From a2cf2999066aa583f6e356a6580862184916a998 Mon Sep 17 00:00:00 2001
From: Matthew Farrellee <matt@cs.wisc.edu>
Date: Wed, 9 Apr 2025 04:35:19 -0400
Subject: [PATCH] fix: update getting started guide to use `ollama pull`
 (#1855)

# What does this PR do?

download the getting started w/ ollama model instead of downloading and
running it.

directly running it was necessary before
https://github.com/meta-llama/llama-stack/pull/1854

## Test Plan

run the code on the page
---
 docs/source/getting_started/index.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md
index ef258a9cf..e9ad51961 100644
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@@ -6,13 +6,13 @@ Llama Stack is a stateful service with REST APIs to support seamless transition
 In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/) to run inference on a Llama Model.
 
 
-### 1. Start Ollama
+### 1. Download a Llama model with Ollama
 
 ```bash
-ollama run llama3.2:3b --keepalive 60m
+ollama pull llama3.2:3b-instruct-fp16
 ```
 
-By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime.
+This will instruct the Ollama service to download the Llama 3.2 3B Instruct model, which we'll use in the rest of this guide.
 
 ```{admonition} Note
 :class: tip