From a2cf2999066aa583f6e356a6580862184916a998 Mon Sep 17 00:00:00 2001 From: Matthew Farrellee Date: Wed, 9 Apr 2025 04:35:19 -0400 Subject: [PATCH] fix: update getting started guide to use `ollama pull` (#1855) # What does this PR do? download the getting started w/ ollama model instead of downloading and running it. directly running it was necessary before https://github.com/meta-llama/llama-stack/pull/1854 ## Test Plan run the code on the page --- docs/source/getting_started/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index ef258a9cf..e9ad51961 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -6,13 +6,13 @@ Llama Stack is a stateful service with REST APIs to support seamless transition In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/) to run inference on a Llama Model. -### 1. Start Ollama +### 1. Download a Llama model with Ollama ```bash -ollama run llama3.2:3b --keepalive 60m +ollama pull llama3.2:3b-instruct-fp16 ``` -By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime. +This will instruct the Ollama service to download the Llama 3.2 3B Instruct model, which we'll use in the rest of this guide. ```{admonition} Note :class: tip