update getting started guide to use ollama pull

This commit is contained in:
Matthew Farrellee 2025-04-01 16:09:15 -04:00
parent 66d6c2580e
commit 9f3c1ed545

View file

@ -8,13 +8,13 @@ In Llama Stack, we provide a server exposing multiple APIs. These APIs are backe
Ollama is an LLM runtime that allows you to run Llama models locally.
### 1. Start Ollama
### 1. Download a Llama model with Ollama
```bash
ollama run llama3.2:3b-instruct-fp16 --keepalive 60m
ollama pull llama3.2:3b-instruct-fp16
```
By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime.
This will instruct the Ollama service to download the Llama 3.2 3B Instruct model, which we'll use in the rest of this guide.
```{admonition} Note
:class: tip