From 39e99b39fe60b0064f91cacd52911b9863da54c9 Mon Sep 17 00:00:00 2001
From: Henry Tai <chuenlok@users.noreply.github.com>
Date: Wed, 20 Nov 2024 02:32:19 +0800
Subject: [PATCH] update quick start to have the working instruction (#467)

# What does this PR do?

Fix the instruction in quickstart readme so the new developers/users can
run it without issues.

## Test Plan
None

## Sources

Please link relevant resources if necessary.


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [X] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Co-authored-by: Henry Tai <henrytai@fb.com>
---
 docs/zero_to_hero_guide/quickstart.md | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/docs/zero_to_hero_guide/quickstart.md b/docs/zero_to_hero_guide/quickstart.md
index 54a01e219..df8e9abc4 100644
--- a/docs/zero_to_hero_guide/quickstart.md
+++ b/docs/zero_to_hero_guide/quickstart.md
@@ -22,14 +22,22 @@ If you're looking for more specific topics like tool calling or agent setup, we
    - Download and unzip `Ollama-darwin.zip`.
    - Run the `Ollama` application.
 
-2. **Download the Ollama CLI**:
+1. **Download the Ollama CLI**:
    - Ensure you have the `ollama` command line tool by downloading and installing it from the same website.
 
-3. **Verify Installation**:
+1. **Start ollama server**:
+   - Open the terminal and run:
+      ```
+      ollama serve
+      ```
+
+1. **Run the model**:
    - Open the terminal and run:
      ```bash
-     ollama run llama3.2:1b
+     ollama run llama3.2:3b-instruct-fp16
      ```
+     **Note**: The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L43)
+
 
 ---
 
@@ -84,6 +92,8 @@ If you're looking for more specific topics like tool calling or agent setup, we
      ```bash
      llama stack run /path/to/your/distro/llamastack-ollama/ollama-run.yaml --port 5050
      ```
+     Note:
+        1. Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model
 
 The server will start and listen on `http://localhost:5050`.
 
@@ -97,7 +107,7 @@ After setting up the server, open a new terminal window and verify it's working
 curl http://localhost:5050/inference/chat_completion \
 -H "Content-Type: application/json" \
 -d '{
-    "model": "llama3.2:1b",
+    "model": "Llama3.2-3B-Instruct",
     "messages": [
         {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
@@ -106,6 +116,8 @@ curl http://localhost:5050/inference/chat_completion \
 }'
 ```
 
+You can check the available models with the command `llama-stack-client models list`.
+
 **Expected Output:**
 ```json
 {