Update quickstart.md

2025-08-01 16:24:44 +00:00 · 2024-11-21 15:51:02 -08:00 · 2024-11-21 15:51:02 -08:00 · bec4bfd668
commit bec4bfd668
parent ee8684b4cd
1 changed files with 16 additions and 12 deletions
--- a/zero_to_hero_guide/quickstart.md
+++ b/zero_to_hero_guide/quickstart.md
@ -112,21 +112,25 @@ Build Successful! Next steps:

 2. **Set the ENV variables by exporting them to the terminal**:
 ```bash
-export OLLAMA_URL=""
+export OLLAMA_URL="http://localhost:11434"
 export LLAMA_STACK_PORT=5001
 export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
 export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
 ```

 3. **Run the Llama Stack**:
-   - Run the stack with the configured YAML file:
+   - Run the stack with command shared by the API from earlier:
     ```bash
-     llama stack run /path/to/your/distro/llamastack-ollama/ollama-run.yaml --port 5050
+     llama stack run /Users/username/.llama/distributions/llamastack-ollama/ollama-run.yaml  \
+    --port $LLAMA_STACK_PORT \
+    --env INFERENCE_MODEL=$INFERENCE_MODEL \
+    --env SAFETY_MODEL=$SAFETY_MODEL \
+    --env OLLAMA_URL=http://localhost:11434
     ```
-     Note:
-        1. Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model

-The server will start and listen on `http://localhost:5050`.
+Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model
+
+The server will start and listen on `http://localhost:5051`.

 ---

@ -135,7 +139,7 @@ The server will start and listen on `http://localhost:5050`.
 After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:

 ```bash
-curl http://localhost:5050/inference/chat_completion \
+curl http://localhost:5051/inference/chat_completion \
 -H "Content-Type: application/json" \
 -d '{
    "model": "Llama3.2-3B-Instruct",
@ -173,9 +177,10 @@ The `llama-stack-client` library offers a robust and efficient python methods fo

 ```bash
 conda activate your-llama-stack-conda-env
-pip install llama-stack-client
 ```

+Note, the client library gets installed by default if you install the server library
+
 ### 2. Create Python Script (`test_llama_stack.py`)
 ```bash
 touch test_llama_stack.py
@ -187,17 +192,16 @@ touch test_llama_stack.py
 from llama_stack_client import LlamaStackClient

 # Initialize the client
-client = LlamaStackClient(base_url="http://localhost:5050")
+client = LlamaStackClient(base_url="http://localhost:5051")

 # Create a chat completion request
 response = client.inference.chat_completion(
    messages=[
-        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Write a two-sentence poem about llama."}
    ],
-    model="llama3.2:1b",
+    model_id=MODEL_NAME,
 )
-
 # Print the response
 print(response.completion_message.content)
 ```