Update quickstart.md

This commit is contained in:
Sanyam Bhutani 2024-11-21 15:51:02 -08:00 committed by Justin Lee
parent ee8684b4cd
commit bec4bfd668

View file

@ -112,21 +112,25 @@ Build Successful! Next steps:
2. **Set the ENV variables by exporting them to the terminal**: 2. **Set the ENV variables by exporting them to the terminal**:
```bash ```bash
export OLLAMA_URL="" export OLLAMA_URL="http://localhost:11434"
export LLAMA_STACK_PORT=5001 export LLAMA_STACK_PORT=5001
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B" export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
``` ```
3. **Run the Llama Stack**: 3. **Run the Llama Stack**:
- Run the stack with the configured YAML file: - Run the stack with command shared by the API from earlier:
```bash ```bash
llama stack run /path/to/your/distro/llamastack-ollama/ollama-run.yaml --port 5050 llama stack run /Users/username/.llama/distributions/llamastack-ollama/ollama-run.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env SAFETY_MODEL=$SAFETY_MODEL \
--env OLLAMA_URL=http://localhost:11434
``` ```
Note:
1. Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model
The server will start and listen on `http://localhost:5050`. Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model
The server will start and listen on `http://localhost:5051`.
--- ---
@ -135,7 +139,7 @@ The server will start and listen on `http://localhost:5050`.
After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`: After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:
```bash ```bash
curl http://localhost:5050/inference/chat_completion \ curl http://localhost:5051/inference/chat_completion \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
"model": "Llama3.2-3B-Instruct", "model": "Llama3.2-3B-Instruct",
@ -173,9 +177,10 @@ The `llama-stack-client` library offers a robust and efficient python methods fo
```bash ```bash
conda activate your-llama-stack-conda-env conda activate your-llama-stack-conda-env
pip install llama-stack-client
``` ```
Note, the client library gets installed by default if you install the server library
### 2. Create Python Script (`test_llama_stack.py`) ### 2. Create Python Script (`test_llama_stack.py`)
```bash ```bash
touch test_llama_stack.py touch test_llama_stack.py
@ -187,17 +192,16 @@ touch test_llama_stack.py
from llama_stack_client import LlamaStackClient from llama_stack_client import LlamaStackClient
# Initialize the client # Initialize the client
client = LlamaStackClient(base_url="http://localhost:5050") client = LlamaStackClient(base_url="http://localhost:5051")
# Create a chat completion request # Create a chat completion request
response = client.inference.chat_completion( response = client.inference.chat_completion(
messages=[ messages=[
{"role": "system", "content": "You are a helpful assistant."}, {"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Write a two-sentence poem about llama."} {"role": "user", "content": "Write a two-sentence poem about llama."}
], ],
model="llama3.2:1b", model_id=MODEL_NAME,
) )
# Print the response # Print the response
print(response.completion_message.content) print(response.completion_message.content)
``` ```