Made changes to readme and pinning to llamastack v0.0.61 (#624)

# What does this PR do?

Pinning zero2hero to 0.0.61 and updated readme


## Test Plan
Please describe:
 - Did a end to end test on the server and inference for 0.0.61
 
Server output:
<img width="670" alt="image"
src="https://github.com/user-attachments/assets/66515adf-102d-466d-b0ac-fa91568fcee6"
/>


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
This commit is contained in:
Justin Lee 2025-01-03 03:18:07 +08:00 committed by GitHub
parent 49ad168336
commit 8e5b336792
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 36 additions and 44 deletions

View file

@ -358,7 +358,7 @@
" if not stream:\n", " if not stream:\n",
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n", " cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
" else:\n", " else:\n",
" async for log in EventLogger().log(response):\n", " for log in EventLogger().log(response):\n",
" log.print()\n", " log.print()\n",
"\n", "\n",
"# In a Jupyter Notebook cell, use `await` to call the function\n", "# In a Jupyter Notebook cell, use `await` to call the function\n",
@ -366,16 +366,6 @@
"# To run it in a python file, use this line instead\n", "# To run it in a python file, use this line instead\n",
"# asyncio.run(run_main())\n" "# asyncio.run(run_main())\n"
] ]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9399aecc",
"metadata": {},
"outputs": [],
"source": [
"#fin"
]
} }
], ],
"metadata": { "metadata": {

View file

@ -45,7 +45,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
--- ---
## Install Dependencies and Set Up Environment ## Install Dependencies and Set Up Environmen
1. **Create a Conda Environment**: 1. **Create a Conda Environment**:
Create a new Conda environment with Python 3.10: Create a new Conda environment with Python 3.10:
@ -73,7 +73,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
Open a new terminal and install `llama-stack`: Open a new terminal and install `llama-stack`:
```bash ```bash
conda activate ollama conda activate ollama
pip install llama-stack==0.0.55 pip install llama-stack==0.0.61
``` ```
--- ---
@ -96,7 +96,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
3. **Set the ENV variables by exporting them to the terminal**: 3. **Set the ENV variables by exporting them to the terminal**:
```bash ```bash
export OLLAMA_URL="http://localhost:11434" export OLLAMA_URL="http://localhost:11434"
export LLAMA_STACK_PORT=5051 export LLAMA_STACK_PORT=5001
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B" export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
``` ```
@ -104,34 +104,29 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
3. **Run the Llama Stack**: 3. **Run the Llama Stack**:
Run the stack with command shared by the API from earlier: Run the stack with command shared by the API from earlier:
```bash ```bash
llama stack run ollama \ llama stack run ollama
--port $LLAMA_STACK_PORT \ --port $LLAMA_STACK_PORT
--env INFERENCE_MODEL=$INFERENCE_MODEL \ --env INFERENCE_MODEL=$INFERENCE_MODEL
--env SAFETY_MODEL=$SAFETY_MODEL \ --env SAFETY_MODEL=$SAFETY_MODEL
--env OLLAMA_URL=$OLLAMA_URL --env OLLAMA_URL=$OLLAMA_URL
``` ```
Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model. Note: Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.
The server will start and listen on `http://localhost:5051`. The server will start and listen on `http://localhost:5001`.
--- ---
## Test with `llama-stack-client` CLI ## Test with `llama-stack-client` CLI
After setting up the server, open a new terminal window and install the llama-stack-client package. After setting up the server, open a new terminal window and configure the llama-stack-client.
1. Install the llama-stack-client package 1. Configure the CLI to point to the llama-stack server.
```bash ```bash
conda activate ollama llama-stack-client configure --endpoint http://localhost:5001
pip install llama-stack-client
```
2. Configure the CLI to point to the llama-stack server.
```bash
llama-stack-client configure --endpoint http://localhost:5051
``` ```
**Expected Output:** **Expected Output:**
```bash ```bash
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5051 Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5001
``` ```
3. Test the CLI by running inference: 2. Test the CLI by running inference:
```bash ```bash
llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon" llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
``` ```
@ -153,16 +148,18 @@ After setting up the server, open a new terminal window and install the llama-st
After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`: After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:
```bash ```bash
curl http://localhost:$LLAMA_STACK_PORT/inference/chat_completion \ curl http://localhost:$LLAMA_STACK_PORT/alpha/inference/chat-completion
-H "Content-Type: application/json" \ -H "Content-Type: application/json"
-d '{ -d @- <<EOF
"model": "Llama3.2-3B-Instruct", {
"model_id": "$INFERENCE_MODEL",
"messages": [ "messages": [
{"role": "system", "content": "You are a helpful assistant."}, {"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2-sentence poem about the moon"} {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
], ],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512} "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}' }
EOF
``` ```
You can check the available models with the command `llama-stack-client models list`. You can check the available models with the command `llama-stack-client models list`.
@ -186,16 +183,12 @@ You can check the available models with the command `llama-stack-client models l
You can also interact with the Llama Stack server using a simple Python script. Below is an example: You can also interact with the Llama Stack server using a simple Python script. Below is an example:
### 1. Activate Conda Environment and Install Required Python Packages ### 1. Activate Conda Environmen
The `llama-stack-client` library offers a robust and efficient python methods for interacting with the Llama Stack server.
```bash ```bash
conda activate ollama conda activate ollama
pip install llama-stack-client
``` ```
Note, the client library gets installed by default if you install the server library
### 2. Create Python Script (`test_llama_stack.py`) ### 2. Create Python Script (`test_llama_stack.py`)
```bash ```bash
touch test_llama_stack.py touch test_llama_stack.py
@ -206,19 +199,28 @@ touch test_llama_stack.py
In `test_llama_stack.py`, write the following code: In `test_llama_stack.py`, write the following code:
```python ```python
from llama_stack_client import LlamaStackClient import os
from llama_stack_client import LlamaStackClien
# Initialize the client # Get the model ID from the environment variable
client = LlamaStackClient(base_url="http://localhost:5051") INFERENCE_MODEL = os.environ.get("INFERENCE_MODEL")
# Create a chat completion request # Check if the environment variable is se
if INFERENCE_MODEL is None:
raise ValueError("The environment variable 'INFERENCE_MODEL' is not set.")
# Initialize the clien
client = LlamaStackClient(base_url="http://localhost:5001")
# Create a chat completion reques
response = client.inference.chat_completion( response = client.inference.chat_completion(
messages=[ messages=[
{"role": "system", "content": "You are a friendly assistant."}, {"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Write a two-sentence poem about llama."} {"role": "user", "content": "Write a two-sentence poem about llama."}
], ],
model_id=MODEL_NAME, model_id=INFERENCE_MODEL,
) )
# Print the response # Print the response
print(response.completion_message.content) print(response.completion_message.content)
``` ```