some more minor changes

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
Francisco Javier Arceo 2025-04-07 09:02:26 -04:00
parent 11b53acfb8
commit b0ed1381e6
2 changed files with 26 additions and 29 deletions

View file

@ -2,22 +2,22 @@
You can run a Llama Stack server in one of the following ways:
**As a Library**:
## As a Library:
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library)
**Container**:
## Container:
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
**Conda**:
## Conda:
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
**Kubernetes**:
## Kubernetes:
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.

View file

@ -75,13 +75,13 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit
```
### ii. Using the Llama Stack Client
Now you can use the llama stack client to run inference and build agents!
Now you can use the llama stack client to run inference and build agents! You can reuse the server setup or use the
[Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/). Note that the client package is already
included in the `llama-stack` package.
_Note: You can reuse the server setup or the Llama Stack Client_
Open a new terminal and navigate to the same directory you started the server from. Then set up or activate your
virtual environment.
Open a new terminal and navigate to the same directory you started the server from.
Setup venv (llama-stack already includes the client package)
```bash
source .venv/bin/activate
```
@ -113,8 +113,8 @@ Total models: 2
```
## Step 4: Run Inference with Llama Stack
You can test basic Llama inference completion using the CLI too.
## Step 4: Run Basic Inference
You can test basic Llama inference completion using the CLI.
```bash
llama-stack-client inference chat-completion --message "tell me a joke"
@ -136,8 +136,9 @@ ChatCompletionResponse(
],
)
```
### i. Create the Script
Alternatively, you can run inference using the Llama Stack client SDK.
### i. Create the Script
Create a file `inference.py` and add the following code:
```python
from llama_stack_client import LlamaStackClient
@ -177,9 +178,9 @@ Logic flows through digital night
Beauty in the bits
```
## Step 5: Run Your First Agent
### i. Create the Script
## Step 5: Build a Simple Agent
Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
### i. Create the Script
Create a file `agent.py` and add the following code:
```python
@ -348,9 +349,9 @@ So, who am I? I'm just a computer program designed to help you!
:::
## Step 6: Build a RAG Agent
### i. Create the Script
For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents
in a vector database.
### i. Create the Script
Create a file `rag_agent.py` and add the following code:
```python
@ -360,11 +361,11 @@ from llama_stack_client.types import Document
import uuid
from termcolor import cprint
client = LlamaStackClient(base_url=f"http://localhost:8321")
client = LlamaStackClient(base_url="http://localhost:8321")
# Create a vector database instance
embedlm = next(m for m in client.models.list() if m.model_type == "embedding")
embedding_model = embedlm.identifier
embed_lm = next(m for m in client.models.list() if m.model_type == "embedding")
embedding_model = embed_lm.identifier
vector_db_id = f"v{uuid.uuid4().hex}"
client.vector_dbs.register(
vector_db_id=vector_db_id,
@ -401,7 +402,7 @@ client.tool_runtime.rag_tool.insert(
llm = next(m for m in client.models.list() if m.model_type == "llm")
model = llm.identifier
# Create RAG agent
# Create the RAG agent
rag_agent = Agent(
client,
model=model,
@ -416,18 +417,14 @@ rag_agent = Agent(
session_id = rag_agent.create_session(session_name=f"s{uuid.uuid4().hex}")
user_prompts = [
"How to optimize memory usage in torchtune? use the knowledge_search tool to get information.",
]
turns = ["what is torchtune", "tell me about dora"]
# Run the agent loop by calling the `create_turn` method
for prompt in user_prompts:
cprint(f"User> {prompt}", "green")
response = rag_agent.create_turn(
messages=[{"role": "user", "content": prompt}],
session_id=session_id,
for t in turns:
print("user>", t)
stream = rag_agent.create_turn(
messages=[{"role": "user", "content": t}], session_id=session_id, stream=True
)
for event in AgentEventLogger().log(response):
for event in AgentEventLogger().log(stream):
event.print()
```
### ii. Run the Script