forked from phoenix-oss/llama-stack-mirror
docs: Updated docs to show minimal RAG example and some other minor changes (#1935)
# What does this PR do? Incorporating some feedback into the docs. - **`docs/source/getting_started/index.md`:** - Demo actually does RAG now - Simplified the installation command for dependencies. - Updated demo script examples to align with the latest API changes. - Replaced manual document manipulation with `RAGDocument` for clarity and maintainability. - Introduced new logic for model and embedding selection using the Llama Stack Client SDK. - Enhanced examples to showcase proper agent initialization and logging. - **`docs/source/getting_started/detailed_tutorial.md`:** - Updated the section for listing models to include proper code formatting with `bash`. - Removed and reorganized the "Run the Demos" section for clarity. - Adjusted tab-item structures and added new instructions for demo scripts. - **`docs/_static/css/my_theme.css`:** - Updated heading styles to include `h2`, `h3`, and `h4` for consistent font weight. - Added a new style for `pre` tags to wrap text and break long words, this is particularly useful for rendering long output from generation. ## Test Plan Tested locally. Screenshot for reference: <img width="1250" alt="Screenshot 2025-04-10 at 10 12 12 PM" src="https://github.com/user-attachments/assets/ce1c8986-e072-4c6f-a697-ed0d8fb75b34" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
c1cb6aad11
commit
24d70cedca
3 changed files with 62 additions and 71 deletions
6
docs/_static/css/my_theme.css
vendored
6
docs/_static/css/my_theme.css
vendored
|
@ -17,9 +17,13 @@
|
||||||
display: none;
|
display: none;
|
||||||
}
|
}
|
||||||
|
|
||||||
h3 {
|
h2, h3, h4 {
|
||||||
font-weight: normal;
|
font-weight: normal;
|
||||||
}
|
}
|
||||||
html[data-theme="dark"] .rst-content div[class^="highlight"] {
|
html[data-theme="dark"] .rst-content div[class^="highlight"] {
|
||||||
background-color: #0b0b0b;
|
background-color: #0b0b0b;
|
||||||
}
|
}
|
||||||
|
pre {
|
||||||
|
white-space: pre-wrap !important;
|
||||||
|
word-break: break-all;
|
||||||
|
}
|
||||||
|
|
|
@ -173,9 +173,8 @@ You will see the below:
|
||||||
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
|
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
|
||||||
```
|
```
|
||||||
|
|
||||||
#### iii. List Available Models
|
|
||||||
List the models
|
List the models
|
||||||
```
|
```bash
|
||||||
llama-stack-client models list
|
llama-stack-client models list
|
||||||
Available Models
|
Available Models
|
||||||
|
|
||||||
|
@ -190,15 +189,6 @@ Available Models
|
||||||
Total models: 2
|
Total models: 2
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 4: Run the Demos
|
|
||||||
|
|
||||||
Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
|
|
||||||
Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
|
|
||||||
|
|
||||||
::::{tab-set}
|
|
||||||
|
|
||||||
:::{tab-item} Basic Inference with the CLI
|
|
||||||
You can test basic Llama inference completion using the CLI.
|
You can test basic Llama inference completion using the CLI.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -221,10 +211,16 @@ ChatCompletionResponse(
|
||||||
],
|
],
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
:::
|
|
||||||
|
|
||||||
:::{tab-item} Basic Inference with a Script
|
## Step 4: Run the Demos
|
||||||
Alternatively, you can run inference using the Llama Stack client SDK.
|
|
||||||
|
Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
|
||||||
|
Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
|
||||||
|
|
||||||
|
::::{tab-set}
|
||||||
|
|
||||||
|
:::{tab-item} Basic Inference
|
||||||
|
Now you can run inference using the Llama Stack client SDK.
|
||||||
|
|
||||||
### i. Create the Script
|
### i. Create the Script
|
||||||
Create a file `inference.py` and add the following code:
|
Create a file `inference.py` and add the following code:
|
||||||
|
@ -269,7 +265,7 @@ Beauty in the bits
|
||||||
:::
|
:::
|
||||||
|
|
||||||
:::{tab-item} Build a Simple Agent
|
:::{tab-item} Build a Simple Agent
|
||||||
Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
|
Next we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
|
||||||
### i. Create the Script
|
### i. Create the Script
|
||||||
Create a file `agent.py` and add the following code:
|
Create a file `agent.py` and add the following code:
|
||||||
|
|
||||||
|
|
|
@ -12,9 +12,8 @@ as the inference [provider](../providers/index.md#inference) for a Llama Model.
|
||||||
Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with
|
Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with
|
||||||
[Ollama](https://ollama.com/download).
|
[Ollama](https://ollama.com/download).
|
||||||
```bash
|
```bash
|
||||||
uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals
|
uv pip install llama-stack
|
||||||
source .venv/bin/activate
|
source .venv/bin/activate
|
||||||
export INFERENCE_MODEL="llama3.2:3b"
|
|
||||||
ollama run llama3.2:3b --keepalive 60m
|
ollama run llama3.2:3b --keepalive 60m
|
||||||
```
|
```
|
||||||
## Step 2: Run the Llama Stack Server
|
## Step 2: Run the Llama Stack Server
|
||||||
|
@ -24,70 +23,62 @@ INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type ven
|
||||||
## Step 3: Run the Demo
|
## Step 3: Run the Demo
|
||||||
Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell.
|
Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell.
|
||||||
```python
|
```python
|
||||||
from termcolor import cprint
|
from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient
|
||||||
from llama_stack_client.types import Document
|
|
||||||
from llama_stack_client import LlamaStackClient
|
|
||||||
|
|
||||||
|
|
||||||
vector_db = "faiss"
|
|
||||||
vector_db_id = "test-vector-db"
|
|
||||||
model_id = "llama3.2:3b-instruct-fp16"
|
|
||||||
query = "Can you give me the arxiv link for Lora Fine Tuning in Pytorch?"
|
|
||||||
documents = [
|
|
||||||
Document(
|
|
||||||
document_id="document_1",
|
|
||||||
content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/lora_finetune.rst",
|
|
||||||
mime_type="text/plain",
|
|
||||||
metadata={},
|
|
||||||
)
|
|
||||||
]
|
|
||||||
|
|
||||||
|
vector_db_id = "my_demo_vector_db"
|
||||||
client = LlamaStackClient(base_url="http://localhost:8321")
|
client = LlamaStackClient(base_url="http://localhost:8321")
|
||||||
client.vector_dbs.register(
|
|
||||||
provider_id=vector_db,
|
|
||||||
vector_db_id=vector_db_id,
|
|
||||||
embedding_model="all-MiniLM-L6-v2",
|
|
||||||
embedding_dimension=384,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
models = client.models.list()
|
||||||
|
|
||||||
|
# Select the first LLM and first embedding models
|
||||||
|
model_id = next(m for m in models if m.model_type == "llm").identifier
|
||||||
|
embedding_model_id = (
|
||||||
|
em := next(m for m in models if m.model_type == "embedding")
|
||||||
|
).identifier
|
||||||
|
embedding_dimension = em.metadata["embedding_dimension"]
|
||||||
|
|
||||||
|
_ = client.vector_dbs.register(
|
||||||
|
vector_db_id=vector_db_id,
|
||||||
|
embedding_model=embedding_model_id,
|
||||||
|
embedding_dimension=embedding_dimension,
|
||||||
|
provider_id="faiss",
|
||||||
|
)
|
||||||
|
document = RAGDocument(
|
||||||
|
document_id="document_1",
|
||||||
|
content="https://www.paulgraham.com/greatwork.html",
|
||||||
|
mime_type="text/html",
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
client.tool_runtime.rag_tool.insert(
|
client.tool_runtime.rag_tool.insert(
|
||||||
documents=documents,
|
documents=[document],
|
||||||
vector_db_id=vector_db_id,
|
vector_db_id=vector_db_id,
|
||||||
chunk_size_in_tokens=50,
|
chunk_size_in_tokens=50,
|
||||||
)
|
)
|
||||||
|
agent = Agent(
|
||||||
response = client.tool_runtime.rag_tool.query(
|
client,
|
||||||
vector_db_ids=[vector_db_id],
|
model=model_id,
|
||||||
content=query,
|
instructions="You are a helpful assistant",
|
||||||
|
tools=[
|
||||||
|
{
|
||||||
|
"name": "builtin::rag/knowledge_search",
|
||||||
|
"args": {"vector_db_ids": [vector_db_id]},
|
||||||
|
}
|
||||||
|
],
|
||||||
)
|
)
|
||||||
|
|
||||||
cprint("" + "-" * 50, "yellow")
|
response = agent.create_turn(
|
||||||
cprint(f"Query> {query}", "red")
|
messages=[{"role": "user", "content": "How do you do great work?"}],
|
||||||
cprint("" + "-" * 50, "yellow")
|
session_id=agent.create_session("rag_session"),
|
||||||
for chunk in response.content:
|
)
|
||||||
cprint(f"Chunk ID> {chunk.text}", "green")
|
|
||||||
cprint("" + "-" * 50, "yellow")
|
for log in AgentEventLogger().log(response):
|
||||||
|
log.print()
|
||||||
```
|
```
|
||||||
And you should see output like below.
|
And you should see output like below.
|
||||||
```
|
```bash
|
||||||
--------------------------------------------------
|
inference> [knowledge_search(query="What does it mean to do great work")]
|
||||||
Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch?
|
tool_execution> Tool:knowledge_search Args:{'query': 'What does it mean to do great work'}
|
||||||
--------------------------------------------------
|
tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='Result 2:\nDocument_id:docum\nContent: [<a name="f1n"><font color=#000000>1</font></a>]\nI don\'t think you could give a precise definition of what\ncounts as great work. Doing great work means doing something important\nso well\n', type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: . And if so\nyou're already further along than you might realize, because the\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\nmathematical sense, and\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: \nincreases your morale and helps you do even better work. But this\ncycle also operates in the other direction: if you're not doing\ngood work, that can demoralize you and make it even harder to. Since\nit matters\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: to try to do\ngreat work. But that's what's going on subconsciously; they shy\naway from the question.<br /><br />So I'm going to pull a sneaky trick on you. Do you want to do great\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')]
|
||||||
Chunk ID> knowledge_search tool found 5 chunks:
|
|
||||||
BEGIN of knowledge_search tool results.
|
|
||||||
|
|
||||||
--------------------------------------------------
|
|
||||||
Chunk ID> Result 1:
|
|
||||||
Document_id:docum
|
|
||||||
Content: .. _lora_finetune_label:
|
|
||||||
|
|
||||||
============================
|
|
||||||
Fine-Tuning Llama2 with LoRA
|
|
||||||
============================
|
|
||||||
|
|
||||||
This guide will teach you about `LoRA <https://arxiv.org/abs/2106.09685>`_, a
|
|
||||||
|
|
||||||
--------------------------------------------------
|
|
||||||
```
|
```
|
||||||
Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳
|
Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue