forked from phoenix-oss/llama-stack-mirror
docs: Minor updates to docs to make them a little friendlier to new users (#1871)
# What does this PR do? This PR modifies some of the docs to help them map to (1) the mental model of software engineers building AI models starting with RAG and then moving to Agents and (2) aligning the navbar somewhat closer to the diagram on the home page. ## Test Plan N/A Tested locally. # Documentation Take a look at the screen shot for below and after. ## Before  ## After  --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
66d6c2580e
commit
23a99a4b22
4 changed files with 39 additions and 14 deletions
|
@ -1,6 +1,9 @@
|
|||
# Llama Stack Agent Framework
|
||||
# Agents
|
||||
|
||||
The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.
|
||||
An Agent in Llama Stack is a powerful abstraction that allows you to build complex AI applications.
|
||||
|
||||
The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI
|
||||
applications. This document explains the key components and how they work together.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
|
|
|
@ -1,6 +1,10 @@
|
|||
## Agent Execution Loop
|
||||
|
||||
Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.
|
||||
Agents are the heart of Llama Stack applications. They combine inference, memory, safety, and tool usage into coherent
|
||||
workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage,
|
||||
and safety checks.
|
||||
|
||||
### Steps in the Agent Workflow
|
||||
|
||||
Each agent turn follows these key steps:
|
||||
|
||||
|
@ -64,7 +68,10 @@ sequenceDiagram
|
|||
S->>U: 5. Final Response
|
||||
```
|
||||
|
||||
Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution:
|
||||
Each step in this process can be monitored and controlled through configurations.
|
||||
|
||||
### Agent Execution Loop Example
|
||||
Here's an example that demonstrates monitoring the agent's execution:
|
||||
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
|
||||
|
|
|
@ -8,9 +8,9 @@ The best way to get started is to look at this notebook which walks through the
|
|||
|
||||
Here are some key topics that will help you build effective agents:
|
||||
|
||||
- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
|
||||
- **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework.
|
||||
- **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop.
|
||||
- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
|
||||
- **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs.
|
||||
- **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement.
|
||||
- **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior.
|
||||
|
@ -20,12 +20,11 @@ Here are some key topics that will help you build effective agents:
|
|||
:hidden:
|
||||
:maxdepth: 1
|
||||
|
||||
rag
|
||||
agent
|
||||
agent_execution_loop
|
||||
rag
|
||||
tools
|
||||
telemetry
|
||||
evals
|
||||
advanced_agent_patterns
|
||||
telemetry
|
||||
safety
|
||||
```
|
||||
|
|
|
@ -3,9 +3,9 @@
|
|||
RAG enables your applications to reference and recall information from previous interactions or external documents.
|
||||
|
||||
Llama Stack organizes the APIs that enable RAG into three layers:
|
||||
- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
|
||||
- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
|
||||
- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
|
||||
1. The lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.).
|
||||
2. The next is the "Rag Tool", a first-class tool as part of the [Tools API](tools.md) that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
|
||||
3. Finally, it all comes together with the top-level ["Agents" API](agent.md) that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
|
||||
|
||||
<img src="rag.png" alt="RAG System" width="50%">
|
||||
|
||||
|
@ -17,14 +17,19 @@ We may add more storage types like Graph IO in the future.
|
|||
|
||||
### Setting up Vector DBs
|
||||
|
||||
For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
|
||||
Ollama is an LLM runtime that allows you to run Llama models locally.
|
||||
|
||||
Here's how to set up a vector database for RAG:
|
||||
|
||||
```python
|
||||
# Create http client
|
||||
import os
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}")
|
||||
|
||||
|
||||
# Register a vector db
|
||||
vector_db_id = "my_documents"
|
||||
response = client.vector_dbs.register(
|
||||
|
@ -33,17 +38,27 @@ response = client.vector_dbs.register(
|
|||
embedding_dimension=384,
|
||||
provider_id="faiss",
|
||||
)
|
||||
```
|
||||
|
||||
### Ingesting Documents
|
||||
You can ingest documents into the vector database using two methods: directly inserting pre-chunked
|
||||
documents or using the RAG Tool.
|
||||
```python
|
||||
# You can insert a pre-chunked document directly into the vector db
|
||||
chunks = [
|
||||
{
|
||||
"document_id": "doc1",
|
||||
"content": "Your document text here",
|
||||
"mime_type": "text/plain",
|
||||
"metadata": {
|
||||
"document_id": "doc1",
|
||||
},
|
||||
},
|
||||
]
|
||||
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
|
||||
|
||||
```
|
||||
### Retrieval
|
||||
You can query the vector database to retrieve documents based on their embeddings.
|
||||
```python
|
||||
# You can then query for these chunks
|
||||
chunks_response = client.vector_io.query(
|
||||
vector_db_id=vector_db_id, query="What do you know about..."
|
||||
|
@ -52,7 +67,8 @@ chunks_response = client.vector_io.query(
|
|||
|
||||
### Using the RAG Tool
|
||||
|
||||
A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces.
|
||||
A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc.
|
||||
and automatically chunks them into smaller pieces.
|
||||
|
||||
```python
|
||||
from llama_stack_client import RAGDocument
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue