mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-06 02:32:40 +00:00
docs: Some aesthetic changes to the Building AI Applicaitons to make them read a little easier
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
66d6c2580e
commit
db9eded18a
4 changed files with 41 additions and 14 deletions
|
@ -1,6 +1,10 @@
|
||||||
# Llama Stack Agent Framework
|
# Agents
|
||||||
|
|
||||||
The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.
|
An Agent in Llama Stack is a powerful abstraction that allows you to build complex AI applications.
|
||||||
|
:w
|
||||||
|
|
||||||
|
The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI
|
||||||
|
applications. This document explains the key components and how they work together.
|
||||||
|
|
||||||
## Core Concepts
|
## Core Concepts
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,11 @@
|
||||||
## Agent Execution Loop
|
## Agent Execution Loop
|
||||||
|
|
||||||
Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.
|
|
||||||
|
Agents are the heart of Llama Stack applications. They combine inference, memory, safety, and tool usage into coherent
|
||||||
|
workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage,
|
||||||
|
and safety checks.
|
||||||
|
|
||||||
|
### Steps in the Agent Workflow
|
||||||
|
|
||||||
Each agent turn follows these key steps:
|
Each agent turn follows these key steps:
|
||||||
|
|
||||||
|
@ -64,7 +69,10 @@ sequenceDiagram
|
||||||
S->>U: 5. Final Response
|
S->>U: 5. Final Response
|
||||||
```
|
```
|
||||||
|
|
||||||
Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution:
|
Each step in this process can be monitored and controlled through configurations.
|
||||||
|
|
||||||
|
### Agent Execution Loop Example
|
||||||
|
Here's an example that demonstrates monitoring the agent's execution:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
|
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
|
||||||
|
|
|
@ -8,9 +8,9 @@ The best way to get started is to look at this notebook which walks through the
|
||||||
|
|
||||||
Here are some key topics that will help you build effective agents:
|
Here are some key topics that will help you build effective agents:
|
||||||
|
|
||||||
|
- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
|
||||||
- **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework.
|
- **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework.
|
||||||
- **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop.
|
- **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop.
|
||||||
- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
|
|
||||||
- **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs.
|
- **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs.
|
||||||
- **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement.
|
- **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement.
|
||||||
- **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior.
|
- **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior.
|
||||||
|
@ -20,12 +20,11 @@ Here are some key topics that will help you build effective agents:
|
||||||
:hidden:
|
:hidden:
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
|
rag
|
||||||
agent
|
agent
|
||||||
agent_execution_loop
|
agent_execution_loop
|
||||||
rag
|
|
||||||
tools
|
tools
|
||||||
telemetry
|
|
||||||
evals
|
evals
|
||||||
advanced_agent_patterns
|
telemetry
|
||||||
safety
|
safety
|
||||||
```
|
```
|
||||||
|
|
|
@ -3,9 +3,9 @@
|
||||||
RAG enables your applications to reference and recall information from previous interactions or external documents.
|
RAG enables your applications to reference and recall information from previous interactions or external documents.
|
||||||
|
|
||||||
Llama Stack organizes the APIs that enable RAG into three layers:
|
Llama Stack organizes the APIs that enable RAG into three layers:
|
||||||
- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
|
1. The lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.).
|
||||||
- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
|
2. The next is the "Rag Tool", a first-class tool as part of the [Tools API](tools.md) that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
|
||||||
- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
|
3. Finally, it all comes together with the top-level ["Agents" API](agent.md) that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
|
||||||
|
|
||||||
<img src="rag.png" alt="RAG System" width="50%">
|
<img src="rag.png" alt="RAG System" width="50%">
|
||||||
|
|
||||||
|
@ -17,14 +17,19 @@ We may add more storage types like Graph IO in the future.
|
||||||
|
|
||||||
### Setting up Vector DBs
|
### Setting up Vector DBs
|
||||||
|
|
||||||
|
For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
|
||||||
|
Ollama is an LLM runtime that allows you to run Llama models locally.
|
||||||
|
|
||||||
Here's how to set up a vector database for RAG:
|
Here's how to set up a vector database for RAG:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Create http client
|
# Create http client
|
||||||
|
import os
|
||||||
from llama_stack_client import LlamaStackClient
|
from llama_stack_client import LlamaStackClient
|
||||||
|
|
||||||
client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}")
|
client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}")
|
||||||
|
|
||||||
|
|
||||||
# Register a vector db
|
# Register a vector db
|
||||||
vector_db_id = "my_documents"
|
vector_db_id = "my_documents"
|
||||||
response = client.vector_dbs.register(
|
response = client.vector_dbs.register(
|
||||||
|
@ -33,17 +38,27 @@ response = client.vector_dbs.register(
|
||||||
embedding_dimension=384,
|
embedding_dimension=384,
|
||||||
provider_id="faiss",
|
provider_id="faiss",
|
||||||
)
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ingesting Documents
|
||||||
|
You can ingest documents into the vector database using two methods: directly inserting pre-chunked
|
||||||
|
documents or using the RAG Tool.
|
||||||
|
```python
|
||||||
# You can insert a pre-chunked document directly into the vector db
|
# You can insert a pre-chunked document directly into the vector db
|
||||||
chunks = [
|
chunks = [
|
||||||
{
|
{
|
||||||
"document_id": "doc1",
|
|
||||||
"content": "Your document text here",
|
"content": "Your document text here",
|
||||||
"mime_type": "text/plain",
|
"mime_type": "text/plain",
|
||||||
|
"metadata": {
|
||||||
|
"document_id": "doc1",
|
||||||
|
},
|
||||||
},
|
},
|
||||||
]
|
]
|
||||||
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
|
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
|
||||||
|
```
|
||||||
|
### Retrieval
|
||||||
|
You can query the vector database to retrieve documents based on their embeddings.
|
||||||
|
```python
|
||||||
# You can then query for these chunks
|
# You can then query for these chunks
|
||||||
chunks_response = client.vector_io.query(
|
chunks_response = client.vector_io.query(
|
||||||
vector_db_id=vector_db_id, query="What do you know about..."
|
vector_db_id=vector_db_id, query="What do you know about..."
|
||||||
|
@ -52,7 +67,8 @@ chunks_response = client.vector_io.query(
|
||||||
|
|
||||||
### Using the RAG Tool
|
### Using the RAG Tool
|
||||||
|
|
||||||
A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces.
|
A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc.
|
||||||
|
and automatically chunks them into smaller pieces.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from llama_stack_client import RAGDocument
|
from llama_stack_client import RAGDocument
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue