diff --git a/docs/source/building_applications/agent.md b/docs/source/building_applications/agent.md index 283fb45e4..6fcc46152 100644 --- a/docs/source/building_applications/agent.md +++ b/docs/source/building_applications/agent.md @@ -1,6 +1,9 @@ -# Llama Stack Agent Framework +# Agents -The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together. +An Agent in Llama Stack is a powerful abstraction that allows you to build complex AI applications. + +The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI +applications. This document explains the key components and how they work together. ## Core Concepts diff --git a/docs/source/building_applications/agent_execution_loop.md b/docs/source/building_applications/agent_execution_loop.md index a180602c6..d66448449 100644 --- a/docs/source/building_applications/agent_execution_loop.md +++ b/docs/source/building_applications/agent_execution_loop.md @@ -1,6 +1,10 @@ ## Agent Execution Loop -Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks. +Agents are the heart of Llama Stack applications. They combine inference, memory, safety, and tool usage into coherent +workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, +and safety checks. + +### Steps in the Agent Workflow Each agent turn follows these key steps: @@ -64,7 +68,10 @@ sequenceDiagram S->>U: 5. Final Response ``` -Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution: +Each step in this process can be monitored and controlled through configurations. + +### Agent Execution Loop Example +Here's an example that demonstrates monitoring the agent's execution: ```python from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger diff --git a/docs/source/building_applications/index.md b/docs/source/building_applications/index.md index abe548971..51dff1702 100644 --- a/docs/source/building_applications/index.md +++ b/docs/source/building_applications/index.md @@ -8,9 +8,9 @@ The best way to get started is to look at this notebook which walks through the Here are some key topics that will help you build effective agents: +- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms. - **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework. - **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop. -- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms. - **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs. - **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement. - **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior. @@ -20,12 +20,11 @@ Here are some key topics that will help you build effective agents: :hidden: :maxdepth: 1 +rag agent agent_execution_loop -rag tools -telemetry evals -advanced_agent_patterns +telemetry safety ``` diff --git a/docs/source/building_applications/rag.md b/docs/source/building_applications/rag.md index fd11d824f..39d1ba333 100644 --- a/docs/source/building_applications/rag.md +++ b/docs/source/building_applications/rag.md @@ -3,9 +3,9 @@ RAG enables your applications to reference and recall information from previous interactions or external documents. Llama Stack organizes the APIs that enable RAG into three layers: -- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.) -- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly. -- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more. +1. The lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.). +2. The next is the "Rag Tool", a first-class tool as part of the [Tools API](tools.md) that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly. +3. Finally, it all comes together with the top-level ["Agents" API](agent.md) that allows you to create agents that can use the tools to answer questions, perform tasks, and more. RAG System @@ -17,14 +17,19 @@ We may add more storage types like Graph IO in the future. ### Setting up Vector DBs +For this guide, we will use [Ollama](https://ollama.com/) as the inference provider. +Ollama is an LLM runtime that allows you to run Llama models locally. + Here's how to set up a vector database for RAG: ```python # Create http client +import os from llama_stack_client import LlamaStackClient client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}") + # Register a vector db vector_db_id = "my_documents" response = client.vector_dbs.register( @@ -33,17 +38,27 @@ response = client.vector_dbs.register( embedding_dimension=384, provider_id="faiss", ) +``` +### Ingesting Documents +You can ingest documents into the vector database using two methods: directly inserting pre-chunked +documents or using the RAG Tool. +```python # You can insert a pre-chunked document directly into the vector db chunks = [ { - "document_id": "doc1", "content": "Your document text here", "mime_type": "text/plain", + "metadata": { + "document_id": "doc1", + }, }, ] client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks) - +``` +### Retrieval +You can query the vector database to retrieve documents based on their embeddings. +```python # You can then query for these chunks chunks_response = client.vector_io.query( vector_db_id=vector_db_id, query="What do you know about..." @@ -52,7 +67,8 @@ chunks_response = client.vector_io.query( ### Using the RAG Tool -A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces. +A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. +and automatically chunks them into smaller pieces. ```python from llama_stack_client import RAGDocument