diff --git a/docs/source/building_applications/agent.md b/docs/source/building_applications/agent.md
index 283fb45e4..6fcc46152 100644
--- a/docs/source/building_applications/agent.md
+++ b/docs/source/building_applications/agent.md
@@ -1,6 +1,9 @@
-# Llama Stack Agent Framework
+# Agents
 
-The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.
+An Agent in Llama Stack is a powerful abstraction that allows you to build complex AI applications.
+
+The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI
+applications. This document explains the key components and how they work together.
 
 ## Core Concepts
 
diff --git a/docs/source/building_applications/agent_execution_loop.md b/docs/source/building_applications/agent_execution_loop.md
index a180602c6..d66448449 100644
--- a/docs/source/building_applications/agent_execution_loop.md
+++ b/docs/source/building_applications/agent_execution_loop.md
@@ -1,6 +1,10 @@
 ## Agent Execution Loop
 
-Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.
+Agents are the heart of Llama Stack applications. They combine inference, memory, safety, and tool usage into coherent
+workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage,
+and safety checks.
+
+### Steps in the Agent Workflow
 
 Each agent turn follows these key steps:
 
@@ -64,7 +68,10 @@ sequenceDiagram
     S->>U: 5. Final Response
 ```
 
-Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution:
+Each step in this process can be monitored and controlled through configurations.
+
+### Agent Execution Loop Example
+Here's an example that demonstrates monitoring the agent's execution:
 
 ```python
 from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
diff --git a/docs/source/building_applications/index.md b/docs/source/building_applications/index.md
index abe548971..51dff1702 100644
--- a/docs/source/building_applications/index.md
+++ b/docs/source/building_applications/index.md
@@ -8,9 +8,9 @@ The best way to get started is to look at this notebook which walks through the
 
 Here are some key topics that will help you build effective agents:
 
+- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
 - **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework.
 - **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop.
-- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
 - **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs.
 - **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement.
 - **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior.
@@ -20,12 +20,11 @@ Here are some key topics that will help you build effective agents:
 :hidden:
 :maxdepth: 1
 
+rag
 agent
 agent_execution_loop
-rag
 tools
-telemetry
 evals
-advanced_agent_patterns
+telemetry
 safety
 ```
diff --git a/docs/source/building_applications/rag.md b/docs/source/building_applications/rag.md
index fd11d824f..39d1ba333 100644
--- a/docs/source/building_applications/rag.md
+++ b/docs/source/building_applications/rag.md
@@ -3,9 +3,9 @@
 RAG enables your applications to reference and recall information from previous interactions or external documents.
 
 Llama Stack organizes the APIs that enable RAG into three layers:
-- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
-- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
-- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
+1. The lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.).
+2. The next is the "Rag Tool", a first-class tool as part of the [Tools API](tools.md) that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
+3. Finally, it all comes together with the top-level ["Agents" API](agent.md) that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
 
 <img src="rag.png" alt="RAG System" width="50%">
 
@@ -17,14 +17,19 @@ We may add more storage types like Graph IO in the future.
 
 ### Setting up Vector DBs
 
+For this guide, we will use [Ollama](https://ollama.com/) as the inference provider.
+Ollama is an LLM runtime that allows you to run Llama models locally.
+
 Here's how to set up a vector database for RAG:
 
 ```python
 # Create http client
+import os
 from llama_stack_client import LlamaStackClient
 
 client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}")
 
+
 # Register a vector db
 vector_db_id = "my_documents"
 response = client.vector_dbs.register(
@@ -33,17 +38,27 @@ response = client.vector_dbs.register(
     embedding_dimension=384,
     provider_id="faiss",
 )
+```
 
+### Ingesting Documents
+You can ingest documents into the vector database using two methods: directly inserting pre-chunked
+documents or using the RAG Tool.
+```python
 # You can insert a pre-chunked document directly into the vector db
 chunks = [
     {
-        "document_id": "doc1",
         "content": "Your document text here",
         "mime_type": "text/plain",
+        "metadata": {
+            "document_id": "doc1",
+        },
     },
 ]
 client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
-
+```
+### Retrieval
+You can query the vector database to retrieve documents based on their embeddings.
+```python
 # You can then query for these chunks
 chunks_response = client.vector_io.query(
     vector_db_id=vector_db_id, query="What do you know about..."
@@ -52,7 +67,8 @@ chunks_response = client.vector_io.query(
 
 ### Using the RAG Tool
 
-A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces.
+A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc.
+and automatically chunks them into smaller pieces.
 
 ```python
 from llama_stack_client import RAGDocument