diff --git a/docs/requirements.txt b/docs/requirements.txt
index c182f41c4..d455cf6b5 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -9,3 +9,4 @@ sphinx-tabs
 sphinx-design
 sphinxcontrib-openapi
 sphinxcontrib-redoc
+sphinxcontrib-mermaid
diff --git a/docs/source/building_applications/index.md b/docs/source/building_applications/index.md
index 1c333c4a7..6e2062204 100644
--- a/docs/source/building_applications/index.md
+++ b/docs/source/building_applications/index.md
@@ -1,17 +1,413 @@
-# Building Applications
+# Building AI Applications
 
-```{admonition} Work in Progress
-:class: warning
+Llama Stack provides all the building blocks needed to create sophisticated AI applications. This guide will walk you through how to use these components effectively.
 
-## What can you do with the Stack?
+## Basic Inference
 
-- Agents
-  - what is a turn? session?
-  - inference
-  - memory / RAG; pre-ingesting content or attaching content in a turn
-  - how does tool calling work
-  - can you do evaluation?
+The foundation of any AI application is the ability to interact with LLM models. Llama Stack provides a simple interface for both completion and chat-based inference:
+
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient(base_url="http://localhost:5001")
+
+# List available models
+models = client.models.list()
+
+# Simple chat completion
+response = client.inference.chat_completion(
+    model_id="Llama3.2-3B-Instruct",
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Write a haiku about coding"}
+    ]
+)
+print(response.completion_message.content)
 ```
+
+## Adding Memory & RAG
+
+Memory enables your applications to reference and recall information from previous interactions or external documents. Llama Stack's memory system is built around the concept of Memory Banks:
+
+1. **Vector Memory Banks**: For semantic search and retrieval
+2. **Key-Value Memory Banks**: For structured data storage
+3. **Keyword Memory Banks**: For basic text search
+4. **Graph Memory Banks**: For relationship-based retrieval
+
+Here's how to set up a vector memory bank for RAG:
+
+```python
+# Register a memory bank
+bank_id = "my_documents"
+response = client.memory_banks.register(
+    memory_bank_id=bank_id,
+    params={
+        "memory_bank_type": "vector",
+        "embedding_model": "all-MiniLM-L6-v2",
+        "chunk_size_in_tokens": 512
+    }
+)
+
+# Insert documents
+documents = [
+    {
+        "document_id": "doc1",
+        "content": "Your document text here",
+        "mime_type": "text/plain"
+    }
+]
+client.memory.insert(bank_id, documents)
+
+# Query documents
+results = client.memory.query(
+    bank_id=bank_id,
+    query="What do you know about...",
+)
+```
+
+## Implementing Safety Guardrails
+
+Safety is a critical component of any AI application. Llama Stack provides a Shield system that can be applied at multiple touchpoints:
+
+```python
+# Register a safety shield
+shield_id = "content_safety"
+client.shields.register(
+    shield_id=shield_id,
+    provider_shield_id="llama-guard-basic"
+)
+
+# Run content through shield
+response = client.safety.run_shield(
+    shield_id=shield_id,
+    messages=[{"role": "user", "content": "User message here"}]
+)
+
+if response.violation:
+    print(f"Safety violation detected: {response.violation.user_message}")
+```
+
+## Building Agents
+
+Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.
+
+### The Agent Execution Loop
+
+Each agent turn follows these key steps:
+
+1. **Initial Safety Check**: The user's input is first screened through configured safety shields
+
+2. **Context Retrieval**:
+   - If RAG is enabled, the agent queries relevant documents from memory banks
+   - For new documents, they are first inserted into the memory bank
+   - Retrieved context is augmented to the user's prompt
+
+3. **Inference Loop**: The agent enters its main execution loop:
+   - The LLM receives the augmented prompt (with context and/or previous tool outputs)
+   - The LLM generates a response, potentially with tool calls
+   - If tool calls are present:
+     - Tool inputs are safety-checked
+     - Tools are executed (e.g., web search, code execution)
+     - Tool responses are fed back to the LLM for synthesis
+   - The loop continues until:
+     - The LLM provides a final response without tool calls
+     - Maximum iterations are reached
+     - Token limit is exceeded
+
+4. **Final Safety Check**: The agent's final response is screened through safety shields
+
+```{mermaid}
+sequenceDiagram
+    participant U as User
+    participant E as Executor
+    participant M as Memory Bank
+    participant L as LLM
+    participant T as Tools
+    participant S as Safety Shield
+
+    Note over U,S: Agent Turn Start
+    U->>S: 1. Submit Prompt
+    activate S
+    S->>E: Input Safety Check
+    deactivate S
+
+    E->>M: 2.1 Query Context
+    M-->>E: 2.2 Retrieved Documents
+
+    loop Inference Loop
+        E->>L: 3.1 Augment with Context
+        L-->>E: 3.2 Response (with/without tool calls)
+
+        alt Has Tool Calls
+            E->>S: Check Tool Input
+            S->>T: 4.1 Execute Tool
+            T-->>E: 4.2 Tool Response
+            E->>L: 5.1 Tool Response
+            L-->>E: 5.2 Synthesized Response
+        end
+
+        opt Stop Conditions
+            Note over E: Break if:
+            Note over E: - No tool calls
+            Note over E: - Max iterations reached
+            Note over E: - Token limit exceeded
+        end
+    end
+
+    E->>S: Output Safety Check
+    S->>U: 6. Final Response
+```
+
+Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution:
+
+```python
+from llama_stack_client.lib.agents.event_logger import EventLogger
+
+agent_config = AgentConfig(
+    model="Llama3.2-3B-Instruct",
+    instructions="You are a helpful assistant",
+    # Enable both RAG and tool usage
+    tools=[
+        {
+            "type": "memory",
+            "memory_bank_configs": [{
+                "type": "vector",
+                "bank_id": "my_docs"
+            }],
+            "max_tokens_in_context": 4096
+        },
+        {
+            "type": "code_interpreter",
+            "enable_inline_code_execution": True
+        }
+    ],
+    # Configure safety
+    input_shields=["content_safety"],
+    output_shields=["content_safety"],
+    # Control the inference loop
+    max_infer_iters=5,
+    sampling_params={
+        "temperature": 0.7,
+        "max_tokens": 2048
+    }
+)
+
+agent = Agent(client, agent_config)
+session_id = agent.create_session("monitored_session")
+
+# Stream the agent's execution steps
+response = agent.create_turn(
+    messages=[{"role": "user", "content": "Analyze this code and run it"}],
+    attachments=[{
+        "content": "https://raw.githubusercontent.com/example/code.py",
+        "mime_type": "text/plain"
+    }],
+    session_id=session_id
+)
+
+# Monitor each step of execution
+for log in EventLogger().log(response):
+    if log.event.step_type == "memory_retrieval":
+        print("Retrieved context:", log.event.retrieved_context)
+    elif log.event.step_type == "inference":
+        print("LLM output:", log.event.model_response)
+    elif log.event.step_type == "tool_execution":
+        print("Tool call:", log.event.tool_call)
+        print("Tool response:", log.event.tool_response)
+    elif log.event.step_type == "shield_call":
+        if log.event.violation:
+            print("Safety violation:", log.event.violation)
+```
+
+This example shows how an agent can: Llama Stack provides a high-level agent framework:
+
+```python
+from llama_stack_client.lib.agents.agent import Agent
+from llama_stack_client.types.agent_create_params import AgentConfig
+
+# Configure an agent
+agent_config = AgentConfig(
+    model="Llama3.2-3B-Instruct",
+    instructions="You are a helpful assistant",
+    tools=[
+        {
+            "type": "memory",
+            "memory_bank_configs": [],
+            "query_generator_config": {
+                "type": "default",
+                "sep": " "
+            }
+        }
+    ],
+    input_shields=["content_safety"],
+    output_shields=["content_safety"],
+    enable_session_persistence=True
+)
+
+# Create an agent
+agent = Agent(client, agent_config)
+session_id = agent.create_session("my_session")
+
+# Run agent turns
+response = agent.create_turn(
+    messages=[{"role": "user", "content": "Your question here"}],
+    session_id=session_id
+)
+```
+
+### Adding Tools to Agents
+
+Agents can be enhanced with various tools:
+
+1. **Search**: Web search capabilities through providers like Brave
+2. **Code Interpreter**: Execute code snippets
+3. **RAG**: Memory and document retrieval
+4. **Function Calling**: Custom function execution
+5. **WolframAlpha**: Mathematical computations
+6. **Photogen**: Image generation
+
+Example of configuring an agent with tools:
+
+```python
+agent_config = AgentConfig(
+    model="Llama3.2-3B-Instruct",
+    tools=[
+        {
+            "type": "brave_search",
+            "api_key": "YOUR_API_KEY",
+            "engine": "brave"
+        },
+        {
+            "type": "code_interpreter",
+            "enable_inline_code_execution": True
+        }
+    ],
+    tool_choice="auto",
+    tool_prompt_format="json"
+)
+```
+
+## Building RAG-Enhanced Agents
+
+One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example:
+
+```python
+from llama_stack_client.types import Attachment
+
+# Create attachments from documents
+attachments = [
+    Attachment(
+        content="https://raw.githubusercontent.com/example/doc.rst",
+        mime_type="text/plain"
+    )
+]
+
+# Configure agent with memory
+agent_config = AgentConfig(
+    model="Llama3.2-3B-Instruct",
+    instructions="You are a helpful assistant",
+    tools=[{
+        "type": "memory",
+        "memory_bank_configs": [],
+        "query_generator_config": {"type": "default", "sep": " "},
+        "max_tokens_in_context": 4096,
+        "max_chunks": 10
+    }],
+    enable_session_persistence=True
+)
+
+agent = Agent(client, agent_config)
+session_id = agent.create_session("rag_session")
+
+# Initial document ingestion
+response = agent.create_turn(
+    messages=[{
+        "role": "user",
+        "content": "I am providing some documents for reference."
+    }],
+    attachments=attachments,
+    session_id=session_id
+)
+
+# Query with RAG
+response = agent.create_turn(
+    messages=[{
+        "role": "user",
+        "content": "What are the key topics in the documents?"
+    }],
+    session_id=session_id
+)
+```
+
+## Testing & Evaluation
+
+Llama Stack provides built-in tools for evaluating your applications:
+
+1. **Benchmarking**: Test against standard datasets
+2. **Application Evaluation**: Score your application's outputs
+3. **Custom Metrics**: Define your own evaluation criteria
+
+Here's how to set up basic evaluation:
+
+```python
+# Create an evaluation task
+response = client.eval_tasks.register(
+    eval_task_id="my_eval",
+    dataset_id="my_dataset",
+    scoring_functions=["accuracy", "relevance"]
+)
+
+# Run evaluation
+job = client.eval.run_eval(
+    task_id="my_eval",
+    task_config={
+        "type": "app",
+        "eval_candidate": {
+            "type": "agent",
+            "config": agent_config
+        }
+    }
+)
+
+# Get results
+result = client.eval.job_result(
+    task_id="my_eval",
+    job_id=job.job_id
+)
+```
+
+## Debugging & Monitoring
+
+Llama Stack includes comprehensive telemetry for debugging and monitoring your applications:
+
+1. **Tracing**: Track request flows across components
+2. **Metrics**: Measure performance and usage
+3. **Logging**: Debug issues and track behavior
+
+The telemetry system supports multiple output formats:
+
+- OpenTelemetry for visualization in tools like Jaeger
+- SQLite for local storage and querying
+- Console output for development
+
+Example of querying traces:
+
+```python
+# Query traces for a session
+traces = client.telemetry.query_traces(
+    attribute_filters=[{
+        "key": "session_id",
+        "op": "eq",
+        "value": session_id
+    }]
+)
+
+# Get detailed span information
+span_tree = client.telemetry.get_span_tree(
+    span_id=traces[0].root_span_id
+)
+```
+
 For details on how to use the telemetry system to debug your applications, export traces to a dataset, and run evaluations, see the [Telemetry](telemetry) section.
 
 ```{toctree}
diff --git a/docs/source/conf.py b/docs/source/conf.py
index b657cddff..2a9e3d17c 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -28,6 +28,7 @@ extensions = [
     "sphinx_tabs.tabs",
     "sphinx_design",
     "sphinxcontrib.redoc",
+    "sphinxcontrib.mermaid",
 ]
 myst_enable_extensions = ["colon_fence"]
 
@@ -47,6 +48,7 @@ exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
 myst_enable_extensions = [
     "amsmath",
     "attrs_inline",
+    "attrs_block",
     "colon_fence",
     "deflist",
     "dollarmath",
@@ -65,6 +67,7 @@ myst_substitutions = {
     "docker_hub": "https://hub.docker.com/repository/docker/llamastack",
 }
 
+
 # Copy button settings
 copybutton_prompt_text = "$ "  # for bash prompts
 copybutton_prompt_is_regexp = True
diff --git a/docs/source/distributions/configuration.md b/docs/source/distributions/configuration.md
index abf7d16ed..6fee67936 100644
--- a/docs/source/distributions/configuration.md
+++ b/docs/source/distributions/configuration.md
@@ -81,6 +81,8 @@ A few things to note:
 - The configuration dictionary is provider-specific. Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
 
 ## Resources
+```
+
 Finally, let's look at the `models` section:
 ```yaml
 models:
diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md
index bae31e8c4..c6227db99 100644
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@@ -19,16 +19,17 @@ export LLAMA_STACK_PORT=5001
 ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m
 ```
 
-By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to enspagents/agenure the model remains loaded for sometime.
+By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime.
 
 
 ### 2. Start the Llama Stack server
 
 Llama Stack is based on a client-server architecture. It consists of a server which can be configured very flexibly so you can mix-and-match various providers for its individual API components -- beyond Inference, these include Memory, Agents, Telemetry, Evals and so forth.
 
+To get started quickly, we provide various Docker images for the server component that work with different inference providers out of the box. For this guide, we will use `llamastack/distribution-ollama` as the Docker image.
+
 ```bash
-docker run \
-  -it \
+docker run -it \
   -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
   -v ~/.llama:/root/.llama \
   llamastack/distribution-ollama \
@@ -42,8 +43,7 @@ Configuration for this is available at `distributions/ollama/run.yaml`.
 
 ### 3. Use the Llama Stack client SDK
 
-You can interact with the Llama Stack server using the `llama-stack-client` CLI or via the Python SDK.
-
+You can interact with the Llama Stack server using various client SDKs. We will use the Python SDK which you can install using:
 ```bash
 pip install llama-stack-client
 ```
@@ -123,7 +123,6 @@ async def run_main():
 
     agent = Agent(client, agent_config)
     session_id = agent.create_session("test-session")
-    print(f"Created session_id={session_id} for Agent({agent.agent_id})")
     user_prompts = [
         (
             "I am attaching documentation for Torchtune. Help me answer questions I will ask next.",
@@ -154,3 +153,10 @@ if __name__ == "__main__":
 - Learn how to [Build Llama Stacks](../distributions/index.md)
 - See [References](../references/index.md) for more details about the llama CLI and Python SDK
 - For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository.
+
+
+## Thinking out aloud here in terms of what to write in the docs
+
+- how to get a llama stack server running
+- what are all the different client sdks
+- what are the components of building agents