# Llama Stack Agents - Complete Guide

This notebook provides a comprehensive guide to building agents with Llama Stack, covering:

1. **Basic Agent Example** - Simple agent creation and conversation
2. **Advanced Agent Features** - RAG, multi-turn conversations
3. **MCP Tools Integration** - Model Context Protocol tools for extended capabilities

## Prerequisites

Before running this notebook, ensure:
- ✅ Llama Stack server is running: `llama stack run starter --port 8321`
- ✅ Ollama (or vllm) is running
- ✅ A model is available: `llama3.3:70b`

## Setup

In [119]:
# Import required libraries
import json
from typing import Any, Dict

from llama_stack_client import LlamaStackClient, Agent
from llama_stack_client.types import UserMessage

# Initialize client
client = LlamaStackClient(base_url="http://localhost:8321")

print("✅ Client initialized successfully!")
print(f"   Base URL: http://localhost:8321")

✅ Client initialized successfully!
   Base URL: http://localhost:8321


In [120]:
# Create a basic agent using the Agent class
agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions="You are a helpful AI assistant that can answer questions and help with tasks.",
)

print("✅ Created agent successfully")

✅ Created agent successfully


---

# Part 1: Basic Agent Example

Let's start with a simple agent that can answer questions. This demonstrates:
- Agent creation with basic configuration
- Session management
- Streaming responses

In [121]:
# Create agent session
basic_session_id = agent.create_session(session_name="basic_example_session")

print(f"✅ Created session: {basic_session_id}")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created session: conv_fc6e84e0499f522e30a4c91e6a3f13d6f03fa326386c403b


In [122]:
# Send a message to the agent with streaming
query = "What is the capital of France? Please explain briefly."

print(f"User: {query}\n")
print("Assistant: ", end='')

# Create a turn with streaming
response = agent.create_turn(
    session_id=basic_session_id,
    messages=[UserMessage(content=query, role="user")],
    stream=True,
)

# Stream the response
output_text = ""
for chunk in response:
    if chunk.event.event_type == "turn_completed":
        output_text = chunk.event.final_text
        print(output_text)
        break
    elif chunk.event.event_type == "step_progress":
        # Print text deltas as they arrive
        if hasattr(chunk.event.delta, 'text'):
            print(chunk.event.delta.text, end='', flush=True)

print(f"\n✅ Response captured: {len(output_text)} characters")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


User: What is the capital of France? Please explain briefly.

Assistant: The capital of France is Paris. It's a city known for its iconic landmarks like the Eiffel Tower, art museums, and historic architecture, serving as the country's political, cultural, and economic center.The capital of France is Paris. It's a city known for its iconic landmarks like the Eiffel Tower, art museums, and historic architecture, serving as the country's political, cultural, and economic center.

✅ Response captured: 204 characters


In [123]:
# Clean up the session
client.conversations.delete(conversation_id=basic_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_fc6e84e0499f522e30a4c91e6a3f13d6f03fa326386c403b "HTTP/1.1 200 OK"


✅ Session cleaned up


---

# Part 2: Advanced Agent Features

Now let's explore more advanced capabilities:
- Multi-turn conversations with context memory
- RAG (Retrieval-Augmented Generation) patterns

## 2.1 Multi-Turn Conversation

Demonstrate how agents can maintain context across multiple conversation turns.

In [124]:
# Create agent for multi-turn conversation
conv_agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions="You are a helpful assistant that remembers context from previous messages.",
)

print("✅ Created conversation agent")

conv_session_id = conv_agent.create_session(session_name="multi_turn_session")
print(f"✅ Created session: {conv_session_id}")

✅ Created conversation agent


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created session: conv_53b1fa277c2b51b59846a499a4a8f4fb264c86d50fe03b30


In [125]:
# Conversation turns that build on each other
conversation_turns = [
    "My name is Alice and I'm learning about AI.",
    "What are some good resources for beginners?",
    "Can you remind me what my name is?",
]

for i, query in enumerate(conversation_turns, 1):
    print(f"\n{'='*60}")
    print(f"Turn {i}")
    print(f"{'='*60}")
    print(f"User: {query}")

    response = conv_agent.create_turn(
        session_id=conv_session_id,
        messages=[UserMessage(content=query, role="user")],
        stream=True,
    )

    print("Assistant: ", end='')
    for chunk in response:
        if chunk.event.event_type == "turn_completed":
            output = chunk.event.final_text
            print(output)
            break
        elif chunk.event.event_type == "step_progress":
            if hasattr(chunk.event.delta, 'text'):
                print(chunk.event.delta.text, end='', flush=True)

print(f"\n✅ Completed {len(conversation_turns)} conversational turns with context retention")


Turn 1
User: My name is Alice and I'm learning about AI.
Assistant: 

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


Nice to meet you, Alice! Learning about AI can be a fascinating topic. There's so much to explore, from machine learning and natural language processing to computer vision and robotics. What specific area of AI are you most interested in or would you like to start with the basics? I'm here to help and provide guidance as you learn.Nice to meet you, Alice! Learning about AI can be a fascinating topic. There's so much to explore, from machine learning and natural language processing to computer vision and robotics. What specific area of AI are you most interested in or would you like to start with the basics? I'm here to help and provide guidance as you learn.

Turn 2
User: What are some good resources for beginners?
Assistant: 

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


As a beginner, it's essential to start with resources that are easy to understand and provide a solid foundation in AI concepts. Here are some popular resources:

**Online Courses:**

1. **Coursera - Machine Learning by Andrew Ng**: A fantastic course that covers the basics of machine learning.
2. **edX - Introduction to Artificial Intelligence**: A broad introduction to AI, covering topics like computer vision, robotics, and natural language processing.
3. **Udemy - Artificial Intelligence for Beginners**: A beginner-friendly course that explores AI fundamentals.

**Websites and Blogs:**

1. **Towards Data Science**: A great blog that features articles on machine learning, AI, and data science.
2. **AI Alignment Forum**: A website that discusses the latest developments in AI research and applications.
3. **KDnuggets**: A popular blog that covers AI, machine learning, and data science topics.

**Books:**

1. **"Artificial Intelligence: A Modern Approach" by Stuart Russell and Peter Nor

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


Your name is Alice! I remember you told me that when we started our conversation about learning AI. How's your exploration of AI resources going so far, Alice? Do you have any questions or need help with anything specific?Your name is Alice! I remember you told me that when we started our conversation about learning AI. How's your exploration of AI resources going so far, Alice? Do you have any questions or need help with anything specific?

✅ Completed 3 conversational turns with context retention


In [126]:
# Cleanup
client.conversations.delete(conversation_id=conv_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_53b1fa277c2b51b59846a499a4a8f4fb264c86d50fe03b30 "HTTP/1.1 200 OK"


✅ Session cleaned up


## 2.2 RAG (Retrieval-Augmented Generation) Pattern

Demonstrate how to provide context to the agent for more accurate responses.

In [127]:
# Sample knowledge base documents
documents = [
    {
        "doc_id": "doc1",
        "content": "Llama Stack is Meta's comprehensive framework for building LLM applications. "
                  "It provides standardized APIs for inference, safety, agents, and more.",
        "metadata": {"category": "framework", "source": "docs"}
    },
    {
        "doc_id": "doc2",
        "content": "MCP (Model Context Protocol) is a standardized protocol that allows AI models "
                  "to interact with external tools and data sources in a consistent way.",
        "metadata": {"category": "protocol", "source": "docs"}
    },
    {
        "doc_id": "doc3",
        "content": "RAG (Retrieval-Augmented Generation) combines information retrieval with "
                  "language generation to provide more accurate and contextual responses.",
        "metadata": {"category": "technique", "source": "docs"}
    },
]

print(f"Knowledge base: {len(documents)} documents")
for doc in documents:
    print(f"  - {doc['doc_id']}: {doc['content'][:50]}...")

Knowledge base: 3 documents
  - doc1: Llama Stack is Meta's comprehensive framework for ...
  - doc2: MCP (Model Context Protocol) is a standardized pro...
  - doc3: RAG (Retrieval-Augmented Generation) combines info...


In [128]:
# Create RAG-enabled agent
rag_agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions=(
        "You are a helpful AI assistant with access to a knowledge base. "
        "When answering questions, use the provided context from the knowledge base. "
        "If the context doesn't contain relevant information, say so."
    ),
)

print("✅ Created RAG agent")

rag_session_id = rag_agent.create_session(session_name="rag_session")
print(f"✅ Created session: {rag_session_id}")

✅ Created RAG agent


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created session: conv_0361da3cbb5e8a9d9ce4ae6404e16f66724e6a18f8c01077


In [129]:
# Query with context
query = "What is Llama Stack and what does it provide?"

# Simulate retrieval (in production, use vector search)
relevant_docs = [doc for doc in documents if "llama stack" in doc["content"].lower()]
context = "\n\n".join([f"Document {i+1}:\n{doc['content']}"
                       for i, doc in enumerate(relevant_docs)])

# Create prompt with retrieved context
prompt_with_context = f"""Context from knowledge base:
{context}

Question: {query}

Please answer based on the provided context."""

print(f"Query: {query}")
print(f"Retrieved {len(relevant_docs)} relevant document(s)\n")
print("Answer: ", end='')

response = rag_agent.create_turn(
    session_id=rag_session_id,
    messages=[UserMessage(content=prompt_with_context, role="user")],
    stream=True,
)

for chunk in response:
    if chunk.event.event_type == "turn_completed":
        output = chunk.event.final_text
        print(output)
        break
    elif chunk.event.event_type == "step_progress":
        if hasattr(chunk.event.delta, 'text'):
            print(chunk.event.delta.text, end='', flush=True)

print("\n")
client.conversations.delete(conversation_id=rag_session_id)
print("✅ Session cleaned up")

Query: What is Llama Stack and what does it provide?
Retrieved 1 relevant document(s)

Answer: 

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


According to the provided context from Document 1, Llama Stack is Meta's comprehensive framework for building LLM (Large Language Model) applications. It provides standardized APIs (Application Programming Interfaces) for various functions, including inference, safety, agents, and more.According to the provided context from Document 1, Llama Stack is Meta's comprehensive framework for building LLM (Large Language Model) applications. It provides standardized APIs (Application Programming Interfaces) for various functions, including inference, safety, agents, and more.




INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_0361da3cbb5e8a9d9ce4ae6404e16f66724e6a18f8c01077 "HTTP/1.1 200 OK"


✅ Session cleaned up


---

# Part 3: MCP (Model Context Protocol) Tools

MCP provides a standardized way for AI models to interact with external tools and data sources.

We'll demonstrate:
- Defining MCP-compatible tools
- Agent tool selection
- Tool execution and response handling

In [130]:
def create_mcp_tools():
    """Create MCP-compatible tool definitions."""
    return [
        {
            "tool_name": "get_weather",
            "description": "Get current weather information for a specified location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state/country, e.g., 'San Francisco, CA'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit",
                        "default": "fahrenheit"
                    }
                },
                "required": ["location"]
            }
        },
        {
            "tool_name": "execute_code",
            "description": "Execute Python code and return the result",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Python code to execute"
                    }
                },
                "required": ["code"]
            }
        },
        {
            "tool_name": "web_search",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                },
                "required": ["query"]
            }
        },
    ]

tools = create_mcp_tools()
print(f"Created {len(tools)} MCP tools:")
for tool in tools:
    print(f"  - {tool['tool_name']}: {tool['description']}")

Created 3 MCP tools:
  - get_weather: Get current weather information for a specified location
  - execute_code: Execute Python code and return the result
  - web_search: Search the web for information


In [132]:
# Example 2: MCP Server Configuration (0.3.0 format)

# MCP server configuration
# Replace with your actual MCP server URL and credentials
MCP_SERVER_URL = "https://api.example.com/mcp"  # Your MCP server endpoint
MCP_ACCESS_TOKEN = "your-token-here"  # Your authentication token

MCP_ACCESS_TOKEN = "YOUR_ACCESS_TOKEN_HERE"
## ran an MCP server locally, you can replace this field with your mcp server url
MCP_SERVER_URL = "http://localhost:3000/sse" 
#MCP_SERVER_URL  = "https://mcp.deepwiki.com/sse"
mcp_tools = [
    {
        "type": "mcp",
        "server_url": MCP_SERVER_URL,
        "server_label": "weather",
        "headers": {
            "Authorization": f"Bearer {MCP_ACCESS_TOKEN}",
        },
    }
]


print("MCP tool configuration ready")
print(f"   Server: {MCP_SERVER_URL}")
print("   Format: MCP server-based")
print("\n To use MCP tools:")
print("   1. Set up your MCP server")
print("   2. Update MCP_SERVER_URL and MCP_ACCESS_TOKEN above")
print("   3. Pass mcp_tools to Agent(tools=mcp_tools)")

MCP tool configuration ready
   Server: http://localhost:3000/sse
   Format: MCP server-based

 To use MCP tools:
   1. Set up your MCP server
   2. Update MCP_SERVER_URL and MCP_ACCESS_TOKEN above
   3. Pass mcp_tools to Agent(tools=mcp_tools)


In [133]:
def simulate_tool_execution(tool_name: str, arguments: Dict[str, Any]) -> str:
    """Simulate tool execution (replace with real implementations)."""
    if tool_name == "get_weather":
        location = arguments.get("location", "Unknown")
        unit = arguments.get("unit", "fahrenheit")
        temp = "72°F" if unit == "fahrenheit" else "22°C"
        return json.dumps({
            "location": location,
            "temperature": temp,
            "condition": "Partly cloudy",
            "humidity": "65%",
            "wind": "10 mph NW"
        })
    elif tool_name == "execute_code":
        code = arguments.get("code", "")
        return json.dumps({
            "status": "success",
            "output": f"Code execution simulated for: {code[:50]}..."
        })
    elif tool_name == "web_search":
        query = arguments.get("query", "")
        return json.dumps({
            "status": "success",
            "results": [
                {"title": f"Result {i+1}", "url": f"https://example.com/{i+1}",
                 "snippet": f"Information about {query}"}
                for i in range(3)
            ]
        })
    return json.dumps({"error": "Unknown tool"})

print("Tool execution simulator ready")

Tool execution simulator ready


In [145]:
mcp_agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions="You are a helpful AI assistant that can answer questions and help with various tasks.",
    tools=mcp_tools # you can set this field to tools when experimenting with the tools created by create_mcp_tools above.
)

print("Created MCP agent")

mcp_session_id = mcp_agent.create_session(session_name="mcp_tools_session")
print(f"✅ Created session: {mcp_session_id}")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


Created MCP agent
✅ Created session: conv_d86d8242813c4019f1176754067d0e393923fcc364429f5b


In [146]:
# Example: Weather query that should trigger tool usage
query = "What's the weather like in New York City?"

print(f"{'='*70}")
print(f"MCP TOOL EXAMPLE")
print(f"{'='*70}")
print(f"\n User: {query}")

response = mcp_agent.create_turn(
    session_id=mcp_session_id,
    messages=[UserMessage(content=query, role="user")],
    stream=True,
)

print("\n Assistant: ", end='')
tool_calls_made = []

for chunk in response:
    event_type = chunk.event.event_type

    if event_type == "step_started":
        if chunk.event.step_type == "tool_execution":
            print(f"\n\n [Tool Execution Started]")

    elif event_type == "step_progress":
        # Check for tool call deltas
        if hasattr(chunk.event.delta, 'delta_type'):
            if chunk.event.delta.delta_type == "tool_call_issued":
                tool_calls_made.append(chunk.event.delta)
                result = simulate_tool_execution(
                    chunk.event.delta.tool_name,
                    json.loads(chunk.event.delta.arguments)
                )
        if hasattr(chunk.event.delta, 'text'):
            print(chunk.event.delta.text, end='', flush=True)

    elif event_type == "turn_completed":
        output = chunk.event.final_text
        if output:
            print(output)

print()
if tool_calls_made:
    print(f"\n Summary: Used {len(tool_calls_made)} tool(s) to answer the query")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


MCP TOOL EXAMPLE

 User: What's the weather like in New York City?

 Assistant: 

 [Tool Execution Started]


 [Tool Execution Started]
The current weather in New York City is 52°F with a slight chance of very light rain and winds of 12 mph NE. Tomorrow's forecast shows a high of 55°F with a slight chance of very light rain and winds of 14 to 17 mph NE. Over the next few days, the city can expect temperatures ranging from 49°F to 61°F with varying chances of rain and wind speeds. It's recommended to check the latest forecasts for the most up-to-date information.The current weather in New York City is 52°F with a slight chance of very light rain and winds of 12 mph NE. Tomorrow's forecast shows a high of 55°F with a slight chance of very light rain and winds of 14 to 17 mph NE. Over the next few days, the city can expect temperatures ranging from 49°F to 61°F with varying chances of rain and wind speeds. It's recommended to check the latest forecasts for the most up-to-date information.

In [147]:
# Cleanup
client.conversations.delete(conversation_id=mcp_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_d86d8242813c4019f1176754067d0e393923fcc364429f5b "HTTP/1.1 200 OK"


✅ Session cleaned up


---

# Summary

This notebook demonstrated three levels of Llama Stack agent capabilities:

## 1. Basic Agent
- ✅ Simple agent creation
- ✅ Session management  
- ✅ Streaming responses

## 2. Advanced Features
- ✅ Multi-turn conversations
- ✅ RAG (Retrieval-Augmented Generation) pattern
- ✅ Custom knowledge base integration

## 3. MCP Tools Integration
- ✅ MCP-compatible tool definitions
- ✅ Automatic tool selection by the agent
- ✅ Tool execution and response handling
- ✅ Real-time streaming with tool calls


## Resources

- [Llama Stack Documentation](https://llama-stack.readthedocs.io/)
- [Llama Stack GitHub](https://github.com/meta-llama/llama-stack)
- [MCP Protocol Specification](https://modelcontextprotocol.io/)
- [Ollama Documentation](https://ollama.ai/)