[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/llamastack_agents_getting_started_examples.ipynb)

# Llama Stack Agents - Getting Started Guide

This notebook provides a comprehensive introduction to building AI agents with Llama Stack. The Agent SDK is built on top of an open source version of **OpenAI's Responses+ APIs**, providing a standardized interface for agent workflows.

## What You'll Learn

1. **Basic Agent Creation** - Simple Q&A agents with streaming
2. **Multi-Turn Conversations** - Maintaining context across conversations
3. **RAG Integration** - Adding knowledge bases to your agents  
4. **MCP Tools** - Extending agents with Model Context Protocol tools

## Prerequisites

- Llama Stack server running: `llama stack run starter --port 8321`
- A model provider configured (Ollama, Fireworks, etc.)
- Python 3.10+

## Setup

In [1]:
# Import required libraries
import json
from typing import Any, Dict

from llama_stack_client import LlamaStackClient, Agent
from llama_stack_client.types import UserMessage

# Initialize client
client = LlamaStackClient(base_url="http://localhost:8321")

print("✅ Client initialized successfully!")
print(f"   Base URL: http://localhost:8321")

✅ Client initialized successfully!
   Base URL: http://localhost:8321


In [2]:
# Create a basic agent using the Agent class
agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions="You are a helpful AI assistant that can answer questions and help with tasks.",
)

print("✅ Created agent successfully")

✅ Created agent successfully


---

# Part 1: Basic Agent Example

Let's start with a simple agent that can answer questions. This demonstrates:
- Agent creation with basic configuration
- Session management
- Streaming responses

In [3]:
# Create agent session
basic_session_id = agent.create_session(session_name="basic_example_session")

print(f"✅ Created session: {basic_session_id}")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created session: conv_e6afd7aaa97b49ce8f4f96a801b07893d9cb784d72e53e3c


In [4]:
# Send a message to the agent with streaming
query = "What is the capital of France? Please explain briefly."

print(f"User: {query}\n")
print("Assistant: ", end='')

# Create a turn with streaming
response = agent.create_turn(
    session_id=basic_session_id,
    messages=[UserMessage(content=query, role="user")],
    stream=True,
)

# Stream the response
output_text = ""
for chunk in response:
    if chunk.event.event_type == "turn_completed":
        output_text = chunk.event.final_text
        print(output_text)
        break
    elif chunk.event.event_type == "step_progress":
        # Print text deltas as they arrive
        if hasattr(chunk.event.delta, 'text'):
            print(chunk.event.delta.text, end='', flush=True)

print(f"\n✅ Response captured: {len(output_text)} characters")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


User: What is the capital of France? Please explain briefly.

Assistant: The capital of France is Paris. It's the country's largest city, known for iconic landmarks like the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum, serving as the center of French politics, culture, and economy.The capital of France is Paris. It's the country's largest city, known for iconic landmarks like the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum, serving as the center of French politics, culture, and economy.

✅ Response captured: 223 characters


In [5]:
# Clean up the session
client.conversations.delete(conversation_id=basic_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_e6afd7aaa97b49ce8f4f96a801b07893d9cb784d72e53e3c "HTTP/1.1 200 OK"


✅ Session cleaned up


---

# Part 2: Advanced Agent Features

Now let's explore more advanced capabilities:
- Multi-turn conversations with context memory
- RAG (Retrieval-Augmented Generation) patterns

## 2.1 Multi-Turn Conversation

Demonstrate how agents can maintain context across multiple conversation turns.

In [6]:
# Create agent for multi-turn conversation
conv_agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions="You are a helpful assistant that remembers context from previous messages.",
)

print("✅ Created conversation agent")

conv_session_id = conv_agent.create_session(session_name="multi_turn_session")
print(f"✅ Created session: {conv_session_id}")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created conversation agent
✅ Created session: conv_936121c2e27b7d1f7d3f0b6a62adce867d79268f5f9ce265


In [7]:
# Conversation turns that build on each other
conversation_turns = [
    "My name is Alice and I'm learning about AI.",
    "What are some good resources for beginners?",
    "Can you remind me what my name is?",
]

for i, query in enumerate(conversation_turns, 1):
    print(f"\n{'='*60}")
    print(f"Turn {i}")
    print(f"{'='*60}")
    print(f"User: {query}")

    response = conv_agent.create_turn(
        session_id=conv_session_id,
        messages=[UserMessage(content=query, role="user")],
        stream=True,
    )

    print("Assistant: ", end='')
    for chunk in response:
        if chunk.event.event_type == "turn_completed":
            output = chunk.event.final_text
            print(output)
            break
        elif chunk.event.event_type == "step_progress":
            if hasattr(chunk.event.delta, 'text'):
                print(chunk.event.delta.text, end='', flush=True)

print(f"\n✅ Completed {len(conversation_turns)} conversational turns with context retention")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"



Turn 1
User: My name is Alice and I'm learning about AI.
Assistant: Nice to meet you, Alice! It's great that you're interested in learning about AI. What aspects of AI would you like to explore? Are you curious about machine learning, natural language processing, or something else? I'll be happy to help and provide information tailored to your interests.Nice to meet you, Alice! It's great that you're interested in learning about AI. What aspects of AI would you like to explore? Are you curious about machine learning, natural language processing, or something else? I'll be happy to help and provide information tailored to your interests.

Turn 2
User: What are some good resources for beginners?
Assistant: 

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


As a beginner, it's essential to start with resources that provide a solid foundation in AI concepts. Here are some recommendations:

1. **Online Courses**:
	* Andrew Ng's Machine Learning course on Coursera: A popular and comprehensive introduction to machine learning.
	* Stanford University's Natural Language Processing with Deep Learning Specialization on Coursera: Covers NLP fundamentals and deep learning techniques.
2. **Books**:
	* "Introduction to Artificial Intelligence" by Philip C. Jackson Jr.: A gentle introduction to AI concepts, including machine learning and computer vision.
	* "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville: A detailed book on deep learning techniques, although it may require some prior knowledge of linear algebra and calculus.
3. **Websites and Blogs**:
	* Machine Learning Mastery: A website offering tutorials, examples, and explanations on various machine learning topics.
	* KDnuggets: A popular blog covering AI, machine learning,

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


Your name is Alice! I remember that from our previous conversation when you introduced yourself as someone interested in learning about AI. How can I assist you further today?Your name is Alice! I remember that from our previous conversation when you introduced yourself as someone interested in learning about AI. How can I assist you further today?

✅ Completed 3 conversational turns with context retention


In [8]:
# Cleanup
client.conversations.delete(conversation_id=conv_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_936121c2e27b7d1f7d3f0b6a62adce867d79268f5f9ce265 "HTTP/1.1 200 OK"


✅ Session cleaned up


## 2.2 RAG (Retrieval-Augmented Generation) Pattern

Demonstrate how to provide context to the agent for more accurate responses.

In [9]:
# Sample knowledge base: Paul Graham essay excerpts
# This is a common RAG example - using actual content from Paul Graham's essays
documents = [
    {
        "doc_id": "pg_essay_1",
        "content": """What I Worked On

        Before college the two main things I worked on, outside of school, were writing and programming.
        I didn't write essays. I wrote what beginning writers were supposed to write then, and probably
        still are: short stories. My stories were awful. They had hardly any plot, just characters with
        strong feelings, which I imagined made them deep.

        The first programs I tried writing were on the IBM 1401 that our school district used for what
        was then called 'data processing.' This was in 9th grade, so I was 13 or 14. The school district's
        1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got
        permission to use it.""",
        "metadata": {"essay": "What I Worked On", "author": "Paul Graham", "year": 2021}
    },
    {
        "doc_id": "pg_essay_2",
        "content": """How to Start a Startup

        You need three things to create a successful startup: to start with good people, to make something
        customers actually want, and to spend as little money as possible. Most startups that fail do it
        because they fail at one of these. A startup that does all three will probably succeed.

        And that's kind of exciting, when you think about it, because all three are doable. Hard, but doable.
        And since a startup that succeeds ordinarily makes its founders rich, that implies getting rich is
        doable too. Hard, but doable.""",
        "metadata": {"essay": "How to Start a Startup", "author": "Paul Graham", "year": 2005}
    },
    {
        "doc_id": "pg_essay_3",
        "content": """Maker's Schedule, Manager's Schedule

        One reason programmers dislike meetings so much is that they're on a different type of schedule
        from other people. Meetings cost them more.

        There are two types of schedule, which I'll call the manager's schedule and the maker's schedule.
        The manager's schedule is for bosses. It's embodied in the traditional appointment book, with each
        day cut into one hour intervals. When you use time that way, it's merely a practical problem to
        meet with someone. But there's another way of using time that's common among people who make things,
        like programmers and writers. They generally prefer to use time in units of half a day at least.""",
        "metadata": {"essay": "Maker's Schedule, Manager's Schedule", "author": "Paul Graham", "year": 2009}
    },
]

print(f"Knowledge base: {len(documents)} Paul Graham essay excerpts")
for doc in documents:
    print(f"  - {doc['doc_id']}: {doc['metadata']['essay']}")

Knowledge base: 3 Paul Graham essay excerpts
  - pg_essay_1: What I Worked On
  - pg_essay_2: How to Start a Startup
  - pg_essay_3: Maker's Schedule, Manager's Schedule


In [10]:
# Create RAG-enabled agent
rag_agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions=(
        "You are a helpful AI assistant with access to a knowledge base. "
        "When answering questions, use the provided context from the knowledge base. "
        "If the context doesn't contain relevant information, say so."
    ),
)

print("✅ Created RAG agent")

rag_session_id = rag_agent.create_session(session_name="rag_session")
print(f"✅ Created session: {rag_session_id}")

✅ Created RAG agent


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created session: conv_9ae94374c781501f2d712620dcc8e55961b5a226df229b1d


In [11]:
# Query with context from Paul Graham essays
query = "What did Paul Graham work on before college?"

# Simulate retrieval (in production, use vector search)
relevant_docs = [doc for doc in documents if "before college" in doc["content"].lower()]
context = "\n\n".join([f"From '{doc['metadata']['essay']}':\n{doc['content']}"
                       for doc in relevant_docs])

# Create prompt with retrieved context
prompt_with_context = f"""Context from knowledge base:
{context}

Question: {query}

Please answer based on the provided context."""

print(f"Query: {query}")
print(f"Retrieved {len(relevant_docs)} relevant document(s)\n")
print("Answer: ", end='')

response = rag_agent.create_turn(
    session_id=rag_session_id,
    messages=[UserMessage(content=prompt_with_context, role="user")],
    stream=True,
)

for chunk in response:
    if chunk.event.event_type == "turn_completed":
        output = chunk.event.final_text
        print(output)
        break
    elif chunk.event.event_type == "step_progress":
        if hasattr(chunk.event.delta, 'text'):
            print(chunk.event.delta.text, end='', flush=True)

print("\n")
client.conversations.delete(conversation_id=rag_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


Query: What did Paul Graham work on before college?
Retrieved 1 relevant document(s)

Answer: Based on the provided context from "What I Worked On", before college, Paul Graham worked on two main things outside of school: 

1. Writing (specifically short stories)
2. Programming (initially on the IBM 1401)Based on the provided context from "What I Worked On", before college, Paul Graham worked on two main things outside of school: 

1. Writing (specifically short stories)
2. Programming (initially on the IBM 1401)




INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_9ae94374c781501f2d712620dcc8e55961b5a226df229b1d "HTTP/1.1 200 OK"


✅ Session cleaned up


---

# Part 3: MCP (Model Context Protocol) Tools

MCP provides a standardized way for AI models to interact with external tools and data sources.

We'll demonstrate:
- Defining MCP-compatible tools
- Agent tool selection
- Tool execution and response handling

In [12]:
def create_mcp_tools():
    """Create MCP-compatible tool definitions."""
    return [
        {
            "tool_name": "get_weather",
            "description": "Get current weather information for a specified location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state/country, e.g., 'San Francisco, CA'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit",
                        "default": "fahrenheit"
                    }
                },
                "required": ["location"]
            }
        },
        {
            "tool_name": "execute_code",
            "description": "Execute Python code and return the result",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Python code to execute"
                    }
                },
                "required": ["code"]
            }
        },
        {
            "tool_name": "web_search",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                },
                "required": ["query"]
            }
        },
    ]

tools = create_mcp_tools()
print(f"Created {len(tools)} MCP tools:")
for tool in tools:
    print(f"  - {tool['tool_name']}: {tool['description']}")

Created 3 MCP tools:
  - get_weather: Get current weather information for a specified location
  - execute_code: Execute Python code and return the result
  - web_search: Search the web for information


In [13]:
# Example 2: MCP Server Configuration (0.3.0 format)

# MCP server configuration
# Replace with your actual MCP server URL and credentials
MCP_SERVER_URL = "https://api.example.com/mcp"  # Your MCP server endpoint
MCP_ACCESS_TOKEN = "your-token-here"  # Your authentication token

MCP_ACCESS_TOKEN = "YOUR_ACCESS_TOKEN_HERE"
## ran an MCP server locally, you can replace this field with your mcp server url
MCP_SERVER_URL = "http://localhost:3000/sse"
#MCP_SERVER_URL  = "https://mcp.deepwiki.com/sse"
mcp_tools = [
    {
        "type": "mcp",
        "server_url": MCP_SERVER_URL,
        "server_label": "weather",
        "headers": {
            "Authorization": f"Bearer {MCP_ACCESS_TOKEN}",
        },
    }
]


print("MCP tool configuration ready")
print(f"   Server: {MCP_SERVER_URL}")
print("   Format: MCP server-based")
print("\n To use MCP tools:")
print("   1. Set up your MCP server")
print("   2. Update MCP_SERVER_URL and MCP_ACCESS_TOKEN above")
print("   3. Pass mcp_tools to Agent(tools=mcp_tools)")

MCP tool configuration ready
   Server: http://localhost:3000/sse
   Format: MCP server-based

 To use MCP tools:
   1. Set up your MCP server
   2. Update MCP_SERVER_URL and MCP_ACCESS_TOKEN above
   3. Pass mcp_tools to Agent(tools=mcp_tools)


In [14]:
def simulate_tool_execution(tool_name: str, arguments: Dict[str, Any]) -> str:
    """Simulate tool execution (replace with real implementations)."""
    if tool_name == "get_weather":
        location = arguments.get("location", "Unknown")
        unit = arguments.get("unit", "fahrenheit")
        temp = "72°F" if unit == "fahrenheit" else "22°C"
        return json.dumps({
            "location": location,
            "temperature": temp,
            "condition": "Partly cloudy",
            "humidity": "65%",
            "wind": "10 mph NW"
        })
    elif tool_name == "execute_code":
        code = arguments.get("code", "")
        return json.dumps({
            "status": "success",
            "output": f"Code execution simulated for: {code[:50]}..."
        })
    elif tool_name == "web_search":
        query = arguments.get("query", "")
        return json.dumps({
            "status": "success",
            "results": [
                {"title": f"Result {i+1}", "url": f"https://example.com/{i+1}",
                 "snippet": f"Information about {query}"}
                for i in range(3)
            ]
        })
    return json.dumps({"error": "Unknown tool"})

print("Tool execution simulator ready")

Tool execution simulator ready


In [15]:
mcp_agent = Agent(
    client=client,
    model="ollama/llama3.3:70b",
    instructions="You are a helpful AI assistant that can answer questions and help with various tasks.",
    tools=mcp_tools # you can set this field to tools when experimenting with the tools created by create_mcp_tools above.
)

print("Created MCP agent")

mcp_session_id = mcp_agent.create_session(session_name="mcp_tools_session")
print(f"✅ Created session: {mcp_session_id}")

Created MCP agent


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/conversations "HTTP/1.1 200 OK"


✅ Created session: conv_5613324aa4c3193b1434bf562fe1c75dc2e0563c681738b1


In [16]:
# Example: Weather query that should trigger tool usage
query = "What's the weather like in New York City?"

print(f"{'='*70}")
print(f"MCP TOOL EXAMPLE")
print(f"{'='*70}")
print(f"\n User: {query}")

response = mcp_agent.create_turn(
    session_id=mcp_session_id,
    messages=[UserMessage(content=query, role="user")],
    stream=True,
)

print("\n Assistant: ", end='')
tool_calls_made = []

for chunk in response:
    event_type = chunk.event.event_type

    if event_type == "step_started":
        if chunk.event.step_type == "tool_execution":
            print(f"\n\n [Tool Execution Started]")

    elif event_type == "step_progress":
        # Check for tool call deltas
        if hasattr(chunk.event.delta, 'delta_type'):
            if chunk.event.delta.delta_type == "tool_call_issued":
                tool_calls_made.append(chunk.event.delta)
                result = simulate_tool_execution(
                    chunk.event.delta.tool_name,
                    json.loads(chunk.event.delta.arguments)
                )
        if hasattr(chunk.event.delta, 'text'):
            print(chunk.event.delta.text, end='', flush=True)

    elif event_type == "turn_completed":
        output = chunk.event.final_text
        if output:
            print(output)

print()
if tool_calls_made:
    print(f"\n Summary: Used {len(tool_calls_made)} tool(s) to answer the query")

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/responses "HTTP/1.1 200 OK"


MCP TOOL EXAMPLE

 User: What's the weather like in New York City?

 Assistant: 

 [Tool Execution Started]


 [Tool Execution Started]
The current weather in New York City is mostly cloudy with a temperature of 49°F and a wind speed of 17 mph NE. Today, it will be partly sunny with a high of 55°F. Tonight, there's a chance of rain showers with a low of 53°F. The rest of the week will see a mix of rain, thunderstorms, and sunshine, with temperatures ranging from the mid-50s to the mid-60s. It's a good idea to check the forecast regularly for updates.The current weather in New York City is mostly cloudy with a temperature of 49°F and a wind speed of 17 mph NE. Today, it will be partly sunny with a high of 55°F. Tonight, there's a chance of rain showers with a low of 53°F. The rest of the week will see a mix of rain, thunderstorms, and sunshine, with temperatures ranging from the mid-50s to the mid-60s. It's a good idea to check the forecast regularly for updates.


 Summary: Used 2 tool

In [17]:
# Cleanup
client.conversations.delete(conversation_id=mcp_session_id)
print("✅ Session cleaned up")

INFO:httpx:HTTP Request: DELETE http://localhost:8321/v1/conversations/conv_5613324aa4c3193b1434bf562fe1c75dc2e0563c681738b1 "HTTP/1.1 200 OK"


✅ Session cleaned up


---

# Summary

This notebook demonstrated three levels of Llama Stack agent capabilities:

## 1. Basic Agent
- ✅ Simple agent creation
- ✅ Session management  
- ✅ Streaming responses

## 2. Advanced Features
- ✅ Multi-turn conversations
- ✅ RAG (Retrieval-Augmented Generation) pattern
- ✅ Custom knowledge base integration

## 3. MCP Tools Integration
- ✅ MCP-compatible tool definitions
- ✅ Automatic tool selection by the agent
- ✅ Tool execution and response handling
- ✅ Real-time streaming with tool calls


## Resources

- [Llama Stack Documentation](https://llama-stack.readthedocs.io/)
- [Llama Stack GitHub](https://github.com/meta-llama/llama-stack)
- [MCP Protocol Specification](https://modelcontextprotocol.io/)
- [Ollama Documentation](https://ollama.ai/)