mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 02:53:30 +00:00

docs: update Agent documentation (#1333 )

Summary:
- [new] Agent concepts (session, turn)
- [new] how to write custom tools
- [new] non-streaming API and how to get outputs
- [update] remaining `memory` -> `rag` rename
- [new] note importance of `instructions`

Test Plan:
read

2025-03-01 22:34:52 -08:00

2.6 KiB

Raw Blame History

Llama Stack Agent Framework

The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.

Core Concepts

1. Agent Configuration

Agents are configured using the AgentConfig class, which includes:

Model: The underlying LLM to power the agent
Instructions: System prompt that defines the agent's behavior
Tools: Capabilities the agent can use to interact with external systems
Safety Shields: Guardrails to ensure responsible AI behavior

from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.lib.agents.agent import Agent

# Configure an agent
agent_config = AgentConfig(
    model="meta-llama/Llama-3-70b-chat",
    instructions="You are a helpful assistant that can use tools to answer questions.",
    toolgroups=["builtin::code_interpreter", "builtin::rag/knowledge_search"],
)

# Create the agent
agent = Agent(llama_stack_client, agent_config)

2. Sessions

Agents maintain state through sessions, which represent a conversation thread:

# Create a session
session_id = agent.create_session(session_name="My conversation")

3. Turns

Each interaction with an agent is called a "turn" and consists of:

Input Messages: What the user sends to the agent
Steps: The agent's internal processing (inference, tool execution, etc.)
Output Message: The agent's response

from llama_stack_client.lib.agents.event_logger import EventLogger

# Create a turn with streaming response
turn_response = agent.create_turn(
    session_id=session_id,
    messages=[{"role": "user", "content": "Tell me about Llama models"}],
)
for log in EventLogger().log(turn_response):
    log.print()

Non-Streaming

from rich.pretty import pprint

# Non-streaming API
response = agent.create_turn(
    session_id=session_id,
    messages=[{"role": "user", "content": "Tell me about Llama models"}],
    stream=False,
)
print("Inputs:")
pprint(response.input_messages)
print("Output:")
pprint(response.output_message.content)
print("Steps:")
pprint(response.steps)

4. Steps

Each turn consists of multiple steps that represent the agent's thought process:

Inference Steps: The agent generating text responses
Tool Execution Steps: The agent using tools to gather information
Shield Call Steps: Safety checks being performed

Agent Execution Loop

Refer to the Agent Execution Loop for more details on what happens within an agent turn.

2.6 KiB Raw Blame History