docs: update Agent documentation (#1333)

Summary: - [new] Agent concepts (session, turn) - [new] how to write custom tools - [new] non-streaming API and how to get outputs - [update] remaining `memory` -> `rag` rename - [new] note importance of `instructions` Test Plan: read
2025-12-04 10:10:36 +00:00 · 2025-03-01 22:34:52 -08:00 · 2025-03-01 22:34:52 -08:00 · 52977e56a8
commit 52977e56a8
parent 46b0a404e8
6 changed files with 170 additions and 64 deletions
--- a/docs/source/building_applications/agent.md
+++ b/docs/source/building_applications/agent.md
@ -0,0 +1,91 @@
+# Llama Stack Agent Framework
+
+The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.
+
+## Core Concepts
+
+### 1. Agent Configuration
+
+Agents are configured using the `AgentConfig` class, which includes:
+
+- **Model**: The underlying LLM to power the agent
+- **Instructions**: System prompt that defines the agent's behavior
+- **Tools**: Capabilities the agent can use to interact with external systems
+- **Safety Shields**: Guardrails to ensure responsible AI behavior
+
+```python
+from llama_stack_client.types.agent_create_params import AgentConfig
+from llama_stack_client.lib.agents.agent import Agent
+
+# Configure an agent
+agent_config = AgentConfig(
+    model="meta-llama/Llama-3-70b-chat",
+    instructions="You are a helpful assistant that can use tools to answer questions.",
+    toolgroups=["builtin::code_interpreter", "builtin::rag/knowledge_search"],
+)
+
+# Create the agent
+agent = Agent(llama_stack_client, agent_config)
+```
+
+### 2. Sessions
+
+Agents maintain state through sessions, which represent a conversation thread:
+
+```python
+# Create a session
+session_id = agent.create_session(session_name="My conversation")
+```
+
+### 3. Turns
+
+Each interaction with an agent is called a "turn" and consists of:
+
+- **Input Messages**: What the user sends to the agent
+- **Steps**: The agent's internal processing (inference, tool execution, etc.)
+- **Output Message**: The agent's response
+
+```python
+from llama_stack_client.lib.agents.event_logger import EventLogger
+
+# Create a turn with streaming response
+turn_response = agent.create_turn(
+    session_id=session_id,
+    messages=[{"role": "user", "content": "Tell me about Llama models"}],
+)
+for log in EventLogger().log(turn_response):
+    log.print()
+```
+###  Non-Streaming
+
+
+
+```python
+from rich.pretty import pprint
+
+# Non-streaming API
+response = agent.create_turn(
+    session_id=session_id,
+    messages=[{"role": "user", "content": "Tell me about Llama models"}],
+    stream=False,
+)
+print("Inputs:")
+pprint(response.input_messages)
+print("Output:")
+pprint(response.output_message.content)
+print("Steps:")
+pprint(response.steps)
+```
+
+### 4. Steps
+
+Each turn consists of multiple steps that represent the agent's thought process:
+
+- **Inference Steps**: The agent generating text responses
+- **Tool Execution Steps**: The agent using tools to gather information
+- **Shield Call Steps**: Safety checks being performed
+
+## Agent Execution Loop
+
+
+Refer to the [Agent Execution Loop](agent_execution_loop) for more details on what happens within an agent turn.