llama-stack/docs/source/building_applications/agent_execution_loop.md at a6a4270eef183b4390bf93c202173a346deca292

forked from phoenix-oss/llama-stack-mirror

Move evals section to AI Agents section 
drop from top level and other minor fixes

2025-01-23 12:42:15 -08:00

4.3 KiB

Raw Blame History

Agent Execution Loop

Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.

Each agent turn follows these key steps:

Initial Safety Check: The user's input is first screened through configured safety shields
Context Retrieval:
- If RAG is enabled, the agent queries relevant documents from memory banks
- For new documents, they are first inserted into the memory bank
- Retrieved context is augmented to the user's prompt
Inference Loop: The agent enters its main execution loop:
- The LLM receives the augmented prompt (with context and/or previous tool outputs)
- The LLM generates a response, potentially with tool calls
- If tool calls are present:
  - Tool inputs are safety-checked
  - Tools are executed (e.g., web search, code execution)
  - Tool responses are fed back to the LLM for synthesis
- The loop continues until:
  - The LLM provides a final response without tool calls
  - Maximum iterations are reached
  - Token limit is exceeded
Final Safety Check: The agent's final response is screened through safety shields

sequenceDiagram
    participant U as User
    participant E as Executor
    participant M as Memory Bank
    participant L as LLM
    participant T as Tools
    participant S as Safety Shield

    Note over U,S: Agent Turn Start
    U->>S: 1. Submit Prompt
    activate S
    S->>E: Input Safety Check
    deactivate S

    E->>M: 2.1 Query Context
    M-->>E: 2.2 Retrieved Documents

    loop Inference Loop
        E->>L: 3.1 Augment with Context
        L-->>E: 3.2 Response (with/without tool calls)

        alt Has Tool Calls
            E->>S: Check Tool Input
            S->>T: 4.1 Execute Tool
            T-->>E: 4.2 Tool Response
            E->>L: 5.1 Tool Response
            L-->>E: 5.2 Synthesized Response
        end

        opt Stop Conditions
            Note over E: Break if:
            Note over E: - No tool calls
            Note over E: - Max iterations reached
            Note over E: - Token limit exceeded
        end
    end

    E->>S: Output Safety Check
    S->>U: 6. Final Response

Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution:

from llama_stack_client.lib.agents.event_logger import EventLogger

agent_config = AgentConfig(
    model="Llama3.2-3B-Instruct",
    instructions="You are a helpful assistant",
    # Enable both RAG and tool usage
    tools=[
        {
            "type": "memory",
            "memory_bank_configs": [{
                "type": "vector",
                "bank_id": "my_docs"
            }],
            "max_tokens_in_context": 4096
        },
        {
            "type": "code_interpreter",
            "enable_inline_code_execution": True
        }
    ],
    # Configure safety
    input_shields=["content_safety"],
    output_shields=["content_safety"],
    # Control the inference loop
    max_infer_iters=5,
    sampling_params={
        "strategy": {
            "type": "top_p",
            "temperature": 0.7,
            "top_p": 0.95
        },
        "max_tokens": 2048
    }
)

agent = Agent(client, agent_config)
session_id = agent.create_session("monitored_session")

# Stream the agent's execution steps
response = agent.create_turn(
    messages=[{"role": "user", "content": "Analyze this code and run it"}],
    attachments=[{
        "content": "https://raw.githubusercontent.com/example/code.py",
        "mime_type": "text/plain"
    }],
    session_id=session_id
)

# Monitor each step of execution
for log in EventLogger().log(response):
    if log.event.step_type == "memory_retrieval":
        print("Retrieved context:", log.event.retrieved_context)
    elif log.event.step_type == "inference":
        print("LLM output:", log.event.model_response)
    elif log.event.step_type == "tool_execution":
        print("Tool call:", log.event.tool_call)
        print("Tool response:", log.event.tool_response)
    elif log.event.step_type == "shield_call":
        if log.event.violation:
            print("Safety violation:", log.event.violation)

4.3 KiB Raw Blame History

Agent Execution Loop

4.3 KiB

Raw Blame History