# AutoGen + Llama Stack Integration

## Overview

This notebook demonstrates how to use **AutoGen (AG2)** with **Llama Stack** as the backend.

### What is AutoGen?
- Microsoft's framework for **conversational multi-agent** systems
- Emphasizes **chat-based** interactions between agents
- Built-in **code execution** and **human-in-the-loop**
- Great for **interactive problem-solving**

### Why Llama Stack?
- **Unified backend** for any LLM (Ollama, Together, vLLM, etc.)
- **One integration point** instead of many
- **Production-ready** infrastructure
- **Open-source** and flexible

### Use Cases Covered:
1. **Two-Agent Conversation** - UserProxy + Assistant solving a problem
2. **Code Generation & Execution** - AutoGen generates and runs code
3. **Group Chat** - Multiple specialists collaborating
4. **Human-in-the-Loop** - Interactive problem-solving
5. **Sequential Task Solving** - Math problem ‚Üí Code ‚Üí Execute ‚Üí Verify

---

## Prerequisites

```bash
# Install AutoGen (AG2)
pip install pyautogen

# Llama Stack should already be running
# Default: http://localhost:8321
```

In [None]:
# Imports
import os
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from autogen.oai import OpenAIWrapper

# Check Llama Stack connectivity
import httpx

LLAMA_STACK_URL = "http://localhost:8321"

try:
    response = httpx.get(f"{LLAMA_STACK_URL}/health")
    print(f"‚úÖ Llama Stack is running at {LLAMA_STACK_URL}")
    print(f"Status: {response.status_code}")
except Exception as e:
    print(f"‚ùå Llama Stack not accessible: {e}")
    print("Make sure Llama Stack is running on port 8321")

## Configuration: AutoGen with Llama Stack

### How It Works

AutoGen uses the **OpenAI API format**, which Llama Stack is compatible with!

```python
config_list = [
    {
        "model": "ollama/llama3.3:70b",  # Your Llama Stack model
        "base_url": "http://localhost:8321/v1",  # Llama Stack endpoint
        "api_key": "not-needed",  # Llama Stack doesn't need auth
    }
]
```

**Key Points:**
- Use `/v1` suffix for OpenAI-compatible endpoint
- `api_key` can be any string (Llama Stack ignores it)
- `model` must match what's available in Llama Stack

In [None]:
# AutoGen configuration for Llama Stack
config_list = [
    {
        "model": "ollama/llama3.3:70b",  # Your Llama Stack model
        "base_url": "http://localhost:8321/v1",  # OpenAI-compatible endpoint
        "api_key": "not-needed",  # Llama Stack doesn't require auth
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
    "timeout": 120,
}

print("‚úÖ AutoGen configuration ready for Llama Stack")
print(f"Model: {config_list[0]['model']}")
print(f"Base URL: {config_list[0]['base_url']}")

## Example 1: Two-Agent Conversation

### Pattern: User Proxy + Assistant

**UserProxyAgent:**
- Represents the human user
- Can execute code
- Provides feedback to assistant

**AssistantAgent:**
- AI assistant powered by Llama Stack
- Generates responses and code
- Solves problems conversationally

### Use Case: Solve a Math Problem

In [None]:
# Create AssistantAgent (AI assistant)
assistant = AssistantAgent(
    name="MathAssistant",
    system_message="You are a helpful AI assistant that solves math problems. Provide clear explanations.",
    llm_config=llm_config,
)

# Create UserProxyAgent (represents human)
user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",  # Fully automated (no human input)
    max_consecutive_auto_reply=5,
    code_execution_config={"use_docker": False},  # Allow local code execution
)

print("‚úÖ Agents created")
print(f"Assistant: {assistant.name}")
print(f"User Proxy: {user_proxy.name}")

In [None]:
# Start conversation
user_proxy.initiate_chat(
    assistant,
    message="What is the sum of the first 100 prime numbers? Please write Python code to calculate it."
)

print("\n" + "="*50)
print("Conversation complete!")

## Example 2: Code Generation & Execution

### Pattern: Assistant generates code ‚Üí UserProxy executes it

This is AutoGen's killer feature: **automatic code execution**!

### Use Case: Data Analysis Task

In [None]:
# Create a coding assistant
coding_assistant = AssistantAgent(
    name="DataScientist",
    system_message="""You are an expert data scientist.
    Write Python code to solve data analysis problems.
    Always include visualizations when appropriate.""",
    llm_config=llm_config,
)

# User proxy with code execution enabled
user_proxy_code = UserProxyAgent(
    name="UserProxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
)

# Start data analysis task
user_proxy_code.initiate_chat(
    coding_assistant,
    message="""Generate 100 random numbers from a normal distribution (mean=50, std=10).
    Calculate the mean, median, and standard deviation.
    Create a histogram to visualize the distribution."""
)

## Example 3: Group Chat (Multi-Agent Collaboration)

### Pattern: Multiple Specialists Collaborating

**Scenario:** Write a technical blog post about AI

**Agents:**
1. **Researcher** - Finds information
2. **Writer** - Writes content
3. **Critic** - Reviews and suggests improvements
4. **UserProxy** - Orchestrates and provides final approval

This is similar to llamacrew's workflow but **conversational** instead of DAG-based!

In [None]:
# Create specialist agents
researcher = AssistantAgent(
    name="Researcher",
    system_message="""You are a researcher. Your job is to find accurate information
    about topics and provide facts, statistics, and recent developments.""",
    llm_config=llm_config,
)

writer = AssistantAgent(
    name="Writer",
    system_message="""You are a technical writer. Your job is to write clear,
    engaging content based on research provided. Use simple language and examples.""",
    llm_config=llm_config,
)

critic = AssistantAgent(
    name="Critic",
    system_message="""You are an editor. Review content for clarity, accuracy,
    and engagement. Suggest specific improvements.""",
    llm_config=llm_config,
)

# User proxy to orchestrate
user_proxy_group = UserProxyAgent(
    name="UserProxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config=False,
)

print("‚úÖ Group chat agents created")

In [None]:
# Create group chat
groupchat = GroupChat(
    agents=[user_proxy_group, researcher, writer, critic],
    messages=[],
    max_round=12,  # Maximum conversation rounds
)

# Create manager to orchestrate
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)

# Start group chat
user_proxy_group.initiate_chat(
    manager,
    message="""Write a 300-word blog post about the benefits of using
    Llama Stack for LLM applications. Include:
    1. What Llama Stack is
    2. Key benefits
    3. A simple use case example

    Researcher: gather information
    Writer: create the blog post
    Critic: review and suggest improvements
    """
)

## Example 4: Human-in-the-Loop

### Pattern: Interactive Problem Solving

Autogen excels at **human-in-the-loop** workflows where you can:
- Provide feedback mid-conversation
- Approve/reject suggestions
- Guide the agent's direction

**Note:** In notebooks, this requires `human_input_mode="ALWAYS"` or `"TERMINATE"` and manual input.

In [None]:
# Interactive assistant (uncomment to try)
# WARNING: This will prompt for user input!

# assistant_interactive = AssistantAgent(
#     name="InteractiveAssistant",
#     system_message="You are a helpful assistant. Ask clarifying questions when needed.",
#     llm_config=llm_config,
# )

# user_proxy_interactive = UserProxyAgent(
#     name="Human",
#     human_input_mode="TERMINATE",  # Ask for human input when TERMINATE is mentioned
#     max_consecutive_auto_reply=5,
# )

# user_proxy_interactive.initiate_chat(
#     assistant_interactive,
#     message="Help me plan a machine learning project for customer churn prediction."
# )

print("üí° Human-in-the-loop example (commented out to avoid blocking notebook execution)")
print("Uncomment the code above to try interactive mode!")

## Example 5: Sequential Task Solving

### Pattern: Chain of Thought Problem Solving

**Scenario:** Solve a complex problem requiring multiple steps

1. **Understand** the problem
2. **Plan** the solution approach
3. **Implement** the solution (code)
4. **Execute** and verify
5. **Explain** the results

### Use Case: Fibonacci Sequence Analysis

In [None]:
# Create a reasoning assistant
reasoning_assistant = AssistantAgent(
    name="ReasoningAssistant",
    system_message="""You are a problem-solving assistant.
    For complex problems:
    1. Break down the problem
    2. Plan the solution step-by-step
    3. Write clean, well-commented code
    4. Explain results clearly
    """,
    llm_config=llm_config,
)

user_proxy_reasoning = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=5,
    code_execution_config={"work_dir": "reasoning", "use_docker": False},
)

# Complex problem requiring sequential reasoning
user_proxy_reasoning.initiate_chat(
    reasoning_assistant,
    message="""Find the first 20 Fibonacci numbers where the number is also a prime number.

    Requirements:
    1. Explain the approach
    2. Write efficient Python code
    3. Display the results in a table
    4. Calculate what percentage of the first 100 Fibonacci numbers are prime
    """
)

## Comparison: AutoGen vs llamacrew

### When to Use AutoGen + Llama Stack

‚úÖ **Use AutoGen when you need:**
- **Conversational** interactions between agents
- **Human-in-the-loop** workflows (interactive approval, feedback)
- **Code generation & execution** (data analysis, scripting)
- **Group discussions** (multiple agents debating, collaborating)
- **Dynamic problem-solving** (unknown number of back-and-forth exchanges)
- **Research/prototyping** (exploratory work)

**Example Use Cases:**
- Interactive coding assistant
- Research assistant with human feedback
- Multi-agent debate/discussion
- Tutoring/educational applications
- Dynamic customer support

---

### When to Use llamacrew

‚úÖ **Use llamacrew when you need:**
- **Production workflows** (blog writing, data pipelines)
- **Declarative DAGs** (predefined task dependencies)
- **Automatic parallelization** (framework optimizes)
- **Non-interactive automation** (scheduled jobs)
- **Minimal dependencies** (lightweight deployment)
- **Predictable workflows** (known steps)

**Example Use Cases:**
- Automated blog post generation
- Data ETL pipelines
- Report generation
- Batch processing
- Production automation

---

### They're Complementary!

- **AutoGen**: Conversational, interactive, exploratory
- **llamacrew**: Workflow, automated, production

You might use **AutoGen for prototyping** then move to **llamacrew for production**!

## Advanced: Custom Agent Behaviors

### Pattern: Specialized Agent with Custom Logic

You can create agents with custom behavior beyond just prompts.

In [None]:
from typing import Dict, List, Union
import re

class CodeReviewAgent(AssistantAgent):
    """Custom agent that reviews code for specific patterns."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.issues_found = []

    def review_code(self, code: str) -> Dict[str, List[str]]:
        """Custom method to review code for common issues."""
        issues = []

        # Check for common issues
        if "TODO" in code or "FIXME" in code:
            issues.append("Code contains TODO/FIXME comments")

        if not re.search(r'def \w+\(.*\):', code):
            issues.append("No function definitions found")

        if "print(" in code:
            issues.append("Contains print statements (consider logging)")

        self.issues_found.extend(issues)
        return {"issues": issues, "total": len(issues)}

# Create custom reviewer
code_reviewer = CodeReviewAgent(
    name="CodeReviewer",
    system_message="""You are a code reviewer. Analyze code for:
    - Code quality
    - Best practices
    - Potential bugs
    - Performance issues
    Provide specific, actionable feedback.
    """,
    llm_config=llm_config,
)

print("‚úÖ Custom CodeReviewAgent created with specialized review logic")

## Performance Tips

### 1. Model Selection

```python
# Fast models for simple tasks
config_fast = {
    "model": "ollama/llama3.2:3b",  # Smaller, faster
    "temperature": 0.5,
}

# Powerful models for complex reasoning
config_powerful = {
    "model": "ollama/llama3.3:70b",  # Larger, better quality
    "temperature": 0.7,
}
```

### 2. Limit Conversation Rounds

```python
user_proxy = UserProxyAgent(
    name="User",
    max_consecutive_auto_reply=3,  # Prevent infinite loops
)
```

### 3. Set Timeouts

```python
llm_config = {
    "timeout": 60,  # 60 second timeout per request
    "config_list": config_list,
}
```

### 4. Use Work Directories

```python
code_execution_config = {
    "work_dir": "autogen_workspace",  # Isolate generated files
    "use_docker": False,
}
```

## Troubleshooting

### Common Issues

#### 1. "Could not connect to Llama Stack"

```bash
# Check if Llama Stack is running
curl http://localhost:8321/health

# Start Llama Stack if needed
llama stack run
```

#### 2. "Model not found"

```bash
# List available models
curl http://localhost:8321/models

# Make sure model name matches exactly:
# ‚úÖ "ollama/llama3.3:70b"
# ‚ùå "llama3.3:70b"
```

#### 3. "Agent not responding"

- Check `max_consecutive_auto_reply` isn't set too low
- Increase `timeout` in `llm_config`
- Verify Llama Stack model is loaded and warm

#### 4. "Code execution failed"

- Make sure `code_execution_config` is set correctly
- Check file permissions on `work_dir`
- Install required Python packages

---

### Debug Mode

```python
import logging

# Enable AutoGen debug logging
logging.basicConfig(level=logging.DEBUG)
```

## Summary

### What We Covered

1. ‚úÖ **Two-Agent Conversations** - UserProxy + Assistant pattern
2. ‚úÖ **Code Generation & Execution** - AutoGen's killer feature
3. ‚úÖ **Group Chat** - Multiple agents collaborating
4. ‚úÖ **Human-in-the-Loop** - Interactive workflows
5. ‚úÖ **Sequential Reasoning** - Complex problem solving
6. ‚úÖ **Custom Agents** - Specialized behaviors

### Key Takeaways

**AutoGen + Llama Stack is powerful for:**
- üó£Ô∏è **Conversational** multi-agent systems
- üë§ **Interactive** problem-solving with humans
- üíª **Code generation** and execution
- ü§ù **Collaborative** agent discussions

**vs llamacrew which is better for:**
- üîÑ **Production workflows** and pipelines
- üìä **Declarative** task orchestration
- ‚ö° **Automatic parallelization**
- ü§ñ **Non-interactive** automation

---

### Next Steps

1. Experiment with different agent combinations
2. Try human-in-the-loop workflows
3. Build custom agents for your use case
4. Compare AutoGen vs llamacrew for your specific needs

### Resources

- **AutoGen Docs**: https://microsoft.github.io/autogen/
- **Llama Stack Docs**: https://llama-stack.readthedocs.io/
- **llamacrew Docs**: `/home/omara/Desktop/llamacrew/README.md`

---

**Happy multi-agent building! üöÄ**