llama-stack-mirror/docs/notebooks/autogen/autogen_llama_stack_integration.ipynb

698 lines
24 KiB
Text

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AutoGen + Llama Stack Integration\n",
"\n",
"## Overview\n",
"\n",
"This notebook demonstrates how to use **AutoGen (AG2)** with **Llama Stack** as the backend.\n",
"\n",
"### What is AutoGen?\n",
"- Microsoft's framework for **conversational multi-agent** systems\n",
"- Emphasizes **chat-based** interactions between agents\n",
"- Built-in **code execution** and **human-in-the-loop**\n",
"- Great for **interactive problem-solving**\n",
"\n",
"### Why Llama Stack?\n",
"- **Unified backend** for any LLM (Ollama, Together, vLLM, etc.)\n",
"- **One integration point** instead of many\n",
"- **Production-ready** infrastructure\n",
"- **Open-source** and flexible\n",
"\n",
"### Use Cases Covered:\n",
"1. **Two-Agent Conversation** - UserProxy + Assistant solving a problem\n",
"2. **Code Generation & Execution** - AutoGen generates and runs code\n",
"3. **Group Chat** - Multiple specialists collaborating\n",
"4. **Human-in-the-Loop** - Interactive problem-solving\n",
"5. **Sequential Task Solving** - Math problem → Code → Execute → Verify\n",
"\n",
"---\n",
"\n",
"## Prerequisites\n",
"\n",
"```bash\n",
"# Install AutoGen (AG2)\n",
"pip install pyautogen\n",
"\n",
"# Llama Stack should already be running\n",
"# Default: http://localhost:8321\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Imports\n",
"import os\n",
"from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager\n",
"from autogen.oai import OpenAIWrapper\n",
"\n",
"# Check Llama Stack connectivity\n",
"import httpx\n",
"\n",
"LLAMA_STACK_URL = \"http://localhost:8321\"\n",
"\n",
"try:\n",
" response = httpx.get(f\"{LLAMA_STACK_URL}/health\")\n",
" print(f\"✅ Llama Stack is running at {LLAMA_STACK_URL}\")\n",
" print(f\"Status: {response.status_code}\")\n",
"except Exception as e:\n",
" print(f\"❌ Llama Stack not accessible: {e}\")\n",
" print(\"Make sure Llama Stack is running on port 8321\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuration: AutoGen with Llama Stack\n",
"\n",
"### How It Works\n",
"\n",
"AutoGen uses the **OpenAI API format**, which Llama Stack is compatible with!\n",
"\n",
"```python\n",
"config_list = [\n",
" {\n",
" \"model\": \"ollama/llama3.3:70b\", # Your Llama Stack model\n",
" \"base_url\": \"http://localhost:8321/v1\", # Llama Stack endpoint\n",
" \"api_key\": \"not-needed\", # Llama Stack doesn't need auth\n",
" }\n",
"]\n",
"```\n",
"\n",
"**Key Points:**\n",
"- Use `/v1` suffix for OpenAI-compatible endpoint\n",
"- `api_key` can be any string (Llama Stack ignores it)\n",
"- `model` must match what's available in Llama Stack"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# AutoGen configuration for Llama Stack\n",
"config_list = [\n",
" {\n",
" \"model\": \"ollama/llama3.3:70b\", # Your Llama Stack model\n",
" \"base_url\": \"http://localhost:8321/v1\", # OpenAI-compatible endpoint\n",
" \"api_key\": \"not-needed\", # Llama Stack doesn't require auth\n",
" }\n",
"]\n",
"\n",
"llm_config = {\n",
" \"config_list\": config_list,\n",
" \"temperature\": 0.7,\n",
" \"timeout\": 120,\n",
"}\n",
"\n",
"print(\"✅ AutoGen configuration ready for Llama Stack\")\n",
"print(f\"Model: {config_list[0]['model']}\")\n",
"print(f\"Base URL: {config_list[0]['base_url']}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 1: Two-Agent Conversation\n",
"\n",
"### Pattern: User Proxy + Assistant\n",
"\n",
"**UserProxyAgent:**\n",
"- Represents the human user\n",
"- Can execute code\n",
"- Provides feedback to assistant\n",
"\n",
"**AssistantAgent:**\n",
"- AI assistant powered by Llama Stack\n",
"- Generates responses and code\n",
"- Solves problems conversationally\n",
"\n",
"### Use Case: Solve a Math Problem"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create AssistantAgent (AI assistant)\n",
"assistant = AssistantAgent(\n",
" name=\"MathAssistant\",\n",
" system_message=\"You are a helpful AI assistant that solves math problems. Provide clear explanations.\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"# Create UserProxyAgent (represents human)\n",
"user_proxy = UserProxyAgent(\n",
" name=\"User\",\n",
" human_input_mode=\"NEVER\", # Fully automated (no human input)\n",
" max_consecutive_auto_reply=5,\n",
" code_execution_config={\"use_docker\": False}, # Allow local code execution\n",
")\n",
"\n",
"print(\"✅ Agents created\")\n",
"print(f\"Assistant: {assistant.name}\")\n",
"print(f\"User Proxy: {user_proxy.name}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Start conversation\n",
"user_proxy.initiate_chat(\n",
" assistant,\n",
" message=\"What is the sum of the first 100 prime numbers? Please write Python code to calculate it.\"\n",
")\n",
"\n",
"print(\"\\n\" + \"=\"*50)\n",
"print(\"Conversation complete!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2: Code Generation & Execution\n",
"\n",
"### Pattern: Assistant generates code → UserProxy executes it\n",
"\n",
"This is AutoGen's killer feature: **automatic code execution**!\n",
"\n",
"### Use Case: Data Analysis Task"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a coding assistant\n",
"coding_assistant = AssistantAgent(\n",
" name=\"DataScientist\",\n",
" system_message=\"\"\"You are an expert data scientist.\n",
" Write Python code to solve data analysis problems.\n",
" Always include visualizations when appropriate.\"\"\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"# User proxy with code execution enabled\n",
"user_proxy_code = UserProxyAgent(\n",
" name=\"UserProxy\",\n",
" human_input_mode=\"NEVER\",\n",
" max_consecutive_auto_reply=3,\n",
" code_execution_config={\n",
" \"work_dir\": \"coding\",\n",
" \"use_docker\": False,\n",
" },\n",
")\n",
"\n",
"# Start data analysis task\n",
"user_proxy_code.initiate_chat(\n",
" coding_assistant,\n",
" message=\"\"\"Generate 100 random numbers from a normal distribution (mean=50, std=10).\n",
" Calculate the mean, median, and standard deviation.\n",
" Create a histogram to visualize the distribution.\"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 3: Group Chat (Multi-Agent Collaboration)\n",
"\n",
"### Pattern: Multiple Specialists Collaborating\n",
"\n",
"**Scenario:** Write a technical blog post about AI\n",
"\n",
"**Agents:**\n",
"1. **Researcher** - Finds information\n",
"2. **Writer** - Writes content\n",
"3. **Critic** - Reviews and suggests improvements\n",
"4. **UserProxy** - Orchestrates and provides final approval\n",
"\n",
"This is similar to llamacrew's workflow but **conversational** instead of DAG-based!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create specialist agents\n",
"researcher = AssistantAgent(\n",
" name=\"Researcher\",\n",
" system_message=\"\"\"You are a researcher. Your job is to find accurate information\n",
" about topics and provide facts, statistics, and recent developments.\"\"\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"writer = AssistantAgent(\n",
" name=\"Writer\",\n",
" system_message=\"\"\"You are a technical writer. Your job is to write clear,\n",
" engaging content based on research provided. Use simple language and examples.\"\"\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"critic = AssistantAgent(\n",
" name=\"Critic\",\n",
" system_message=\"\"\"You are an editor. Review content for clarity, accuracy,\n",
" and engagement. Suggest specific improvements.\"\"\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"# User proxy to orchestrate\n",
"user_proxy_group = UserProxyAgent(\n",
" name=\"UserProxy\",\n",
" human_input_mode=\"NEVER\",\n",
" max_consecutive_auto_reply=10,\n",
" code_execution_config=False,\n",
")\n",
"\n",
"print(\"✅ Group chat agents created\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create group chat\n",
"groupchat = GroupChat(\n",
" agents=[user_proxy_group, researcher, writer, critic],\n",
" messages=[],\n",
" max_round=12, # Maximum conversation rounds\n",
")\n",
"\n",
"# Create manager to orchestrate\n",
"manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
"\n",
"# Start group chat\n",
"user_proxy_group.initiate_chat(\n",
" manager,\n",
" message=\"\"\"Write a 300-word blog post about the benefits of using\n",
" Llama Stack for LLM applications. Include:\n",
" 1. What Llama Stack is\n",
" 2. Key benefits\n",
" 3. A simple use case example\n",
"\n",
" Researcher: gather information\n",
" Writer: create the blog post\n",
" Critic: review and suggest improvements\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 4: Human-in-the-Loop\n",
"\n",
"### Pattern: Interactive Problem Solving\n",
"\n",
"Autogen excels at **human-in-the-loop** workflows where you can:\n",
"- Provide feedback mid-conversation\n",
"- Approve/reject suggestions\n",
"- Guide the agent's direction\n",
"\n",
"**Note:** In notebooks, this requires `human_input_mode=\"ALWAYS\"` or `\"TERMINATE\"` and manual input."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Interactive assistant (uncomment to try)\n",
"# WARNING: This will prompt for user input!\n",
"\n",
"# assistant_interactive = AssistantAgent(\n",
"# name=\"InteractiveAssistant\",\n",
"# system_message=\"You are a helpful assistant. Ask clarifying questions when needed.\",\n",
"# llm_config=llm_config,\n",
"# )\n",
"\n",
"# user_proxy_interactive = UserProxyAgent(\n",
"# name=\"Human\",\n",
"# human_input_mode=\"TERMINATE\", # Ask for human input when TERMINATE is mentioned\n",
"# max_consecutive_auto_reply=5,\n",
"# )\n",
"\n",
"# user_proxy_interactive.initiate_chat(\n",
"# assistant_interactive,\n",
"# message=\"Help me plan a machine learning project for customer churn prediction.\"\n",
"# )\n",
"\n",
"print(\"💡 Human-in-the-loop example (commented out to avoid blocking notebook execution)\")\n",
"print(\"Uncomment the code above to try interactive mode!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 5: Sequential Task Solving\n",
"\n",
"### Pattern: Chain of Thought Problem Solving\n",
"\n",
"**Scenario:** Solve a complex problem requiring multiple steps\n",
"\n",
"1. **Understand** the problem\n",
"2. **Plan** the solution approach\n",
"3. **Implement** the solution (code)\n",
"4. **Execute** and verify\n",
"5. **Explain** the results\n",
"\n",
"### Use Case: Fibonacci Sequence Analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a reasoning assistant\n",
"reasoning_assistant = AssistantAgent(\n",
" name=\"ReasoningAssistant\",\n",
" system_message=\"\"\"You are a problem-solving assistant.\n",
" For complex problems:\n",
" 1. Break down the problem\n",
" 2. Plan the solution step-by-step\n",
" 3. Write clean, well-commented code\n",
" 4. Explain results clearly\n",
" \"\"\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"user_proxy_reasoning = UserProxyAgent(\n",
" name=\"User\",\n",
" human_input_mode=\"NEVER\",\n",
" max_consecutive_auto_reply=5,\n",
" code_execution_config={\"work_dir\": \"reasoning\", \"use_docker\": False},\n",
")\n",
"\n",
"# Complex problem requiring sequential reasoning\n",
"user_proxy_reasoning.initiate_chat(\n",
" reasoning_assistant,\n",
" message=\"\"\"Find the first 20 Fibonacci numbers where the number is also a prime number.\n",
"\n",
" Requirements:\n",
" 1. Explain the approach\n",
" 2. Write efficient Python code\n",
" 3. Display the results in a table\n",
" 4. Calculate what percentage of the first 100 Fibonacci numbers are prime\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparison: AutoGen vs llamacrew\n",
"\n",
"### When to Use AutoGen + Llama Stack\n",
"\n",
"✅ **Use AutoGen when you need:**\n",
"- **Conversational** interactions between agents\n",
"- **Human-in-the-loop** workflows (interactive approval, feedback)\n",
"- **Code generation & execution** (data analysis, scripting)\n",
"- **Group discussions** (multiple agents debating, collaborating)\n",
"- **Dynamic problem-solving** (unknown number of back-and-forth exchanges)\n",
"- **Research/prototyping** (exploratory work)\n",
"\n",
"**Example Use Cases:**\n",
"- Interactive coding assistant\n",
"- Research assistant with human feedback\n",
"- Multi-agent debate/discussion\n",
"- Tutoring/educational applications\n",
"- Dynamic customer support\n",
"\n",
"---\n",
"\n",
"### When to Use llamacrew\n",
"\n",
"✅ **Use llamacrew when you need:**\n",
"- **Production workflows** (blog writing, data pipelines)\n",
"- **Declarative DAGs** (predefined task dependencies)\n",
"- **Automatic parallelization** (framework optimizes)\n",
"- **Non-interactive automation** (scheduled jobs)\n",
"- **Minimal dependencies** (lightweight deployment)\n",
"- **Predictable workflows** (known steps)\n",
"\n",
"**Example Use Cases:**\n",
"- Automated blog post generation\n",
"- Data ETL pipelines\n",
"- Report generation\n",
"- Batch processing\n",
"- Production automation\n",
"\n",
"---\n",
"\n",
"### They're Complementary!\n",
"\n",
"- **AutoGen**: Conversational, interactive, exploratory\n",
"- **llamacrew**: Workflow, automated, production\n",
"\n",
"You might use **AutoGen for prototyping** then move to **llamacrew for production**!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced: Custom Agent Behaviors\n",
"\n",
"### Pattern: Specialized Agent with Custom Logic\n",
"\n",
"You can create agents with custom behavior beyond just prompts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Dict, List, Union\n",
"import re\n",
"\n",
"class CodeReviewAgent(AssistantAgent):\n",
" \"\"\"Custom agent that reviews code for specific patterns.\"\"\"\n",
"\n",
" def __init__(self, *args, **kwargs):\n",
" super().__init__(*args, **kwargs)\n",
" self.issues_found = []\n",
"\n",
" def review_code(self, code: str) -> Dict[str, List[str]]:\n",
" \"\"\"Custom method to review code for common issues.\"\"\"\n",
" issues = []\n",
"\n",
" # Check for common issues\n",
" if \"TODO\" in code or \"FIXME\" in code:\n",
" issues.append(\"Code contains TODO/FIXME comments\")\n",
"\n",
" if not re.search(r'def \\w+\\(.*\\):', code):\n",
" issues.append(\"No function definitions found\")\n",
"\n",
" if \"print(\" in code:\n",
" issues.append(\"Contains print statements (consider logging)\")\n",
"\n",
" self.issues_found.extend(issues)\n",
" return {\"issues\": issues, \"total\": len(issues)}\n",
"\n",
"# Create custom reviewer\n",
"code_reviewer = CodeReviewAgent(\n",
" name=\"CodeReviewer\",\n",
" system_message=\"\"\"You are a code reviewer. Analyze code for:\n",
" - Code quality\n",
" - Best practices\n",
" - Potential bugs\n",
" - Performance issues\n",
" Provide specific, actionable feedback.\n",
" \"\"\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"print(\"✅ Custom CodeReviewAgent created with specialized review logic\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Performance Tips\n",
"\n",
"### 1. Model Selection\n",
"\n",
"```python\n",
"# Fast models for simple tasks\n",
"config_fast = {\n",
" \"model\": \"ollama/llama3.2:3b\", # Smaller, faster\n",
" \"temperature\": 0.5,\n",
"}\n",
"\n",
"# Powerful models for complex reasoning\n",
"config_powerful = {\n",
" \"model\": \"ollama/llama3.3:70b\", # Larger, better quality\n",
" \"temperature\": 0.7,\n",
"}\n",
"```\n",
"\n",
"### 2. Limit Conversation Rounds\n",
"\n",
"```python\n",
"user_proxy = UserProxyAgent(\n",
" name=\"User\",\n",
" max_consecutive_auto_reply=3, # Prevent infinite loops\n",
")\n",
"```\n",
"\n",
"### 3. Set Timeouts\n",
"\n",
"```python\n",
"llm_config = {\n",
" \"timeout\": 60, # 60 second timeout per request\n",
" \"config_list\": config_list,\n",
"}\n",
"```\n",
"\n",
"### 4. Use Work Directories\n",
"\n",
"```python\n",
"code_execution_config = {\n",
" \"work_dir\": \"autogen_workspace\", # Isolate generated files\n",
" \"use_docker\": False,\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Troubleshooting\n",
"\n",
"### Common Issues\n",
"\n",
"#### 1. \"Could not connect to Llama Stack\"\n",
"\n",
"```bash\n",
"# Check if Llama Stack is running\n",
"curl http://localhost:8321/health\n",
"\n",
"# Start Llama Stack if needed\n",
"llama stack run\n",
"```\n",
"\n",
"#### 2. \"Model not found\"\n",
"\n",
"```bash\n",
"# List available models\n",
"curl http://localhost:8321/models\n",
"\n",
"# Make sure model name matches exactly:\n",
"# ✅ \"ollama/llama3.3:70b\"\n",
"# ❌ \"llama3.3:70b\"\n",
"```\n",
"\n",
"#### 3. \"Agent not responding\"\n",
"\n",
"- Check `max_consecutive_auto_reply` isn't set too low\n",
"- Increase `timeout` in `llm_config`\n",
"- Verify Llama Stack model is loaded and warm\n",
"\n",
"#### 4. \"Code execution failed\"\n",
"\n",
"- Make sure `code_execution_config` is set correctly\n",
"- Check file permissions on `work_dir`\n",
"- Install required Python packages\n",
"\n",
"---\n",
"\n",
"### Debug Mode\n",
"\n",
"```python\n",
"import logging\n",
"\n",
"# Enable AutoGen debug logging\n",
"logging.basicConfig(level=logging.DEBUG)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"### What We Covered\n",
"\n",
"1. ✅ **Two-Agent Conversations** - UserProxy + Assistant pattern\n",
"2. ✅ **Code Generation & Execution** - AutoGen's killer feature\n",
"3. ✅ **Group Chat** - Multiple agents collaborating\n",
"4. ✅ **Human-in-the-Loop** - Interactive workflows\n",
"5. ✅ **Sequential Reasoning** - Complex problem solving\n",
"6. ✅ **Custom Agents** - Specialized behaviors\n",
"\n",
"### Key Takeaways\n",
"\n",
"**AutoGen + Llama Stack is powerful for:**\n",
"- 🗣️ **Conversational** multi-agent systems\n",
"- 👤 **Interactive** problem-solving with humans\n",
"- 💻 **Code generation** and execution\n",
"- 🤝 **Collaborative** agent discussions\n",
"\n",
"**vs llamacrew which is better for:**\n",
"- 🔄 **Production workflows** and pipelines\n",
"- 📊 **Declarative** task orchestration\n",
"- ⚡ **Automatic parallelization**\n",
"- 🤖 **Non-interactive** automation\n",
"\n",
"---\n",
"\n",
"### Next Steps\n",
"\n",
"1. Experiment with different agent combinations\n",
"2. Try human-in-the-loop workflows\n",
"3. Build custom agents for your use case\n",
"4. Compare AutoGen vs llamacrew for your specific needs\n",
"\n",
"### Resources\n",
"\n",
"- **AutoGen Docs**: https://microsoft.github.io/autogen/\n",
"- **Llama Stack Docs**: https://llama-stack.readthedocs.io/\n",
"- **llamacrew Docs**: `/home/omara/Desktop/llamacrew/README.md`\n",
"\n",
"---\n",
"\n",
"**Happy multi-agent building! 🚀**"
]
}
],
"metadata": {
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}