diff --git a/docs/notebooks/autogen/autogen_llama_stack_integration.ipynb b/docs/notebooks/autogen/autogen_llama_stack_integration.ipynb index 2e59ac0d7..36cf409e0 100644 --- a/docs/notebooks/autogen/autogen_llama_stack_integration.ipynb +++ b/docs/notebooks/autogen/autogen_llama_stack_integration.ipynb @@ -8,22 +8,20 @@ "\n", "## Overview\n", "\n", - "This notebook demonstrates how to use **AutoGen (AG2)** with **Llama Stack** as the backend.\n", + "This notebook demonstrates how to use **AutoGen v0.7.5** with **Llama Stack** as the backend.\n", "\n", "### Use Cases Covered:\n", - "1. **Two-Agent Conversation** - UserProxy + Assistant solving a problem\n", + "1. **Two-Agent Conversation** - Teams working together on tasks\n", "2. **Code Generation & Execution** - AutoGen generates and runs code\n", - "3. **Group Chat** - Multiple specialists collaborating\n", - "4. **Human-in-the-Loop** - Interactive problem-solving\n", - "5. **Sequential Task Solving** - Math problem → Code → Execute → Verify\n", + "3. **Group Chat** - Multiple specialists collaborating \n", "\n", "---\n", "\n", "## Prerequisites\n", "\n", "```bash\n", - "# Install AutoGen (AG2)\n", - "pip install pyautogen\n", + "# Install AutoGen v0.7.5 (new API)\n", + "pip install autogen-agentchat autogen-ext\n", "\n", "# Llama Stack should already be running\n", "# Default: http://localhost:8321\n", @@ -32,14 +30,34 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 1, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ AutoGen imports successful\n", + "Using AutoGen v0.7.5 with new team-based API\n", + "✅ Llama Stack is running at http://localhost:8321\n", + "Status: 200\n" + ] + } + ], "source": [ "# Imports\n", "import os\n", - "from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager\n", - "from autogen.oai import OpenAIWrapper\n", + "import asyncio\n", + "from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent\n", + "from autogen_agentchat.teams import RoundRobinGroupChat\n", + "from autogen_agentchat.base import TaskResult\n", + "from autogen_agentchat.messages import TextMessage\n", + "from autogen_ext.models.openai import OpenAIChatCompletionClient\n", + "\n", + "print(\"✅ AutoGen imports successful\")\n", + "print(\"Using AutoGen v0.7.5 with new team-based API\")\n", "\n", "# Check Llama Stack connectivity\n", "import httpx\n", @@ -47,7 +65,7 @@ "LLAMA_STACK_URL = \"http://localhost:8321\"\n", "\n", "try:\n", - " response = httpx.get(f\"{LLAMA_STACK_URL}/health\")\n", + " response = httpx.get(f\"{LLAMA_STACK_URL}/v1/models\")\n", " print(f\"✅ Llama Stack is running at {LLAMA_STACK_URL}\")\n", " print(f\"Status: {response.status_code}\")\n", "except Exception as e:\n", @@ -59,622 +77,640 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Configuration: AutoGen with Llama Stack\n", + "## Configuration: AutoGen v0.7.5 with Llama Stack\n", "\n", "### How It Works\n", "\n", - "AutoGen uses the **OpenAI API format**, which Llama Stack is compatible with!\n", - "\n", - "```python\n", - "config_list = [\n", - " {\n", - " \"model\": \"ollama/llama3.3:70b\", # Your Llama Stack model\n", - " \"base_url\": \"http://localhost:8321/v1\", # Llama Stack endpoint\n", - " \"api_key\": \"not-needed\", # Llama Stack doesn't need auth\n", - " }\n", - "]\n", - "```\n", - "\n", - "**Key Points:**\n", - "- Use `/v1` suffix for OpenAI-compatible endpoint\n", - "- `api_key` can be any string (Llama Stack ignores it)\n", - "- `model` must match what's available in Llama Stack" + "AutoGen v0.7.5 uses **OpenAIChatCompletionClient** to connect to OpenAI-compatible endpoints like Llama Stack's /v1/chat/completions." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Model client configured for Llama Stack\n", + "Model: ollama/llama3.3:70b\n", + "Base URL: http://localhost:8321/v1\n" + ] + } + ], "source": [ - "# AutoGen configuration for Llama Stack\n", - "config_list = [\n", - " {\n", - " \"model\": \"ollama/llama3.3:70b\", # Your Llama Stack model\n", - " \"base_url\": \"http://localhost:8321/v1\", # OpenAI-compatible endpoint\n", - " \"api_key\": \"not-needed\", # Llama Stack doesn't require auth\n", + "# Create OpenAI-compatible client for Llama Stack\n", + "model_client = OpenAIChatCompletionClient(\n", + " model=\"ollama/llama3.3:70b\", # Choose any other model of your choice.\n", + " api_key=\"not-needed\",\n", + " base_url=\"http://localhost:8321/v1\", # For pointing to llama stack end points.\n", + " model_capabilities={\n", + " \"vision\": False,\n", + " \"function_calling\": True,\n", + " \"json_output\": True,\n", " }\n", - "]\n", + ")\n", "\n", - "llm_config = {\n", - " \"config_list\": config_list,\n", - " \"temperature\": 0.7,\n", - " \"timeout\": 120,\n", - "}\n", - "\n", - "print(\"✅ AutoGen configuration ready for Llama Stack\")\n", - "print(f\"Model: {config_list[0]['model']}\")\n", - "print(f\"Base URL: {config_list[0]['base_url']}\")" + "print(\"✅ Model client configured for Llama Stack\")\n", + "print(f\"Model: ollama/llama3.3:70b\")\n", + "print(f\"Base URL: http://localhost:8321/v1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Example 1: Two-Agent Conversation\n", + "## Example 1: Simple Task with Assistant Agent\n", "\n", - "### Pattern: User Proxy + Assistant\n", + "### Pattern: Single Agent Task\n", "\n", - "**UserProxyAgent:**\n", - "- Represents the human user\n", - "- Can execute code\n", - "- Provides feedback to assistant\n", + "In v0.7.5, Autogen uses **Teams** to orchestrate agents, even for simple single-agent tasks.\n", "\n", "**AssistantAgent:**\n", "- AI assistant powered by Llama Stack\n", - "- Generates responses and code\n", - "- Solves problems conversationally\n", + "- Executes tasks and provides responses\n", "\n", "### Use Case: Solve a Math Problem" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 4, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Agent created: MathAssistant\n", + "\n", + "==================================================\n", + "Task Result:\n", + "To find the sum of the first 10 prime numbers, we need to follow these steps:\n", + "\n", + "1. **Identify the first 10 prime numbers**: Prime numbers are natural numbers greater than 1 that have no divisors other than 1 and themselves.\n", + "\n", + "2. **List the first 10 prime numbers**:\n", + " - Start with 2 (the smallest prime number).\n", + " - Check each subsequent natural number to see if it is divisible by any prime number less than or equal to its square root. If not, it's a prime number.\n", + " - Continue until we have 10 prime numbers.\n", + "\n", + "3. **Calculate the sum** of these numbers.\n", + "\n", + "Let's list the first 10 prime numbers step by step:\n", + "\n", + "1. The smallest prime number is **2**.\n", + "2. The next prime number after 2 is **3**, since it's only divisible by 1 and itself.\n", + "3. Then comes **5**, because it has no divisors other than 1 and itself.\n", + "4. Next is **7**, for the same reason as above.\n", + "5. **11** is also a prime number, as it cannot be divided evenly by any number other than 1 and itself.\n", + "6. Following this pattern, we identify **13** as a prime number.\n", + "7. Then, **17**.\n", + "8. Next in line is **19**.\n", + "9. After that, we have **23**.\n", + "10. The tenth prime number is **29**.\n", + "\n", + "So, the first 10 prime numbers are: 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29.\n", + "\n", + "Now, let's **calculate their sum**:\n", + "\n", + "- Start with 0 (or any starting number for summation).\n", + "- Add each prime number to the total:\n", + " - 0 + 2 = 2\n", + " - 2 + 3 = 5\n", + " - 5 + 5 = 10\n", + " - 10 + 7 = 17\n", + " - 17 + 11 = 28\n", + " - 28 + 13 = 41\n", + " - 41 + 17 = 58\n", + " - 58 + 19 = 77\n", + " - 77 + 23 = 100\n", + " - 100 + 29 = 129\n", + "\n", + "Therefore, the sum of the first 10 prime numbers is **129**.\n" + ] + } + ], "source": [ - "# Create AssistantAgent (AI assistant)\n", + "import asyncio\n", + "\n", + "# Create an AssistantAgent\n", "assistant = AssistantAgent(\n", " name=\"MathAssistant\",\n", - " system_message=\"You are a helpful AI assistant that solves math problems. Provide clear explanations.\",\n", - " llm_config=llm_config,\n", + " model_client=model_client,\n", + " system_message=\"You are a helpful AI assistant that solves math problems. Provide clear explanations and show your work.\"\n", ")\n", "\n", - "# Create UserProxyAgent (represents human)\n", - "user_proxy = UserProxyAgent(\n", - " name=\"User\",\n", - " human_input_mode=\"NEVER\", # Fully automated (no human input)\n", - " max_consecutive_auto_reply=5,\n", - " code_execution_config={\"use_docker\": False}, # Allow local code execution\n", - ")\n", + "print(\"✅ Agent created:\", assistant.name)\n", "\n", - "print(\"✅ Agents created\")\n", - "print(f\"Assistant: {assistant.name}\")\n", - "print(f\"User Proxy: {user_proxy.name}\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Start conversation\n", - "user_proxy.initiate_chat(\n", - " assistant,\n", - " message=\"What is the sum of the first 100 prime numbers? Please write Python code to calculate it.\"\n", - ")\n", + "# Define the task\n", + "task = \"What is the sum of the first 10 prime numbers? Please calculate it step by step.\"\n", + "\n", + "# Run the task (AutoGen v0.7.5 uses async)\n", + "async def run_simple_task():\n", + " # Create a simple team with just the assistant\n", + " team = RoundRobinGroupChat([assistant], max_turns=1)\n", + " result = await team.run(task=task)\n", + " return result\n", + "\n", + "# Execute in notebook\n", + "result = await run_simple_task()\n", "\n", "print(\"\\n\" + \"=\"*50)\n", - "print(\"Conversation complete!\")" + "print(\"Task Result:\")\n", + "print(result.messages[-1].content if result.messages else \"No response\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Example 2: Code Generation & Execution\n", + "## Example 2: Multi-Agent Team Collaboration\n", "\n", - "### Pattern: Assistant generates code → UserProxy executes it\n", + "### Pattern: Multiple Agents Working Together\n", "\n", - "This is AutoGen's killer feature: **automatic code execution**!\n", + "In v0.7.5, Autogen uses **RoundRobinGroupChat** to create teams where agents take turns contributing to a task.\n", "\n", - "### Use Case: Data Analysis Task" + "### Use Case: Write a Technical Blog Post" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create a coding assistant\n", - "coding_assistant = AssistantAgent(\n", - " name=\"DataScientist\",\n", - " system_message=\"\"\"You are an expert data scientist.\n", - " Write Python code to solve data analysis problems.\n", - " Always include visualizations when appropriate.\"\"\",\n", - " llm_config=llm_config,\n", - ")\n", - "\n", - "# User proxy with code execution enabled\n", - "user_proxy_code = UserProxyAgent(\n", - " name=\"UserProxy\",\n", - " human_input_mode=\"NEVER\",\n", - " max_consecutive_auto_reply=3,\n", - " code_execution_config={\n", - " \"work_dir\": \"coding\",\n", - " \"use_docker\": False,\n", - " },\n", - ")\n", - "\n", - "# Start data analysis task\n", - "user_proxy_code.initiate_chat(\n", - " coding_assistant,\n", - " message=\"\"\"Generate 100 random numbers from a normal distribution (mean=50, std=10).\n", - " Calculate the mean, median, and standard deviation.\n", - " Create a histogram to visualize the distribution.\"\"\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Example 3: Group Chat (Multi-Agent Collaboration)\n", - "\n", - "### Pattern: Multiple Specialists Collaborating\n", - "\n", - "**Scenario:** Write a technical blog post about AI\n", - "\n", - "**Agents:**\n", - "1. **Researcher** - Finds information\n", - "2. **Writer** - Writes content\n", - "3. **Critic** - Reviews and suggests improvements\n", - "4. **UserProxy** - Orchestrates and provides final approval\n", - "\n", - "This is similar to llamacrew's workflow but **conversational** instead of DAG-based!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 5, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Team agents created: Researcher, Writer, Critic\n", + "\n", + "==================================================\n", + "Final Blog Post:\n", + "==================================================\n", + "Turn 1\n", + "\n", + "[user]: Write a 200-word blog post about the benefits of using Llama Stack for LLM applications.\n", + "\n", + " Steps:\n", + " 1. Researcher: Gather key information about Llama Stack\n", + " 2. Writer: Create the blog post\n", + " ...\n", + "Turn 2\n", + "\n", + "[Researcher]: **Unlocking Efficient LLM Applications with Llama Stack**\n", + "\n", + "The Llama Stack is a cutting-edge framework designed to optimize Large Language Model (LLM) applications, offering numerous benefits for deve...\n", + "Turn 3\n", + "\n", + "[Writer]: **Unlocking Efficient LLM Applications with Llama Stack**\n", + "\n", + "The Llama Stack is a revolutionary framework that optimizes Large Language Model (LLM) applications, offering numerous benefits for developer...\n", + "Turn 4\n", + "\n", + "[Critic]: **Reviewed Blog Post:**\n", + "\n", + "The provided blog post effectively highlights the benefits of using the Llama Stack for Large Language Model (LLM) applications. However, there are a few areas that could be i...\n", + "Turn 5\n", + "\n", + "[Researcher]: Here's a 200-word blog post about the benefits of using Llama Stack for LLM applications:\n", + "\n", + "**Unlocking Efficient LLM Applications with Llama Stack**\n", + "\n", + "The Llama Stack is a revolutionary framework that ...\n", + "Turn 6\n", + "\n", + "[Writer]: **Unlocking Efficient LLM Applications with Llama Stack**\n", + "\n", + "The Llama Stack is a game-changer for Large Language Model (LLM) applications, offering numerous benefits for developers and users. By utiliz...\n", + "Turn 7\n", + "\n", + "[Critic]: **Critic's Review:**\n", + "\n", + "The provided blog post effectively communicates the benefits of using the Llama Stack for Large Language Model (LLM) applications. Here are some key observations and suggestions ...\n", + "Turn 8\n", + "\n", + "[Researcher]: Here's a rewritten 200-word blog post about the benefits of using Llama Stack for LLM applications:\n", + "\n", + "**Unlock Efficient LLM Applications with Llama Stack**\n", + "\n", + "In the rapidly evolving landscape of Large ...\n", + "Turn 9\n", + "\n", + "[Writer]: **Unlock Efficient LLM Applications with Llama Stack**\n", + "\n", + "The Llama Stack revolutionizes Large Language Model (LLM) applications by providing a game-changing framework that optimizes development and dep...\n", + "Turn 10\n", + "\n", + "[Critic]: **Editor's Review:**\n", + "\n", + "The rewritten blog post effectively communicates the benefits of using the Llama Stack for Large Language Model (LLM) applications. Here are some key observations and suggestions...\n", + "Turn 11\n", + "\n", + "[Researcher]: Here's a rewritten 200-word blog post about the benefits of using Llama Stack for LLM applications:\n", + "\n", + "**Unlock Efficient LLM Applications with Llama Stack**\n", + "\n", + "In the rapidly evolving landscape of Large ...\n", + "Turn 12\n", + "\n", + "[Writer]: **Unlock Efficient LLM Applications with Llama Stack**\n", + "\n", + "The rapidly evolving landscape of Large Language Models (LLMs) demands efficiency and scalability for success. The Llama Stack is a game-changin...\n", + "Turn 13\n", + "\n", + "[Critic]: **Editor's Review:**\n", + "\n", + "The rewritten blog post effectively communicates the benefits of using the Llama Stack for Large Language Model (LLM) applications. Here are some key observations and suggestions...\n" + ] + } + ], "source": [ "# Create specialist agents\n", "researcher = AssistantAgent(\n", " name=\"Researcher\",\n", - " system_message=\"\"\"You are a researcher. Your job is to find accurate information\n", - " about topics and provide facts, statistics, and recent developments.\"\"\",\n", - " llm_config=llm_config,\n", + " model_client=model_client,\n", + " system_message=\"You are a researcher. Provide accurate information, facts, and statistics about topics.\"\n", ")\n", "\n", "writer = AssistantAgent(\n", " name=\"Writer\",\n", - " system_message=\"\"\"You are a technical writer. Your job is to write clear,\n", - " engaging content based on research provided. Use simple language and examples.\"\"\",\n", - " llm_config=llm_config,\n", + " model_client=model_client,\n", + " system_message=\"You are a technical writer. Write clear, engaging content based on research provided.\"\n", ")\n", "\n", "critic = AssistantAgent(\n", " name=\"Critic\",\n", - " system_message=\"\"\"You are an editor. Review content for clarity, accuracy,\n", - " and engagement. Suggest specific improvements.\"\"\",\n", - " llm_config=llm_config,\n", + " model_client=model_client,\n", + " system_message=\"You are an editor. Review content for clarity, accuracy, and engagement. Suggest improvements.\"\n", ")\n", "\n", - "# User proxy to orchestrate\n", - "user_proxy_group = UserProxyAgent(\n", - " name=\"UserProxy\",\n", - " human_input_mode=\"NEVER\",\n", - " max_consecutive_auto_reply=10,\n", - " code_execution_config=False,\n", - ")\n", + "print(\"✅ Team agents created: Researcher, Writer, Critic\")\n", "\n", - "print(\"✅ Group chat agents created\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create group chat\n", - "groupchat = GroupChat(\n", - " agents=[user_proxy_group, researcher, writer, critic],\n", - " messages=[],\n", - " max_round=12, # Maximum conversation rounds\n", - ")\n", + "# Create a team with round-robin collaboration\n", + "async def run_blog_team():\n", + " team = RoundRobinGroupChat([researcher, writer, critic], max_turns=12)\n", "\n", - "# Create manager to orchestrate\n", - "manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n", + " task = \"\"\"Write a 200-word blog post about the benefits of using Llama Stack for LLM applications.\n", "\n", - "# Start group chat\n", - "user_proxy_group.initiate_chat(\n", - " manager,\n", - " message=\"\"\"Write a 300-word blog post about the benefits of using\n", - " Llama Stack for LLM applications. Include:\n", - " 1. What Llama Stack is\n", - " 2. Key benefits\n", - " 3. A simple use case example\n", - "\n", - " Researcher: gather information\n", - " Writer: create the blog post\n", - " Critic: review and suggest improvements\n", + " Steps:\n", + " 1. Researcher: Gather key information about Llama Stack\n", + " 2. Writer: Create the blog post\n", + " 3. Critic: Review and suggest improvements\n", " \"\"\"\n", - ")" + "\n", + " result = await team.run(task=task)\n", + " return result\n", + "\n", + "# Run the team\n", + "result = await run_blog_team()\n", + "\n", + "print(\"\\n\" + \"=\"*50)\n", + "print(\"Final Blog Post:\")\n", + "print(\"=\"*50)\n", + "# Print the last message which should contain the final output\n", + "# for msg in result.messages[-3:]:\n", + "# print(f\"\\n[{msg.source}]: {msg.content[:200]}...\" if len(msg.content) > 200 else f\"\\n[{msg.source}]: {msg.content}\")\n", + "i=1\n", + "for msg in result.messages:\n", + " print (f\"Turn {i}\")\n", + " i+=1\n", + " print(f\"\\n[{msg.source}]: {msg.content[:200]}...\" if len(msg.content) > 200 else f\"\\n[{msg.source}]: {msg.content}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Example 4: Human-in-the-Loop\n", + "## Example 3: Multi-Turn Task\n", "\n", - "### Pattern: Interactive Problem Solving\n", + "### Pattern: Extended Team Collaboration\n", "\n", - "Autogen excels at **human-in-the-loop** workflows where you can:\n", - "- Provide feedback mid-conversation\n", - "- Approve/reject suggestions\n", - "- Guide the agent's direction\n", + "Use longer conversations for problem-solving where agents need multiple rounds of discussion.\n", "\n", - "**Note:** In notebooks, this requires `human_input_mode=\"ALWAYS\"` or `\"TERMINATE\"` and manual input." + "### Use Case: Technical Analysis" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 6, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Analyst agent created\n", + "\n", + "==================================================\n", + "Analysis Result:\n", + "==================================================\n", + "Turn 1\n", + "Analyze the trade-offs between using local LLMs (like Llama via Llama Stack)\n", + " versus cloud-based APIs (like OpenAI) for production applications.\n", + " Consider: cost, latency, privacy, scalability, and maintenance.\n", + "==================================================\n", + "Turn 2\n", + "The debate between using local Large Language Models (LLMs) like Llama via Llama Stack and cloud-based APIs like OpenAI for production applications revolves around several key trade-offs. Here's a detailed analysis of the pros and cons of each approach considering cost, latency, privacy, scalability, and maintenance.\n", + "\n", + "### Local LLMs (e.g., Llama via Llama Stack)\n", + "\n", + "**Pros:**\n", + "1. **Privacy:** Running models locally can offer enhanced data privacy since sensitive information doesn't need to be transmitted over the internet or stored on third-party servers.\n", + "2. **Latency:** Local deployment typically results in lower latency for inference, as it eliminates the need for network requests and responses to cloud services.\n", + "3. **Customizability:** Users have full control over the model's training data, allowing for fine-tuning that is more tailored to their specific use case or industry.\n", + "4. **Dependence:** Reduced dependence on external APIs means applications are less vulnerable to service outages or changes in API terms of service.\n", + "\n", + "**Cons:**\n", + "1. **Cost:** While the cost per inference might be lower once models are set up, the initial investment for hardware and potentially personnel with expertise in machine learning can be high.\n", + "2. **Scalability:** Scaling local deployments to meet growing demand requires purchasing more powerful or additional servers, which can become prohibitively expensive.\n", + "3. **Maintenance:** Continuous updates to the model for maintaining performance or adapting to new data distributions require significant technical expertise and resource commitment.\n", + "\n", + "### Cloud-Based APIs (e.g., OpenAI)\n", + "\n", + "**Pros:**\n", + "1. **Scalability:** Cloud services can easily scale up or down based on demand, without requiring large upfront investments in hardware.\n", + "2. **Maintenance:** The maintenance burden, including model updates and security patches, is handled by the cloud provider.\n", + "3. **Accessibility:** Lower barrier to entry due to a lack of need for significant initial investment in hardware or ML expertise; users can start with basic development resources.\n", + "4. **Cost-Effectiveness:** Pricing models often include a free tier and are usually billed per use (e.g., per API call), making it more predictable and manageable for businesses with fluctuating demand.\n", + "\n", + "**Cons:**\n", + "1. **Privacy:** Sending data to cloud services may pose significant privacy risks, especially for sensitive information.\n", + "2. **Latency:** Network latency can impact the speed of inferences compared to local deployments.\n", + "3. **Dependence on Third Parties:** Applications relying on external APIs are at risk if those services change their pricing model, terms of service, or experience outages.\n", + "4. **Cost at Scale:** While cost-effective for small projects, as usage scales up, costs can quickly add up and become significant.\n", + "\n", + "### Recommendations\n", + "\n", + "- **Use Local LLMs:**\n", + " - When data privacy is paramount (e.g., in healthcare, finance).\n", + " - For applications requiring ultra-low latency.\n", + " - In scenarios where customizability of the model for a specific task or domain is critical.\n", + "\n", + "- **Use Cloud-Based APIs:**\n", + " - For proofs-of-concept, prototypes, or early-stage startups with variable and potentially low initial demand.\n", + " - When scalability needs are high and unpredictable, requiring rapid adjustments.\n", + " - In cases where the expertise and resources to manage local ML deployments are lacking.\n", + "\n", + "### Hybrid Approach\n", + "\n", + "A potential middle ground involves a hybrid approach: using cloud services for initial development and testing (benefiting from ease of use and scalability), and then transitioning to local deployment once the application has grown and can justify the investment in hardware and expertise. This transition point depends on various factors, including cost considerations, privacy requirements, and specific latency needs.\n", + "\n", + "In conclusion, the choice between local LLMs like Llama via Llama Stack and cloud-based APIs such as OpenAI for production applications hinges on a careful evaluation of trade-offs related to cost, latency, privacy, scalability, and maintenance. Each approach has its place, depending on the specific requirements and constraints of an application or project.\n", + "==================================================\n", + "Turn 3\n", + "\n", + "==================================================\n", + "Turn 4\n", + "When planning your deployment strategy:\n", + "- **Evaluate Privacy Requirements:** If data privacy is a significant concern, favor local deployments.\n", + "- **Assess Scalability Needs:** For high variability in demand, cloud services might offer more flexibility.\n", + "- **Consider Cost Predictability:** Cloud APIs provide cost predictability for variable usage patterns but can become expensive at large scales. Local deployments have higher upfront costs but potentially lower long-term costs per inference.\n", + "\n", + "Ultimately, the best approach may involve a combination of both local and cloud solutions, tailored to meet the evolving needs of your application as it grows and matures.\n", + "==================================================\n", + "Turn 5\n", + "\n", + "==================================================\n", + "Turn 6\n", + "**Final Consideration:** Regardless of which path you choose, ensure you have a deep understanding of your data's privacy implications and the legal requirements surrounding its handling. Additionally, maintaining flexibility in your architecture can allow for transitions between deployment strategies as needed.\n", + "==================================================\n" + ] + } + ], "source": [ - "# Interactive assistant (uncomment to try)\n", - "# WARNING: This will prompt for user input!\n", - "\n", - "# assistant_interactive = AssistantAgent(\n", - "# name=\"InteractiveAssistant\",\n", - "# system_message=\"You are a helpful assistant. Ask clarifying questions when needed.\",\n", - "# llm_config=llm_config,\n", - "# )\n", - "\n", - "# user_proxy_interactive = UserProxyAgent(\n", - "# name=\"Human\",\n", - "# human_input_mode=\"TERMINATE\", # Ask for human input when TERMINATE is mentioned\n", - "# max_consecutive_auto_reply=5,\n", - "# )\n", - "\n", - "# user_proxy_interactive.initiate_chat(\n", - "# assistant_interactive,\n", - "# message=\"Help me plan a machine learning project for customer churn prediction.\"\n", - "# )\n", - "\n", - "print(\"💡 Human-in-the-loop example (commented out to avoid blocking notebook execution)\")\n", - "print(\"Uncomment the code above to try interactive mode!\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Example 5: Sequential Task Solving\n", - "\n", - "### Pattern: Chain of Thought Problem Solving\n", - "\n", - "**Scenario:** Solve a complex problem requiring multiple steps\n", - "\n", - "1. **Understand** the problem\n", - "2. **Plan** the solution approach\n", - "3. **Implement** the solution (code)\n", - "4. **Execute** and verify\n", - "5. **Explain** the results\n", - "\n", - "### Use Case: Fibonacci Sequence Analysis" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create a reasoning assistant\n", - "reasoning_assistant = AssistantAgent(\n", - " name=\"ReasoningAssistant\",\n", - " system_message=\"\"\"You are a problem-solving assistant.\n", - " For complex problems:\n", - " 1. Break down the problem\n", - " 2. Plan the solution step-by-step\n", - " 3. Write clean, well-commented code\n", - " 4. Explain results clearly\n", - " \"\"\",\n", - " llm_config=llm_config,\n", - ")\n", - "\n", - "user_proxy_reasoning = UserProxyAgent(\n", - " name=\"User\",\n", - " human_input_mode=\"NEVER\",\n", - " max_consecutive_auto_reply=5,\n", - " code_execution_config={\"work_dir\": \"reasoning\", \"use_docker\": False},\n", - ")\n", - "\n", - "# Complex problem requiring sequential reasoning\n", - "user_proxy_reasoning.initiate_chat(\n", - " reasoning_assistant,\n", - " message=\"\"\"Find the first 20 Fibonacci numbers where the number is also a prime number.\n", - "\n", - " Requirements:\n", - " 1. Explain the approach\n", - " 2. Write efficient Python code\n", - " 3. Display the results in a table\n", - " 4. Calculate what percentage of the first 100 Fibonacci numbers are prime\n", + "# Create an analyst agent\n", + "analyst = AssistantAgent(\n", + " name=\"TechAnalyst\",\n", + " model_client=model_client,\n", + " system_message=\"\"\"You are a technical analyst. Analyze technical topics deeply:\n", + " 1. Break down complex concepts\n", + " 2. Identify pros and cons\n", + " 3. Provide recommendations\n", " \"\"\"\n", - ")" + ")\n", + "\n", + "print(\"✅ Analyst agent created\")\n", + "\n", + "# Run extended analysis\n", + "async def run_analysis():\n", + " team = RoundRobinGroupChat([analyst], max_turns=5)\n", + "\n", + " task = \"\"\"Analyze the trade-offs between using local LLMs (like Llama via Llama Stack)\n", + " versus cloud-based APIs (like OpenAI) for production applications.\n", + " Consider: cost, latency, privacy, scalability, and maintenance.\"\"\"\n", + "\n", + " result = await team.run(task=task)\n", + " return result\n", + "\n", + "result = await run_analysis()\n", + "\n", + "print(\"\\n\" + \"=\"*50)\n", + "print(\"Analysis Result:\")\n", + "print(\"=\"*50)\n", + "i=1\n", + "for message in result.messages:\n", + " print (f\"Turn {i}\")\n", + " i+=1\n", + " print(message.content)\n", + " print(\"=\"*50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Comparison: AutoGen vs llamacrew\n", + "## Example 4: Advanced Termination Conditions\n", "\n", - "### When to Use AutoGen + Llama Stack\n", + "### Pattern: Code Review Loop with Stopping Logic\n", "\n", - "✅ **Use AutoGen when you need:**\n", - "- **Conversational** interactions between agents\n", - "- **Human-in-the-loop** workflows (interactive approval, feedback)\n", - "- **Code generation & execution** (data analysis, scripting)\n", - "- **Group discussions** (multiple agents debating, collaborating)\n", - "- **Dynamic problem-solving** (unknown number of back-and-forth exchanges)\n", - "- **Research/prototyping** (exploratory work)\n", + "This example demonstrates termination using:\n", + "1. **Multiple agents** in a review loop\n", + "2. **Termination on approval** - Stops when reviewer says \"LGTM\"\n", + "3. **Fallback with max_turns** for safety\n", "\n", - "**Example Use Cases:**\n", - "- Interactive coding assistant\n", - "- Research assistant with human feedback\n", - "- Multi-agent debate/discussion\n", - "- Tutoring/educational applications\n", - "- Dynamic customer support\n", - "\n", - "---\n", - "\n", - "### When to Use llamacrew\n", - "\n", - "✅ **Use llamacrew when you need:**\n", - "- **Production workflows** (blog writing, data pipelines)\n", - "- **Declarative DAGs** (predefined task dependencies)\n", - "- **Automatic parallelization** (framework optimizes)\n", - "- **Non-interactive automation** (scheduled jobs)\n", - "- **Minimal dependencies** (lightweight deployment)\n", - "- **Predictable workflows** (known steps)\n", - "\n", - "**Example Use Cases:**\n", - "- Automated blog post generation\n", - "- Data ETL pipelines\n", - "- Report generation\n", - "- Batch processing\n", - "- Production automation\n", - "\n", - "---\n", - "\n", - "### They're Complementary!\n", - "\n", - "- **AutoGen**: Conversational, interactive, exploratory\n", - "- **llamacrew**: Workflow, automated, production\n", - "\n", - "You might use **AutoGen for prototyping** then move to **llamacrew for production**!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Advanced: Custom Agent Behaviors\n", - "\n", - "### Pattern: Specialized Agent with Custom Logic\n", - "\n", - "You can create agents with custom behavior beyond just prompts." + "### Use Case: Iterative Code Review Until Approved" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, - "outputs": [], + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✅ Code review team created\n" + ] + } + ], "source": [ - "from typing import Dict, List, Union\n", - "import re\n", + "from autogen_agentchat.conditions import TextMentionTermination\n", "\n", - "class CodeReviewAgent(AssistantAgent):\n", - " \"\"\"Custom agent that reviews code for specific patterns.\"\"\"\n", - "\n", - " def __init__(self, *args, **kwargs):\n", - " super().__init__(*args, **kwargs)\n", - " self.issues_found = []\n", - "\n", - " def review_code(self, code: str) -> Dict[str, List[str]]:\n", - " \"\"\"Custom method to review code for common issues.\"\"\"\n", - " issues = []\n", - "\n", - " # Check for common issues\n", - " if \"TODO\" in code or \"FIXME\" in code:\n", - " issues.append(\"Code contains TODO/FIXME comments\")\n", - "\n", - " if not re.search(r'def \\w+\\(.*\\):', code):\n", - " issues.append(\"No function definitions found\")\n", - "\n", - " if \"print(\" in code:\n", - " issues.append(\"Contains print statements (consider logging)\")\n", - "\n", - " self.issues_found.extend(issues)\n", - " return {\"issues\": issues, \"total\": len(issues)}\n", - "\n", - "# Create custom reviewer\n", - "code_reviewer = CodeReviewAgent(\n", + "# Create code review agents\n", + "code_reviewer = AssistantAgent(\n", " name=\"CodeReviewer\",\n", - " system_message=\"\"\"You are a code reviewer. Analyze code for:\n", - " - Code quality\n", - " - Best practices\n", - " - Potential bugs\n", + " model_client=model_client,\n", + " system_message=\"\"\"You are a senior code reviewer. Review code for:\n", + " - Bugs and edge cases\n", " - Performance issues\n", - " Provide specific, actionable feedback.\n", - " \"\"\",\n", - " llm_config=llm_config,\n", + " - Security vulnerabilities\n", + " - Best practices\n", + "\n", + " If the code looks good, say 'LGTM' (Looks Good To Me).\n", + " If issues found, provide specific feedback for improvement.\"\"\"\n", ")\n", "\n", - "print(\"✅ Custom CodeReviewAgent created with specialized review logic\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Performance Tips\n", + "code_developer = AssistantAgent(\n", + " name=\"Developer\",\n", + " model_client=model_client,\n", + " system_message=\"\"\"You are a developer. When you receive code review feedback:\n", + " - Address ALL issues mentioned\n", + " - Explain your changes\n", + " - Present the improved code\n", "\n", - "### 1. Model Selection\n", - "\n", - "```python\n", - "# Fast models for simple tasks\n", - "config_fast = {\n", - " \"model\": \"ollama/llama3.2:3b\", # Smaller, faster\n", - " \"temperature\": 0.5,\n", - "}\n", - "\n", - "# Powerful models for complex reasoning\n", - "config_powerful = {\n", - " \"model\": \"ollama/llama3.3:70b\", # Larger, better quality\n", - " \"temperature\": 0.7,\n", - "}\n", - "```\n", - "\n", - "### 2. Limit Conversation Rounds\n", - "\n", - "```python\n", - "user_proxy = UserProxyAgent(\n", - " name=\"User\",\n", - " max_consecutive_auto_reply=3, # Prevent infinite loops\n", + " If no feedback is given, present your initial implementation.\"\"\"\n", ")\n", - "```\n", "\n", - "### 3. Set Timeouts\n", + "print(\"✅ Code review team created\")\n", "\n", - "```python\n", - "llm_config = {\n", - " \"timeout\": 60, # 60 second timeout per request\n", - " \"config_list\": config_list,\n", - "}\n", - "```\n", + "# Complex termination: Stops when reviewer approves OR max iterations reached\n", + "async def run_code_review_loop():\n", + " # Stop when reviewer says \"LGTM\"\n", + " approval_termination = TextMentionTermination(\"LGTM\")\n", "\n", - "### 4. Use Work Directories\n", + " team = RoundRobinGroupChat(\n", + " [code_developer, code_reviewer],\n", + " max_turns=16, # Max 4 review cycles (developer + reviewer = 2 turns per cycle)\n", + " termination_condition=approval_termination\n", + " )\n", "\n", - "```python\n", - "code_execution_config = {\n", - " \"work_dir\": \"autogen_workspace\", # Isolate generated files\n", - " \"use_docker\": False,\n", - "}\n", - "```" + " task = \"\"\"Implement a Python function to check if a string is a palindrome.\n", + "\n", + " The Developer should implement the function first.\n", + " The Reviewer should then review it and provide feedback.\n", + " Continue iterating until the Reviewer approves the code.\n", + " \"\"\"\n", + "\n", + " result = await team.run(task=task)\n", + " return result\n", + "\n", + "result = await run_code_review_loop()\n", + "\n", + "print(\"\\n\" + \"=\"*50)\n", + "print(f\"✅ Review completed in {len(result.messages)} message(s)\")\n", + "print(f\"Stop reason: {result.stop_reason}\")\n", + "print(\"=\"*50)\n", + "\n", + "# Show the conversation flow\n", + "print(\"\\n📝 Review Conversation Flow:\")\n", + "for i, msg in enumerate(result.messages, 1):\n", + " preview = msg.content[:150].replace('\\n', ' ')\n", + " print(f\"{i}. [{msg.source}]: {preview}...\")\n", + "\n", + "print(\"\\n\" + \"=\"*50)\n", + "print(\"Final Code (last message):\")\n", + "print(\"=\"*50)\n", + "if result.messages:\n", + " print(result.messages[-1].content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Troubleshooting\n", + "## Example 5: Practical Team Use Case\n", "\n", - "### Common Issues\n", + "### Pattern: Research → Write → Review Pipeline\n", "\n", - "#### 1. \"Could not connect to Llama Stack\"\n", + "A common pattern in content creation: research, draft, review, finalize.\n", "\n", - "```bash\n", - "# Check if Llama Stack is running\n", - "curl http://localhost:8321/health\n", + "### Use Case: Documentation Generator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# Create documentation team\n", + "doc_researcher = AssistantAgent(\n", + " name=\"DocResearcher\",\n", + " model_client=model_client,\n", + " system_message=\"You research technical topics and gather key information for documentation.\"\n", + ")\n", "\n", - "# Start Llama Stack if needed\n", - "llama stack run\n", - "```\n", + "doc_writer = AssistantAgent(\n", + " name=\"DocWriter\",\n", + " model_client=model_client,\n", + " system_message=\"You write clear, concise technical documentation with examples.\"\n", + ")\n", "\n", - "#### 2. \"Model not found\"\n", + "print(\"✅ Documentation team created\")\n", "\n", - "```bash\n", - "# List available models\n", - "curl http://localhost:8321/models\n", + "# Run documentation pipeline\n", + "async def create_documentation():\n", + " team = RoundRobinGroupChat([doc_researcher, doc_writer], max_turns=4)\n", + " task = \"\"\"Create documentation for a hypothetical food recipe:\n", "\n", - "# Make sure model name matches exactly:\n", - "# ✅ \"ollama/llama3.3:70b\"\n", - "# ❌ \"llama3.3:70b\"\n", - "```\n", + " Food: `Cheese Pizza`\n", "\n", - "#### 3. \"Agent not responding\"\n", + " Include:\n", + " - Description\n", + " - Ingredients\n", + " - How to make it\n", + " - Steps\n", + " \"\"\"\n", "\n", - "- Check `max_consecutive_auto_reply` isn't set too low\n", - "- Increase `timeout` in `llm_config`\n", - "- Verify Llama Stack model is loaded and warm\n", + " result = await team.run(task=task)\n", + " return result\n", "\n", - "#### 4. \"Code execution failed\"\n", + "result = await create_documentation()\n", "\n", - "- Make sure `code_execution_config` is set correctly\n", - "- Check file permissions on `work_dir`\n", - "- Install required Python packages\n", + "print(\"\\n\" + \"=\"*50)\n", + "print(\"Generated Documentation:\")\n", + "print(\"=\"*50)\n", + "i=1\n", + "for message in result.messages:\n", + " print(f\"Turn {i}\")\n", + " i+=1\n", + " print(message.content)\n", "\n", - "---\n", - "\n", - "### Debug Mode\n", - "\n", - "```python\n", - "import logging\n", - "\n", - "# Enable AutoGen debug logging\n", - "logging.basicConfig(level=logging.DEBUG)\n", - "```" + "# Turn 1: `DocResearcher` receives the task → researches the topic\n", + "# Turn 2: `DocWriter` sees the task + researcher's output → writes documentation\n", + "# Turn 3**: `DocResearcher` sees everything → can add more info\n", + "# Turn 4: `DocWriter` sees everything → refines documentation\n", + "# Stops at `max_turns=4`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Summary\n", - "\n", - "### What We Covered\n", - "\n", - "1. ✅ **Two-Agent Conversations** - UserProxy + Assistant pattern\n", - "2. ✅ **Code Generation & Execution** - AutoGen's killer feature\n", - "3. ✅ **Group Chat** - Multiple agents collaborating\n", - "4. ✅ **Human-in-the-Loop** - Interactive workflows\n", - "5. ✅ **Sequential Reasoning** - Complex problem solving\n", - "6. ✅ **Custom Agents** - Specialized behaviors\n", - "\n", - "### Key Takeaways\n", - "\n", - "**AutoGen + Llama Stack is powerful for:**\n", - "- 🗣️ **Conversational** multi-agent systems\n", - "- 👤 **Interactive** problem-solving with humans\n", - "- 💻 **Code generation** and execution\n", - "- 🤝 **Collaborative** agent discussions\n", - "\n", - "**vs llamacrew which is better for:**\n", - "- 🔄 **Production workflows** and pipelines\n", - "- 📊 **Declarative** task orchestration\n", - "- ⚡ **Automatic parallelization**\n", - "- 🤖 **Non-interactive** automation\n", - "\n", - "---\n", - "\n", "### Next Steps\n", "\n", - "1. Experiment with different agent combinations\n", - "2. Try human-in-the-loop workflows\n", - "3. Build custom agents for your use case\n", - "4. Compare AutoGen vs llamacrew for your specific needs\n", + "1. **Install autogen-ext**: `pip install autogen-agentchat autogen-ext`\n", + "2. **Start Llama Stack**: Ensure it's running on `http://localhost:8321`\n", + "3. **Experiment**: Try different team compositions and task types\n", + "4. **Explore**: Check out SelectorGroupChat and other team types\n", "\n", "### Resources\n", "\n", - "- **AutoGen Docs**: https://microsoft.github.io/autogen/\n", - "- **Llama Stack Docs**: https://llama-stack.readthedocs.io/\n", - "- **llamacrew Docs**: `/home/omara/Desktop/llamacrew/README.md`\n", - "\n", - "---\n", - "\n", - "**Happy multi-agent building! 🚀**" + "- **AutoGen v0.7.5 Docs**: https://microsoft.github.io/autogen/\n", + "- **Llama Stack Docs**: https://llama-stack.readthedocs.io/" ] } ],