added quickstart w ollama and toolcalling using together

2025-12-17 02:32:37 +00:00 · 2024-11-09 10:18:42 -08:00 · 2024-11-09 10:18:42 -08:00 · f40b78a941
commit f40b78a941
parent 2137b0af40
2 changed files with 476 additions and 472 deletions
--- a/zero_to_hero_guide/Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb
+++ b/zero_to_hero_guide/Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb
@ -1,474 +1,474 @@
 {
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "LLZwsT_J6OnZ"
-      },
-      "source": [
-        "<a href=\"https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "ME7IXK4M6Ona"
-      },
-      "source": [
-        "If you'd prefer not to set up a local server, explore this on tool calling with the Together API. This guide will show you how to leverage Together.ai's Llama Stack Server API, allowing you to get started with Llama Stack without the need for a locally built and running server.\n",
-        "\n",
-        "## Tool Calling w Together API\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "rWl1f1Hc6Onb"
-      },
-      "source": [
-        "In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
-        "1. Setting up and using the Brave Search API\n",
-        "2. Creating custom tools\n",
-        "3. Configuring tool prompts and safety settings"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "sRkJcA_O77hP",
-        "outputId": "49d33c5c-3300-4dc0-89a6-ff80bfc0bbdf"
-      },
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Collecting llama-stack-client\n",
-            "  Downloading llama_stack_client-0.0.50-py3-none-any.whl.metadata (13 kB)\n",
-            "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (3.7.1)\n",
-            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (1.9.0)\n",
-            "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (0.27.2)\n",
-            "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (2.9.2)\n",
-            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (1.3.1)\n",
-            "Requirement already satisfied: tabulate>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (0.9.0)\n",
-            "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (4.12.2)\n",
-            "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->llama-stack-client) (3.10)\n",
-            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->llama-stack-client) (1.2.2)\n",
-            "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->llama-stack-client) (2024.8.30)\n",
-            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->llama-stack-client) (1.0.6)\n",
-            "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama-stack-client) (0.14.0)\n",
-            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->llama-stack-client) (0.7.0)\n",
-            "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->llama-stack-client) (2.23.4)\n",
-            "Downloading llama_stack_client-0.0.50-py3-none-any.whl (282 kB)\n",
-            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m283.0/283.0 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
-            "\u001b[?25hInstalling collected packages: llama-stack-client\n",
-            "Successfully installed llama-stack-client-0.0.50\n"
-          ]
-        }
-      ],
-      "source": [
-        "!pip install llama-stack-client"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "T_EW_jV81ldl"
-      },
-      "outputs": [],
-      "source": [
-        "LLAMA_STACK_API_TOGETHER_URL=\"https://llama-stack.together.ai\"\n",
-        "LLAMA31_8B_INSTRUCT = \"Llama3.1-8B-Instruct\""
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "n_QHq45B6Onb"
-      },
-      "outputs": [],
-      "source": [
-        "import asyncio\n",
-        "import os\n",
-        "from typing import Dict, List, Optional\n",
-        "\n",
-        "from llama_stack_client import LlamaStackClient\n",
-        "from llama_stack_client.lib.agents.agent import Agent\n",
-        "from llama_stack_client.lib.agents.event_logger import EventLogger\n",
-        "from llama_stack_client.types.agent_create_params import (\n",
-        "    AgentConfig,\n",
-        "    AgentConfigToolSearchToolDefinition,\n",
-        ")\n",
-        "\n",
-        "# Helper function to create an agent with tools\n",
-        "async def create_tool_agent(\n",
-        "    client: LlamaStackClient,\n",
-        "    tools: List[Dict],\n",
-        "    instructions: str = \"You are a helpful assistant\",\n",
-        "    model: str = LLAMA31_8B_INSTRUCT\n",
-        ") -> Agent:\n",
-        "    \"\"\"Create an agent with specified tools.\"\"\"\n",
-        "    print(\"Using the following model: \", model)\n",
-        "    agent_config = AgentConfig(\n",
-        "        model=model,\n",
-        "        instructions=instructions,\n",
-        "        sampling_params={\n",
-        "            \"strategy\": \"greedy\",\n",
-        "            \"temperature\": 1.0,\n",
-        "            \"top_p\": 0.9,\n",
-        "        },\n",
-        "        tools=tools,\n",
-        "        tool_choice=\"auto\",\n",
-        "        tool_prompt_format=\"json\",\n",
-        "        enable_session_persistence=True,\n",
-        "    )\n",
-        "\n",
-        "    return Agent(client, agent_config)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "3Bjr891C6Onc",
-        "outputId": "85245ae4-fba4-4ddb-8775-11262ddb1c29"
-      },
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Using the following model:  Llama3.1-8B-Instruct\n",
-            "\n",
-            "Query: What are the latest developments in quantum computing?\n",
-            "--------------------------------------------------\n",
-            "inference> FINDINGS:\n",
-            "The latest developments in quantum computing involve significant advancements in the field of quantum processors, error correction, and the development of practical applications. Some of the recent breakthroughs include:\n",
-            "\n",
-            "* Google's 53-qubit Sycamore processor, which achieved quantum supremacy in 2019 (Source: Google AI Blog, https://ai.googleblog.com/2019/10/experiment-advances-quantum-computing.html)\n",
-            "* The development of a 100-qubit quantum processor by the Chinese company, Origin Quantum (Source: Physics World, https://physicsworld.com/a/origin-quantum-scales-up-to-100-qubits/)\n",
-            "* IBM's 127-qubit Eagle processor, which has the potential to perform complex calculations that are currently unsolvable by classical computers (Source: IBM Research Blog, https://www.ibm.com/blogs/research/2020/11/ibm-advances-quantum-computing-research-with-new-127-qubit-processor/)\n",
-            "* The development of topological quantum computers, which have the potential to solve complex problems in materials science and chemistry (Source: MIT Technology Review, https://www.technologyreview.com/2020/02/24/914776/topological-quantum-computers-are-a-game-changer-for-materials-science/)\n",
-            "* The development of a new type of quantum error correction code, known as the \"surface code\", which has the potential to solve complex problems in quantum computing (Source: Nature Physics, https://www.nature.com/articles/s41567-021-01314-2)\n",
-            "\n",
-            "SOURCES:\n",
-            "- Google AI Blog: https://ai.googleblog.com/2019/10/experiment-advances-quantum-computing.html\n",
-            "- Physics World: https://physicsworld.com/a/origin-quantum-scales-up-to-100-qubits/\n",
-            "- IBM Research Blog: https://www.ibm.com/blogs/research/2020/11/ibm-advances-quantum-computing-research-with-new-127-qubit-processor/\n",
-            "- MIT Technology Review: https://www.technologyreview.com/2020/02/24/914776/topological-quantum-computers-are-a-game-changer-for-materials-science/\n",
-            "- Nature Physics: https://www.nature.com/articles/s41567-021-01314-2\n"
-          ]
-        }
-      ],
-      "source": [
-        "# comment this if you don't have a BRAVE_SEARCH_API_KEY\n",
-        "os.environ[\"BRAVE_SEARCH_API_KEY\"] = 'YOUR_BRAVE_SEARCH_API_KEY'\n",
-        "\n",
-        "async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
-        "    \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
-        "\n",
-        "    # comment this if you don't have a BRAVE_SEARCH_API_KEY\n",
-        "    search_tool = AgentConfigToolSearchToolDefinition(\n",
-        "        type=\"brave_search\",\n",
-        "        engine=\"brave\",\n",
-        "        api_key=os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
-        "    )\n",
-        "\n",
-        "    return await create_tool_agent(\n",
-        "        client=client,\n",
-        "        tools=[search_tool], # set this to [] if you don't have a BRAVE_SEARCH_API_KEY\n",
-        "        model = LLAMA31_8B_INSTRUCT,\n",
-        "        instructions=\"\"\"\n",
-        "        You are a research assistant that can search the web.\n",
-        "        Always cite your sources with URLs when providing information.\n",
-        "        Format your responses as:\n",
-        "\n",
-        "        FINDINGS:\n",
-        "        [Your summary here]\n",
-        "\n",
-        "        SOURCES:\n",
-        "        - [Source title](URL)\n",
-        "        \"\"\"\n",
-        "    )\n",
-        "\n",
-        "# Example usage\n",
-        "async def search_example():\n",
-        "    client = LlamaStackClient(base_url=LLAMA_STACK_API_TOGETHER_URL)\n",
-        "    agent = await create_search_agent(client)\n",
-        "\n",
-        "    # Create a session\n",
-        "    session_id = agent.create_session(\"search-session\")\n",
-        "\n",
-        "    # Example queries\n",
-        "    queries = [\n",
-        "        \"What are the latest developments in quantum computing?\",\n",
-        "        #\"Who won the most recent Super Bowl?\",\n",
-        "    ]\n",
-        "\n",
-        "    for query in queries:\n",
-        "        print(f\"\\nQuery: {query}\")\n",
-        "        print(\"-\" * 50)\n",
-        "\n",
-        "        response = agent.create_turn(\n",
-        "            messages=[{\"role\": \"user\", \"content\": query}],\n",
-        "            session_id=session_id,\n",
-        "        )\n",
-        "\n",
-        "        async for log in EventLogger().log(response):\n",
-        "            log.print()\n",
-        "\n",
-        "# Run the example (in Jupyter, use asyncio.run())\n",
-        "await search_example()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "r3YN6ufb6Onc"
-      },
-      "source": [
-        "## 3. Custom Tool Creation\n",
-        "\n",
-        "Let's create a custom weather tool:\n",
-        "\n",
-        "#### Key Highlights:\n",
-        "- **`WeatherTool` Class**: A custom tool that processes weather information requests, supporting location and optional date parameters.\n",
-        "- **Agent Creation**: The `create_weather_agent` function sets up an agent equipped with the `WeatherTool`, allowing for weather queries in natural language.\n",
-        "- **Simulation of API Call**: The `run_impl` method simulates fetching weather data. This method can be replaced with an actual API integration for real-world usage.\n",
-        "- **Interactive Example**: The `weather_example` function shows how to use the agent to handle user queries regarding the weather, providing step-by-step responses."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "A0bOLYGj6Onc",
-        "outputId": "023a8fb7-49ed-4ab4-e5b7-8050ded5d79a"
-      },
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "\n",
-            "Query: What's the weather like in San Francisco?\n",
-            "--------------------------------------------------\n",
-            "inference> {\n",
-            "    \"function\": \"get_weather\",\n",
-            "    \"parameters\": {\n",
-            "        \"location\": \"San Francisco\"\n",
-            "    }\n",
-            "}\n",
-            "\n",
-            "Query: Tell me the weather in Tokyo tomorrow\n",
-            "--------------------------------------------------\n",
-            "inference> {\n",
-            "    \"function\": \"get_weather\",\n",
-            "    \"parameters\": {\n",
-            "        \"location\": \"Tokyo\",\n",
-            "        \"date\": \"tomorrow\"\n",
-            "    }\n",
-            "}\n"
-          ]
-        }
-      ],
-      "source": [
-        "from typing import TypedDict, Optional, Dict, Any\n",
-        "from datetime import datetime\n",
-        "import json\n",
-        "from llama_stack_client.types.tool_param_definition_param import ToolParamDefinitionParam\n",
-        "from llama_stack_client.types import CompletionMessage,ToolResponseMessage\n",
-        "from llama_stack_client.lib.agents.custom_tool import CustomTool\n",
-        "\n",
-        "class WeatherTool(CustomTool):\n",
-        "    \"\"\"Example custom tool for weather information.\"\"\"\n",
-        "\n",
-        "    def get_name(self) -> str:\n",
-        "        return \"get_weather\"\n",
-        "\n",
-        "    def get_description(self) -> str:\n",
-        "        return \"Get weather information for a location\"\n",
-        "\n",
-        "    def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
-        "        return {\n",
-        "            \"location\": ToolParamDefinitionParam(\n",
-        "                param_type=\"str\",\n",
-        "                description=\"City or location name\",\n",
-        "                required=True\n",
-        "            ),\n",
-        "            \"date\": ToolParamDefinitionParam(\n",
-        "                param_type=\"str\",\n",
-        "                description=\"Optional date (YYYY-MM-DD)\",\n",
-        "                required=False\n",
-        "            )\n",
-        "        }\n",
-        "    async def run(self, messages: List[CompletionMessage]) -> List[ToolResponseMessage]:\n",
-        "        assert len(messages) == 1, \"Expected single message\"\n",
-        "\n",
-        "        message = messages[0]\n",
-        "\n",
-        "        tool_call = message.tool_calls[0]\n",
-        "        # location = tool_call.arguments.get(\"location\", None)\n",
-        "        # date = tool_call.arguments.get(\"date\", None)\n",
-        "        try:\n",
-        "            response = await self.run_impl(**tool_call.arguments)\n",
-        "            response_str = json.dumps(response, ensure_ascii=False)\n",
-        "        except Exception as e:\n",
-        "            response_str = f\"Error when running tool: {e}\"\n",
-        "\n",
-        "        message = ToolResponseMessage(\n",
-        "            call_id=tool_call.call_id,\n",
-        "            tool_name=tool_call.tool_name,\n",
-        "            content=response_str,\n",
-        "            role=\"ipython\",\n",
-        "        )\n",
-        "        return [message]\n",
-        "\n",
-        "    async def run_impl(self, location: str, date: Optional[str] = None) -> Dict[str, Any]:\n",
-        "        \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
-        "        # Mock implementation\n",
-        "        if date:\n",
-        "            return {\n",
-        "            \"temperature\": 90.1,\n",
-        "            \"conditions\": \"sunny\",\n",
-        "            \"humidity\": 40.0\n",
-        "        }\n",
-        "        return {\n",
-        "            \"temperature\": 72.5,\n",
-        "            \"conditions\": \"partly cloudy\",\n",
-        "            \"humidity\": 65.0\n",
-        "        }\n",
-        "\n",
-        "\n",
-        "async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
-        "    \"\"\"Create an agent with weather tool capability.\"\"\"\n",
-        "\n",
-        "    agent_config = AgentConfig(\n",
-        "        model=LLAMA31_8B_INSTRUCT,\n",
-        "        #model=model_name,\n",
-        "        instructions=\"\"\"\n",
-        "        You are a weather assistant that can provide weather information.\n",
-        "        Always specify the location clearly in your responses.\n",
-        "        Include both temperature and conditions in your summaries.\n",
-        "        \"\"\",\n",
-        "        sampling_params={\n",
-        "            \"strategy\": \"greedy\",\n",
-        "            \"temperature\": 1.0,\n",
-        "            \"top_p\": 0.9,\n",
-        "        },\n",
-        "        tools=[\n",
-        "            {\n",
-        "                \"function_name\": \"get_weather\",\n",
-        "                \"description\": \"Get weather information for a location\",\n",
-        "                \"parameters\": {\n",
-        "                    \"location\": {\n",
-        "                        \"param_type\": \"str\",\n",
-        "                        \"description\": \"City or location name\",\n",
-        "                        \"required\": True,\n",
-        "                    },\n",
-        "                    \"date\": {\n",
-        "                        \"param_type\": \"str\",\n",
-        "                        \"description\": \"Optional date (YYYY-MM-DD)\",\n",
-        "                        \"required\": False,\n",
-        "                    },\n",
-        "                },\n",
-        "                \"type\": \"function_call\",\n",
-        "            }\n",
-        "        ],\n",
-        "        tool_choice=\"auto\",\n",
-        "        tool_prompt_format=\"json\",\n",
-        "        input_shields=[],\n",
-        "        output_shields=[],\n",
-        "        enable_session_persistence=True\n",
-        "    )\n",
-        "\n",
-        "    # Create the agent with the tool\n",
-        "    weather_tool = WeatherTool()\n",
-        "    agent = Agent(\n",
-        "        client=client,\n",
-        "        agent_config=agent_config,\n",
-        "        custom_tools=[weather_tool]\n",
-        "    )\n",
-        "\n",
-        "    return agent\n",
-        "\n",
-        "# Example usage\n",
-        "async def weather_example():\n",
-        "    client = LlamaStackClient(base_url=LLAMA_STACK_API_TOGETHER_URL)\n",
-        "    agent = await create_weather_agent(client)\n",
-        "    session_id = agent.create_session(\"weather-session\")\n",
-        "\n",
-        "    queries = [\n",
-        "        \"What's the weather like in San Francisco?\",\n",
-        "        \"Tell me the weather in Tokyo tomorrow\",\n",
-        "    ]\n",
-        "\n",
-        "    for query in queries:\n",
-        "        print(f\"\\nQuery: {query}\")\n",
-        "        print(\"-\" * 50)\n",
-        "\n",
-        "        response = agent.create_turn(\n",
-        "            messages=[{\"role\": \"user\", \"content\": query}],\n",
-        "            session_id=session_id,\n",
-        "        )\n",
-        "\n",
-        "        async for log in EventLogger().log(response):\n",
-        "            log.print()\n",
-        "\n",
-        "# For Jupyter notebooks\n",
-        "import nest_asyncio\n",
-        "nest_asyncio.apply()\n",
-        "\n",
-        "# Run the example\n",
-        "await weather_example()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "yKhUkVNq6Onc"
-      },
-      "source": [
-        "Thanks for checking out this tutorial, hopefully you can now automate everything with Llama! :D\n",
-        "\n",
-        "Next up, we learn another hot topic of LLMs: Memory and Rag. Continue learning [here](./04_Memory101.ipynb)!"
-      ]
-    }
-  ],
-  "metadata": {
-    "colab": {
-      "provenance": []
-    },
-    "kernelspec": {
-      "display_name": "Python 3 (ipykernel)",
-      "language": "python",
-      "name": "python3"
-    },
-    "language_info": {
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "file_extension": ".py",
-      "mimetype": "text/x-python",
-      "name": "python",
-      "nbconvert_exporter": "python",
-      "pygments_lexer": "ipython3",
-      "version": "3.10.15"
-    }
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "LLZwsT_J6OnZ"
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+   ]
  },
-  "nbformat": 4,
-  "nbformat_minor": 0
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ME7IXK4M6Ona"
+   },
+   "source": [
+    "If you'd prefer not to set up a local server, explore this on tool calling with the Together API. This guide will show you how to leverage Together.ai's Llama Stack Server API, allowing you to get started with Llama Stack without the need for a locally built and running server.\n",
+    "\n",
+    "## Tool Calling w Together API\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "rWl1f1Hc6Onb"
+   },
+   "source": [
+    "In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
+    "1. Setting up and using the Brave Search API\n",
+    "2. Creating custom tools\n",
+    "3. Configuring tool prompts and safety settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "sRkJcA_O77hP",
+    "outputId": "49d33c5c-3300-4dc0-89a6-ff80bfc0bbdf"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Collecting llama-stack-client\n",
+      "  Downloading llama_stack_client-0.0.50-py3-none-any.whl.metadata (13 kB)\n",
+      "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (3.7.1)\n",
+      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (1.9.0)\n",
+      "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (0.27.2)\n",
+      "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (2.9.2)\n",
+      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (1.3.1)\n",
+      "Requirement already satisfied: tabulate>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (0.9.0)\n",
+      "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (4.12.2)\n",
+      "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->llama-stack-client) (3.10)\n",
+      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->llama-stack-client) (1.2.2)\n",
+      "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->llama-stack-client) (2024.8.30)\n",
+      "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->llama-stack-client) (1.0.6)\n",
+      "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama-stack-client) (0.14.0)\n",
+      "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->llama-stack-client) (0.7.0)\n",
+      "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->llama-stack-client) (2.23.4)\n",
+      "Downloading llama_stack_client-0.0.50-py3-none-any.whl (282 kB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m283.0/283.0 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hInstalling collected packages: llama-stack-client\n",
+      "Successfully installed llama-stack-client-0.0.50\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install llama-stack-client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "T_EW_jV81ldl"
+   },
+   "outputs": [],
+   "source": [
+    "LLAMA_STACK_API_TOGETHER_URL=\"https://llama-stack.together.ai\"\n",
+    "LLAMA31_8B_INSTRUCT = \"Llama3.1-8B-Instruct\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "n_QHq45B6Onb"
+   },
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "import os\n",
+    "from typing import Dict, List, Optional\n",
+    "\n",
+    "from llama_stack_client import LlamaStackClient\n",
+    "from llama_stack_client.lib.agents.agent import Agent\n",
+    "from llama_stack_client.lib.agents.event_logger import EventLogger\n",
+    "from llama_stack_client.types.agent_create_params import (\n",
+    "    AgentConfig,\n",
+    "    AgentConfigToolSearchToolDefinition,\n",
+    ")\n",
+    "\n",
+    "# Helper function to create an agent with tools\n",
+    "async def create_tool_agent(\n",
+    "    client: LlamaStackClient,\n",
+    "    tools: List[Dict],\n",
+    "    instructions: str = \"You are a helpful assistant\",\n",
+    "    model: str = LLAMA31_8B_INSTRUCT\n",
+    ") -> Agent:\n",
+    "    \"\"\"Create an agent with specified tools.\"\"\"\n",
+    "    print(\"Using the following model: \", model)\n",
+    "    agent_config = AgentConfig(\n",
+    "        model=model,\n",
+    "        instructions=instructions,\n",
+    "        sampling_params={\n",
+    "            \"strategy\": \"greedy\",\n",
+    "            \"temperature\": 1.0,\n",
+    "            \"top_p\": 0.9,\n",
+    "        },\n",
+    "        tools=tools,\n",
+    "        tool_choice=\"auto\",\n",
+    "        tool_prompt_format=\"json\",\n",
+    "        enable_session_persistence=True,\n",
+    "    )\n",
+    "\n",
+    "    return Agent(client, agent_config)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "3Bjr891C6Onc",
+    "outputId": "85245ae4-fba4-4ddb-8775-11262ddb1c29"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Using the following model:  Llama3.1-8B-Instruct\n",
+      "\n",
+      "Query: What are the latest developments in quantum computing?\n",
+      "--------------------------------------------------\n",
+      "inference> FINDINGS:\n",
+      "The latest developments in quantum computing involve significant advancements in the field of quantum processors, error correction, and the development of practical applications. Some of the recent breakthroughs include:\n",
+      "\n",
+      "* Google's 53-qubit Sycamore processor, which achieved quantum supremacy in 2019 (Source: Google AI Blog, https://ai.googleblog.com/2019/10/experiment-advances-quantum-computing.html)\n",
+      "* The development of a 100-qubit quantum processor by the Chinese company, Origin Quantum (Source: Physics World, https://physicsworld.com/a/origin-quantum-scales-up-to-100-qubits/)\n",
+      "* IBM's 127-qubit Eagle processor, which has the potential to perform complex calculations that are currently unsolvable by classical computers (Source: IBM Research Blog, https://www.ibm.com/blogs/research/2020/11/ibm-advances-quantum-computing-research-with-new-127-qubit-processor/)\n",
+      "* The development of topological quantum computers, which have the potential to solve complex problems in materials science and chemistry (Source: MIT Technology Review, https://www.technologyreview.com/2020/02/24/914776/topological-quantum-computers-are-a-game-changer-for-materials-science/)\n",
+      "* The development of a new type of quantum error correction code, known as the \"surface code\", which has the potential to solve complex problems in quantum computing (Source: Nature Physics, https://www.nature.com/articles/s41567-021-01314-2)\n",
+      "\n",
+      "SOURCES:\n",
+      "- Google AI Blog: https://ai.googleblog.com/2019/10/experiment-advances-quantum-computing.html\n",
+      "- Physics World: https://physicsworld.com/a/origin-quantum-scales-up-to-100-qubits/\n",
+      "- IBM Research Blog: https://www.ibm.com/blogs/research/2020/11/ibm-advances-quantum-computing-research-with-new-127-qubit-processor/\n",
+      "- MIT Technology Review: https://www.technologyreview.com/2020/02/24/914776/topological-quantum-computers-are-a-game-changer-for-materials-science/\n",
+      "- Nature Physics: https://www.nature.com/articles/s41567-021-01314-2\n"
+     ]
+    }
+   ],
+   "source": [
+    "# comment this if you don't have a BRAVE_SEARCH_API_KEY\n",
+    "os.environ[\"BRAVE_SEARCH_API_KEY\"] = 'YOUR_BRAVE_SEARCH_API_KEY'\n",
+    "\n",
+    "async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
+    "    \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
+    "\n",
+    "    # comment this if you don't have a BRAVE_SEARCH_API_KEY\n",
+    "    search_tool = AgentConfigToolSearchToolDefinition(\n",
+    "        type=\"brave_search\",\n",
+    "        engine=\"brave\",\n",
+    "        api_key=os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
+    "    )\n",
+    "\n",
+    "    return await create_tool_agent(\n",
+    "        client=client,\n",
+    "        tools=[search_tool], # set this to [] if you don't have a BRAVE_SEARCH_API_KEY\n",
+    "        model = LLAMA31_8B_INSTRUCT,\n",
+    "        instructions=\"\"\"\n",
+    "        You are a research assistant that can search the web.\n",
+    "        Always cite your sources with URLs when providing information.\n",
+    "        Format your responses as:\n",
+    "\n",
+    "        FINDINGS:\n",
+    "        [Your summary here]\n",
+    "\n",
+    "        SOURCES:\n",
+    "        - [Source title](URL)\n",
+    "        \"\"\"\n",
+    "    )\n",
+    "\n",
+    "# Example usage\n",
+    "async def search_example():\n",
+    "    client = LlamaStackClient(base_url=LLAMA_STACK_API_TOGETHER_URL)\n",
+    "    agent = await create_search_agent(client)\n",
+    "\n",
+    "    # Create a session\n",
+    "    session_id = agent.create_session(\"search-session\")\n",
+    "\n",
+    "    # Example queries\n",
+    "    queries = [\n",
+    "        \"What are the latest developments in quantum computing?\",\n",
+    "        #\"Who won the most recent Super Bowl?\",\n",
+    "    ]\n",
+    "\n",
+    "    for query in queries:\n",
+    "        print(f\"\\nQuery: {query}\")\n",
+    "        print(\"-\" * 50)\n",
+    "\n",
+    "        response = agent.create_turn(\n",
+    "            messages=[{\"role\": \"user\", \"content\": query}],\n",
+    "            session_id=session_id,\n",
+    "        )\n",
+    "\n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()\n",
+    "\n",
+    "# Run the example (in Jupyter, use asyncio.run())\n",
+    "await search_example()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "r3YN6ufb6Onc"
+   },
+   "source": [
+    "## 3. Custom Tool Creation\n",
+    "\n",
+    "Let's create a custom weather tool:\n",
+    "\n",
+    "#### Key Highlights:\n",
+    "- **`WeatherTool` Class**: A custom tool that processes weather information requests, supporting location and optional date parameters.\n",
+    "- **Agent Creation**: The `create_weather_agent` function sets up an agent equipped with the `WeatherTool`, allowing for weather queries in natural language.\n",
+    "- **Simulation of API Call**: The `run_impl` method simulates fetching weather data. This method can be replaced with an actual API integration for real-world usage.\n",
+    "- **Interactive Example**: The `weather_example` function shows how to use the agent to handle user queries regarding the weather, providing step-by-step responses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "A0bOLYGj6Onc",
+    "outputId": "023a8fb7-49ed-4ab4-e5b7-8050ded5d79a"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Query: What's the weather like in San Francisco?\n",
+      "--------------------------------------------------\n",
+      "inference> {\n",
+      "    \"function\": \"get_weather\",\n",
+      "    \"parameters\": {\n",
+      "        \"location\": \"San Francisco\"\n",
+      "    }\n",
+      "}\n",
+      "\n",
+      "Query: Tell me the weather in Tokyo tomorrow\n",
+      "--------------------------------------------------\n",
+      "inference> {\n",
+      "    \"function\": \"get_weather\",\n",
+      "    \"parameters\": {\n",
+      "        \"location\": \"Tokyo\",\n",
+      "        \"date\": \"tomorrow\"\n",
+      "    }\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from typing import TypedDict, Optional, Dict, Any\n",
+    "from datetime import datetime\n",
+    "import json\n",
+    "from llama_stack_client.types.tool_param_definition_param import ToolParamDefinitionParam\n",
+    "from llama_stack_client.types import CompletionMessage,ToolResponseMessage\n",
+    "from llama_stack_client.lib.agents.custom_tool import CustomTool\n",
+    "\n",
+    "class WeatherTool(CustomTool):\n",
+    "    \"\"\"Example custom tool for weather information.\"\"\"\n",
+    "\n",
+    "    def get_name(self) -> str:\n",
+    "        return \"get_weather\"\n",
+    "\n",
+    "    def get_description(self) -> str:\n",
+    "        return \"Get weather information for a location\"\n",
+    "\n",
+    "    def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
+    "        return {\n",
+    "            \"location\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"City or location name\",\n",
+    "                required=True\n",
+    "            ),\n",
+    "            \"date\": ToolParamDefinitionParam(\n",
+    "                param_type=\"str\",\n",
+    "                description=\"Optional date (YYYY-MM-DD)\",\n",
+    "                required=False\n",
+    "            )\n",
+    "        }\n",
+    "    async def run(self, messages: List[CompletionMessage]) -> List[ToolResponseMessage]:\n",
+    "        assert len(messages) == 1, \"Expected single message\"\n",
+    "\n",
+    "        message = messages[0]\n",
+    "\n",
+    "        tool_call = message.tool_calls[0]\n",
+    "        # location = tool_call.arguments.get(\"location\", None)\n",
+    "        # date = tool_call.arguments.get(\"date\", None)\n",
+    "        try:\n",
+    "            response = await self.run_impl(**tool_call.arguments)\n",
+    "            response_str = json.dumps(response, ensure_ascii=False)\n",
+    "        except Exception as e:\n",
+    "            response_str = f\"Error when running tool: {e}\"\n",
+    "\n",
+    "        message = ToolResponseMessage(\n",
+    "            call_id=tool_call.call_id,\n",
+    "            tool_name=tool_call.tool_name,\n",
+    "            content=response_str,\n",
+    "            role=\"ipython\",\n",
+    "        )\n",
+    "        return [message]\n",
+    "\n",
+    "    async def run_impl(self, location: str, date: Optional[str] = None) -> Dict[str, Any]:\n",
+    "        \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
+    "        # Mock implementation\n",
+    "        if date:\n",
+    "            return {\n",
+    "            \"temperature\": 90.1,\n",
+    "            \"conditions\": \"sunny\",\n",
+    "            \"humidity\": 40.0\n",
+    "        }\n",
+    "        return {\n",
+    "            \"temperature\": 72.5,\n",
+    "            \"conditions\": \"partly cloudy\",\n",
+    "            \"humidity\": 65.0\n",
+    "        }\n",
+    "\n",
+    "\n",
+    "async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
+    "    \"\"\"Create an agent with weather tool capability.\"\"\"\n",
+    "\n",
+    "    agent_config = AgentConfig(\n",
+    "        model=LLAMA31_8B_INSTRUCT,\n",
+    "        #model=model_name,\n",
+    "        instructions=\"\"\"\n",
+    "        You are a weather assistant that can provide weather information.\n",
+    "        Always specify the location clearly in your responses.\n",
+    "        Include both temperature and conditions in your summaries.\n",
+    "        \"\"\",\n",
+    "        sampling_params={\n",
+    "            \"strategy\": \"greedy\",\n",
+    "            \"temperature\": 1.0,\n",
+    "            \"top_p\": 0.9,\n",
+    "        },\n",
+    "        tools=[\n",
+    "            {\n",
+    "                \"function_name\": \"get_weather\",\n",
+    "                \"description\": \"Get weather information for a location\",\n",
+    "                \"parameters\": {\n",
+    "                    \"location\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"City or location name\",\n",
+    "                        \"required\": True,\n",
+    "                    },\n",
+    "                    \"date\": {\n",
+    "                        \"param_type\": \"str\",\n",
+    "                        \"description\": \"Optional date (YYYY-MM-DD)\",\n",
+    "                        \"required\": False,\n",
+    "                    },\n",
+    "                },\n",
+    "                \"type\": \"function_call\",\n",
+    "            }\n",
+    "        ],\n",
+    "        tool_choice=\"auto\",\n",
+    "        tool_prompt_format=\"json\",\n",
+    "        input_shields=[],\n",
+    "        output_shields=[],\n",
+    "        enable_session_persistence=True\n",
+    "    )\n",
+    "\n",
+    "    # Create the agent with the tool\n",
+    "    weather_tool = WeatherTool()\n",
+    "    agent = Agent(\n",
+    "        client=client,\n",
+    "        agent_config=agent_config,\n",
+    "        custom_tools=[weather_tool]\n",
+    "    )\n",
+    "\n",
+    "    return agent\n",
+    "\n",
+    "# Example usage\n",
+    "async def weather_example():\n",
+    "    client = LlamaStackClient(base_url=LLAMA_STACK_API_TOGETHER_URL)\n",
+    "    agent = await create_weather_agent(client)\n",
+    "    session_id = agent.create_session(\"weather-session\")\n",
+    "\n",
+    "    queries = [\n",
+    "        \"What's the weather like in San Francisco?\",\n",
+    "        \"Tell me the weather in Tokyo tomorrow\",\n",
+    "    ]\n",
+    "\n",
+    "    for query in queries:\n",
+    "        print(f\"\\nQuery: {query}\")\n",
+    "        print(\"-\" * 50)\n",
+    "\n",
+    "        response = agent.create_turn(\n",
+    "            messages=[{\"role\": \"user\", \"content\": query}],\n",
+    "            session_id=session_id,\n",
+    "        )\n",
+    "\n",
+    "        async for log in EventLogger().log(response):\n",
+    "            log.print()\n",
+    "\n",
+    "# For Jupyter notebooks\n",
+    "import nest_asyncio\n",
+    "nest_asyncio.apply()\n",
+    "\n",
+    "# Run the example\n",
+    "await weather_example()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "yKhUkVNq6Onc"
+   },
+   "source": [
+    "Thanks for checking out this tutorial, hopefully you can now automate everything with Llama! :D\n",
+    "\n",
+    "Next up, we learn another hot topic of LLMs: Memory and Rag. Continue learning [here](./04_Memory101.ipynb)!"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
 }
--- a/zero_to_hero_guide/quickstart.md
+++ b/zero_to_hero_guide/quickstart.md
@ -38,6 +38,10 @@ If you're looking for more specific topics like tool calling or agent setup, we
     ```
     **Note**: The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L43)

+1. **Download Ollama App**:
+   - Go to [https://ollama.com/download](https://ollama.com/download).
+   - Download and unzip `Ollama-darwin.zip`.
+   - Run the `Ollama` application.

 ---

@ -107,7 +111,7 @@ After setting up the server, open a new terminal window and verify it's working
 curl http://localhost:5050/inference/chat_completion \
 -H "Content-Type: application/json" \
 -d '{
-    "model": "Llama3.2-3B-Instruct",
+    "model": "llama3.2:1b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write me a 2-sentence poem about the moon"}