diff --git a/docs/source/comprehensive-start.md b/docs/source/comprehensive-start.md deleted file mode 100644 index 604c87568..000000000 --- a/docs/source/comprehensive-start.md +++ /dev/null @@ -1,111 +0,0 @@ - -# Getting Started with Llama Stack - -This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our [documentation](../README.md) for more on Llama Stack's capabilities, or visit [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) for example apps. - -## Installation - -The `llama` CLI tool helps you manage the Llama toolchain & agentic systems. After installing the `llama-stack` package, the `llama` command should be available in your path. - -You can install this repository in two ways: - -1. **Install as a package**: - Install directly from [PyPI](https://pypi.org/project/llama-stack/) with: - ```bash - pip install llama-stack - ``` - -2. **Install from source**: - Follow these steps to install from the source code: - ```bash - mkdir -p ~/local - cd ~/local - git clone git@github.com:meta-llama/llama-stack.git - - conda create -n stack python=3.10 - conda activate stack - - cd llama-stack - $CONDA_PREFIX/bin/pip install -e . - ``` - -Refer to the [CLI Reference](./cli_reference.md) for details on Llama CLI commands. - -## Starting Up Llama Stack Server - -There are two ways to start the Llama Stack server: - -1. **Using Docker**: - We provide a pre-built Docker image of Llama Stack, available in the [distributions](../distributions/) folder. - - > **Note:** For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference. - ```bash - export LLAMA_CHECKPOINT_DIR=~/.llama - ``` - Download Llama models with: - ``` - llama download --model-id Llama3.1-8B-Instruct - ``` - Start a Docker container with: - ```bash - cd llama-stack/distributions/meta-reference-gpu - docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml - ``` - - **Tip:** For remote providers, use `docker compose up` with scripts in the [distributions folder](../distributions/). - -2. **Build->Configure->Run via Conda**: - For development, build a LlamaStack distribution from scratch. - - **`llama stack build`** - Enter build information interactively: - ```bash - llama stack build - ``` - - **`llama stack configure`** - Run `llama stack configure ` using the name from the build step. - ```bash - llama stack configure my-local-stack - ``` - - **`llama stack run`** - Start the server with: - ```bash - llama stack run my-local-stack - ``` - -## Testing with Client - -After setup, test the server with a client: -```bash -cd /path/to/llama-stack -conda activate - -python -m llama_stack.apis.inference.client localhost 5000 -``` - -You can also send a POST request: -```bash -curl http://localhost:5000/inference/chat_completion \ --H "Content-Type: application/json" \ --d '{ - "model": "Llama3.1-8B-Instruct", - "messages": [ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "Write me a 2-sentence poem about the moon"} - ], - "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512} -}' -``` - -For testing safety, run: -```bash -python -m llama_stack.apis.safety.client localhost 5000 -``` - -Check our client SDKs for various languages: [Python](https://github.com/meta-llama/llama-stack-client-python), [Node](https://github.com/meta-llama/llama-stack-client-node), [Swift](https://github.com/meta-llama/llama-stack-client-swift), and [Kotlin](https://github.com/meta-llama/llama-stack-client-kotlin). - -## Advanced Guides - -For more on custom Llama Stack distributions, refer to our [Building a Llama Stack Distribution](./building_distro.md) guide. diff --git a/docs/zero_to_hero_guide/00_Inference101.ipynb b/docs/zero_to_hero_guide/00_Inference101.ipynb new file mode 100644 index 000000000..c5efa600d --- /dev/null +++ b/docs/zero_to_hero_guide/00_Inference101.ipynb @@ -0,0 +1,247 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c1e7571c", + "metadata": {}, + "source": [ + "# Llama Stack Inference Guide\n", + "\n", + "This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).\n", + "\n", + "### Table of Contents\n", + "1. [Quickstart](#quickstart)\n", + "2. [Building Effective Prompts](#building-effective-prompts)\n", + "3. [Conversation Loop](#conversation-loop)\n", + "4. [Conversation History](#conversation-history)\n", + "5. [Streaming Responses](#streaming-responses)\n" + ] + }, + { + "cell_type": "markdown", + "id": "414301dc", + "metadata": {}, + "source": [ + "## Quickstart\n", + "\n", + "This section walks through each step to set up and make a simple text generation request.\n", + "\n", + "### 1. Set Up the Client\n", + "\n", + "Begin by importing the necessary components from Llama Stack’s client library:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7a573752", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.types import SystemMessage, UserMessage\n", + "\n", + "client = LlamaStackClient(base_url='http://localhost:5000')" + ] + }, + { + "cell_type": "markdown", + "id": "86366383", + "metadata": {}, + "source": [ + "### 2. Create a Chat Completion Request\n", + "\n", + "Use the `chat_completion` function to define the conversation context. Each message you include should have a specific role and content:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "77c29dba", + "metadata": {}, + "outputs": [], + "source": [ + "response = client.inference.chat_completion(\n", + " messages=[\n", + " SystemMessage(content='You are a friendly assistant.', role='system'),\n", + " UserMessage(content='Write a two-sentence poem about llama.', role='user')\n", + " ],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + ")\n", + "\n", + "print(response.completion_message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "e5f16949", + "metadata": {}, + "source": [ + "## Building Effective Prompts\n", + "\n", + "Effective prompt creation (often called 'prompt engineering') is essential for quality responses. Here are best practices for structuring your prompts to get the most out of the Llama Stack model:\n", + "\n", + "1. **System Messages**: Use `SystemMessage` to set the model's behavior. This is similar to providing top-level instructions for tone, format, or specific behavior.\n", + " - **Example**: `SystemMessage(content='You are a friendly assistant that explains complex topics simply.')`\n", + "2. **User Messages**: Define the task or question you want to ask the model with a `UserMessage`. The clearer and more direct you are, the better the response.\n", + " - **Example**: `UserMessage(content='Explain recursion in programming in simple terms.')`\n", + "\n", + "### Sample Prompt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c6812da", + "metadata": {}, + "outputs": [], + "source": [ + "response = client.inference.chat_completion(\n", + " messages=[\n", + " SystemMessage(content='You are shakespeare.', role='system'),\n", + " UserMessage(content='Write a two-sentence poem about llama.', role='user')\n", + " ],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + ")\n", + "\n", + "print(response.completion_message.content)" + ] + }, + { + "cell_type": "markdown", + "id": "c8690ef0", + "metadata": {}, + "source": [ + "## Conversation Loop\n", + "\n", + "To create a continuous conversation loop, where users can input multiple messages in a session, use the following structure. This example runs an asynchronous loop, ending when the user types 'exit,' 'quit,' or 'bye.'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02211625", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.types import UserMessage\n", + "from termcolor import cprint\n", + "\n", + "client = LlamaStackClient(base_url='http://localhost:5000')\n", + "\n", + "async def chat_loop():\n", + " while True:\n", + " user_input = input('User> ')\n", + " if user_input.lower() in ['exit', 'quit', 'bye']:\n", + " cprint('Ending conversation. Goodbye!', 'yellow')\n", + " break\n", + "\n", + " message = UserMessage(content=user_input, role='user')\n", + " response = client.inference.chat_completion(\n", + " messages=[message],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + " )\n", + " cprint(f'> Response: {response.completion_message.content}', 'cyan')\n", + "\n", + "asyncio.run(chat_loop())" + ] + }, + { + "cell_type": "markdown", + "id": "8cf0d555", + "metadata": {}, + "source": [ + "## Conversation History\n", + "\n", + "Maintaining a conversation history allows the model to retain context from previous interactions. Use a list to accumulate messages, enabling continuity throughout the chat session." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9496f75c", + "metadata": {}, + "outputs": [], + "source": [ + "async def chat_loop():\n", + " conversation_history = []\n", + " while True:\n", + " user_input = input('User> ')\n", + " if user_input.lower() in ['exit', 'quit', 'bye']:\n", + " cprint('Ending conversation. Goodbye!', 'yellow')\n", + " break\n", + "\n", + " user_message = UserMessage(content=user_input, role='user')\n", + " conversation_history.append(user_message)\n", + "\n", + " response = client.inference.chat_completion(\n", + " messages=conversation_history,\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + " )\n", + " cprint(f'> Response: {response.completion_message.content}', 'cyan')\n", + "\n", + " assistant_message = UserMessage(content=response.completion_message.content, role='user')\n", + " conversation_history.append(assistant_message)\n", + "\n", + "asyncio.run(chat_loop())" + ] + }, + { + "cell_type": "markdown", + "id": "03fcf5e0", + "metadata": {}, + "source": [ + "## Streaming Responses\n", + "\n", + "Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed.\n", + "\n", + "### Example: Streaming Responses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d119026e", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.lib.inference.event_logger import EventLogger\n", + "from llama_stack_client.types import UserMessage\n", + "from termcolor import cprint\n", + "\n", + "async def run_main(stream: bool = True):\n", + " client = LlamaStackClient(base_url='http://localhost:5000')\n", + "\n", + " message = UserMessage(\n", + " content='hello world, write me a 2 sentence poem about the moon', role='user'\n", + " )\n", + " print(f'User>{message.content}', 'green')\n", + "\n", + " response = client.inference.chat_completion(\n", + " messages=[message],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + " stream=stream,\n", + " )\n", + "\n", + " if not stream:\n", + " cprint(f'> Response: {response}', 'cyan')\n", + " else:\n", + " async for log in EventLogger().log(response):\n", + " log.print()\n", + "\n", + " models_response = client.models.list()\n", + " print(models_response)\n", + "\n", + "if __name__ == '__main__':\n", + " asyncio.run(run_main())" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb b/docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb new file mode 100644 index 000000000..8b80c2731 --- /dev/null +++ b/docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb @@ -0,0 +1,201 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a0ed972d", + "metadata": {}, + "source": [ + "# Switching between Local and Cloud Model with Llama Stack\n", + "\n", + "This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n", + "\n", + "### Pre-requisite\n", + "Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n", + "\n", + "### Implementation" + ] + }, + { + "cell_type": "markdown", + "id": "df89cff7", + "metadata": {}, + "source": [ + "#### 1. Set Up Local and Cloud Clients\n", + "\n", + "Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:5000` and the cloud distribution running on `http://localhost:5001`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f868dfe", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_stack_client import LlamaStackClient\n", + "\n", + "# Configure local and cloud clients\n", + "local_client = LlamaStackClient(base_url='http://localhost:5000')\n", + "cloud_client = LlamaStackClient(base_url='http://localhost:5001')" + ] + }, + { + "cell_type": "markdown", + "id": "894689c1", + "metadata": {}, + "source": [ + "#### 2. Client Selection with Fallback\n", + "\n", + "The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ff0c8277", + "metadata": {}, + "outputs": [], + "source": [ + "import httpx\n", + "from termcolor import cprint\n", + "\n", + "async def select_client() -> LlamaStackClient:\n", + " \"\"\"Use local client if available; otherwise, switch to cloud client.\"\"\"\n", + " try:\n", + " async with httpx.AsyncClient() as http_client:\n", + " response = await http_client.get(f'{local_client.base_url}/health')\n", + " if response.status_code == 200:\n", + " cprint('Using local client.', 'yellow')\n", + " return local_client\n", + " except httpx.RequestError:\n", + " pass\n", + " cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n", + " return cloud_client" + ] + }, + { + "cell_type": "markdown", + "id": "9ccfe66f", + "metadata": {}, + "source": [ + "#### 3. Generate a Response\n", + "\n", + "After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e19cc20", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_stack_client.types import UserMessage\n", + "\n", + "async def get_llama_response(stream: bool = True):\n", + " client = await select_client() # Selects the available client\n", + " message = UserMessage(content='hello world, write me a 2 sentence poem about the moon', role='user')\n", + " cprint(f'User> {message.content}', 'green')\n", + "\n", + " response = client.inference.chat_completion(\n", + " messages=[message],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + " stream=stream,\n", + " )\n", + "\n", + " if not stream:\n", + " cprint(f'> Response: {response}', 'cyan')\n", + " else:\n", + " # Stream tokens progressively\n", + " async for log in EventLogger().log(response):\n", + " log.print()" + ] + }, + { + "cell_type": "markdown", + "id": "6edf5e57", + "metadata": {}, + "source": [ + "#### 4. Run the Asynchronous Response Generation\n", + "\n", + "Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c10f487e", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "\n", + "# Initiate the response generation process\n", + "asyncio.run(get_llama_response())" + ] + }, + { + "cell_type": "markdown", + "id": "56aa9a09", + "metadata": {}, + "source": [ + "### Complete code\n", + "Summing it up, here's the complete code for local-cloud model implementation with Llama Stack:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d9fd74ff", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "import httpx\n", + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.lib.inference.event_logger import EventLogger\n", + "from llama_stack_client.types import UserMessage\n", + "from termcolor import cprint\n", + "\n", + "local_client = LlamaStackClient(base_url='http://localhost:5000')\n", + "cloud_client = LlamaStackClient(base_url='http://localhost:5001')\n", + "\n", + "async def select_client() -> LlamaStackClient:\n", + " try:\n", + " async with httpx.AsyncClient() as http_client:\n", + " response = await http_client.get(f'{local_client.base_url}/health')\n", + " if response.status_code == 200:\n", + " cprint('Using local client.', 'yellow')\n", + " return local_client\n", + " except httpx.RequestError:\n", + " pass\n", + " cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n", + " return cloud_client\n", + "\n", + "async def get_llama_response(stream: bool = True):\n", + " client = await select_client()\n", + " message = UserMessage(\n", + " content='hello world, write me a 2 sentence poem about the moon', role='user'\n", + " )\n", + " cprint(f'User> {message.content}', 'green')\n", + "\n", + " response = client.inference.chat_completion(\n", + " messages=[message],\n", + " model='Llama3.2-11B-Vision-Instruct',\n", + " stream=stream,\n", + " )\n", + "\n", + " if not stream:\n", + " cprint(f'> Response: {response}', 'cyan')\n", + " else:\n", + " async for log in EventLogger().log(response):\n", + " log.print()\n", + "\n", + "asyncio.run(get_llama_response())" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/Prompt_Engineering_with_Llama_3.ipynb b/docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb similarity index 100% rename from docs/Prompt_Engineering_with_Llama_3.ipynb rename to docs/zero_to_hero_guide/01_Prompt_Engineering101.ipynb diff --git a/docs/zero_to_hero_guide/01_Image_Chat101.ipynb b/docs/zero_to_hero_guide/02_Image_Chat101.ipynb similarity index 100% rename from docs/zero_to_hero_guide/01_Image_Chat101.ipynb rename to docs/zero_to_hero_guide/02_Image_Chat101.ipynb diff --git a/docs/zero_to_hero_guide/02_Tool_Calling101.ipynb b/docs/zero_to_hero_guide/02_Tool_Calling101.ipynb deleted file mode 100644 index f82482e01..000000000 --- a/docs/zero_to_hero_guide/02_Tool_Calling101.ipynb +++ /dev/null @@ -1,318 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tool Calling" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n", - "1. Setting up and using the Brave Search API\n", - "2. Creating custom tools\n", - "3. Configuring tool prompts and safety settings" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: llama-stack-client in ./.conda/envs/quick/lib/python3.13/site-packages (0.0.48)\n", - "Requirement already satisfied: anyio<5,>=3.5.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (4.6.2.post1)\n", - "Requirement already satisfied: distro<2,>=1.7.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (1.9.0)\n", - "Requirement already satisfied: httpx<1,>=0.23.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (0.27.2)\n", - "Requirement already satisfied: pydantic<3,>=1.9.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (2.9.2)\n", - "Requirement already satisfied: sniffio in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (1.3.1)\n", - "Requirement already satisfied: tabulate>=0.9.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (0.9.0)\n", - "Requirement already satisfied: typing-extensions<5,>=4.7 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (4.12.2)\n", - "Requirement already satisfied: idna>=2.8 in ./.conda/envs/quick/lib/python3.13/site-packages (from anyio<5,>=3.5.0->llama-stack-client) (3.10)\n", - "Requirement already satisfied: certifi in ./.conda/envs/quick/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama-stack-client) (2024.8.30)\n", - "Requirement already satisfied: httpcore==1.* in ./.conda/envs/quick/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama-stack-client) (1.0.6)\n", - "Requirement already satisfied: h11<0.15,>=0.13 in ./.conda/envs/quick/lib/python3.13/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama-stack-client) (0.14.0)\n", - "Requirement already satisfied: annotated-types>=0.6.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama-stack-client) (0.7.0)\n", - "Requirement already satisfied: pydantic-core==2.23.4 in ./.conda/envs/quick/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama-stack-client) (2.23.4)\n" - ] - } - ], - "source": [ - "!pip install llama-stack-client --upgrade" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "ename": "NameError", - "evalue": "name 'Agent' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[4], line 23\u001b[0m\n\u001b[1;32m 15\u001b[0m load_dotenv()\n\u001b[1;32m 17\u001b[0m \u001b[38;5;66;03m# Helper function to create an agent with tools\u001b[39;00m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mcreate_tool_agent\u001b[39m(\n\u001b[1;32m 19\u001b[0m client: LlamaStackClient,\n\u001b[1;32m 20\u001b[0m tools: List[Dict],\n\u001b[1;32m 21\u001b[0m instructions: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mYou are a helpful assistant\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 22\u001b[0m model: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mLlama3.1-8B-Instruct\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m---> 23\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[43mAgent\u001b[49m:\n\u001b[1;32m 24\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Create an agent with specified tools.\"\"\"\u001b[39;00m\n\u001b[1;32m 25\u001b[0m agent_config \u001b[38;5;241m=\u001b[39m AgentConfig(\n\u001b[1;32m 26\u001b[0m model\u001b[38;5;241m=\u001b[39mmodel,\n\u001b[1;32m 27\u001b[0m instructions\u001b[38;5;241m=\u001b[39minstructions,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 38\u001b[0m enable_session_persistence\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m,\n\u001b[1;32m 39\u001b[0m )\n", - "\u001b[0;31mNameError\u001b[0m: name 'Agent' is not defined" - ] - } - ], - "source": [ - "import asyncio\n", - "import os\n", - "from typing import Dict, List, Optional\n", - "from dotenv import load_dotenv\n", - "\n", - "from llama_stack_client import LlamaStackClient\n", - "#from llama_stack_client.lib.agents.agent import Agent\n", - "from llama_stack_client.lib.agents.event_logger import EventLogger\n", - "from llama_stack_client.types.agent_create_params import (\n", - " AgentConfig,\n", - " AgentConfigToolSearchToolDefinition,\n", - ")\n", - "\n", - "# Load environment variables\n", - "load_dotenv()\n", - "\n", - "# Helper function to create an agent with tools\n", - "async def create_tool_agent(\n", - " client: LlamaStackClient,\n", - " tools: List[Dict],\n", - " instructions: str = \"You are a helpful assistant\",\n", - " model: str = \"Llama3.1-8B-Instruct\",\n", - ") -> Agent:\n", - " \"\"\"Create an agent with specified tools.\"\"\"\n", - " agent_config = AgentConfig(\n", - " model=model,\n", - " instructions=instructions,\n", - " sampling_params={\n", - " \"strategy\": \"greedy\",\n", - " \"temperature\": 1.0,\n", - " \"top_p\": 0.9,\n", - " },\n", - " tools=tools,\n", - " tool_choice=\"auto\",\n", - " tool_prompt_format=\"json\",\n", - " input_shields=[\"llama_guard\"],\n", - " output_shields=[\"llama_guard\"],\n", - " enable_session_persistence=True,\n", - " )\n", - "\n", - " return Agent(client, agent_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, create a `.env` file in your notebook directory with your Brave Search API key:\n", - "\n", - "```\n", - "BRAVE_SEARCH_API_KEY=your_key_here\n", - "```\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "async def create_search_agent(client: LlamaStackClient) -> Agent:\n", - " \"\"\"Create an agent with Brave Search capability.\"\"\"\n", - " search_tool = AgentConfigToolSearchToolDefinition(\n", - " type=\"brave_search\",\n", - " engine=\"brave\",\n", - " api_key=os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n", - " )\n", - "\n", - " return await create_tool_agent(\n", - " client=client,\n", - " tools=[search_tool],\n", - " instructions=\"\"\"\n", - " You are a research assistant that can search the web.\n", - " Always cite your sources with URLs when providing information.\n", - " Format your responses as:\n", - "\n", - " FINDINGS:\n", - " [Your summary here]\n", - "\n", - " SOURCES:\n", - " - [Source title](URL)\n", - " \"\"\"\n", - " )\n", - "\n", - "# Example usage\n", - "async def search_example():\n", - " client = LlamaStackClient(base_url=\"http://localhost:8000\")\n", - " agent = await create_search_agent(client)\n", - "\n", - " # Create a session\n", - " session_id = agent.create_session(\"search-session\")\n", - "\n", - " # Example queries\n", - " queries = [\n", - " \"What are the latest developments in quantum computing?\",\n", - " \"Who won the most recent Super Bowl?\",\n", - " ]\n", - "\n", - " for query in queries:\n", - " print(f\"\\nQuery: {query}\")\n", - " print(\"-\" * 50)\n", - "\n", - " response = agent.create_turn(\n", - " messages=[{\"role\": \"user\", \"content\": query}],\n", - " session_id=session_id,\n", - " )\n", - "\n", - " async for log in EventLogger().log(response):\n", - " log.print()\n", - "\n", - "# Run the example (in Jupyter, use asyncio.run())\n", - "await search_example()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. Custom Tool Creation\n", - "\n", - "Let's create a custom weather tool:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from typing import TypedDict, Optional\n", - "from datetime import datetime\n", - "\n", - "# Define tool types\n", - "class WeatherInput(TypedDict):\n", - " location: str\n", - " date: Optional[str]\n", - "\n", - "class WeatherOutput(TypedDict):\n", - " temperature: float\n", - " conditions: str\n", - " humidity: float\n", - "\n", - "class WeatherTool:\n", - " \"\"\"Example custom tool for weather information.\"\"\"\n", - "\n", - " def __init__(self, api_key: Optional[str] = None):\n", - " self.api_key = api_key\n", - "\n", - " async def get_weather(self, location: str, date: Optional[str] = None) -> WeatherOutput:\n", - " \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n", - " # Mock implementation\n", - " return {\n", - " \"temperature\": 72.5,\n", - " \"conditions\": \"partly cloudy\",\n", - " \"humidity\": 65.0\n", - " }\n", - "\n", - " async def __call__(self, input_data: WeatherInput) -> WeatherOutput:\n", - " \"\"\"Make the tool callable with structured input.\"\"\"\n", - " return await self.get_weather(\n", - " location=input_data[\"location\"],\n", - " date=input_data.get(\"date\")\n", - " )\n", - "\n", - "async def create_weather_agent(client: LlamaStackClient) -> Agent:\n", - " \"\"\"Create an agent with weather tool capability.\"\"\"\n", - " weather_tool = {\n", - " \"type\": \"function\",\n", - " \"function\": {\n", - " \"name\": \"get_weather\",\n", - " \"description\": \"Get weather information for a location\",\n", - " \"parameters\": {\n", - " \"type\": \"object\",\n", - " \"properties\": {\n", - " \"location\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"City or location name\"\n", - " },\n", - " \"date\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"Optional date (YYYY-MM-DD)\",\n", - " \"format\": \"date\"\n", - " }\n", - " },\n", - " \"required\": [\"location\"]\n", - " }\n", - " },\n", - " \"implementation\": WeatherTool()\n", - " }\n", - "\n", - " return await create_tool_agent(\n", - " client=client,\n", - " tools=[weather_tool],\n", - " instructions=\"\"\"\n", - " You are a weather assistant that can provide weather information.\n", - " Always specify the location clearly in your responses.\n", - " Include both temperature and conditions in your summaries.\n", - " \"\"\"\n", - " )\n", - "\n", - "# Example usage\n", - "async def weather_example():\n", - " client = LlamaStackClient(base_url=\"http://localhost:8000\")\n", - " agent = await create_weather_agent(client)\n", - "\n", - " session_id = agent.create_session(\"weather-session\")\n", - "\n", - " queries = [\n", - " \"What's the weather like in San Francisco?\",\n", - " \"Tell me the weather in Tokyo tomorrow\",\n", - " ]\n", - "\n", - " for query in queries:\n", - " print(f\"\\nQuery: {query}\")\n", - " print(\"-\" * 50)\n", - "\n", - " response = agent.create_turn(\n", - " messages=[{\"role\": \"user\", \"content\": query}],\n", - " session_id=session_id,\n", - " )\n", - "\n", - " async for log in EventLogger().log(response):\n", - " log.print()\n", - "\n", - "# Run the example\n", - "await weather_example()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.13.0" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/docs/zero_to_hero_guide/03_Tool_Calling101.ipynb b/docs/zero_to_hero_guide/03_Tool_Calling101.ipynb new file mode 100644 index 000000000..6431c3e0a --- /dev/null +++ b/docs/zero_to_hero_guide/03_Tool_Calling101.ipynb @@ -0,0 +1,349 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tool Calling" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n", + "1. Setting up and using the Brave Search API\n", + "2. Creating custom tools\n", + "3. Configuring tool prompts and safety settings" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "import os\n", + "from typing import Dict, List, Optional\n", + "from dotenv import load_dotenv\n", + "\n", + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.lib.agents.agent import Agent\n", + "from llama_stack_client.lib.agents.event_logger import EventLogger\n", + "from llama_stack_client.types.agent_create_params import (\n", + " AgentConfig,\n", + " AgentConfigToolSearchToolDefinition,\n", + ")\n", + "\n", + "# Load environment variables\n", + "load_dotenv()\n", + "\n", + "# Helper function to create an agent with tools\n", + "async def create_tool_agent(\n", + " client: LlamaStackClient,\n", + " tools: List[Dict],\n", + " instructions: str = \"You are a helpful assistant\",\n", + " model: str = \"Llama3.1-8B-Instruct\",\n", + ") -> Agent:\n", + " \"\"\"Create an agent with specified tools.\"\"\"\n", + " agent_config = AgentConfig(\n", + " model=model,\n", + " instructions=instructions,\n", + " sampling_params={\n", + " \"strategy\": \"greedy\",\n", + " \"temperature\": 1.0,\n", + " \"top_p\": 0.9,\n", + " },\n", + " tools=tools,\n", + " tool_choice=\"auto\",\n", + " tool_prompt_format=\"json\",\n", + " enable_session_persistence=True,\n", + " )\n", + "\n", + " return Agent(client, agent_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, create a `.env` file in your notebook directory with your Brave Search API key:\n", + "\n", + "```\n", + "BRAVE_SEARCH_API_KEY=your_key_here\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Query: What are the latest developments in quantum computing?\n", + "--------------------------------------------------\n", + "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33mF\u001b[0m\u001b[33mIND\u001b[0m\u001b[33mINGS\u001b[0m\u001b[33m:\n", + "\u001b[0m\u001b[33mThe\u001b[0m\u001b[33m latest\u001b[0m\u001b[33m developments\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m include\u001b[0m\u001b[33m advancements\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m processors\u001b[0m\u001b[33m,\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m algorithms\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m error\u001b[0m\u001b[33m correction\u001b[0m\u001b[33m.\u001b[0m\u001b[33m Researchers\u001b[0m\u001b[33m have\u001b[0m\u001b[33m made\u001b[0m\u001b[33m significant\u001b[0m\u001b[33m progress\u001b[0m\u001b[33m in\u001b[0m\u001b[33m developing\u001b[0m\u001b[33m more\u001b[0m\u001b[33m powerful\u001b[0m\u001b[33m and\u001b[0m\u001b[33m reliable\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m with\u001b[0m\u001b[33m some\u001b[0m\u001b[33m companies\u001b[0m\u001b[33m already\u001b[0m\u001b[33m showcasing\u001b[0m\u001b[33m \u001b[0m\u001b[33m100\u001b[0m\u001b[33m-q\u001b[0m\u001b[33mubit\u001b[0m\u001b[33m and\u001b[0m\u001b[33m \u001b[0m\u001b[33m127\u001b[0m\u001b[33m-q\u001b[0m\u001b[33mubit\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m processors\u001b[0m\u001b[33m (\u001b[0m\u001b[33mIBM\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m;\u001b[0m\u001b[33m Google\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m These\u001b[0m\u001b[33m advancements\u001b[0m\u001b[33m have\u001b[0m\u001b[33m led\u001b[0m\u001b[33m to\u001b[0m\u001b[33m breakthrough\u001b[0m\u001b[33ms\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m simulations\u001b[0m\u001b[33m,\u001b[0m\u001b[33m machine\u001b[0m\u001b[33m learning\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m optimization\u001b[0m\u001b[33m problems\u001b[0m\u001b[33m (\u001b[0m\u001b[33mB\u001b[0m\u001b[33mhart\u001b[0m\u001b[33mi\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m;\u001b[0m\u001b[33m Zhang\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m Additionally\u001b[0m\u001b[33m,\u001b[0m\u001b[33m there\u001b[0m\u001b[33m have\u001b[0m\u001b[33m been\u001b[0m\u001b[33m significant\u001b[0m\u001b[33m improvements\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m error\u001b[0m\u001b[33m correction\u001b[0m\u001b[33m,\u001b[0m\u001b[33m which\u001b[0m\u001b[33m is\u001b[0m\u001b[33m essential\u001b[0m\u001b[33m for\u001b[0m\u001b[33m large\u001b[0m\u001b[33m-scale\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m (\u001b[0m\u001b[33mG\u001b[0m\u001b[33mottes\u001b[0m\u001b[33mman\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\n", + "\n", + "\u001b[0m\u001b[33mS\u001b[0m\u001b[33mOURCES\u001b[0m\u001b[33m:\n", + "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m IBM\u001b[0m\u001b[33m Quantum\u001b[0m\u001b[33m:\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Process\u001b[0m\u001b[33mors\u001b[0m\u001b[33m\"\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mwww\u001b[0m\u001b[33m.ibm\u001b[0m\u001b[33m.com\u001b[0m\u001b[33m/\u001b[0m\u001b[33mquant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m/com\u001b[0m\u001b[33mputer\u001b[0m\u001b[33m/)\n", + "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Google\u001b[0m\u001b[33m Quantum\u001b[0m\u001b[33m AI\u001b[0m\u001b[33m Lab\u001b[0m\u001b[33m:\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Process\u001b[0m\u001b[33mors\u001b[0m\u001b[33m\"\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mquant\u001b[0m\u001b[33mum\u001b[0m\u001b[33mai\u001b[0m\u001b[33m.google\u001b[0m\u001b[33m/al\u001b[0m\u001b[33mphabet\u001b[0m\u001b[33m/sub\u001b[0m\u001b[33m-page\u001b[0m\u001b[33m-\u001b[0m\u001b[33m1\u001b[0m\u001b[33m/)\n", + "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Bh\u001b[0m\u001b[33marti\u001b[0m\u001b[33m,\u001b[0m\u001b[33m K\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Computing\u001b[0m\u001b[33m:\u001b[0m\u001b[33m A\u001b[0m\u001b[33m Review\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Recent\u001b[0m\u001b[33m Advances\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m Journal\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Physics\u001b[0m\u001b[33m:\u001b[0m\u001b[33m Conference\u001b[0m\u001b[33m Series\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m218\u001b[0m\u001b[33m5\u001b[0m\u001b[33m(\u001b[0m\u001b[33m1\u001b[0m\u001b[33m),\u001b[0m\u001b[33m \u001b[0m\u001b[33m012\u001b[0m\u001b[33m001\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mi\u001b[0m\u001b[33mop\u001b[0m\u001b[33mscience\u001b[0m\u001b[33m.i\u001b[0m\u001b[33mop\u001b[0m\u001b[33m.org\u001b[0m\u001b[33m/article\u001b[0m\u001b[33m/\u001b[0m\u001b[33m10\u001b[0m\u001b[33m.\u001b[0m\u001b[33m108\u001b[0m\u001b[33m8\u001b[0m\u001b[33m/\u001b[0m\u001b[33m174\u001b[0m\u001b[33m2\u001b[0m\u001b[33m-\u001b[0m\u001b[33m659\u001b[0m\u001b[33m6\u001b[0m\u001b[33m/\u001b[0m\u001b[33m218\u001b[0m\u001b[33m5\u001b[0m\u001b[33m/\u001b[0m\u001b[33m1\u001b[0m\u001b[33m/\u001b[0m\u001b[33m012\u001b[0m\u001b[33m001\u001b[0m\u001b[33m)\n", + "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Zhang\u001b[0m\u001b[33m,\u001b[0m\u001b[33m Y\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Algorithms\u001b[0m\u001b[33m for\u001b[0m\u001b[33m Machine\u001b[0m\u001b[33m Learning\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m Journal\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Machine\u001b[0m\u001b[33m Learning\u001b[0m\u001b[33m Research\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m23\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m1\u001b[0m\u001b[33m-\u001b[0m\u001b[33m36\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mj\u001b[0m\u001b[33mml\u001b[0m\u001b[33mr\u001b[0m\u001b[33m.org\u001b[0m\u001b[33m/p\u001b[0m\u001b[33mapers\u001b[0m\u001b[33m/v\u001b[0m\u001b[33m23\u001b[0m\u001b[33m/\u001b[0m\u001b[33m20\u001b[0m\u001b[33m-\u001b[0m\u001b[33m065\u001b[0m\u001b[33m.html\u001b[0m\u001b[33m)\n", + "\u001b[0m\u001b[33m-\u001b[0m\u001b[33m G\u001b[0m\u001b[33mottes\u001b[0m\u001b[33mman\u001b[0m\u001b[33m,\u001b[0m\u001b[33m D\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Error\u001b[0m\u001b[33m Correction\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m In\u001b[0m\u001b[33m Encyclopedia\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Complexity\u001b[0m\u001b[33m and\u001b[0m\u001b[33m Systems\u001b[0m\u001b[33m Science\u001b[0m\u001b[33m (\u001b[0m\u001b[33mpp\u001b[0m\u001b[33m.\u001b[0m\u001b[33m \u001b[0m\u001b[33m1\u001b[0m\u001b[33m-\u001b[0m\u001b[33m13\u001b[0m\u001b[33m).\u001b[0m\u001b[33m Springer\u001b[0m\u001b[33m,\u001b[0m\u001b[33m New\u001b[0m\u001b[33m York\u001b[0m\u001b[33m,\u001b[0m\u001b[33m NY\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mlink\u001b[0m\u001b[33m.spring\u001b[0m\u001b[33mer\u001b[0m\u001b[33m.com\u001b[0m\u001b[33m/reference\u001b[0m\u001b[33mwork\u001b[0m\u001b[33mentry\u001b[0m\u001b[33m/\u001b[0m\u001b[33m10\u001b[0m\u001b[33m.\u001b[0m\u001b[33m100\u001b[0m\u001b[33m7\u001b[0m\u001b[33m/\u001b[0m\u001b[33m978\u001b[0m\u001b[33m-\u001b[0m\u001b[33m0\u001b[0m\u001b[33m-\u001b[0m\u001b[33m387\u001b[0m\u001b[33m-\u001b[0m\u001b[33m758\u001b[0m\u001b[33m88\u001b[0m\u001b[33m-\u001b[0m\u001b[33m6\u001b[0m\u001b[33m_\u001b[0m\u001b[33m447\u001b[0m\u001b[33m)\u001b[0m\u001b[97m\u001b[0m\n", + "\u001b[30m\u001b[0m" + ] + } + ], + "source": [ + "async def create_search_agent(client: LlamaStackClient) -> Agent:\n", + " \"\"\"Create an agent with Brave Search capability.\"\"\"\n", + " search_tool = AgentConfigToolSearchToolDefinition(\n", + " type=\"brave_search\",\n", + " engine=\"brave\",\n", + " api_key=\"dummy_value\"#os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n", + " )\n", + "\n", + " return await create_tool_agent(\n", + " client=client,\n", + " tools=[search_tool],\n", + " instructions=\"\"\"\n", + " You are a research assistant that can search the web.\n", + " Always cite your sources with URLs when providing information.\n", + " Format your responses as:\n", + "\n", + " FINDINGS:\n", + " [Your summary here]\n", + "\n", + " SOURCES:\n", + " - [Source title](URL)\n", + " \"\"\"\n", + " )\n", + "\n", + "# Example usage\n", + "async def search_example():\n", + " client = LlamaStackClient(base_url=\"http://localhost:5001\")\n", + " agent = await create_search_agent(client)\n", + "\n", + " # Create a session\n", + " session_id = agent.create_session(\"search-session\")\n", + "\n", + " # Example queries\n", + " queries = [\n", + " \"What are the latest developments in quantum computing?\",\n", + " #\"Who won the most recent Super Bowl?\",\n", + " ]\n", + "\n", + " for query in queries:\n", + " print(f\"\\nQuery: {query}\")\n", + " print(\"-\" * 50)\n", + "\n", + " response = agent.create_turn(\n", + " messages=[{\"role\": \"user\", \"content\": query}],\n", + " session_id=session_id,\n", + " )\n", + "\n", + " async for log in EventLogger().log(response):\n", + " log.print()\n", + "\n", + "# Run the example (in Jupyter, use asyncio.run())\n", + "await search_example()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Custom Tool Creation\n", + "\n", + "Let's create a custom weather tool:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Query: What's the weather like in San Francisco?\n", + "--------------------------------------------------\n", + "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33m{\n", + "\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mtype\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mfunction\u001b[0m\u001b[33m\",\n", + "\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mname\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mget\u001b[0m\u001b[33m_weather\u001b[0m\u001b[33m\",\n", + "\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mparameters\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m {\n", + "\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mlocation\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mSan\u001b[0m\u001b[33m Francisco\u001b[0m\u001b[33m\"\n", + "\u001b[0m\u001b[33m \u001b[0m\u001b[33m }\n", + "\u001b[0m\u001b[33m}\u001b[0m\u001b[97m\u001b[0m\n" + ] + }, + { + "ename": "AttributeError", + "evalue": "'WeatherTool' object has no attribute 'run'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[27], line 113\u001b[0m\n\u001b[1;32m 110\u001b[0m nest_asyncio\u001b[38;5;241m.\u001b[39mapply()\n\u001b[1;32m 112\u001b[0m \u001b[38;5;66;03m# Run the example\u001b[39;00m\n\u001b[0;32m--> 113\u001b[0m \u001b[38;5;28;01mawait\u001b[39;00m weather_example()\n", + "Cell \u001b[0;32mIn[27], line 105\u001b[0m, in \u001b[0;36mweather_example\u001b[0;34m()\u001b[0m\n\u001b[1;32m 98\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m-\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m*\u001b[39m \u001b[38;5;241m50\u001b[39m)\n\u001b[1;32m 100\u001b[0m response \u001b[38;5;241m=\u001b[39m agent\u001b[38;5;241m.\u001b[39mcreate_turn(\n\u001b[1;32m 101\u001b[0m messages\u001b[38;5;241m=\u001b[39m[{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrole\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124muser\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcontent\u001b[39m\u001b[38;5;124m\"\u001b[39m: query}],\n\u001b[1;32m 102\u001b[0m session_id\u001b[38;5;241m=\u001b[39msession_id,\n\u001b[1;32m 103\u001b[0m )\n\u001b[0;32m--> 105\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mfor\u001b[39;00m log \u001b[38;5;129;01min\u001b[39;00m EventLogger()\u001b[38;5;241m.\u001b[39mlog(response):\n\u001b[1;32m 106\u001b[0m log\u001b[38;5;241m.\u001b[39mprint()\n", + "File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/event_logger.py:55\u001b[0m, in \u001b[0;36mEventLogger.log\u001b[0;34m(self, event_generator)\u001b[0m\n\u001b[1;32m 52\u001b[0m previous_event_type \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 53\u001b[0m previous_step_type \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m---> 55\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m event_generator:\n\u001b[1;32m 56\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(chunk, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mevent\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m 57\u001b[0m \u001b[38;5;66;03m# Need to check for custom tool first\u001b[39;00m\n\u001b[1;32m 58\u001b[0m \u001b[38;5;66;03m# since it does not produce event but instead\u001b[39;00m\n\u001b[1;32m 59\u001b[0m \u001b[38;5;66;03m# a Message\u001b[39;00m\n\u001b[1;32m 60\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(chunk, ToolResponseMessage):\n", + "File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/agent.py:76\u001b[0m, in \u001b[0;36mAgent.create_turn\u001b[0;34m(self, messages, attachments, session_id)\u001b[0m\n\u001b[1;32m 74\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 75\u001b[0m tool \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcustom_tools[tool_call\u001b[38;5;241m.\u001b[39mtool_name]\n\u001b[0;32m---> 76\u001b[0m result_messages \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexecute_custom_tool(tool, message)\n\u001b[1;32m 77\u001b[0m next_message \u001b[38;5;241m=\u001b[39m result_messages[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m 79\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m next_message\n", + "File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/agent.py:84\u001b[0m, in \u001b[0;36mAgent.execute_custom_tool\u001b[0;34m(self, tool, message)\u001b[0m\n\u001b[1;32m 81\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mexecute_custom_tool\u001b[39m(\n\u001b[1;32m 82\u001b[0m \u001b[38;5;28mself\u001b[39m, tool: CustomTool, message: Union[UserMessage, ToolResponseMessage]\n\u001b[1;32m 83\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[Union[UserMessage, ToolResponseMessage]]:\n\u001b[0;32m---> 84\u001b[0m result_messages \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m([message])\n\u001b[1;32m 85\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result_messages\n", + "\u001b[0;31mAttributeError\u001b[0m: 'WeatherTool' object has no attribute 'run'" + ] + } + ], + "source": [ + "from typing import TypedDict, Optional, Dict, Any\n", + "from datetime import datetime\n", + "class WeatherTool:\n", + " \"\"\"Example custom tool for weather information.\"\"\"\n", + " \n", + " def get_name(self) -> str:\n", + " return \"get_weather\"\n", + " \n", + " def get_description(self) -> str:\n", + " return \"Get weather information for a location\"\n", + " \n", + " def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n", + " return {\n", + " \"location\": ToolParamDefinitionParam(\n", + " param_type=\"str\",\n", + " description=\"City or location name\",\n", + " required=True\n", + " ),\n", + " \"date\": ToolParamDefinitionParam(\n", + " param_type=\"str\",\n", + " description=\"Optional date (YYYY-MM-DD)\",\n", + " required=False\n", + " )\n", + " }\n", + " \n", + " async def run_impl(self, location: str, date: Optional[str] = None) -> Dict[str, Any]:\n", + " \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n", + " # Mock implementation\n", + " return {\n", + " \"temperature\": 72.5,\n", + " \"conditions\": \"partly cloudy\",\n", + " \"humidity\": 65.0\n", + " }\n", + "\n", + "async def create_weather_agent(client: LlamaStackClient) -> Agent:\n", + " \"\"\"Create an agent with weather tool capability.\"\"\"\n", + " agent_config = AgentConfig(\n", + " model=\"Llama3.1-8B-Instruct\",\n", + " instructions=\"\"\"\n", + " You are a weather assistant that can provide weather information.\n", + " Always specify the location clearly in your responses.\n", + " Include both temperature and conditions in your summaries.\n", + " \"\"\",\n", + " sampling_params={\n", + " \"strategy\": \"greedy\",\n", + " \"temperature\": 1.0,\n", + " \"top_p\": 0.9,\n", + " },\n", + " tools=[\n", + " {\n", + " \"function_name\": \"get_weather\",\n", + " \"description\": \"Get weather information for a location\",\n", + " \"parameters\": {\n", + " \"location\": {\n", + " \"param_type\": \"str\",\n", + " \"description\": \"City or location name\",\n", + " \"required\": True,\n", + " },\n", + " \"date\": {\n", + " \"param_type\": \"str\",\n", + " \"description\": \"Optional date (YYYY-MM-DD)\",\n", + " \"required\": False,\n", + " },\n", + " },\n", + " \"type\": \"function_call\",\n", + " }\n", + " ],\n", + " tool_choice=\"auto\",\n", + " tool_prompt_format=\"json\",\n", + " input_shields=[],\n", + " output_shields=[],\n", + " enable_session_persistence=True\n", + " )\n", + " \n", + " # Create the agent with the tool\n", + " weather_tool = WeatherTool()\n", + " agent = Agent(\n", + " client=client,\n", + " agent_config=agent_config,\n", + " custom_tools=[weather_tool]\n", + " )\n", + " \n", + " return agent\n", + "\n", + "# Example usage\n", + "async def weather_example():\n", + " client = LlamaStackClient(base_url=\"http://localhost:5001\")\n", + " agent = await create_weather_agent(client)\n", + " session_id = agent.create_session(\"weather-session\")\n", + " \n", + " queries = [\n", + " \"What's the weather like in San Francisco?\",\n", + " \"Tell me the weather in Tokyo tomorrow\",\n", + " ]\n", + " \n", + " for query in queries:\n", + " print(f\"\\nQuery: {query}\")\n", + " print(\"-\" * 50)\n", + " \n", + " response = agent.create_turn(\n", + " messages=[{\"role\": \"user\", \"content\": query}],\n", + " session_id=session_id,\n", + " )\n", + " \n", + " async for log in EventLogger().log(response):\n", + " log.print()\n", + "\n", + "# For Jupyter notebooks\n", + "import nest_asyncio\n", + "nest_asyncio.apply()\n", + "\n", + "# Run the example\n", + "await weather_example()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/zero_to_hero_guide/03_Memory101.ipynb b/docs/zero_to_hero_guide/04_Memory101.ipynb similarity index 100% rename from docs/zero_to_hero_guide/03_Memory101.ipynb rename to docs/zero_to_hero_guide/04_Memory101.ipynb diff --git a/docs/zero_to_hero_guide/04_Safety101.ipynb b/docs/zero_to_hero_guide/05_Safety101.ipynb similarity index 100% rename from docs/zero_to_hero_guide/04_Safety101.ipynb rename to docs/zero_to_hero_guide/05_Safety101.ipynb diff --git a/docs/zero_to_hero_guide/05_Agents101.ipynb b/docs/zero_to_hero_guide/06_Agents101.ipynb similarity index 100% rename from docs/zero_to_hero_guide/05_Agents101.ipynb rename to docs/zero_to_hero_guide/06_Agents101.ipynb diff --git a/docs/zero_to_hero_guide/Tool_Calling101.ipynb b/docs/zero_to_hero_guide/Tool_Calling101.ipynb new file mode 100644 index 000000000..a4c57ddff --- /dev/null +++ b/docs/zero_to_hero_guide/Tool_Calling101.ipynb @@ -0,0 +1,558 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Getting Started with LlamaStack: Tool Calling Tutorial\n", + "\n", + "Welcome! This notebook will guide you through creating and using custom tools with LlamaStack.\n", + "We'll start with the basics and work our way up to more complex examples.\n", + "\n", + "Table of Contents:\n", + "1. Setup and Installation\n", + "2. Understanding Tool Basics\n", + "3. Creating Your First Tool\n", + "4. Building a Mock Weather Tool\n", + "5. Setting Up the LlamaStack Agent\n", + "6. Running Examples\n", + "7. Next Steps\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Setup\n", + "#### Before we begin, let's import all the required packages:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import asyncio\n", + "import json\n", + "from typing import Dict\n", + "from datetime import datetime" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# LlamaStack specific imports\n", + "from llama_stack_client import LlamaStackClient\n", + "from llama_stack_client.lib.agents.agent import Agent\n", + "from llama_stack_client.lib.agents.event_logger import EventLogger\n", + "from llama_stack_client.types.agent_create_params import AgentConfig\n", + "from llama_stack_client.types.tool_param_definition_param import ToolParamDefinitionParam" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Understanding Tool Basics\n", + "\n", + "In LlamaStack, a tool is like a special function that our AI assistant can use. Think of it as giving the AI a new \n", + "capability, like using a calculator or checking the weather.\n", + "\n", + "Every tool needs:\n", + "- A name: What we call the tool\n", + "- A description: What the tool does\n", + "- Parameters: What information the tool needs to work\n", + "- Implementation: The actual code that does the work\n", + "\n", + "Let's create a base class that all our tools will inherit from:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "class SingleMessageCustomTool:\n", + " \"\"\"Base class for all our custom tools\"\"\"\n", + " \n", + " async def run(self, messages=None):\n", + " \"\"\"\n", + " Main entry point for running the tool\n", + " Args:\n", + " messages: List of messages (can be None for backward compatibility)\n", + " \"\"\"\n", + " if messages and len(messages) > 0:\n", + " # Extract parameters from the message if it contains function parameters\n", + " message = messages[0]\n", + " if hasattr(message, 'function_parameters'):\n", + " return await self.run_impl(**message.function_parameters)\n", + " else:\n", + " return await self.run_impl()\n", + " return await self.run_impl()\n", + " \n", + " async def run_impl(self, **kwargs):\n", + " \"\"\"Each tool will implement this method with their specific logic\"\"\"\n", + " raise NotImplementedError()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Creating Your First Tool: Calculator\n", + " \n", + "Let's create a simple calculator tool. This will help us understand the basic structure of a tool.\n", + "Our calculator can:\n", + "- Add\n", + "- Subtract\n", + "- Multiply\n", + "- Divide\n" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "# Calculator Tool implementation\n", + "class CalculatorTool(SingleMessageCustomTool):\n", + " \"\"\"A simple calculator tool that can perform basic math operations\"\"\"\n", + " \n", + " def get_name(self) -> str:\n", + " return \"calculator\"\n", + " \n", + " def get_description(self) -> str:\n", + " return \"Perform basic arithmetic operations (add, subtract, multiply, divide)\"\n", + " \n", + " def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n", + " return {\n", + " \"operation\": ToolParamDefinitionParam(\n", + " param_type=\"str\",\n", + " description=\"Operation to perform (add, subtract, multiply, divide)\",\n", + " required=True\n", + " ),\n", + " \"x\": ToolParamDefinitionParam(\n", + " param_type=\"float\",\n", + " description=\"First number\",\n", + " required=True\n", + " ),\n", + " \"y\": ToolParamDefinitionParam(\n", + " param_type=\"float\",\n", + " description=\"Second number\",\n", + " required=True\n", + " )\n", + " }\n", + " \n", + " async def run_impl(self, operation: str = None, x: float = None, y: float = None):\n", + " \"\"\"The actual implementation of our calculator\"\"\"\n", + " if not all([operation, x, y]):\n", + " return json.dumps({\"error\": \"Missing required parameters\"})\n", + " \n", + " # Dictionary of math operations\n", + " operations = {\n", + " \"add\": lambda a, b: a + b,\n", + " \"subtract\": lambda a, b: a - b,\n", + " \"multiply\": lambda a, b: a * b,\n", + " \"divide\": lambda a, b: a / b if b != 0 else \"Error: Division by zero\"\n", + " }\n", + " \n", + " # Check if the operation is valid\n", + " if operation not in operations:\n", + " return json.dumps({\"error\": f\"Unknown operation '{operation}'\"})\n", + " \n", + " try:\n", + " # Convert string inputs to float if needed\n", + " x = float(x) if isinstance(x, str) else x\n", + " y = float(y) if isinstance(y, str) else y\n", + " \n", + " # Perform the calculation\n", + " result = operations[operation](x, y)\n", + " return json.dumps({\"result\": result})\n", + " except ValueError:\n", + " return json.dumps({\"error\": \"Invalid number format\"})\n", + " except Exception as e:\n", + " return json.dumps({\"error\": str(e)})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Building a Mock Weather Tool\n", + " \n", + "Now let's create something a bit more complex: a weather tool! \n", + "While this is just a mock version (it doesn't actually fetch real weather data),\n", + "it shows how you might structure a tool that interfaces with an external API." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "class WeatherTool(SingleMessageCustomTool):\n", + " \"async def run_single_query(agent, session_id, query: str):\n", + " \"\"\"Run a single query through our agent with complete interaction cycle\"\"\"\n", + " print(\"\\n\" + \"=\"*50)\n", + " print(f\"🤔 User asks: {query}\")\n", + " print(\"=\"*50)\n", + " \n", + " # Get the initial response and tool call\n", + " response = agent.create_turn(\n", + " messages=[\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": query,\n", + " }\n", + " ],\n", + " session_id=session_id,\n", + " )\n", + " \n", + " # Process all events including tool calls and final response\n", + " async for event in EventLogger().log(response):\n", + " event.print()\n", + " \n", + " # If this was a tool call, we need to create another turn with the result\n", + " if hasattr(event, 'tool_calls') and event.tool_calls:\n", + " tool_call = event.tool_calls[0] # Get the first tool call\n", + " \n", + " # Execute the custom tool\n", + " if tool_call.tool_name in [t.get_name() for t in agent.custom_tools]:\n", + " tool = [t for t in agent.custom_tools if t.get_name() == tool_call.tool_name][0]\n", + " result = await tool.run_impl(**tool_call.arguments)\n", + " \n", + " # Create a follow-up turn with the tool result\n", + " follow_up = agent.create_turn(\n", + " messages=[\n", + " {\n", + " \"role\": \"tool\",\n", + " \"content\": result,\n", + " \"tool_call_id\": tool_call.call_id,\n", + " \"name\": tool_call.tool_name\n", + " }\n", + " ],\n", + " session_id=session_id,\n", + " )\n", + " \n", + " # Process the follow-up response\n", + " async for follow_up_event in EventLogger().log(follow_up):\n", + " follow_up_event.print()\"\"A mock weather tool that simulates getting weather data\"\"\"\n", + " \n", + " def get_name(self) -> str:\n", + " return \"get_weather\"\n", + " \n", + " def get_description(self) -> str:\n", + " return \"Get current weather information for major cities\"\n", + " \n", + " def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n", + " return {\n", + " \"city\": ToolParamDefinitionParam(\n", + " param_type=\"str\",\n", + " description=\"Name of the city (e.g., New York, London, Tokyo)\",\n", + " required=True\n", + " ),\n", + " \"date\": ToolParamDefinitionParam(\n", + " param_type=\"str\",\n", + " description=\"Date in YYYY-MM-DD format (optional)\",\n", + " required=False\n", + " )\n", + " }\n", + " \n", + " async def run_impl(self, city: str = None, date: str = None):\n", + " if not city:\n", + " return json.dumps({\"error\": \"City parameter is required\"})\n", + " \n", + " # Mock database of weather information\n", + " weather_data = {\n", + " \"New York\": {\"temp\": 20, \"condition\": \"sunny\"},\n", + " \"London\": {\"temp\": 15, \"condition\": \"rainy\"},\n", + " \"Tokyo\": {\"temp\": 25, \"condition\": \"cloudy\"}\n", + " }\n", + " \n", + " try:\n", + " # Check if we have data for the requested city\n", + " if city not in weather_data:\n", + " return json.dumps({\n", + " \"error\": f\"Sorry! No data available for {city}\",\n", + " \"available_cities\": list(weather_data.keys())\n", + " })\n", + " \n", + " # Return the weather information\n", + " return json.dumps({\n", + " \"city\": city,\n", + " \"date\": date or datetime.now().strftime(\"%Y-%m-%d\"),\n", + " \"data\": weather_data[city]\n", + " })\n", + " except Exception as e:\n", + " return json.dumps({\"error\": str(e)})" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "# ## 5. Setting Up the LlamaStack Agent\n", + "# \n", + "# Now that we have our tools, we need to create an agent that can use them.\n", + "# The agent is like a smart assistant that knows how to use our tools when needed." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "async def setup_agent(host: str = \"localhost\", port: int = 5001):\n", + " \"\"\"Creates and configures our LlamaStack agent\"\"\"\n", + " \n", + " # Create a client to connect to the LlamaStack server\n", + " client = LlamaStackClient(\n", + " base_url=f\"http://{host}:{port}\",\n", + " )\n", + " \n", + " # Configure how we want our agent to behave\n", + " agent_config = AgentConfig(\n", + " model=\"Llama3.1-8B-Instruct\",\n", + " instructions=\"\"\"You are a helpful assistant that can:\n", + " 1. Perform mathematical calculations\n", + " 2. Check weather information\n", + " Always explain your thinking before using a tool.\"\"\",\n", + " \n", + " sampling_params={\n", + " \"strategy\": \"greedy\",\n", + " \"temperature\": 1.0,\n", + " \"top_p\": 0.9,\n", + " },\n", + " \n", + " # List of tools available to the agent\n", + " tools=[\n", + " {\n", + " \"function_name\": \"calculator\",\n", + " \"description\": \"Perform basic arithmetic operations\",\n", + " \"parameters\": {\n", + " \"operation\": {\n", + " \"param_type\": \"str\",\n", + " \"description\": \"Operation to perform (add, subtract, multiply, divide)\",\n", + " \"required\": True,\n", + " },\n", + " \"x\": {\n", + " \"param_type\": \"float\",\n", + " \"description\": \"First number\",\n", + " \"required\": True,\n", + " },\n", + " \"y\": {\n", + " \"param_type\": \"float\",\n", + " \"description\": \"Second number\",\n", + " \"required\": True,\n", + " },\n", + " },\n", + " \"type\": \"function_call\",\n", + " },\n", + " {\n", + " \"function_name\": \"get_weather\",\n", + " \"description\": \"Get weather information for a given city\",\n", + " \"parameters\": {\n", + " \"city\": {\n", + " \"param_type\": \"str\",\n", + " \"description\": \"Name of the city\",\n", + " \"required\": True,\n", + " },\n", + " \"date\": {\n", + " \"param_type\": \"str\",\n", + " \"description\": \"Date in YYYY-MM-DD format\",\n", + " \"required\": False,\n", + " },\n", + " },\n", + " \"type\": \"function_call\",\n", + " },\n", + " ],\n", + " tool_choice=\"auto\",\n", + " # Using standard JSON format for tools\n", + " tool_prompt_format=\"json\", \n", + " input_shields=[],\n", + " output_shields=[],\n", + " enable_session_persistence=False,\n", + " )\n", + " \n", + " # Create our tools\n", + " custom_tools = [CalculatorTool(), WeatherTool()]\n", + " \n", + " # Create the agent\n", + " agent = Agent(client, agent_config, custom_tools)\n", + " session_id = agent.create_session(\"tutorial-session\")\n", + " print(f\"🎉 Created session_id={session_id} for Agent({agent.agent_id})\")\n", + " \n", + " return agent, session_id" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "# ## 6. Running Examples\n", + "# \n", + "# Let's try out our agent with some example questions!\n", + "\n", + "# %%" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [], + "source": [ + "import nest_asyncio\n", + "nest_asyncio.apply() # This allows async operations to work in Jupyter\n", + "\n", + "# %%\n", + "# Initialize the agent\n", + "async def init_agent():\n", + " \"\"\"Initialize our agent - run this first!\"\"\"\n", + " agent, session_id = await setup_agent()\n", + " print(f\"✨ Agent initialized with session {session_id}\")\n", + " return agent, session_id\n", + "\n", + "# %%\n", + "# Function to run a single query\n", + "async def run_single_query(agent, session_id, query: str):\n", + " \"\"\"Run a single query through our agent\"\"\"\n", + " print(\"\\n\" + \"=\"*50)\n", + " print(f\"🤔 User asks: {query}\")\n", + " print(\"=\"*50)\n", + " \n", + " response = agent.create_turn(\n", + " messages=[\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": query,\n", + " }\n", + " ],\n", + " session_id=session_id,\n", + " )\n", + " \n", + " async for log in EventLogger().log(response):\n", + " log.print()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's run everything and see it in action!\n", + "\n", + "Create and run our agent" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "🎉 Created session_id=fbe83bb6-bdfd-497c-b920-d7307482d8ba for Agent(3997eeda-4ffd-4b05-9026-28b4da206a11)\n", + "✨ Agent initialized with session fbe83bb6-bdfd-497c-b920-d7307482d8ba\n" + ] + } + ], + "source": [ + "agent, session_id = await init_agent()" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "==================================================\n", + "🤔 User asks: What's 25 plus 17?\n", + "==================================================\n", + "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[36m\u001b[0m\u001b[36m{\"\u001b[0m\u001b[36mtype\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mfunction\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mname\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mcalculator\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mparameters\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m {\"\u001b[0m\u001b[36moperation\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36madd\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36my\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36m17\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mx\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36m25\u001b[0m\u001b[36m\"}}\u001b[0m\u001b[97m\u001b[0m\n" + ] + } + ], + "source": [ + "await run_single_query(agent, session_id, \"What's 25 plus 17?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "==================================================\n", + "🤔 User asks: What's the weather like in Tokyo?\n", + "==================================================\n", + "\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[36m\u001b[0m\u001b[36m{\"\u001b[0m\u001b[36mtype\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mfunction\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mname\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mget\u001b[0m\u001b[36m_weather\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mparameters\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m {\"\u001b[0m\u001b[36mcity\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mTok\u001b[0m\u001b[36myo\u001b[0m\u001b[36m\"}}\u001b[0m\u001b[97m\u001b[0m\n" + ] + } + ], + "source": [ + "await run_single_query(agent, session_id, \"What's the weather like in Tokyo?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [], + "source": [ + "#fin" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/source/chat_completion_guide.md b/docs/zero_to_hero_guide/chat_completion_guide.md similarity index 98% rename from docs/source/chat_completion_guide.md rename to docs/zero_to_hero_guide/chat_completion_guide.md index 9ec6edfab..3fcdbfc1d 100644 --- a/docs/source/chat_completion_guide.md +++ b/docs/zero_to_hero_guide/chat_completion_guide.md @@ -1,7 +1,7 @@ -# Llama Stack Text Generation Guide +# Llama Stack Inference Guide -This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). +This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). ### Table of Contents 1. [Quickstart](#quickstart) diff --git a/docs/source/chat_few_shot_guide.md b/docs/zero_to_hero_guide/chat_few_shot_guide.md similarity index 100% rename from docs/source/chat_few_shot_guide.md rename to docs/zero_to_hero_guide/chat_few_shot_guide.md diff --git a/docs/source/chat_local_cloud_guide.md b/docs/zero_to_hero_guide/chat_local_cloud_guide.md similarity index 100% rename from docs/source/chat_local_cloud_guide.md rename to docs/zero_to_hero_guide/chat_local_cloud_guide.md diff --git a/docs/source/quickstart.md b/docs/zero_to_hero_guide/quickstart.md similarity index 91% rename from docs/source/quickstart.md rename to docs/zero_to_hero_guide/quickstart.md index a96d35c3b..3bc11285e 100644 --- a/docs/source/quickstart.md +++ b/docs/zero_to_hero_guide/quickstart.md @@ -157,16 +157,15 @@ With these steps, you should have a functional Llama Stack setup capable of gene ## Next Steps - **Explore Other Guides**: Dive deeper into specific topics by following these guides: - - [Understanding Distributions](#) - - [Configure your Distro](#) - - [Doing Inference API Call and Fetching a Response from Endpoints](#) - - [Creating a Conversation Loop](#) - - [Sending Image to the Model](#) - - [Tool Calling: How to and Details](#) - - [Memory API: Show Simple In-Memory Retrieval](#) - - [Agents API: Explain Components](#) - - [Using Safety API in Conversation](#) - - [Prompt Engineering Guide](#) +- [Inference 101](00_Inference101.ipynb) +- [Simple switch between local and cloud model](00_Local_Cloud_Inference101.ipynb) +- [Prompt Engineering](01_Prompt_Engineering101.ipynb) +- [Chat with Image - LlamaStack Vision API](02_Image_Chat101.ipynb) +- [Tool Calling: How to and Details](03_Tool_Calling101.ipynb) +- [Memory API: Show Simple In-Memory Retrieval](04_Memory101.ipynb) +- [Using Safety API in Conversation](05_Safety101.ipynb) +- [Agents API: Explain Components](06_Agents101.ipynb) + - **Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications: - [Python SDK](https://github.com/meta-llama/llama-stack-client-python) @@ -180,5 +179,3 @@ With these steps, you should have a functional Llama Stack setup capable of gene --- - -