Merge branch 'docs_improvement' of github.com:meta-llama/llama-stack into docs_improvement

This commit is contained in:
Kai Wu 2024-11-05 15:07:28 -08:00
commit ca95afb449
15 changed files with 1366 additions and 443 deletions

View file

@ -1,111 +0,0 @@
# Getting Started with Llama Stack
This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our [documentation](../README.md) for more on Llama Stack's capabilities, or visit [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) for example apps.
## Installation
The `llama` CLI tool helps you manage the Llama toolchain & agentic systems. After installing the `llama-stack` package, the `llama` command should be available in your path.
You can install this repository in two ways:
1. **Install as a package**:
Install directly from [PyPI](https://pypi.org/project/llama-stack/) with:
```bash
pip install llama-stack
```
2. **Install from source**:
Follow these steps to install from the source code:
```bash
mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
conda create -n stack python=3.10
conda activate stack
cd llama-stack
$CONDA_PREFIX/bin/pip install -e .
```
Refer to the [CLI Reference](./cli_reference.md) for details on Llama CLI commands.
## Starting Up Llama Stack Server
There are two ways to start the Llama Stack server:
1. **Using Docker**:
We provide a pre-built Docker image of Llama Stack, available in the [distributions](../distributions/) folder.
> **Note:** For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference.
```bash
export LLAMA_CHECKPOINT_DIR=~/.llama
```
Download Llama models with:
```
llama download --model-id Llama3.1-8B-Instruct
```
Start a Docker container with:
```bash
cd llama-stack/distributions/meta-reference-gpu
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
```
**Tip:** For remote providers, use `docker compose up` with scripts in the [distributions folder](../distributions/).
2. **Build->Configure->Run via Conda**:
For development, build a LlamaStack distribution from scratch.
**`llama stack build`**
Enter build information interactively:
```bash
llama stack build
```
**`llama stack configure`**
Run `llama stack configure <name>` using the name from the build step.
```bash
llama stack configure my-local-stack
```
**`llama stack run`**
Start the server with:
```bash
llama stack run my-local-stack
```
## Testing with Client
After setup, test the server with a client:
```bash
cd /path/to/llama-stack
conda activate <env>
python -m llama_stack.apis.inference.client localhost 5000
```
You can also send a POST request:
```bash
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
"model": "Llama3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'
```
For testing safety, run:
```bash
python -m llama_stack.apis.safety.client localhost 5000
```
Check our client SDKs for various languages: [Python](https://github.com/meta-llama/llama-stack-client-python), [Node](https://github.com/meta-llama/llama-stack-client-node), [Swift](https://github.com/meta-llama/llama-stack-client-swift), and [Kotlin](https://github.com/meta-llama/llama-stack-client-kotlin).
## Advanced Guides
For more on custom Llama Stack distributions, refer to our [Building a Llama Stack Distribution](./building_distro.md) guide.

View file

@ -0,0 +1,247 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c1e7571c",
"metadata": {},
"source": [
"# Llama Stack Inference Guide\n",
"\n",
"This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).\n",
"\n",
"### Table of Contents\n",
"1. [Quickstart](#quickstart)\n",
"2. [Building Effective Prompts](#building-effective-prompts)\n",
"3. [Conversation Loop](#conversation-loop)\n",
"4. [Conversation History](#conversation-history)\n",
"5. [Streaming Responses](#streaming-responses)\n"
]
},
{
"cell_type": "markdown",
"id": "414301dc",
"metadata": {},
"source": [
"## Quickstart\n",
"\n",
"This section walks through each step to set up and make a simple text generation request.\n",
"\n",
"### 1. Set Up the Client\n",
"\n",
"Begin by importing the necessary components from Llama Stacks client library:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7a573752",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client import LlamaStackClient\n",
"from llama_stack_client.types import SystemMessage, UserMessage\n",
"\n",
"client = LlamaStackClient(base_url='http://localhost:5000')"
]
},
{
"cell_type": "markdown",
"id": "86366383",
"metadata": {},
"source": [
"### 2. Create a Chat Completion Request\n",
"\n",
"Use the `chat_completion` function to define the conversation context. Each message you include should have a specific role and content:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77c29dba",
"metadata": {},
"outputs": [],
"source": [
"response = client.inference.chat_completion(\n",
" messages=[\n",
" SystemMessage(content='You are a friendly assistant.', role='system'),\n",
" UserMessage(content='Write a two-sentence poem about llama.', role='user')\n",
" ],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
")\n",
"\n",
"print(response.completion_message.content)"
]
},
{
"cell_type": "markdown",
"id": "e5f16949",
"metadata": {},
"source": [
"## Building Effective Prompts\n",
"\n",
"Effective prompt creation (often called 'prompt engineering') is essential for quality responses. Here are best practices for structuring your prompts to get the most out of the Llama Stack model:\n",
"\n",
"1. **System Messages**: Use `SystemMessage` to set the model's behavior. This is similar to providing top-level instructions for tone, format, or specific behavior.\n",
" - **Example**: `SystemMessage(content='You are a friendly assistant that explains complex topics simply.')`\n",
"2. **User Messages**: Define the task or question you want to ask the model with a `UserMessage`. The clearer and more direct you are, the better the response.\n",
" - **Example**: `UserMessage(content='Explain recursion in programming in simple terms.')`\n",
"\n",
"### Sample Prompt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c6812da",
"metadata": {},
"outputs": [],
"source": [
"response = client.inference.chat_completion(\n",
" messages=[\n",
" SystemMessage(content='You are shakespeare.', role='system'),\n",
" UserMessage(content='Write a two-sentence poem about llama.', role='user')\n",
" ],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
")\n",
"\n",
"print(response.completion_message.content)"
]
},
{
"cell_type": "markdown",
"id": "c8690ef0",
"metadata": {},
"source": [
"## Conversation Loop\n",
"\n",
"To create a continuous conversation loop, where users can input multiple messages in a session, use the following structure. This example runs an asynchronous loop, ending when the user types 'exit,' 'quit,' or 'bye.'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02211625",
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"from llama_stack_client import LlamaStackClient\n",
"from llama_stack_client.types import UserMessage\n",
"from termcolor import cprint\n",
"\n",
"client = LlamaStackClient(base_url='http://localhost:5000')\n",
"\n",
"async def chat_loop():\n",
" while True:\n",
" user_input = input('User> ')\n",
" if user_input.lower() in ['exit', 'quit', 'bye']:\n",
" cprint('Ending conversation. Goodbye!', 'yellow')\n",
" break\n",
"\n",
" message = UserMessage(content=user_input, role='user')\n",
" response = client.inference.chat_completion(\n",
" messages=[message],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
" )\n",
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
"\n",
"asyncio.run(chat_loop())"
]
},
{
"cell_type": "markdown",
"id": "8cf0d555",
"metadata": {},
"source": [
"## Conversation History\n",
"\n",
"Maintaining a conversation history allows the model to retain context from previous interactions. Use a list to accumulate messages, enabling continuity throughout the chat session."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9496f75c",
"metadata": {},
"outputs": [],
"source": [
"async def chat_loop():\n",
" conversation_history = []\n",
" while True:\n",
" user_input = input('User> ')\n",
" if user_input.lower() in ['exit', 'quit', 'bye']:\n",
" cprint('Ending conversation. Goodbye!', 'yellow')\n",
" break\n",
"\n",
" user_message = UserMessage(content=user_input, role='user')\n",
" conversation_history.append(user_message)\n",
"\n",
" response = client.inference.chat_completion(\n",
" messages=conversation_history,\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
" )\n",
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
"\n",
" assistant_message = UserMessage(content=response.completion_message.content, role='user')\n",
" conversation_history.append(assistant_message)\n",
"\n",
"asyncio.run(chat_loop())"
]
},
{
"cell_type": "markdown",
"id": "03fcf5e0",
"metadata": {},
"source": [
"## Streaming Responses\n",
"\n",
"Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed.\n",
"\n",
"### Example: Streaming Responses"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d119026e",
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"from llama_stack_client import LlamaStackClient\n",
"from llama_stack_client.lib.inference.event_logger import EventLogger\n",
"from llama_stack_client.types import UserMessage\n",
"from termcolor import cprint\n",
"\n",
"async def run_main(stream: bool = True):\n",
" client = LlamaStackClient(base_url='http://localhost:5000')\n",
"\n",
" message = UserMessage(\n",
" content='hello world, write me a 2 sentence poem about the moon', role='user'\n",
" )\n",
" print(f'User>{message.content}', 'green')\n",
"\n",
" response = client.inference.chat_completion(\n",
" messages=[message],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
" stream=stream,\n",
" )\n",
"\n",
" if not stream:\n",
" cprint(f'> Response: {response}', 'cyan')\n",
" else:\n",
" async for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
" models_response = client.models.list()\n",
" print(models_response)\n",
"\n",
"if __name__ == '__main__':\n",
" asyncio.run(run_main())"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View file

@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a0ed972d",
"metadata": {},
"source": [
"# Switching between Local and Cloud Model with Llama Stack\n",
"\n",
"This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stacks `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
"\n",
"### Pre-requisite\n",
"Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
"\n",
"### Implementation"
]
},
{
"cell_type": "markdown",
"id": "df89cff7",
"metadata": {},
"source": [
"#### 1. Set Up Local and Cloud Clients\n",
"\n",
"Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:5000` and the cloud distribution running on `http://localhost:5001`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f868dfe",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client import LlamaStackClient\n",
"\n",
"# Configure local and cloud clients\n",
"local_client = LlamaStackClient(base_url='http://localhost:5000')\n",
"cloud_client = LlamaStackClient(base_url='http://localhost:5001')"
]
},
{
"cell_type": "markdown",
"id": "894689c1",
"metadata": {},
"source": [
"#### 2. Client Selection with Fallback\n",
"\n",
"The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff0c8277",
"metadata": {},
"outputs": [],
"source": [
"import httpx\n",
"from termcolor import cprint\n",
"\n",
"async def select_client() -> LlamaStackClient:\n",
" \"\"\"Use local client if available; otherwise, switch to cloud client.\"\"\"\n",
" try:\n",
" async with httpx.AsyncClient() as http_client:\n",
" response = await http_client.get(f'{local_client.base_url}/health')\n",
" if response.status_code == 200:\n",
" cprint('Using local client.', 'yellow')\n",
" return local_client\n",
" except httpx.RequestError:\n",
" pass\n",
" cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n",
" return cloud_client"
]
},
{
"cell_type": "markdown",
"id": "9ccfe66f",
"metadata": {},
"source": [
"#### 3. Generate a Response\n",
"\n",
"After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e19cc20",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client.types import UserMessage\n",
"\n",
"async def get_llama_response(stream: bool = True):\n",
" client = await select_client() # Selects the available client\n",
" message = UserMessage(content='hello world, write me a 2 sentence poem about the moon', role='user')\n",
" cprint(f'User> {message.content}', 'green')\n",
"\n",
" response = client.inference.chat_completion(\n",
" messages=[message],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
" stream=stream,\n",
" )\n",
"\n",
" if not stream:\n",
" cprint(f'> Response: {response}', 'cyan')\n",
" else:\n",
" # Stream tokens progressively\n",
" async for log in EventLogger().log(response):\n",
" log.print()"
]
},
{
"cell_type": "markdown",
"id": "6edf5e57",
"metadata": {},
"source": [
"#### 4. Run the Asynchronous Response Generation\n",
"\n",
"Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c10f487e",
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"\n",
"# Initiate the response generation process\n",
"asyncio.run(get_llama_response())"
]
},
{
"cell_type": "markdown",
"id": "56aa9a09",
"metadata": {},
"source": [
"### Complete code\n",
"Summing it up, here's the complete code for local-cloud model implementation with Llama Stack:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9fd74ff",
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"import httpx\n",
"from llama_stack_client import LlamaStackClient\n",
"from llama_stack_client.lib.inference.event_logger import EventLogger\n",
"from llama_stack_client.types import UserMessage\n",
"from termcolor import cprint\n",
"\n",
"local_client = LlamaStackClient(base_url='http://localhost:5000')\n",
"cloud_client = LlamaStackClient(base_url='http://localhost:5001')\n",
"\n",
"async def select_client() -> LlamaStackClient:\n",
" try:\n",
" async with httpx.AsyncClient() as http_client:\n",
" response = await http_client.get(f'{local_client.base_url}/health')\n",
" if response.status_code == 200:\n",
" cprint('Using local client.', 'yellow')\n",
" return local_client\n",
" except httpx.RequestError:\n",
" pass\n",
" cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n",
" return cloud_client\n",
"\n",
"async def get_llama_response(stream: bool = True):\n",
" client = await select_client()\n",
" message = UserMessage(\n",
" content='hello world, write me a 2 sentence poem about the moon', role='user'\n",
" )\n",
" cprint(f'User> {message.content}', 'green')\n",
"\n",
" response = client.inference.chat_completion(\n",
" messages=[message],\n",
" model='Llama3.2-11B-Vision-Instruct',\n",
" stream=stream,\n",
" )\n",
"\n",
" if not stream:\n",
" cprint(f'> Response: {response}', 'cyan')\n",
" else:\n",
" async for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
"asyncio.run(get_llama_response())"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View file

@ -1,318 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tool Calling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
"1. Setting up and using the Brave Search API\n",
"2. Creating custom tools\n",
"3. Configuring tool prompts and safety settings"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: llama-stack-client in ./.conda/envs/quick/lib/python3.13/site-packages (0.0.48)\n",
"Requirement already satisfied: anyio<5,>=3.5.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (4.6.2.post1)\n",
"Requirement already satisfied: distro<2,>=1.7.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (1.9.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (0.27.2)\n",
"Requirement already satisfied: pydantic<3,>=1.9.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (2.9.2)\n",
"Requirement already satisfied: sniffio in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (1.3.1)\n",
"Requirement already satisfied: tabulate>=0.9.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (0.9.0)\n",
"Requirement already satisfied: typing-extensions<5,>=4.7 in ./.conda/envs/quick/lib/python3.13/site-packages (from llama-stack-client) (4.12.2)\n",
"Requirement already satisfied: idna>=2.8 in ./.conda/envs/quick/lib/python3.13/site-packages (from anyio<5,>=3.5.0->llama-stack-client) (3.10)\n",
"Requirement already satisfied: certifi in ./.conda/envs/quick/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama-stack-client) (2024.8.30)\n",
"Requirement already satisfied: httpcore==1.* in ./.conda/envs/quick/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama-stack-client) (1.0.6)\n",
"Requirement already satisfied: h11<0.15,>=0.13 in ./.conda/envs/quick/lib/python3.13/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama-stack-client) (0.14.0)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in ./.conda/envs/quick/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama-stack-client) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.23.4 in ./.conda/envs/quick/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama-stack-client) (2.23.4)\n"
]
}
],
"source": [
"!pip install llama-stack-client --upgrade"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'Agent' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[4], line 23\u001b[0m\n\u001b[1;32m 15\u001b[0m load_dotenv()\n\u001b[1;32m 17\u001b[0m \u001b[38;5;66;03m# Helper function to create an agent with tools\u001b[39;00m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mcreate_tool_agent\u001b[39m(\n\u001b[1;32m 19\u001b[0m client: LlamaStackClient,\n\u001b[1;32m 20\u001b[0m tools: List[Dict],\n\u001b[1;32m 21\u001b[0m instructions: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mYou are a helpful assistant\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 22\u001b[0m model: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mLlama3.1-8B-Instruct\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m---> 23\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[43mAgent\u001b[49m:\n\u001b[1;32m 24\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Create an agent with specified tools.\"\"\"\u001b[39;00m\n\u001b[1;32m 25\u001b[0m agent_config \u001b[38;5;241m=\u001b[39m AgentConfig(\n\u001b[1;32m 26\u001b[0m model\u001b[38;5;241m=\u001b[39mmodel,\n\u001b[1;32m 27\u001b[0m instructions\u001b[38;5;241m=\u001b[39minstructions,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 38\u001b[0m enable_session_persistence\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m,\n\u001b[1;32m 39\u001b[0m )\n",
"\u001b[0;31mNameError\u001b[0m: name 'Agent' is not defined"
]
}
],
"source": [
"import asyncio\n",
"import os\n",
"from typing import Dict, List, Optional\n",
"from dotenv import load_dotenv\n",
"\n",
"from llama_stack_client import LlamaStackClient\n",
"#from llama_stack_client.lib.agents.agent import Agent\n",
"from llama_stack_client.lib.agents.event_logger import EventLogger\n",
"from llama_stack_client.types.agent_create_params import (\n",
" AgentConfig,\n",
" AgentConfigToolSearchToolDefinition,\n",
")\n",
"\n",
"# Load environment variables\n",
"load_dotenv()\n",
"\n",
"# Helper function to create an agent with tools\n",
"async def create_tool_agent(\n",
" client: LlamaStackClient,\n",
" tools: List[Dict],\n",
" instructions: str = \"You are a helpful assistant\",\n",
" model: str = \"Llama3.1-8B-Instruct\",\n",
") -> Agent:\n",
" \"\"\"Create an agent with specified tools.\"\"\"\n",
" agent_config = AgentConfig(\n",
" model=model,\n",
" instructions=instructions,\n",
" sampling_params={\n",
" \"strategy\": \"greedy\",\n",
" \"temperature\": 1.0,\n",
" \"top_p\": 0.9,\n",
" },\n",
" tools=tools,\n",
" tool_choice=\"auto\",\n",
" tool_prompt_format=\"json\",\n",
" input_shields=[\"llama_guard\"],\n",
" output_shields=[\"llama_guard\"],\n",
" enable_session_persistence=True,\n",
" )\n",
"\n",
" return Agent(client, agent_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, create a `.env` file in your notebook directory with your Brave Search API key:\n",
"\n",
"```\n",
"BRAVE_SEARCH_API_KEY=your_key_here\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
" \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
" search_tool = AgentConfigToolSearchToolDefinition(\n",
" type=\"brave_search\",\n",
" engine=\"brave\",\n",
" api_key=os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
" )\n",
"\n",
" return await create_tool_agent(\n",
" client=client,\n",
" tools=[search_tool],\n",
" instructions=\"\"\"\n",
" You are a research assistant that can search the web.\n",
" Always cite your sources with URLs when providing information.\n",
" Format your responses as:\n",
"\n",
" FINDINGS:\n",
" [Your summary here]\n",
"\n",
" SOURCES:\n",
" - [Source title](URL)\n",
" \"\"\"\n",
" )\n",
"\n",
"# Example usage\n",
"async def search_example():\n",
" client = LlamaStackClient(base_url=\"http://localhost:8000\")\n",
" agent = await create_search_agent(client)\n",
"\n",
" # Create a session\n",
" session_id = agent.create_session(\"search-session\")\n",
"\n",
" # Example queries\n",
" queries = [\n",
" \"What are the latest developments in quantum computing?\",\n",
" \"Who won the most recent Super Bowl?\",\n",
" ]\n",
"\n",
" for query in queries:\n",
" print(f\"\\nQuery: {query}\")\n",
" print(\"-\" * 50)\n",
"\n",
" response = agent.create_turn(\n",
" messages=[{\"role\": \"user\", \"content\": query}],\n",
" session_id=session_id,\n",
" )\n",
"\n",
" async for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
"# Run the example (in Jupyter, use asyncio.run())\n",
"await search_example()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Custom Tool Creation\n",
"\n",
"Let's create a custom weather tool:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import TypedDict, Optional\n",
"from datetime import datetime\n",
"\n",
"# Define tool types\n",
"class WeatherInput(TypedDict):\n",
" location: str\n",
" date: Optional[str]\n",
"\n",
"class WeatherOutput(TypedDict):\n",
" temperature: float\n",
" conditions: str\n",
" humidity: float\n",
"\n",
"class WeatherTool:\n",
" \"\"\"Example custom tool for weather information.\"\"\"\n",
"\n",
" def __init__(self, api_key: Optional[str] = None):\n",
" self.api_key = api_key\n",
"\n",
" async def get_weather(self, location: str, date: Optional[str] = None) -> WeatherOutput:\n",
" \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
" # Mock implementation\n",
" return {\n",
" \"temperature\": 72.5,\n",
" \"conditions\": \"partly cloudy\",\n",
" \"humidity\": 65.0\n",
" }\n",
"\n",
" async def __call__(self, input_data: WeatherInput) -> WeatherOutput:\n",
" \"\"\"Make the tool callable with structured input.\"\"\"\n",
" return await self.get_weather(\n",
" location=input_data[\"location\"],\n",
" date=input_data.get(\"date\")\n",
" )\n",
"\n",
"async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
" \"\"\"Create an agent with weather tool capability.\"\"\"\n",
" weather_tool = {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"get_weather\",\n",
" \"description\": \"Get weather information for a location\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"location\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"City or location name\"\n",
" },\n",
" \"date\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"Optional date (YYYY-MM-DD)\",\n",
" \"format\": \"date\"\n",
" }\n",
" },\n",
" \"required\": [\"location\"]\n",
" }\n",
" },\n",
" \"implementation\": WeatherTool()\n",
" }\n",
"\n",
" return await create_tool_agent(\n",
" client=client,\n",
" tools=[weather_tool],\n",
" instructions=\"\"\"\n",
" You are a weather assistant that can provide weather information.\n",
" Always specify the location clearly in your responses.\n",
" Include both temperature and conditions in your summaries.\n",
" \"\"\"\n",
" )\n",
"\n",
"# Example usage\n",
"async def weather_example():\n",
" client = LlamaStackClient(base_url=\"http://localhost:8000\")\n",
" agent = await create_weather_agent(client)\n",
"\n",
" session_id = agent.create_session(\"weather-session\")\n",
"\n",
" queries = [\n",
" \"What's the weather like in San Francisco?\",\n",
" \"Tell me the weather in Tokyo tomorrow\",\n",
" ]\n",
"\n",
" for query in queries:\n",
" print(f\"\\nQuery: {query}\")\n",
" print(\"-\" * 50)\n",
"\n",
" response = agent.create_turn(\n",
" messages=[{\"role\": \"user\", \"content\": query}],\n",
" session_id=session_id,\n",
" )\n",
"\n",
" async for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
"# Run the example\n",
"await weather_example()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -0,0 +1,349 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tool Calling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
"1. Setting up and using the Brave Search API\n",
"2. Creating custom tools\n",
"3. Configuring tool prompts and safety settings"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"import os\n",
"from typing import Dict, List, Optional\n",
"from dotenv import load_dotenv\n",
"\n",
"from llama_stack_client import LlamaStackClient\n",
"from llama_stack_client.lib.agents.agent import Agent\n",
"from llama_stack_client.lib.agents.event_logger import EventLogger\n",
"from llama_stack_client.types.agent_create_params import (\n",
" AgentConfig,\n",
" AgentConfigToolSearchToolDefinition,\n",
")\n",
"\n",
"# Load environment variables\n",
"load_dotenv()\n",
"\n",
"# Helper function to create an agent with tools\n",
"async def create_tool_agent(\n",
" client: LlamaStackClient,\n",
" tools: List[Dict],\n",
" instructions: str = \"You are a helpful assistant\",\n",
" model: str = \"Llama3.1-8B-Instruct\",\n",
") -> Agent:\n",
" \"\"\"Create an agent with specified tools.\"\"\"\n",
" agent_config = AgentConfig(\n",
" model=model,\n",
" instructions=instructions,\n",
" sampling_params={\n",
" \"strategy\": \"greedy\",\n",
" \"temperature\": 1.0,\n",
" \"top_p\": 0.9,\n",
" },\n",
" tools=tools,\n",
" tool_choice=\"auto\",\n",
" tool_prompt_format=\"json\",\n",
" enable_session_persistence=True,\n",
" )\n",
"\n",
" return Agent(client, agent_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, create a `.env` file in your notebook directory with your Brave Search API key:\n",
"\n",
"```\n",
"BRAVE_SEARCH_API_KEY=your_key_here\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Query: What are the latest developments in quantum computing?\n",
"--------------------------------------------------\n",
"\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33mF\u001b[0m\u001b[33mIND\u001b[0m\u001b[33mINGS\u001b[0m\u001b[33m:\n",
"\u001b[0m\u001b[33mThe\u001b[0m\u001b[33m latest\u001b[0m\u001b[33m developments\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m include\u001b[0m\u001b[33m advancements\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m processors\u001b[0m\u001b[33m,\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m algorithms\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m error\u001b[0m\u001b[33m correction\u001b[0m\u001b[33m.\u001b[0m\u001b[33m Researchers\u001b[0m\u001b[33m have\u001b[0m\u001b[33m made\u001b[0m\u001b[33m significant\u001b[0m\u001b[33m progress\u001b[0m\u001b[33m in\u001b[0m\u001b[33m developing\u001b[0m\u001b[33m more\u001b[0m\u001b[33m powerful\u001b[0m\u001b[33m and\u001b[0m\u001b[33m reliable\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computers\u001b[0m\u001b[33m,\u001b[0m\u001b[33m with\u001b[0m\u001b[33m some\u001b[0m\u001b[33m companies\u001b[0m\u001b[33m already\u001b[0m\u001b[33m showcasing\u001b[0m\u001b[33m \u001b[0m\u001b[33m100\u001b[0m\u001b[33m-q\u001b[0m\u001b[33mubit\u001b[0m\u001b[33m and\u001b[0m\u001b[33m \u001b[0m\u001b[33m127\u001b[0m\u001b[33m-q\u001b[0m\u001b[33mubit\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m processors\u001b[0m\u001b[33m (\u001b[0m\u001b[33mIBM\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m;\u001b[0m\u001b[33m Google\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m These\u001b[0m\u001b[33m advancements\u001b[0m\u001b[33m have\u001b[0m\u001b[33m led\u001b[0m\u001b[33m to\u001b[0m\u001b[33m breakthrough\u001b[0m\u001b[33ms\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m simulations\u001b[0m\u001b[33m,\u001b[0m\u001b[33m machine\u001b[0m\u001b[33m learning\u001b[0m\u001b[33m,\u001b[0m\u001b[33m and\u001b[0m\u001b[33m optimization\u001b[0m\u001b[33m problems\u001b[0m\u001b[33m (\u001b[0m\u001b[33mB\u001b[0m\u001b[33mhart\u001b[0m\u001b[33mi\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m;\u001b[0m\u001b[33m Zhang\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m Additionally\u001b[0m\u001b[33m,\u001b[0m\u001b[33m there\u001b[0m\u001b[33m have\u001b[0m\u001b[33m been\u001b[0m\u001b[33m significant\u001b[0m\u001b[33m improvements\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m error\u001b[0m\u001b[33m correction\u001b[0m\u001b[33m,\u001b[0m\u001b[33m which\u001b[0m\u001b[33m is\u001b[0m\u001b[33m essential\u001b[0m\u001b[33m for\u001b[0m\u001b[33m large\u001b[0m\u001b[33m-scale\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m (\u001b[0m\u001b[33mG\u001b[0m\u001b[33mottes\u001b[0m\u001b[33mman\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\n",
"\n",
"\u001b[0m\u001b[33mS\u001b[0m\u001b[33mOURCES\u001b[0m\u001b[33m:\n",
"\u001b[0m\u001b[33m-\u001b[0m\u001b[33m IBM\u001b[0m\u001b[33m Quantum\u001b[0m\u001b[33m:\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Process\u001b[0m\u001b[33mors\u001b[0m\u001b[33m\"\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mwww\u001b[0m\u001b[33m.ibm\u001b[0m\u001b[33m.com\u001b[0m\u001b[33m/\u001b[0m\u001b[33mquant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m/com\u001b[0m\u001b[33mputer\u001b[0m\u001b[33m/)\n",
"\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Google\u001b[0m\u001b[33m Quantum\u001b[0m\u001b[33m AI\u001b[0m\u001b[33m Lab\u001b[0m\u001b[33m:\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Process\u001b[0m\u001b[33mors\u001b[0m\u001b[33m\"\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mquant\u001b[0m\u001b[33mum\u001b[0m\u001b[33mai\u001b[0m\u001b[33m.google\u001b[0m\u001b[33m/al\u001b[0m\u001b[33mphabet\u001b[0m\u001b[33m/sub\u001b[0m\u001b[33m-page\u001b[0m\u001b[33m-\u001b[0m\u001b[33m1\u001b[0m\u001b[33m/)\n",
"\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Bh\u001b[0m\u001b[33marti\u001b[0m\u001b[33m,\u001b[0m\u001b[33m K\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Computing\u001b[0m\u001b[33m:\u001b[0m\u001b[33m A\u001b[0m\u001b[33m Review\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Recent\u001b[0m\u001b[33m Advances\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m Journal\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Physics\u001b[0m\u001b[33m:\u001b[0m\u001b[33m Conference\u001b[0m\u001b[33m Series\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m218\u001b[0m\u001b[33m5\u001b[0m\u001b[33m(\u001b[0m\u001b[33m1\u001b[0m\u001b[33m),\u001b[0m\u001b[33m \u001b[0m\u001b[33m012\u001b[0m\u001b[33m001\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mi\u001b[0m\u001b[33mop\u001b[0m\u001b[33mscience\u001b[0m\u001b[33m.i\u001b[0m\u001b[33mop\u001b[0m\u001b[33m.org\u001b[0m\u001b[33m/article\u001b[0m\u001b[33m/\u001b[0m\u001b[33m10\u001b[0m\u001b[33m.\u001b[0m\u001b[33m108\u001b[0m\u001b[33m8\u001b[0m\u001b[33m/\u001b[0m\u001b[33m174\u001b[0m\u001b[33m2\u001b[0m\u001b[33m-\u001b[0m\u001b[33m659\u001b[0m\u001b[33m6\u001b[0m\u001b[33m/\u001b[0m\u001b[33m218\u001b[0m\u001b[33m5\u001b[0m\u001b[33m/\u001b[0m\u001b[33m1\u001b[0m\u001b[33m/\u001b[0m\u001b[33m012\u001b[0m\u001b[33m001\u001b[0m\u001b[33m)\n",
"\u001b[0m\u001b[33m-\u001b[0m\u001b[33m Zhang\u001b[0m\u001b[33m,\u001b[0m\u001b[33m Y\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Algorithms\u001b[0m\u001b[33m for\u001b[0m\u001b[33m Machine\u001b[0m\u001b[33m Learning\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m Journal\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Machine\u001b[0m\u001b[33m Learning\u001b[0m\u001b[33m Research\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m23\u001b[0m\u001b[33m,\u001b[0m\u001b[33m \u001b[0m\u001b[33m1\u001b[0m\u001b[33m-\u001b[0m\u001b[33m36\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mj\u001b[0m\u001b[33mml\u001b[0m\u001b[33mr\u001b[0m\u001b[33m.org\u001b[0m\u001b[33m/p\u001b[0m\u001b[33mapers\u001b[0m\u001b[33m/v\u001b[0m\u001b[33m23\u001b[0m\u001b[33m/\u001b[0m\u001b[33m20\u001b[0m\u001b[33m-\u001b[0m\u001b[33m065\u001b[0m\u001b[33m.html\u001b[0m\u001b[33m)\n",
"\u001b[0m\u001b[33m-\u001b[0m\u001b[33m G\u001b[0m\u001b[33mottes\u001b[0m\u001b[33mman\u001b[0m\u001b[33m,\u001b[0m\u001b[33m D\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33m202\u001b[0m\u001b[33m2\u001b[0m\u001b[33m).\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mQuant\u001b[0m\u001b[33mum\u001b[0m\u001b[33m Error\u001b[0m\u001b[33m Correction\u001b[0m\u001b[33m.\"\u001b[0m\u001b[33m In\u001b[0m\u001b[33m Encyclopedia\u001b[0m\u001b[33m of\u001b[0m\u001b[33m Complexity\u001b[0m\u001b[33m and\u001b[0m\u001b[33m Systems\u001b[0m\u001b[33m Science\u001b[0m\u001b[33m (\u001b[0m\u001b[33mpp\u001b[0m\u001b[33m.\u001b[0m\u001b[33m \u001b[0m\u001b[33m1\u001b[0m\u001b[33m-\u001b[0m\u001b[33m13\u001b[0m\u001b[33m).\u001b[0m\u001b[33m Springer\u001b[0m\u001b[33m,\u001b[0m\u001b[33m New\u001b[0m\u001b[33m York\u001b[0m\u001b[33m,\u001b[0m\u001b[33m NY\u001b[0m\u001b[33m.\u001b[0m\u001b[33m (\u001b[0m\u001b[33mhttps\u001b[0m\u001b[33m://\u001b[0m\u001b[33mlink\u001b[0m\u001b[33m.spring\u001b[0m\u001b[33mer\u001b[0m\u001b[33m.com\u001b[0m\u001b[33m/reference\u001b[0m\u001b[33mwork\u001b[0m\u001b[33mentry\u001b[0m\u001b[33m/\u001b[0m\u001b[33m10\u001b[0m\u001b[33m.\u001b[0m\u001b[33m100\u001b[0m\u001b[33m7\u001b[0m\u001b[33m/\u001b[0m\u001b[33m978\u001b[0m\u001b[33m-\u001b[0m\u001b[33m0\u001b[0m\u001b[33m-\u001b[0m\u001b[33m387\u001b[0m\u001b[33m-\u001b[0m\u001b[33m758\u001b[0m\u001b[33m88\u001b[0m\u001b[33m-\u001b[0m\u001b[33m6\u001b[0m\u001b[33m_\u001b[0m\u001b[33m447\u001b[0m\u001b[33m)\u001b[0m\u001b[97m\u001b[0m\n",
"\u001b[30m\u001b[0m"
]
}
],
"source": [
"async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
" \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
" search_tool = AgentConfigToolSearchToolDefinition(\n",
" type=\"brave_search\",\n",
" engine=\"brave\",\n",
" api_key=\"dummy_value\"#os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
" )\n",
"\n",
" return await create_tool_agent(\n",
" client=client,\n",
" tools=[search_tool],\n",
" instructions=\"\"\"\n",
" You are a research assistant that can search the web.\n",
" Always cite your sources with URLs when providing information.\n",
" Format your responses as:\n",
"\n",
" FINDINGS:\n",
" [Your summary here]\n",
"\n",
" SOURCES:\n",
" - [Source title](URL)\n",
" \"\"\"\n",
" )\n",
"\n",
"# Example usage\n",
"async def search_example():\n",
" client = LlamaStackClient(base_url=\"http://localhost:5001\")\n",
" agent = await create_search_agent(client)\n",
"\n",
" # Create a session\n",
" session_id = agent.create_session(\"search-session\")\n",
"\n",
" # Example queries\n",
" queries = [\n",
" \"What are the latest developments in quantum computing?\",\n",
" #\"Who won the most recent Super Bowl?\",\n",
" ]\n",
"\n",
" for query in queries:\n",
" print(f\"\\nQuery: {query}\")\n",
" print(\"-\" * 50)\n",
"\n",
" response = agent.create_turn(\n",
" messages=[{\"role\": \"user\", \"content\": query}],\n",
" session_id=session_id,\n",
" )\n",
"\n",
" async for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
"# Run the example (in Jupyter, use asyncio.run())\n",
"await search_example()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Custom Tool Creation\n",
"\n",
"Let's create a custom weather tool:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Query: What's the weather like in San Francisco?\n",
"--------------------------------------------------\n",
"\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33m{\n",
"\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mtype\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mfunction\u001b[0m\u001b[33m\",\n",
"\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mname\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mget\u001b[0m\u001b[33m_weather\u001b[0m\u001b[33m\",\n",
"\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mparameters\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m {\n",
"\u001b[0m\u001b[33m \u001b[0m\u001b[33m \"\u001b[0m\u001b[33mlocation\u001b[0m\u001b[33m\":\u001b[0m\u001b[33m \"\u001b[0m\u001b[33mSan\u001b[0m\u001b[33m Francisco\u001b[0m\u001b[33m\"\n",
"\u001b[0m\u001b[33m \u001b[0m\u001b[33m }\n",
"\u001b[0m\u001b[33m}\u001b[0m\u001b[97m\u001b[0m\n"
]
},
{
"ename": "AttributeError",
"evalue": "'WeatherTool' object has no attribute 'run'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[27], line 113\u001b[0m\n\u001b[1;32m 110\u001b[0m nest_asyncio\u001b[38;5;241m.\u001b[39mapply()\n\u001b[1;32m 112\u001b[0m \u001b[38;5;66;03m# Run the example\u001b[39;00m\n\u001b[0;32m--> 113\u001b[0m \u001b[38;5;28;01mawait\u001b[39;00m weather_example()\n",
"Cell \u001b[0;32mIn[27], line 105\u001b[0m, in \u001b[0;36mweather_example\u001b[0;34m()\u001b[0m\n\u001b[1;32m 98\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m-\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m*\u001b[39m \u001b[38;5;241m50\u001b[39m)\n\u001b[1;32m 100\u001b[0m response \u001b[38;5;241m=\u001b[39m agent\u001b[38;5;241m.\u001b[39mcreate_turn(\n\u001b[1;32m 101\u001b[0m messages\u001b[38;5;241m=\u001b[39m[{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrole\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124muser\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcontent\u001b[39m\u001b[38;5;124m\"\u001b[39m: query}],\n\u001b[1;32m 102\u001b[0m session_id\u001b[38;5;241m=\u001b[39msession_id,\n\u001b[1;32m 103\u001b[0m )\n\u001b[0;32m--> 105\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mfor\u001b[39;00m log \u001b[38;5;129;01min\u001b[39;00m EventLogger()\u001b[38;5;241m.\u001b[39mlog(response):\n\u001b[1;32m 106\u001b[0m log\u001b[38;5;241m.\u001b[39mprint()\n",
"File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/event_logger.py:55\u001b[0m, in \u001b[0;36mEventLogger.log\u001b[0;34m(self, event_generator)\u001b[0m\n\u001b[1;32m 52\u001b[0m previous_event_type \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 53\u001b[0m previous_step_type \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m---> 55\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mfor\u001b[39;00m chunk \u001b[38;5;129;01min\u001b[39;00m event_generator:\n\u001b[1;32m 56\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(chunk, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mevent\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m 57\u001b[0m \u001b[38;5;66;03m# Need to check for custom tool first\u001b[39;00m\n\u001b[1;32m 58\u001b[0m \u001b[38;5;66;03m# since it does not produce event but instead\u001b[39;00m\n\u001b[1;32m 59\u001b[0m \u001b[38;5;66;03m# a Message\u001b[39;00m\n\u001b[1;32m 60\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(chunk, ToolResponseMessage):\n",
"File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/agent.py:76\u001b[0m, in \u001b[0;36mAgent.create_turn\u001b[0;34m(self, messages, attachments, session_id)\u001b[0m\n\u001b[1;32m 74\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 75\u001b[0m tool \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcustom_tools[tool_call\u001b[38;5;241m.\u001b[39mtool_name]\n\u001b[0;32m---> 76\u001b[0m result_messages \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexecute_custom_tool(tool, message)\n\u001b[1;32m 77\u001b[0m next_message \u001b[38;5;241m=\u001b[39m result_messages[\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m 79\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m next_message\n",
"File \u001b[0;32m~/new_task/llama-stack-client-python/src/llama_stack_client/lib/agents/agent.py:84\u001b[0m, in \u001b[0;36mAgent.execute_custom_tool\u001b[0;34m(self, tool, message)\u001b[0m\n\u001b[1;32m 81\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mexecute_custom_tool\u001b[39m(\n\u001b[1;32m 82\u001b[0m \u001b[38;5;28mself\u001b[39m, tool: CustomTool, message: Union[UserMessage, ToolResponseMessage]\n\u001b[1;32m 83\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[Union[UserMessage, ToolResponseMessage]]:\n\u001b[0;32m---> 84\u001b[0m result_messages \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m([message])\n\u001b[1;32m 85\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result_messages\n",
"\u001b[0;31mAttributeError\u001b[0m: 'WeatherTool' object has no attribute 'run'"
]
}
],
"source": [
"from typing import TypedDict, Optional, Dict, Any\n",
"from datetime import datetime\n",
"class WeatherTool:\n",
" \"\"\"Example custom tool for weather information.\"\"\"\n",
" \n",
" def get_name(self) -> str:\n",
" return \"get_weather\"\n",
" \n",
" def get_description(self) -> str:\n",
" return \"Get weather information for a location\"\n",
" \n",
" def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
" return {\n",
" \"location\": ToolParamDefinitionParam(\n",
" param_type=\"str\",\n",
" description=\"City or location name\",\n",
" required=True\n",
" ),\n",
" \"date\": ToolParamDefinitionParam(\n",
" param_type=\"str\",\n",
" description=\"Optional date (YYYY-MM-DD)\",\n",
" required=False\n",
" )\n",
" }\n",
" \n",
" async def run_impl(self, location: str, date: Optional[str] = None) -> Dict[str, Any]:\n",
" \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
" # Mock implementation\n",
" return {\n",
" \"temperature\": 72.5,\n",
" \"conditions\": \"partly cloudy\",\n",
" \"humidity\": 65.0\n",
" }\n",
"\n",
"async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
" \"\"\"Create an agent with weather tool capability.\"\"\"\n",
" agent_config = AgentConfig(\n",
" model=\"Llama3.1-8B-Instruct\",\n",
" instructions=\"\"\"\n",
" You are a weather assistant that can provide weather information.\n",
" Always specify the location clearly in your responses.\n",
" Include both temperature and conditions in your summaries.\n",
" \"\"\",\n",
" sampling_params={\n",
" \"strategy\": \"greedy\",\n",
" \"temperature\": 1.0,\n",
" \"top_p\": 0.9,\n",
" },\n",
" tools=[\n",
" {\n",
" \"function_name\": \"get_weather\",\n",
" \"description\": \"Get weather information for a location\",\n",
" \"parameters\": {\n",
" \"location\": {\n",
" \"param_type\": \"str\",\n",
" \"description\": \"City or location name\",\n",
" \"required\": True,\n",
" },\n",
" \"date\": {\n",
" \"param_type\": \"str\",\n",
" \"description\": \"Optional date (YYYY-MM-DD)\",\n",
" \"required\": False,\n",
" },\n",
" },\n",
" \"type\": \"function_call\",\n",
" }\n",
" ],\n",
" tool_choice=\"auto\",\n",
" tool_prompt_format=\"json\",\n",
" input_shields=[],\n",
" output_shields=[],\n",
" enable_session_persistence=True\n",
" )\n",
" \n",
" # Create the agent with the tool\n",
" weather_tool = WeatherTool()\n",
" agent = Agent(\n",
" client=client,\n",
" agent_config=agent_config,\n",
" custom_tools=[weather_tool]\n",
" )\n",
" \n",
" return agent\n",
"\n",
"# Example usage\n",
"async def weather_example():\n",
" client = LlamaStackClient(base_url=\"http://localhost:5001\")\n",
" agent = await create_weather_agent(client)\n",
" session_id = agent.create_session(\"weather-session\")\n",
" \n",
" queries = [\n",
" \"What's the weather like in San Francisco?\",\n",
" \"Tell me the weather in Tokyo tomorrow\",\n",
" ]\n",
" \n",
" for query in queries:\n",
" print(f\"\\nQuery: {query}\")\n",
" print(\"-\" * 50)\n",
" \n",
" response = agent.create_turn(\n",
" messages=[{\"role\": \"user\", \"content\": query}],\n",
" session_id=session_id,\n",
" )\n",
" \n",
" async for log in EventLogger().log(response):\n",
" log.print()\n",
"\n",
"# For Jupyter notebooks\n",
"import nest_asyncio\n",
"nest_asyncio.apply()\n",
"\n",
"# Run the example\n",
"await weather_example()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -0,0 +1,558 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with LlamaStack: Tool Calling Tutorial\n",
"\n",
"Welcome! This notebook will guide you through creating and using custom tools with LlamaStack.\n",
"We'll start with the basics and work our way up to more complex examples.\n",
"\n",
"Table of Contents:\n",
"1. Setup and Installation\n",
"2. Understanding Tool Basics\n",
"3. Creating Your First Tool\n",
"4. Building a Mock Weather Tool\n",
"5. Setting Up the LlamaStack Agent\n",
"6. Running Examples\n",
"7. Next Steps\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Setup\n",
"#### Before we begin, let's import all the required packages:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import asyncio\n",
"import json\n",
"from typing import Dict\n",
"from datetime import datetime"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# LlamaStack specific imports\n",
"from llama_stack_client import LlamaStackClient\n",
"from llama_stack_client.lib.agents.agent import Agent\n",
"from llama_stack_client.lib.agents.event_logger import EventLogger\n",
"from llama_stack_client.types.agent_create_params import AgentConfig\n",
"from llama_stack_client.types.tool_param_definition_param import ToolParamDefinitionParam"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Understanding Tool Basics\n",
"\n",
"In LlamaStack, a tool is like a special function that our AI assistant can use. Think of it as giving the AI a new \n",
"capability, like using a calculator or checking the weather.\n",
"\n",
"Every tool needs:\n",
"- A name: What we call the tool\n",
"- A description: What the tool does\n",
"- Parameters: What information the tool needs to work\n",
"- Implementation: The actual code that does the work\n",
"\n",
"Let's create a base class that all our tools will inherit from:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"class SingleMessageCustomTool:\n",
" \"\"\"Base class for all our custom tools\"\"\"\n",
" \n",
" async def run(self, messages=None):\n",
" \"\"\"\n",
" Main entry point for running the tool\n",
" Args:\n",
" messages: List of messages (can be None for backward compatibility)\n",
" \"\"\"\n",
" if messages and len(messages) > 0:\n",
" # Extract parameters from the message if it contains function parameters\n",
" message = messages[0]\n",
" if hasattr(message, 'function_parameters'):\n",
" return await self.run_impl(**message.function_parameters)\n",
" else:\n",
" return await self.run_impl()\n",
" return await self.run_impl()\n",
" \n",
" async def run_impl(self, **kwargs):\n",
" \"\"\"Each tool will implement this method with their specific logic\"\"\"\n",
" raise NotImplementedError()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Creating Your First Tool: Calculator\n",
" \n",
"Let's create a simple calculator tool. This will help us understand the basic structure of a tool.\n",
"Our calculator can:\n",
"- Add\n",
"- Subtract\n",
"- Multiply\n",
"- Divide\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# Calculator Tool implementation\n",
"class CalculatorTool(SingleMessageCustomTool):\n",
" \"\"\"A simple calculator tool that can perform basic math operations\"\"\"\n",
" \n",
" def get_name(self) -> str:\n",
" return \"calculator\"\n",
" \n",
" def get_description(self) -> str:\n",
" return \"Perform basic arithmetic operations (add, subtract, multiply, divide)\"\n",
" \n",
" def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
" return {\n",
" \"operation\": ToolParamDefinitionParam(\n",
" param_type=\"str\",\n",
" description=\"Operation to perform (add, subtract, multiply, divide)\",\n",
" required=True\n",
" ),\n",
" \"x\": ToolParamDefinitionParam(\n",
" param_type=\"float\",\n",
" description=\"First number\",\n",
" required=True\n",
" ),\n",
" \"y\": ToolParamDefinitionParam(\n",
" param_type=\"float\",\n",
" description=\"Second number\",\n",
" required=True\n",
" )\n",
" }\n",
" \n",
" async def run_impl(self, operation: str = None, x: float = None, y: float = None):\n",
" \"\"\"The actual implementation of our calculator\"\"\"\n",
" if not all([operation, x, y]):\n",
" return json.dumps({\"error\": \"Missing required parameters\"})\n",
" \n",
" # Dictionary of math operations\n",
" operations = {\n",
" \"add\": lambda a, b: a + b,\n",
" \"subtract\": lambda a, b: a - b,\n",
" \"multiply\": lambda a, b: a * b,\n",
" \"divide\": lambda a, b: a / b if b != 0 else \"Error: Division by zero\"\n",
" }\n",
" \n",
" # Check if the operation is valid\n",
" if operation not in operations:\n",
" return json.dumps({\"error\": f\"Unknown operation '{operation}'\"})\n",
" \n",
" try:\n",
" # Convert string inputs to float if needed\n",
" x = float(x) if isinstance(x, str) else x\n",
" y = float(y) if isinstance(y, str) else y\n",
" \n",
" # Perform the calculation\n",
" result = operations[operation](x, y)\n",
" return json.dumps({\"result\": result})\n",
" except ValueError:\n",
" return json.dumps({\"error\": \"Invalid number format\"})\n",
" except Exception as e:\n",
" return json.dumps({\"error\": str(e)})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Building a Mock Weather Tool\n",
" \n",
"Now let's create something a bit more complex: a weather tool! \n",
"While this is just a mock version (it doesn't actually fetch real weather data),\n",
"it shows how you might structure a tool that interfaces with an external API."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"class WeatherTool(SingleMessageCustomTool):\n",
" \"async def run_single_query(agent, session_id, query: str):\n",
" \"\"\"Run a single query through our agent with complete interaction cycle\"\"\"\n",
" print(\"\\n\" + \"=\"*50)\n",
" print(f\"🤔 User asks: {query}\")\n",
" print(\"=\"*50)\n",
" \n",
" # Get the initial response and tool call\n",
" response = agent.create_turn(\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": query,\n",
" }\n",
" ],\n",
" session_id=session_id,\n",
" )\n",
" \n",
" # Process all events including tool calls and final response\n",
" async for event in EventLogger().log(response):\n",
" event.print()\n",
" \n",
" # If this was a tool call, we need to create another turn with the result\n",
" if hasattr(event, 'tool_calls') and event.tool_calls:\n",
" tool_call = event.tool_calls[0] # Get the first tool call\n",
" \n",
" # Execute the custom tool\n",
" if tool_call.tool_name in [t.get_name() for t in agent.custom_tools]:\n",
" tool = [t for t in agent.custom_tools if t.get_name() == tool_call.tool_name][0]\n",
" result = await tool.run_impl(**tool_call.arguments)\n",
" \n",
" # Create a follow-up turn with the tool result\n",
" follow_up = agent.create_turn(\n",
" messages=[\n",
" {\n",
" \"role\": \"tool\",\n",
" \"content\": result,\n",
" \"tool_call_id\": tool_call.call_id,\n",
" \"name\": tool_call.tool_name\n",
" }\n",
" ],\n",
" session_id=session_id,\n",
" )\n",
" \n",
" # Process the follow-up response\n",
" async for follow_up_event in EventLogger().log(follow_up):\n",
" follow_up_event.print()\"\"A mock weather tool that simulates getting weather data\"\"\"\n",
" \n",
" def get_name(self) -> str:\n",
" return \"get_weather\"\n",
" \n",
" def get_description(self) -> str:\n",
" return \"Get current weather information for major cities\"\n",
" \n",
" def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
" return {\n",
" \"city\": ToolParamDefinitionParam(\n",
" param_type=\"str\",\n",
" description=\"Name of the city (e.g., New York, London, Tokyo)\",\n",
" required=True\n",
" ),\n",
" \"date\": ToolParamDefinitionParam(\n",
" param_type=\"str\",\n",
" description=\"Date in YYYY-MM-DD format (optional)\",\n",
" required=False\n",
" )\n",
" }\n",
" \n",
" async def run_impl(self, city: str = None, date: str = None):\n",
" if not city:\n",
" return json.dumps({\"error\": \"City parameter is required\"})\n",
" \n",
" # Mock database of weather information\n",
" weather_data = {\n",
" \"New York\": {\"temp\": 20, \"condition\": \"sunny\"},\n",
" \"London\": {\"temp\": 15, \"condition\": \"rainy\"},\n",
" \"Tokyo\": {\"temp\": 25, \"condition\": \"cloudy\"}\n",
" }\n",
" \n",
" try:\n",
" # Check if we have data for the requested city\n",
" if city not in weather_data:\n",
" return json.dumps({\n",
" \"error\": f\"Sorry! No data available for {city}\",\n",
" \"available_cities\": list(weather_data.keys())\n",
" })\n",
" \n",
" # Return the weather information\n",
" return json.dumps({\n",
" \"city\": city,\n",
" \"date\": date or datetime.now().strftime(\"%Y-%m-%d\"),\n",
" \"data\": weather_data[city]\n",
" })\n",
" except Exception as e:\n",
" return json.dumps({\"error\": str(e)})"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"# ## 5. Setting Up the LlamaStack Agent\n",
"# \n",
"# Now that we have our tools, we need to create an agent that can use them.\n",
"# The agent is like a smart assistant that knows how to use our tools when needed."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"async def setup_agent(host: str = \"localhost\", port: int = 5001):\n",
" \"\"\"Creates and configures our LlamaStack agent\"\"\"\n",
" \n",
" # Create a client to connect to the LlamaStack server\n",
" client = LlamaStackClient(\n",
" base_url=f\"http://{host}:{port}\",\n",
" )\n",
" \n",
" # Configure how we want our agent to behave\n",
" agent_config = AgentConfig(\n",
" model=\"Llama3.1-8B-Instruct\",\n",
" instructions=\"\"\"You are a helpful assistant that can:\n",
" 1. Perform mathematical calculations\n",
" 2. Check weather information\n",
" Always explain your thinking before using a tool.\"\"\",\n",
" \n",
" sampling_params={\n",
" \"strategy\": \"greedy\",\n",
" \"temperature\": 1.0,\n",
" \"top_p\": 0.9,\n",
" },\n",
" \n",
" # List of tools available to the agent\n",
" tools=[\n",
" {\n",
" \"function_name\": \"calculator\",\n",
" \"description\": \"Perform basic arithmetic operations\",\n",
" \"parameters\": {\n",
" \"operation\": {\n",
" \"param_type\": \"str\",\n",
" \"description\": \"Operation to perform (add, subtract, multiply, divide)\",\n",
" \"required\": True,\n",
" },\n",
" \"x\": {\n",
" \"param_type\": \"float\",\n",
" \"description\": \"First number\",\n",
" \"required\": True,\n",
" },\n",
" \"y\": {\n",
" \"param_type\": \"float\",\n",
" \"description\": \"Second number\",\n",
" \"required\": True,\n",
" },\n",
" },\n",
" \"type\": \"function_call\",\n",
" },\n",
" {\n",
" \"function_name\": \"get_weather\",\n",
" \"description\": \"Get weather information for a given city\",\n",
" \"parameters\": {\n",
" \"city\": {\n",
" \"param_type\": \"str\",\n",
" \"description\": \"Name of the city\",\n",
" \"required\": True,\n",
" },\n",
" \"date\": {\n",
" \"param_type\": \"str\",\n",
" \"description\": \"Date in YYYY-MM-DD format\",\n",
" \"required\": False,\n",
" },\n",
" },\n",
" \"type\": \"function_call\",\n",
" },\n",
" ],\n",
" tool_choice=\"auto\",\n",
" # Using standard JSON format for tools\n",
" tool_prompt_format=\"json\", \n",
" input_shields=[],\n",
" output_shields=[],\n",
" enable_session_persistence=False,\n",
" )\n",
" \n",
" # Create our tools\n",
" custom_tools = [CalculatorTool(), WeatherTool()]\n",
" \n",
" # Create the agent\n",
" agent = Agent(client, agent_config, custom_tools)\n",
" session_id = agent.create_session(\"tutorial-session\")\n",
" print(f\"🎉 Created session_id={session_id} for Agent({agent.agent_id})\")\n",
" \n",
" return agent, session_id"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"# ## 6. Running Examples\n",
"# \n",
"# Let's try out our agent with some example questions!\n",
"\n",
"# %%"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"import nest_asyncio\n",
"nest_asyncio.apply() # This allows async operations to work in Jupyter\n",
"\n",
"# %%\n",
"# Initialize the agent\n",
"async def init_agent():\n",
" \"\"\"Initialize our agent - run this first!\"\"\"\n",
" agent, session_id = await setup_agent()\n",
" print(f\"✨ Agent initialized with session {session_id}\")\n",
" return agent, session_id\n",
"\n",
"# %%\n",
"# Function to run a single query\n",
"async def run_single_query(agent, session_id, query: str):\n",
" \"\"\"Run a single query through our agent\"\"\"\n",
" print(\"\\n\" + \"=\"*50)\n",
" print(f\"🤔 User asks: {query}\")\n",
" print(\"=\"*50)\n",
" \n",
" response = agent.create_turn(\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": query,\n",
" }\n",
" ],\n",
" session_id=session_id,\n",
" )\n",
" \n",
" async for log in EventLogger().log(response):\n",
" log.print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's run everything and see it in action!\n",
"\n",
"Create and run our agent"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"🎉 Created session_id=fbe83bb6-bdfd-497c-b920-d7307482d8ba for Agent(3997eeda-4ffd-4b05-9026-28b4da206a11)\n",
"✨ Agent initialized with session fbe83bb6-bdfd-497c-b920-d7307482d8ba\n"
]
}
],
"source": [
"agent, session_id = await init_agent()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"==================================================\n",
"🤔 User asks: What's 25 plus 17?\n",
"==================================================\n",
"\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[36m\u001b[0m\u001b[36m{\"\u001b[0m\u001b[36mtype\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mfunction\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mname\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mcalculator\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mparameters\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m {\"\u001b[0m\u001b[36moperation\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36madd\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36my\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36m17\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mx\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36m25\u001b[0m\u001b[36m\"}}\u001b[0m\u001b[97m\u001b[0m\n"
]
}
],
"source": [
"await run_single_query(agent, session_id, \"What's 25 plus 17?\")"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"==================================================\n",
"🤔 User asks: What's the weather like in Tokyo?\n",
"==================================================\n",
"\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[36m\u001b[0m\u001b[36m{\"\u001b[0m\u001b[36mtype\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mfunction\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mname\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mget\u001b[0m\u001b[36m_weather\u001b[0m\u001b[36m\",\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mparameters\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m {\"\u001b[0m\u001b[36mcity\u001b[0m\u001b[36m\":\u001b[0m\u001b[36m \"\u001b[0m\u001b[36mTok\u001b[0m\u001b[36myo\u001b[0m\u001b[36m\"}}\u001b[0m\u001b[97m\u001b[0m\n"
]
}
],
"source": [
"await run_single_query(agent, session_id, \"What's the weather like in Tokyo?\")"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"#fin"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -1,7 +1,7 @@
# Llama Stack Text Generation Guide
# Llama Stack Inference Guide
This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).
This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).
### Table of Contents
1. [Quickstart](#quickstart)

View file

@ -157,16 +157,15 @@ With these steps, you should have a functional Llama Stack setup capable of gene
## Next Steps
- **Explore Other Guides**: Dive deeper into specific topics by following these guides:
- [Understanding Distributions](#)
- [Configure your Distro](#)
- [Doing Inference API Call and Fetching a Response from Endpoints](#)
- [Creating a Conversation Loop](#)
- [Sending Image to the Model](#)
- [Tool Calling: How to and Details](#)
- [Memory API: Show Simple In-Memory Retrieval](#)
- [Agents API: Explain Components](#)
- [Using Safety API in Conversation](#)
- [Prompt Engineering Guide](#)
- [Inference 101](00_Inference101.ipynb)
- [Simple switch between local and cloud model](00_Local_Cloud_Inference101.ipynb)
- [Prompt Engineering](01_Prompt_Engineering101.ipynb)
- [Chat with Image - LlamaStack Vision API](02_Image_Chat101.ipynb)
- [Tool Calling: How to and Details](03_Tool_Calling101.ipynb)
- [Memory API: Show Simple In-Memory Retrieval](04_Memory101.ipynb)
- [Using Safety API in Conversation](05_Safety101.ipynb)
- [Agents API: Explain Components](06_Agents101.ipynb)
- **Explore Client SDKs**: Utilize our client SDKs for various languages to integrate Llama Stack into your applications:
- [Python SDK](https://github.com/meta-llama/llama-stack-client-python)
@ -180,5 +179,3 @@ With these steps, you should have a functional Llama Stack setup capable of gene
---