forked from phoenix-oss/llama-stack-mirror
# What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com>
362 lines
14 KiB
Text
362 lines
14 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7a1ac883",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Tool Calling\n",
|
|
"\n",
|
|
"\n",
|
|
"## Creating a Custom Tool and Agent Tool Calling\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d3d3ec91",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 1: Import Necessary Packages and Api Keys"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "2fbe7011",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import asyncio\n",
|
|
"import json\n",
|
|
"import os\n",
|
|
"from typing import Dict, List\n",
|
|
"\n",
|
|
"import nest_asyncio\n",
|
|
"import requests\n",
|
|
"from dotenv import load_dotenv\n",
|
|
"from llama_stack_client import LlamaStackClient\n",
|
|
"from llama_stack_client.lib.agents.agent import Agent\n",
|
|
"from llama_stack_client.lib.agents.custom_tool import CustomTool\n",
|
|
"from llama_stack_client.lib.agents.event_logger import EventLogger\n",
|
|
"from llama_stack_client.types import CompletionMessage\n",
|
|
"from llama_stack_client.types.agent_create_params import AgentConfig\n",
|
|
"from llama_stack_client.types.shared.tool_response_message import ToolResponseMessage\n",
|
|
"\n",
|
|
"# Allow asyncio to run in Jupyter Notebook\n",
|
|
"nest_asyncio.apply()\n",
|
|
"\n",
|
|
"HOST = \"localhost\"\n",
|
|
"PORT = 5001\n",
|
|
"MODEL_NAME = \"meta-llama/Llama-3.2-3B-Instruct\"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ac6042d8",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create a `.env` file and add you brave api key\n",
|
|
"\n",
|
|
"`BRAVE_SEARCH_API_KEY = \"YOUR_BRAVE_API_KEY_HERE\"`\n",
|
|
"\n",
|
|
"Now load the `.env` file into your jupyter notebook."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "b4b3300c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"load_dotenv()\n",
|
|
"BRAVE_SEARCH_API_KEY = os.environ[\"BRAVE_SEARCH_API_KEY\"]\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c838bb40",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 2: Create a class for the Brave Search API integration\n",
|
|
"\n",
|
|
"Let's create the `BraveSearch` class, which encapsulates the logic for making web search queries using the Brave Search API and formatting the response. The class includes methods for sending requests, processing results, and extracting relevant data to support the integration with an AI toolchain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "62271ed2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"class BraveSearch:\n",
|
|
" def __init__(self, api_key: str) -> None:\n",
|
|
" self.api_key = api_key\n",
|
|
"\n",
|
|
" async def search(self, query: str) -> str:\n",
|
|
" url = \"https://api.search.brave.com/res/v1/web/search\"\n",
|
|
" headers = {\n",
|
|
" \"X-Subscription-Token\": self.api_key,\n",
|
|
" \"Accept-Encoding\": \"gzip\",\n",
|
|
" \"Accept\": \"application/json\",\n",
|
|
" }\n",
|
|
" payload = {\"q\": query}\n",
|
|
" response = requests.get(url=url, params=payload, headers=headers)\n",
|
|
" return json.dumps(self._clean_brave_response(response.json()))\n",
|
|
"\n",
|
|
" def _clean_brave_response(self, search_response, top_k=3):\n",
|
|
" query = search_response.get(\"query\", {}).get(\"original\", None)\n",
|
|
" clean_response = []\n",
|
|
" mixed_results = search_response.get(\"mixed\", {}).get(\"main\", [])[:top_k]\n",
|
|
"\n",
|
|
" for m in mixed_results:\n",
|
|
" r_type = m[\"type\"]\n",
|
|
" results = search_response.get(r_type, {}).get(\"results\", [])\n",
|
|
" if r_type == \"web\" and results:\n",
|
|
" idx = m[\"index\"]\n",
|
|
" selected_keys = [\"title\", \"url\", \"description\"]\n",
|
|
" cleaned = {k: v for k, v in results[idx].items() if k in selected_keys}\n",
|
|
" clean_response.append(cleaned)\n",
|
|
"\n",
|
|
" return {\"query\": query, \"top_k\": clean_response}\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d987d48f",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 3: Create a Custom Tool Class\n",
|
|
"\n",
|
|
"Here, we defines the `WebSearchTool` class, which extends `CustomTool` to integrate the Brave Search API with Llama Stack, enabling web search capabilities within AI workflows. The class handles incoming user queries, interacts with the `BraveSearch` class for data retrieval, and formats results for effective response generation."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "92e75cf8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"class WebSearchTool(CustomTool):\n",
|
|
" def __init__(self, api_key: str):\n",
|
|
" self.api_key = api_key\n",
|
|
" self.engine = BraveSearch(api_key)\n",
|
|
"\n",
|
|
" def get_name(self) -> str:\n",
|
|
" return \"web_search\"\n",
|
|
"\n",
|
|
" def get_description(self) -> str:\n",
|
|
" return \"Search the web for a given query\"\n",
|
|
"\n",
|
|
" async def run_impl(self, query: str):\n",
|
|
" return await self.engine.search(query)\n",
|
|
"\n",
|
|
" async def run(self, messages):\n",
|
|
" query = None\n",
|
|
" for message in messages:\n",
|
|
" if isinstance(message, CompletionMessage) and message.tool_calls:\n",
|
|
" for tool_call in message.tool_calls:\n",
|
|
" if \"query\" in tool_call.arguments:\n",
|
|
" query = tool_call.arguments[\"query\"]\n",
|
|
" call_id = tool_call.call_id\n",
|
|
"\n",
|
|
" if query:\n",
|
|
" search_result = await self.run_impl(query)\n",
|
|
" return [\n",
|
|
" ToolResponseMessage(\n",
|
|
" call_id=call_id,\n",
|
|
" role=\"ipython\",\n",
|
|
" content=self._format_response_for_agent(search_result),\n",
|
|
" tool_name=\"brave_search\",\n",
|
|
" )\n",
|
|
" ]\n",
|
|
"\n",
|
|
" return [\n",
|
|
" ToolResponseMessage(\n",
|
|
" call_id=\"no_call_id\",\n",
|
|
" role=\"ipython\",\n",
|
|
" content=\"No query provided.\",\n",
|
|
" tool_name=\"brave_search\",\n",
|
|
" )\n",
|
|
" ]\n",
|
|
"\n",
|
|
" def _format_response_for_agent(self, search_result):\n",
|
|
" parsed_result = json.loads(search_result)\n",
|
|
" formatted_result = \"Search Results with Citations:\\n\\n\"\n",
|
|
" for i, result in enumerate(parsed_result.get(\"top_k\", []), start=1):\n",
|
|
" formatted_result += (\n",
|
|
" f\"{i}. {result.get('title', 'No Title')}\\n\"\n",
|
|
" f\" URL: {result.get('url', 'No URL')}\\n\"\n",
|
|
" f\" Description: {result.get('description', 'No Description')}\\n\\n\"\n",
|
|
" )\n",
|
|
" return formatted_result\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f282a9bd",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 4: Create a function to execute a search query and print the results\n",
|
|
"\n",
|
|
"Now let's create the `execute_search` function, which initializes the `WebSearchTool`, runs a query asynchronously, and prints the formatted search results for easy viewing."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "aaf5664f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"async def execute_search(query: str):\n",
|
|
" web_search_tool = WebSearchTool(api_key=BRAVE_SEARCH_API_KEY)\n",
|
|
" result = await web_search_tool.run_impl(query)\n",
|
|
" print(\"Search Results:\", result)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7cc3a039",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 5: Run the search with an example query"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "5f22c4e2",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Search Results: {\"query\": \"Latest developments in quantum computing\", \"top_k\": [{\"title\": \"Quantum Computing | Latest News, Photos & Videos | WIRED\", \"url\": \"https://www.wired.com/tag/quantum-computing/\", \"description\": \"Find the <strong>latest</strong> <strong>Quantum</strong> <strong>Computing</strong> news from WIRED. See related science and technology articles, photos, slideshows and videos.\"}, {\"title\": \"Quantum Computing News -- ScienceDaily\", \"url\": \"https://www.sciencedaily.com/news/matter_energy/quantum_computing/\", \"description\": \"<strong>Quantum</strong> <strong>Computing</strong> News. Read the <strong>latest</strong> about the <strong>development</strong> <strong>of</strong> <strong>quantum</strong> <strong>computers</strong>.\"}]}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"query = \"Latest developments in quantum computing\"\n",
|
|
"asyncio.run(execute_search(query))\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ea58f265-dfd7-4935-ae5e-6f3a6d74d805",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 6: Run the search tool using an agent\n",
|
|
"\n",
|
|
"Here, we setup and execute the `WebSearchTool` within an agent configuration in Llama Stack to handle user queries and generate responses. This involves initializing the client, configuring the agent with tool capabilities, and processing user prompts asynchronously to display results."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "9e704b01-f410-492f-8baf-992589b82803",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Created session_id=34d2978d-e299-4a2a-9219-4ffe2fb124a2 for Agent(8a68f2c3-2b2a-4f67-a355-c6d5b2451d6a)\n",
|
|
"\u001b[30m\u001b[0m\u001b[33minference> \u001b[0m\u001b[33m[\u001b[0m\u001b[33mweb\u001b[0m\u001b[33m_search\u001b[0m\u001b[33m(query\u001b[0m\u001b[33m=\"\u001b[0m\u001b[33mlatest\u001b[0m\u001b[33m developments\u001b[0m\u001b[33m in\u001b[0m\u001b[33m quantum\u001b[0m\u001b[33m computing\u001b[0m\u001b[33m\")]\u001b[0m\u001b[97m\u001b[0m\n",
|
|
"\u001b[32mCustomTool> Search Results with Citations:\n",
|
|
"\n",
|
|
"1. Quantum Computing | Latest News, Photos & Videos | WIRED\n",
|
|
" URL: https://www.wired.com/tag/quantum-computing/\n",
|
|
" Description: Find the <strong>latest</strong> <strong>Quantum</strong> <strong>Computing</strong> news from WIRED. See related science and technology articles, photos, slideshows and videos.\n",
|
|
"\n",
|
|
"2. Quantum Computing News -- ScienceDaily\n",
|
|
" URL: https://www.sciencedaily.com/news/matter_energy/quantum_computing/\n",
|
|
" Description: <strong>Quantum</strong> <strong>Computing</strong> News. Read the <strong>latest</strong> about the <strong>development</strong> <strong>of</strong> <strong>quantum</strong> <strong>computers</strong>.\n",
|
|
"\n",
|
|
"\u001b[0m\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"async def run_main(disable_safety: bool = False):\n",
|
|
" # Initialize the Llama Stack client with the specified base URL\n",
|
|
" client = LlamaStackClient(\n",
|
|
" base_url=f\"http://{HOST}:{PORT}\",\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Configure input and output shields for safety (use \"llama_guard\" by default)\n",
|
|
" input_shields = [] if disable_safety else [\"llama_guard\"]\n",
|
|
" output_shields = [] if disable_safety else [\"llama_guard\"]\n",
|
|
"\n",
|
|
" # Initialize custom tool (ensure `WebSearchTool` is defined earlier in the notebook)\n",
|
|
" webSearchTool = WebSearchTool(api_key=BRAVE_SEARCH_API_KEY)\n",
|
|
"\n",
|
|
" # Define the agent configuration, including the model and tool setup\n",
|
|
" agent_config = AgentConfig(\n",
|
|
" model=MODEL_NAME,\n",
|
|
" instructions=\"\"\"You are a helpful assistant that responds to user queries with relevant information and cites sources when available.\"\"\",\n",
|
|
" sampling_params={\n",
|
|
" \"strategy\": {\n",
|
|
" \"type\": \"greedy\",\n",
|
|
" },\n",
|
|
" },\n",
|
|
" tools=[webSearchTool.get_tool_definition()],\n",
|
|
" tool_choice=\"auto\",\n",
|
|
" tool_prompt_format=\"python_list\",\n",
|
|
" input_shields=input_shields,\n",
|
|
" output_shields=output_shields,\n",
|
|
" enable_session_persistence=False,\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Create an agent instance with the client and configuration\n",
|
|
" agent = Agent(client, agent_config, [webSearchTool])\n",
|
|
"\n",
|
|
" # Create a session for interaction and print the session ID\n",
|
|
" session_id = agent.create_session(\"test-session\")\n",
|
|
" print(f\"Created session_id={session_id} for Agent({agent.agent_id})\")\n",
|
|
"\n",
|
|
" response = agent.create_turn(\n",
|
|
" messages=[\n",
|
|
" {\n",
|
|
" \"role\": \"user\",\n",
|
|
" \"content\": \"\"\"What are the latest developments in quantum computing?\"\"\",\n",
|
|
" }\n",
|
|
" ],\n",
|
|
" session_id=session_id, # Use the created session ID\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Log and print the response from the agent asynchronously\n",
|
|
" async for log in EventLogger().log(response):\n",
|
|
" log.print()\n",
|
|
"\n",
|
|
"\n",
|
|
"# Run the function asynchronously in a Jupyter Notebook cell\n",
|
|
"await run_main(disable_safety=True)\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.15"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|