forked from phoenix-oss/llama-stack-mirror
added quickstart w ollama and toolcalling using together (#413)
* added quickstart w ollama and toolcalling using together * corrected url for colab --------- Co-authored-by: Justin Lee <justinai@fb.com>
This commit is contained in:
parent
b0b9c905b3
commit
6d38b1690b
2 changed files with 554 additions and 57 deletions
|
@ -0,0 +1,483 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "LLZwsT_J6OnZ"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "ME7IXK4M6Ona"
|
||||
},
|
||||
"source": [
|
||||
"If you'd prefer not to set up a local server, explore this on tool calling with the Together API. This guide will show you how to leverage Together.ai's Llama Stack Server API, allowing you to get started with Llama Stack without the need for a locally built and running server.\n",
|
||||
"\n",
|
||||
"## Tool Calling w Together API\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "rWl1f1Hc6Onb"
|
||||
},
|
||||
"source": [
|
||||
"In this section, we'll explore how to enhance your applications with tool calling capabilities. We'll cover:\n",
|
||||
"1. Setting up and using the Brave Search API\n",
|
||||
"2. Creating custom tools\n",
|
||||
"3. Configuring tool prompts and safety settings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "sRkJcA_O77hP",
|
||||
"outputId": "49d33c5c-3300-4dc0-89a6-ff80bfc0bbdf"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Collecting llama-stack-client\n",
|
||||
" Downloading llama_stack_client-0.0.50-py3-none-any.whl.metadata (13 kB)\n",
|
||||
"Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (3.7.1)\n",
|
||||
"Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (1.9.0)\n",
|
||||
"Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (0.27.2)\n",
|
||||
"Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (2.9.2)\n",
|
||||
"Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (1.3.1)\n",
|
||||
"Requirement already satisfied: tabulate>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (0.9.0)\n",
|
||||
"Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from llama-stack-client) (4.12.2)\n",
|
||||
"Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->llama-stack-client) (3.10)\n",
|
||||
"Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->llama-stack-client) (1.2.2)\n",
|
||||
"Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->llama-stack-client) (2024.8.30)\n",
|
||||
"Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->llama-stack-client) (1.0.6)\n",
|
||||
"Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama-stack-client) (0.14.0)\n",
|
||||
"Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->llama-stack-client) (0.7.0)\n",
|
||||
"Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->llama-stack-client) (2.23.4)\n",
|
||||
"Downloading llama_stack_client-0.0.50-py3-none-any.whl (282 kB)\n",
|
||||
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m283.0/283.0 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
||||
"\u001b[?25hInstalling collected packages: llama-stack-client\n",
|
||||
"Successfully installed llama-stack-client-0.0.50\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!pip install llama-stack-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "T_EW_jV81ldl"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"LLAMA_STACK_API_TOGETHER_URL=\"https://llama-stack.together.ai\"\n",
|
||||
"LLAMA31_8B_INSTRUCT = \"Llama3.1-8B-Instruct\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "n_QHq45B6Onb"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"import os\n",
|
||||
"from typing import Dict, List, Optional\n",
|
||||
"\n",
|
||||
"from llama_stack_client import LlamaStackClient\n",
|
||||
"from llama_stack_client.lib.agents.agent import Agent\n",
|
||||
"from llama_stack_client.lib.agents.event_logger import EventLogger\n",
|
||||
"from llama_stack_client.types.agent_create_params import (\n",
|
||||
" AgentConfig,\n",
|
||||
" AgentConfigToolSearchToolDefinition,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Helper function to create an agent with tools\n",
|
||||
"async def create_tool_agent(\n",
|
||||
" client: LlamaStackClient,\n",
|
||||
" tools: List[Dict],\n",
|
||||
" instructions: str = \"You are a helpful assistant\",\n",
|
||||
" model: str = LLAMA31_8B_INSTRUCT\n",
|
||||
") -> Agent:\n",
|
||||
" \"\"\"Create an agent with specified tools.\"\"\"\n",
|
||||
" print(\"Using the following model: \", model)\n",
|
||||
" agent_config = AgentConfig(\n",
|
||||
" model=model,\n",
|
||||
" instructions=instructions,\n",
|
||||
" sampling_params={\n",
|
||||
" \"strategy\": \"greedy\",\n",
|
||||
" \"temperature\": 1.0,\n",
|
||||
" \"top_p\": 0.9,\n",
|
||||
" },\n",
|
||||
" tools=tools,\n",
|
||||
" tool_choice=\"auto\",\n",
|
||||
" tool_prompt_format=\"json\",\n",
|
||||
" enable_session_persistence=True,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" return Agent(client, agent_config)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "iMVYso6_xoDV"
|
||||
},
|
||||
"source": [
|
||||
"Quickly and easily get a free Together.ai API key [here](https://api.together.ai) and replace \"YOUR_TOGETHER_API_KEY\" below with it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "3Bjr891C6Onc",
|
||||
"outputId": "85245ae4-fba4-4ddb-8775-11262ddb1c29"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Using the following model: Llama3.1-8B-Instruct\n",
|
||||
"\n",
|
||||
"Query: What are the latest developments in quantum computing?\n",
|
||||
"--------------------------------------------------\n",
|
||||
"inference> FINDINGS:\n",
|
||||
"The latest developments in quantum computing involve significant advancements in the field of quantum processors, error correction, and the development of practical applications. Some of the recent breakthroughs include:\n",
|
||||
"\n",
|
||||
"* Google's 53-qubit Sycamore processor, which achieved quantum supremacy in 2019 (Source: Google AI Blog, https://ai.googleblog.com/2019/10/experiment-advances-quantum-computing.html)\n",
|
||||
"* The development of a 100-qubit quantum processor by the Chinese company, Origin Quantum (Source: Physics World, https://physicsworld.com/a/origin-quantum-scales-up-to-100-qubits/)\n",
|
||||
"* IBM's 127-qubit Eagle processor, which has the potential to perform complex calculations that are currently unsolvable by classical computers (Source: IBM Research Blog, https://www.ibm.com/blogs/research/2020/11/ibm-advances-quantum-computing-research-with-new-127-qubit-processor/)\n",
|
||||
"* The development of topological quantum computers, which have the potential to solve complex problems in materials science and chemistry (Source: MIT Technology Review, https://www.technologyreview.com/2020/02/24/914776/topological-quantum-computers-are-a-game-changer-for-materials-science/)\n",
|
||||
"* The development of a new type of quantum error correction code, known as the \"surface code\", which has the potential to solve complex problems in quantum computing (Source: Nature Physics, https://www.nature.com/articles/s41567-021-01314-2)\n",
|
||||
"\n",
|
||||
"SOURCES:\n",
|
||||
"- Google AI Blog: https://ai.googleblog.com/2019/10/experiment-advances-quantum-computing.html\n",
|
||||
"- Physics World: https://physicsworld.com/a/origin-quantum-scales-up-to-100-qubits/\n",
|
||||
"- IBM Research Blog: https://www.ibm.com/blogs/research/2020/11/ibm-advances-quantum-computing-research-with-new-127-qubit-processor/\n",
|
||||
"- MIT Technology Review: https://www.technologyreview.com/2020/02/24/914776/topological-quantum-computers-are-a-game-changer-for-materials-science/\n",
|
||||
"- Nature Physics: https://www.nature.com/articles/s41567-021-01314-2\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# comment this if you don't have a BRAVE_SEARCH_API_KEY\n",
|
||||
"os.environ[\"BRAVE_SEARCH_API_KEY\"] = 'YOUR_BRAVE_SEARCH_API_KEY'\n",
|
||||
"\n",
|
||||
"async def create_search_agent(client: LlamaStackClient) -> Agent:\n",
|
||||
" \"\"\"Create an agent with Brave Search capability.\"\"\"\n",
|
||||
"\n",
|
||||
" # comment this if you don't have a BRAVE_SEARCH_API_KEY\n",
|
||||
" search_tool = AgentConfigToolSearchToolDefinition(\n",
|
||||
" type=\"brave_search\",\n",
|
||||
" engine=\"brave\",\n",
|
||||
" api_key=os.getenv(\"BRAVE_SEARCH_API_KEY\"),\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" return await create_tool_agent(\n",
|
||||
" client=client,\n",
|
||||
" tools=[search_tool], # set this to [] if you don't have a BRAVE_SEARCH_API_KEY\n",
|
||||
" model = LLAMA31_8B_INSTRUCT,\n",
|
||||
" instructions=\"\"\"\n",
|
||||
" You are a research assistant that can search the web.\n",
|
||||
" Always cite your sources with URLs when providing information.\n",
|
||||
" Format your responses as:\n",
|
||||
"\n",
|
||||
" FINDINGS:\n",
|
||||
" [Your summary here]\n",
|
||||
"\n",
|
||||
" SOURCES:\n",
|
||||
" - [Source title](URL)\n",
|
||||
" \"\"\"\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"# Example usage\n",
|
||||
"async def search_example():\n",
|
||||
" client = LlamaStackClient(base_url=LLAMA_STACK_API_TOGETHER_URL)\n",
|
||||
" agent = await create_search_agent(client)\n",
|
||||
"\n",
|
||||
" # Create a session\n",
|
||||
" session_id = agent.create_session(\"search-session\")\n",
|
||||
"\n",
|
||||
" # Example queries\n",
|
||||
" queries = [\n",
|
||||
" \"What are the latest developments in quantum computing?\",\n",
|
||||
" #\"Who won the most recent Super Bowl?\",\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
" for query in queries:\n",
|
||||
" print(f\"\\nQuery: {query}\")\n",
|
||||
" print(\"-\" * 50)\n",
|
||||
"\n",
|
||||
" response = agent.create_turn(\n",
|
||||
" messages=[{\"role\": \"user\", \"content\": query}],\n",
|
||||
" session_id=session_id,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" async for log in EventLogger().log(response):\n",
|
||||
" log.print()\n",
|
||||
"\n",
|
||||
"# Run the example (in Jupyter, use asyncio.run())\n",
|
||||
"await search_example()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "r3YN6ufb6Onc"
|
||||
},
|
||||
"source": [
|
||||
"## 3. Custom Tool Creation\n",
|
||||
"\n",
|
||||
"Let's create a custom weather tool:\n",
|
||||
"\n",
|
||||
"#### Key Highlights:\n",
|
||||
"- **`WeatherTool` Class**: A custom tool that processes weather information requests, supporting location and optional date parameters.\n",
|
||||
"- **Agent Creation**: The `create_weather_agent` function sets up an agent equipped with the `WeatherTool`, allowing for weather queries in natural language.\n",
|
||||
"- **Simulation of API Call**: The `run_impl` method simulates fetching weather data. This method can be replaced with an actual API integration for real-world usage.\n",
|
||||
"- **Interactive Example**: The `weather_example` function shows how to use the agent to handle user queries regarding the weather, providing step-by-step responses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "A0bOLYGj6Onc",
|
||||
"outputId": "023a8fb7-49ed-4ab4-e5b7-8050ded5d79a"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Query: What's the weather like in San Francisco?\n",
|
||||
"--------------------------------------------------\n",
|
||||
"inference> {\n",
|
||||
" \"function\": \"get_weather\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"location\": \"San Francisco\"\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"Query: Tell me the weather in Tokyo tomorrow\n",
|
||||
"--------------------------------------------------\n",
|
||||
"inference> {\n",
|
||||
" \"function\": \"get_weather\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"location\": \"Tokyo\",\n",
|
||||
" \"date\": \"tomorrow\"\n",
|
||||
" }\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import TypedDict, Optional, Dict, Any\n",
|
||||
"from datetime import datetime\n",
|
||||
"import json\n",
|
||||
"from llama_stack_client.types.tool_param_definition_param import ToolParamDefinitionParam\n",
|
||||
"from llama_stack_client.types import CompletionMessage,ToolResponseMessage\n",
|
||||
"from llama_stack_client.lib.agents.custom_tool import CustomTool\n",
|
||||
"\n",
|
||||
"class WeatherTool(CustomTool):\n",
|
||||
" \"\"\"Example custom tool for weather information.\"\"\"\n",
|
||||
"\n",
|
||||
" def get_name(self) -> str:\n",
|
||||
" return \"get_weather\"\n",
|
||||
"\n",
|
||||
" def get_description(self) -> str:\n",
|
||||
" return \"Get weather information for a location\"\n",
|
||||
"\n",
|
||||
" def get_params_definition(self) -> Dict[str, ToolParamDefinitionParam]:\n",
|
||||
" return {\n",
|
||||
" \"location\": ToolParamDefinitionParam(\n",
|
||||
" param_type=\"str\",\n",
|
||||
" description=\"City or location name\",\n",
|
||||
" required=True\n",
|
||||
" ),\n",
|
||||
" \"date\": ToolParamDefinitionParam(\n",
|
||||
" param_type=\"str\",\n",
|
||||
" description=\"Optional date (YYYY-MM-DD)\",\n",
|
||||
" required=False\n",
|
||||
" )\n",
|
||||
" }\n",
|
||||
" async def run(self, messages: List[CompletionMessage]) -> List[ToolResponseMessage]:\n",
|
||||
" assert len(messages) == 1, \"Expected single message\"\n",
|
||||
"\n",
|
||||
" message = messages[0]\n",
|
||||
"\n",
|
||||
" tool_call = message.tool_calls[0]\n",
|
||||
" # location = tool_call.arguments.get(\"location\", None)\n",
|
||||
" # date = tool_call.arguments.get(\"date\", None)\n",
|
||||
" try:\n",
|
||||
" response = await self.run_impl(**tool_call.arguments)\n",
|
||||
" response_str = json.dumps(response, ensure_ascii=False)\n",
|
||||
" except Exception as e:\n",
|
||||
" response_str = f\"Error when running tool: {e}\"\n",
|
||||
"\n",
|
||||
" message = ToolResponseMessage(\n",
|
||||
" call_id=tool_call.call_id,\n",
|
||||
" tool_name=tool_call.tool_name,\n",
|
||||
" content=response_str,\n",
|
||||
" role=\"ipython\",\n",
|
||||
" )\n",
|
||||
" return [message]\n",
|
||||
"\n",
|
||||
" async def run_impl(self, location: str, date: Optional[str] = None) -> Dict[str, Any]:\n",
|
||||
" \"\"\"Simulate getting weather data (replace with actual API call).\"\"\"\n",
|
||||
" # Mock implementation\n",
|
||||
" if date:\n",
|
||||
" return {\n",
|
||||
" \"temperature\": 90.1,\n",
|
||||
" \"conditions\": \"sunny\",\n",
|
||||
" \"humidity\": 40.0\n",
|
||||
" }\n",
|
||||
" return {\n",
|
||||
" \"temperature\": 72.5,\n",
|
||||
" \"conditions\": \"partly cloudy\",\n",
|
||||
" \"humidity\": 65.0\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def create_weather_agent(client: LlamaStackClient) -> Agent:\n",
|
||||
" \"\"\"Create an agent with weather tool capability.\"\"\"\n",
|
||||
"\n",
|
||||
" agent_config = AgentConfig(\n",
|
||||
" model=LLAMA31_8B_INSTRUCT,\n",
|
||||
" #model=model_name,\n",
|
||||
" instructions=\"\"\"\n",
|
||||
" You are a weather assistant that can provide weather information.\n",
|
||||
" Always specify the location clearly in your responses.\n",
|
||||
" Include both temperature and conditions in your summaries.\n",
|
||||
" \"\"\",\n",
|
||||
" sampling_params={\n",
|
||||
" \"strategy\": \"greedy\",\n",
|
||||
" \"temperature\": 1.0,\n",
|
||||
" \"top_p\": 0.9,\n",
|
||||
" },\n",
|
||||
" tools=[\n",
|
||||
" {\n",
|
||||
" \"function_name\": \"get_weather\",\n",
|
||||
" \"description\": \"Get weather information for a location\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"location\": {\n",
|
||||
" \"param_type\": \"str\",\n",
|
||||
" \"description\": \"City or location name\",\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
" \"date\": {\n",
|
||||
" \"param_type\": \"str\",\n",
|
||||
" \"description\": \"Optional date (YYYY-MM-DD)\",\n",
|
||||
" \"required\": False,\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"type\": \"function_call\",\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" tool_choice=\"auto\",\n",
|
||||
" tool_prompt_format=\"json\",\n",
|
||||
" input_shields=[],\n",
|
||||
" output_shields=[],\n",
|
||||
" enable_session_persistence=True\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Create the agent with the tool\n",
|
||||
" weather_tool = WeatherTool()\n",
|
||||
" agent = Agent(\n",
|
||||
" client=client,\n",
|
||||
" agent_config=agent_config,\n",
|
||||
" custom_tools=[weather_tool]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" return agent\n",
|
||||
"\n",
|
||||
"# Example usage\n",
|
||||
"async def weather_example():\n",
|
||||
" client = LlamaStackClient(base_url=LLAMA_STACK_API_TOGETHER_URL)\n",
|
||||
" agent = await create_weather_agent(client)\n",
|
||||
" session_id = agent.create_session(\"weather-session\")\n",
|
||||
"\n",
|
||||
" queries = [\n",
|
||||
" \"What's the weather like in San Francisco?\",\n",
|
||||
" \"Tell me the weather in Tokyo tomorrow\",\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
" for query in queries:\n",
|
||||
" print(f\"\\nQuery: {query}\")\n",
|
||||
" print(\"-\" * 50)\n",
|
||||
"\n",
|
||||
" response = agent.create_turn(\n",
|
||||
" messages=[{\"role\": \"user\", \"content\": query}],\n",
|
||||
" session_id=session_id,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" async for log in EventLogger().log(response):\n",
|
||||
" log.print()\n",
|
||||
"\n",
|
||||
"# For Jupyter notebooks\n",
|
||||
"import nest_asyncio\n",
|
||||
"nest_asyncio.apply()\n",
|
||||
"\n",
|
||||
"# Run the example\n",
|
||||
"await weather_example()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "yKhUkVNq6Onc"
|
||||
},
|
||||
"source": [
|
||||
"Thanks for checking out this tutorial, hopefully you can now automate everything with Llama! :D\n",
|
||||
"\n",
|
||||
"Next up, we learn another hot topic of LLMs: Memory and Rag. Continue learning [here](./04_Memory101.ipynb)!"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.15"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
|
@ -1,91 +1,103 @@
|
|||
# Llama Stack Quickstart Guide
|
||||
# Ollama Quickstart Guide
|
||||
|
||||
This guide will walk you through setting up an end-to-end workflow with Llama Stack, enabling you to perform text generation using the `Llama3.2-3B-Instruct` model. Follow these steps to get started quickly.
|
||||
This guide will walk you through setting up an end-to-end workflow with Llama Stack with ollama, enabling you to perform text generation using the `Llama3.2-1B-Instruct` model. Follow these steps to get started quickly.
|
||||
|
||||
If you're looking for more specific topics like tool calling or agent setup, we have a [Zero to Hero Guide](#next-steps) that covers everything from Tool Calling to Agents in detail. Feel free to skip to the end to explore the advanced topics you're interested in.
|
||||
|
||||
> If you'd prefer not to set up a local server, explore our notebook on [tool calling with the Together API](Tool_Calling101_Using_Together's_Llama_Stack_Server.ipynb). This guide will show you how to leverage Together.ai's Llama Stack Server API, allowing you to get started with Llama Stack without the need for a locally built and running server.
|
||||
|
||||
## Table of Contents
|
||||
1. [Setup](#Setup)
|
||||
2. [Build, Configure, and Run Llama Stack](#build-configure-and-run-llama-stack)
|
||||
3. [Testing with `curl`](#testing-with-curl)
|
||||
4. [Testing with Python](#testing-with-python)
|
||||
1. [Setup ollama](#setup-ollama)
|
||||
2. [Install Dependencies and Set Up Environment](#install-dependencies-and-set-up-environment)
|
||||
3. [Build, Configure, and Run Llama Stack](#build-configure-and-run-llama-stack)
|
||||
4. [Run Ollama Model](#run-ollama-model)
|
||||
5. [Next Steps](#next-steps)
|
||||
|
||||
---
|
||||
|
||||
## Setup ollama
|
||||
|
||||
1. **Download Ollama App**:
|
||||
- Go to [https://ollama.com/download](https://ollama.com/download).
|
||||
- Download and unzip `Ollama-darwin.zip`.
|
||||
- Run the `Ollama` application.
|
||||
|
||||
## Setup
|
||||
2. **Download the Ollama CLI**:
|
||||
- Ensure you have the `ollama` command line tool by downloading and installing it from the same website.
|
||||
|
||||
### 1. Prerequisite
|
||||
3. **Verify Installation**:
|
||||
- Open the terminal and run:
|
||||
```bash
|
||||
ollama run llama3.2:1b
|
||||
```
|
||||
|
||||
Ensure you have the following installed on your system:
|
||||
---
|
||||
|
||||
- **Conda**: A package, dependency, and environment management tool.
|
||||
## Install Dependencies and Set Up Environment
|
||||
|
||||
1. **Create a Conda Environment**:
|
||||
- Create a new Conda environment with Python 3.11:
|
||||
```bash
|
||||
conda create -n hack python=3.11
|
||||
```
|
||||
- Activate the environment:
|
||||
```bash
|
||||
conda activate hack
|
||||
```
|
||||
|
||||
### 2. Installation
|
||||
The `llama` CLI tool helps you manage the Llama Stack toolchain and agent systems. Follow these step to install
|
||||
2. **Install ChromaDB**:
|
||||
- Install `chromadb` using `pip`:
|
||||
```bash
|
||||
pip install chromadb
|
||||
```
|
||||
|
||||
First activate and activate your conda environment
|
||||
```
|
||||
conda create --name my-env
|
||||
conda activate my-env
|
||||
```
|
||||
Then install llama-stack with pip, you could also check out other installation methods [here](https://llama-stack.readthedocs.io/en/latest/cli_reference/index.html).
|
||||
3. **Run ChromaDB**:
|
||||
- Start the ChromaDB server:
|
||||
```bash
|
||||
chroma run --host localhost --port 8000 --path ./my_chroma_data
|
||||
```
|
||||
|
||||
```bash
|
||||
pip install llama-stack
|
||||
```
|
||||
|
||||
After installation, the `llama` command should be available in your PATH.
|
||||
|
||||
### 3. Download Llama Models
|
||||
|
||||
Download the necessary Llama model checkpoints using the `llama` CLI:
|
||||
|
||||
```bash
|
||||
llama download --model-id Llama3.2-3B-Instruct
|
||||
```
|
||||
|
||||
Follow the CLI prompts to complete the download. You may need to accept a license agreement. Obtain an instant license [here](https://www.llama.com/llama-downloads/).
|
||||
4. **Install Llama Stack**:
|
||||
- Open a new terminal and install `llama-stack`:
|
||||
```bash
|
||||
conda activate hack
|
||||
pip install llama-stack
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build, Configure, and Run Llama Stack
|
||||
|
||||
### 1. Build the Llama Stack Distribution
|
||||
1. **Build the Llama Stack**:
|
||||
- Build the Llama Stack using the `ollama` template:
|
||||
```bash
|
||||
llama stack build --template ollama --image-type conda
|
||||
```
|
||||
|
||||
We will default to building the `meta-reference-gpu` distribution due to its optimized configuration tailored for inference tasks that utilize local GPU capabilities effectively. If you have limited GPU resources, prefer using a cloud-based instance or plan to run on a CPU, you can explore other distribution options [here](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#decide-your-inference-provider).
|
||||
2. **Edit Configuration**:
|
||||
- Modify the `ollama-run.yaml` file located at `/Users/yourusername/.llama/distributions/llamastack-ollama/ollama-run.yaml`:
|
||||
- Change the `chromadb` port to `8000`.
|
||||
- Remove the `pgvector` section if present.
|
||||
|
||||
```bash
|
||||
llama stack build --template meta-reference-gpu --image-type conda
|
||||
```
|
||||
3. **Run the Llama Stack**:
|
||||
- Run the stack with the configured YAML file:
|
||||
```bash
|
||||
llama stack run /path/to/your/distro/llamastack-ollama/ollama-run.yaml --port 5050
|
||||
```
|
||||
|
||||
|
||||
### 2. Run the Llama Stack Distribution
|
||||
> Launching a distribution initializes and configures the necessary APIs and Providers, enabling seamless interaction with the underlying model.
|
||||
|
||||
Start the server with the configured stack:
|
||||
|
||||
```bash
|
||||
cd llama-stack/distributions/meta-reference-gpu
|
||||
llama stack run ./run.yaml
|
||||
```
|
||||
|
||||
The server will start and listen on `http://localhost:5000` by default.
|
||||
The server will start and listen on `http://localhost:5050`.
|
||||
|
||||
---
|
||||
|
||||
## Testing with `curl`
|
||||
|
||||
After setting up the server, verify it's working by sending a `POST` request using `curl`:
|
||||
After setting up the server, open a new terminal window and verify it's working by sending a `POST` request using `curl`:
|
||||
|
||||
```bash
|
||||
curl http://localhost:5000/inference/chat_completion \
|
||||
curl http://localhost:5050/inference/chat_completion \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "Llama3.2-3B-Instruct",
|
||||
"model": "llama3.2:1b",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
|
||||
|
@ -113,10 +125,11 @@ curl http://localhost:5000/inference/chat_completion \
|
|||
|
||||
You can also interact with the Llama Stack server using a simple Python script. Below is an example:
|
||||
|
||||
### 1. Install Required Python Packages
|
||||
### 1. Active Conda Environment and Install Required Python Packages
|
||||
The `llama-stack-client` library offers a robust and efficient python methods for interacting with the Llama Stack server.
|
||||
|
||||
```bash
|
||||
conda activate your-llama-stack-conda-env
|
||||
pip install llama-stack-client
|
||||
```
|
||||
|
||||
|
@ -129,10 +142,9 @@ touch test_llama_stack.py
|
|||
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
from llama_stack_client.types import SystemMessage, UserMessage
|
||||
|
||||
# Initialize the client
|
||||
client = LlamaStackClient(base_url="http://localhost:5000")
|
||||
client = LlamaStackClient(base_url="http://localhost:5050")
|
||||
|
||||
# Create a chat completion request
|
||||
response = client.inference.chat_completion(
|
||||
|
@ -140,7 +152,7 @@ response = client.inference.chat_completion(
|
|||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Write a two-sentence poem about llama."}
|
||||
],
|
||||
model="Llama3.2-3B-Instruct",
|
||||
model="llama3.2:1b",
|
||||
)
|
||||
|
||||
# Print the response
|
||||
|
@ -161,6 +173,8 @@ A beacon of wonder, as it catches the eye.
|
|||
|
||||
With these steps, you should have a functional Llama Stack setup capable of generating text using the specified model. For more detailed information and advanced configurations, refer to some of our documentation below.
|
||||
|
||||
This command initializes the model to interact with your local Llama Stack instance.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue