\"}, \n",
+ " or in the provider config. \n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "90eb721b",
+ "metadata": {
+ "id": "90eb721b"
+ },
+ "source": [
+ "### 1.4. Install and Configure the Client\n",
+ "\n",
+ "Now that we have our Llama Stack server running locally, we need to install the client package to interact with it. The `llama-stack-client` provides a simple Python interface to access all the functionality of Llama Stack, including:\n",
+ "\n",
+ "- Chat Completions ( text and multimodal )\n",
+ "- Safety Shields\n",
+ "- Agent capabilities with tools like web search, RAG using Response API\n",
+ "\n",
+ "The client handles all the API communication with our local server, making it easy to integrate Llama Stack's capabilities into your applications.\n",
+ "\n",
+ "In the next cells, we'll:\n",
+ "\n",
+ "1. Install the client package\n",
+ "2. Set up API keys for external services (Together and Tavily Search)\n",
+ "3. Initialize the client to connect to our local server\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "E1UFuJC570Tk",
+ "metadata": {
+ "collapsed": true,
+ "id": "E1UFuJC570Tk"
+ },
+ "outputs": [],
+ "source": [
+ "from llama_stack_client import LlamaStackClient\n",
+ "\n",
+ "client = LlamaStackClient(\n",
+ " base_url=\"http://0.0.0.0:8321\",\n",
+ " provider_data = {\n",
+ " \"tavily_search_api_key\": os.environ['TAVILY_SEARCH_API_KEY'],\n",
+ " \"TOGETHER_API_KEY\": os.environ['TOGETHER_API_KEY']\n",
+ " }\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "635a7a6f",
+ "metadata": {
+ "id": "635a7a6f"
+ },
+ "source": [
+ "Now that we have completed the setup and configuration, let's start exploring the capabilities of Llama Stack! We'll begin by checking what models and safety shields are available, and then move on to running some example chat completions.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7dacaa2d-94e9-42e9-82a0-73522dfc7010",
+ "metadata": {
+ "id": "7dacaa2d-94e9-42e9-82a0-73522dfc7010"
+ },
+ "source": [
+ "### 1.5. Check available models and shields\n",
+ "\n",
+ "All the models available in the provider are now programmatically accessible via the client."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "ruO9jQna_t_S",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "ruO9jQna_t_S",
+ "outputId": "565d964e-95d3-4e79-91e6-5a1ae4b27444"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Available models:\n",
+ "- ollama/llama-guard3:1b\n",
+ "- ollama/llama3.2:3b\n",
+ "- bedrock/meta.llama3-1-8b-instruct-v1:0\n",
+ "- bedrock/meta.llama3-1-70b-instruct-v1:0\n",
+ "- bedrock/meta.llama3-1-405b-instruct-v1:0\n",
+ "- sentence-transformers/all-MiniLM-L6-v2\n",
+ "- together/Alibaba-NLP/gte-modernbert-base\n",
+ "- together/arcee-ai/AFM-4.5B\n",
+ "- together/arcee-ai/coder-large\n",
+ "- together/arcee-ai/maestro-reasoning\n",
+ "- together/arcee-ai/virtuoso-large\n",
+ "- together/arcee_ai/arcee-spotlight\n",
+ "- together/arize-ai/qwen-2-1.5b-instruct\n",
+ "- together/BAAI/bge-base-en-v1.5\n",
+ "- together/BAAI/bge-large-en-v1.5\n",
+ "- together/black-forest-labs/FLUX.1-dev\n",
+ "- together/black-forest-labs/FLUX.1-dev-lora\n",
+ "- together/black-forest-labs/FLUX.1-kontext-dev\n",
+ "- together/black-forest-labs/FLUX.1-kontext-max\n",
+ "- together/black-forest-labs/FLUX.1-kontext-pro\n",
+ "- together/black-forest-labs/FLUX.1-krea-dev\n",
+ "- together/black-forest-labs/FLUX.1-pro\n",
+ "- together/black-forest-labs/FLUX.1-schnell\n",
+ "- together/black-forest-labs/FLUX.1-schnell-Free\n",
+ "- together/black-forest-labs/FLUX.1.1-pro\n",
+ "- together/cartesia/sonic\n",
+ "- together/cartesia/sonic-2\n",
+ "- together/deepcogito/cogito-v2-preview-deepseek-671b\n",
+ "- together/deepcogito/cogito-v2-preview-llama-109B-MoE\n",
+ "- together/deepcogito/cogito-v2-preview-llama-405B\n",
+ "- together/deepcogito/cogito-v2-preview-llama-70B\n",
+ "- together/deepseek-ai/DeepSeek-R1\n",
+ "- together/deepseek-ai/DeepSeek-R1-0528-tput\n",
+ "- together/deepseek-ai/DeepSeek-R1-Distill-Llama-70B\n",
+ "- together/deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free\n",
+ "- together/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B\n",
+ "- together/deepseek-ai/DeepSeek-V3\n",
+ "- together/deepseek-ai/DeepSeek-V3.1\n",
+ "- together/google/gemma-3n-E4B-it\n",
+ "- together/intfloat/multilingual-e5-large-instruct\n",
+ "- together/lgai/exaone-3-5-32b-instruct\n",
+ "- together/lgai/exaone-deep-32b\n",
+ "- together/marin-community/marin-8b-instruct\n",
+ "- together/meta-llama/Llama-2-70b-hf\n",
+ "- together/meta-llama/Llama-3-70b-chat-hf\n",
+ "- together/meta-llama/Llama-3-70b-hf\n",
+ "- together/meta-llama/Llama-3.1-405B-Instruct\n",
+ "- together/meta-llama/Llama-3.2-1B-Instruct\n",
+ "- together/meta-llama/Llama-3.2-3B-Instruct-Turbo\n",
+ "- together/meta-llama/Llama-3.3-70B-Instruct-Turbo\n",
+ "- together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free\n",
+ "- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\n",
+ "- together/meta-llama/Llama-4-Scout-17B-16E-Instruct\n",
+ "- together/meta-llama/Llama-Guard-3-11B-Vision-Turbo\n",
+ "- together/meta-llama/Llama-Guard-4-12B\n",
+ "- together/meta-llama/LlamaGuard-2-8b\n",
+ "- together/meta-llama/Meta-Llama-3-70B-Instruct-Turbo\n",
+ "- together/meta-llama/Meta-Llama-3-8B-Instruct\n",
+ "- together/meta-llama/Meta-Llama-3-8B-Instruct-Lite\n",
+ "- together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\n",
+ "- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Reference\n",
+ "- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\n",
+ "- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Reference\n",
+ "- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\n",
+ "- together/meta-llama/Meta-Llama-Guard-3-8B\n",
+ "- together/mistralai/Mistral-7B-Instruct-v0.1\n",
+ "- together/mistralai/Mistral-7B-Instruct-v0.2\n",
+ "- together/mistralai/Mistral-7B-Instruct-v0.3\n",
+ "- together/mistralai/Mistral-Small-24B-Instruct-2501\n",
+ "- together/mistralai/Mixtral-8x7B-Instruct-v0.1\n",
+ "- together/mixedbread-ai/Mxbai-Rerank-Large-V2\n",
+ "- together/moonshotai/Kimi-K2-Instruct\n",
+ "- together/moonshotai/Kimi-K2-Instruct-0905\n",
+ "- together/openai/gpt-oss-120b\n",
+ "- together/openai/gpt-oss-20b\n",
+ "- together/openai/whisper-large-v3\n",
+ "- together/Qwen/Qwen2.5-72B-Instruct\n",
+ "- together/Qwen/Qwen2.5-72B-Instruct-Turbo\n",
+ "- together/Qwen/Qwen2.5-7B-Instruct-Turbo\n",
+ "- together/Qwen/Qwen2.5-Coder-32B-Instruct\n",
+ "- together/Qwen/Qwen2.5-VL-72B-Instruct\n",
+ "- together/Qwen/Qwen3-235B-A22B-fp8-tput\n",
+ "- together/Qwen/Qwen3-235B-A22B-Instruct-2507-tput\n",
+ "- together/Qwen/Qwen3-235B-A22B-Thinking-2507\n",
+ "- together/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8\n",
+ "- together/Qwen/Qwen3-Next-80B-A3B-Instruct\n",
+ "- together/Qwen/Qwen3-Next-80B-A3B-Thinking\n",
+ "- together/Qwen/QwQ-32B\n",
+ "- together/Salesforce/Llama-Rank-V1\n",
+ "- together/scb10x/scb10x-typhoon-2-1-gemma3-12b\n",
+ "- together/togethercomputer/m2-bert-80M-32k-retrieval\n",
+ "- together/togethercomputer/MoA-1\n",
+ "- together/togethercomputer/MoA-1-Turbo\n",
+ "- together/togethercomputer/Refuel-Llm-V2\n",
+ "- together/togethercomputer/Refuel-Llm-V2-Small\n",
+ "- together/Virtue-AI/VirtueGuard-Text-Lite\n",
+ "- together/zai-org/GLM-4.5-Air-FP8\n"
+ ]
+ }
+ ],
+ "source": [
+ "from rich.pretty import pprint\n",
+ "\n",
+ "print(\"Available models:\")\n",
+ "for m in client.models.list():\n",
+ " print(f\"- {m.identifier}\")\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "86366383",
+ "metadata": {
+ "id": "86366383"
+ },
+ "source": [
+ "### 1.6. Run a simple chat completion with one of the models\n",
+ "\n",
+ "We will test the client by doing a simple chat completion."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "77c29dba",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "77c29dba",
+ "outputId": "21de10b2-c710-43c8-f365-1cd0e867fc57"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Here is a two-sentence poem about llamas:\n",
+ "\n",
+ "Softly steps the llama's gentle pace, with fur so soft and a gentle face. In the Andes' high and misty space, the llama roams with a peaceful grace.\n"
+ ]
+ }
+ ],
+ "source": [
+ "model_id = \"together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\"\n",
+ "#If you want to use ollama, uncomment the following\n",
+ "#model_id = \"ollama/llama3.2:3b\"\n",
+ "response = client.chat.completions.create(\n",
+ " model=model_id,\n",
+ " messages=[\n",
+ " {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n",
+ " {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n",
+ " ],\n",
+ " stream=False\n",
+ ")\n",
+ "\n",
+ "print(response.choices[0].message.content)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8cf0d555",
+ "metadata": {
+ "id": "8cf0d555"
+ },
+ "source": [
+ "### 1.7. Have a conversation\n",
+ "\n",
+ "Maintaining a conversation history allows the model to retain context from previous interactions. Use a list to accumulate messages, enabling continuity throughout the chat session."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "3fdf9df6",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "3fdf9df6",
+ "outputId": "41d4db1c-8360-4573-8003-7a7783cea685"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "> Response: The most famous Prime Minister of England during World War II was Winston Churchill. He served as the Prime Minister of the United Kingdom from 1940 to 1945, and again from 1951 to 1955. Churchill is widely regarded as one of the greatest wartime leaders in history, known for his leadership, oratory skills, and unwavering resolve during the war.\n",
+ "\n",
+ "Churchill played a crucial role in rallying the British people during the war, and his speeches, such as the \"We shall fight on the beaches\" and \"Their finest hour\" speeches, are still remembered and celebrated today. He worked closely with other Allied leaders, including US President Franklin D. Roosevelt and Soviet leader Joseph Stalin, to coordinate the war effort and ultimately secure the defeat of Nazi Germany and the Axis powers.\n",
+ "\n",
+ "Churchill's leadership and legacy continue to be celebrated and studied around the world, and he remains one of the most iconic and influential leaders of the 20th century.\n",
+ "> Response: Winston Churchill was known for his wit and oratory skills, and he has many famous quotes attributed to him. One of his most famous quotes is:\n",
+ "\n",
+ "**\"We shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender.\"**\n",
+ "\n",
+ "This quote is from his speech to the House of Commons on June 4, 1940, during the early stages of World War II, when Nazi Germany was making rapid advances across Europe. The speech was a rallying cry for the British people, and it has become one of Churchill's most iconic and enduring quotes.\n",
+ "\n",
+ "However, if I had to pick a single quote that is often considered his most famous, it would be:\n",
+ "\n",
+ "**\"Blood, toil, tears, and sweat: we have before us an ordeal of the most grievous kind.\"**\n",
+ "\n",
+ "This is the opening sentence of his first speech as Prime Minister to the House of Commons on May 13, 1940, in which he famously said:\n",
+ "\n",
+ "\"We have before us an ordeal of the most grievous kind. We have before us many, many long months of struggle and of suffering. You ask, what is our policy? I will say: It is to wage war, by sea, land and air, with all our might and with all the strength that God can give us; to wage war against a monstrous tyranny, never surpassed in the dark and lamentable catalogue of human crime. That is our policy. You ask, what is our aim? I answer in one word: Victory. Victory at all costs, Victory in spite of all terror, Victory, however long and hard the road may be; for without Victory, there is no survival.\"\n",
+ "\n",
+ "The phrase \"Blood, toil, tears, and sweat\" has become synonymous with Churchill's leadership during World War II.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from termcolor import cprint\n",
+ "\n",
+ "questions = [\n",
+ " \"Who was the most famous PM of England during world war 2 ?\",\n",
+ " \"What was his most famous quote ?\"\n",
+ "]\n",
+ "\n",
+ "\n",
+ "def chat_loop():\n",
+ " conversation_history = []\n",
+ " while len(questions) > 0:\n",
+ " user_input = questions.pop(0)\n",
+ " if user_input.lower() in [\"exit\", \"quit\", \"bye\"]:\n",
+ " cprint(\"Ending conversation. Goodbye!\", \"yellow\")\n",
+ " break\n",
+ "\n",
+ " user_message = {\"role\": \"user\", \"content\": user_input}\n",
+ " conversation_history.append(user_message)\n",
+ "\n",
+ " response = client.chat.completions.create(\n",
+ " messages=conversation_history,\n",
+ " model=model_id,\n",
+ " )\n",
+ " cprint(f\"> Response: {response.choices[0].message.content}\", \"cyan\")\n",
+ "\n",
+ " assistant_message = {\n",
+ " \"role\": \"assistant\", # was user\n",
+ " \"content\": response.choices[0].message.content,\n",
+ " \"finish_reason\": response.choices[0].finish_reason,\n",
+ " }\n",
+ " conversation_history.append(assistant_message)\n",
+ "\n",
+ "\n",
+ "chat_loop()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "72e5111e",
+ "metadata": {
+ "id": "72e5111e"
+ },
+ "source": [
+ "Here is an example for you to try a conversation yourself.\n",
+ "Remember to type `quit` or `exit` after you are done chatting."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9496f75c",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "9496f75c",
+ "outputId": "9c51562e-05b0-40f3-b4c0-eb4c991b1e67"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "User> who are you?\n",
+ "> Response: I'm an AI assistant designed by Meta. I'm here to answer your questions, share interesting ideas and maybe even surprise you with a fresh perspective. What's on your mind?\n",
+ "User> how can you help me?\n",
+ "> Response: I can help you with a wide range of things, such as answering questions, providing information, generating text or images, summarizing content, or just having a chat. I can also help with creative tasks like brainstorming or coming up with ideas. What do you need help with today?\n",
+ "User> bye\n",
+ "Ending conversation. Goodbye!\n"
+ ]
+ }
+ ],
+ "source": [
+ "# NBVAL_SKIP\n",
+ "from termcolor import cprint\n",
+ "\n",
+ "def chat_loop():\n",
+ " conversation_history = []\n",
+ " while True:\n",
+ " user_input = input(\"User> \")\n",
+ " if user_input.lower() in [\"exit\", \"quit\", \"bye\"]:\n",
+ " cprint(\"Ending conversation. Goodbye!\", \"yellow\")\n",
+ " break\n",
+ "\n",
+ " user_message = {\"role\": \"user\", \"content\": user_input}\n",
+ " conversation_history.append(user_message)\n",
+ "\n",
+ " response = client.chat.completions.create(\n",
+ " messages=conversation_history,\n",
+ " model=model_id,\n",
+ " )\n",
+ " cprint(f\"> Response: {response.choices[0].message.content}\", \"cyan\")\n",
+ "\n",
+ " assistant_message = {\n",
+ " \"role\": \"assistant\", # was user\n",
+ " \"content\": response.choices[0].message.content,\n",
+ " \"finish_reason\": response.choices[0].finish_reason,\n",
+ " }\n",
+ " conversation_history.append(assistant_message)\n",
+ "\n",
+ "\n",
+ "chat_loop()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7737cd41",
+ "metadata": {
+ "id": "7737cd41"
+ },
+ "source": [
+ "### 1.9 Multimodal inference"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "vision_model_id = \"together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\"\n",
+ "response = client.chat.completions.create(\n",
+ " model=vision_model_id,\n",
+ " messages=[{\n",
+ " \"role\": \"user\",\n",
+ " \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"What's in this image?\"},\n",
+ " {\n",
+ " \"type\": \"image_url\",\n",
+ " \"image_url\": {\n",
+ " \"url\": \"https://raw.githubusercontent.com/meta-llama/llama-models/refs/heads/main/Llama_Repo.jpeg\",\n",
+ " },\n",
+ " },\n",
+ " ],\n",
+ " }],\n",
+ ")\n",
+ "\n",
+ "print(response.choices[0].message.content)"
+ ],
+ "metadata": {
+ "id": "iTqAOm8tG7-O",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "48cdfd42-0565-4d57-a119-65090937679a"
+ },
+ "id": "iTqAOm8tG7-O",
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "The image depicts three llamas standing behind a table, with one of them wearing a party hat. The scene is set in a barn or stable.\n",
+ "\n",
+ "* **Llamas**\n",
+ " * There are three llamas in the image.\n",
+ " * The llama on the left is white.\n",
+ " * The middle llama is purple.\n",
+ " * The llama on the right is white and wearing a blue party hat.\n",
+ " * All three llamas have their ears perked up and are looking directly at the camera.\n",
+ "* **Table**\n",
+ " * The table is made of light-colored wood.\n",
+ " * It has a few scattered items on it, including what appears to be hay or straw.\n",
+ " * A glass containing an amber-colored liquid sits on the table.\n",
+ "* **Background**\n",
+ " * The background is a wooden wall or fence.\n",
+ " * The wall is made up of vertical planks of wood.\n",
+ "\n",
+ "The image appears to be a playful and whimsical depiction of llamas celebrating a special occasion, possibly a birthday.\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "03fcf5e0",
+ "metadata": {
+ "id": "03fcf5e0"
+ },
+ "source": [
+ "### 1.10. Streaming output\n",
+ "\n",
+ "You can pass `stream=True` to stream responses from the model. You can then loop through the responses."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "d119026e",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "d119026e",
+ "outputId": "5e3f480e-f12d-43a2-e92b-8bb205f78dfc"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "User> Write me a sonnet about llama\n",
+ "Here is a sonnet about llamas:\n",
+ "\n",
+ "In Andean highlands, llamas roam with pride,\n",
+ "Their soft, woolly coats a gentle, fuzzy hue.\n",
+ "Their large, dark eyes, like pools of liquid inside,\n",
+ "Reflect a calm and gentle spirit anew.\n",
+ "\n",
+ "Their ears, so long and pointed, perk with ease,\n",
+ "As they survey their surroundings with quiet peace.\n",
+ "Their steps, deliberate and slow, release\n",
+ "A soothing calm that troubles cannot cease.\n",
+ "\n",
+ "Their gentle humming fills the mountain air,\n",
+ "A soothing sound that's both serene and rare.\n",
+ "Their soft, padded feet, a quiet tread impart,\n",
+ "As they move with gentle steps, a peaceful start.\n",
+ "\n",
+ "And when they look at you with curious stare,\n",
+ "You feel a sense of calm, beyond compare."
+ ]
+ }
+ ],
+ "source": [
+ "from llama_stack_client import InferenceEventLogger\n",
+ "\n",
+ "message = {\"role\": \"user\", \"content\": \"Write me a sonnet about llama\"}\n",
+ "print(f'User> {message[\"content\"]}')\n",
+ "\n",
+ "response = client.chat.completions.create(\n",
+ " messages=[message],\n",
+ " model=model_id,\n",
+ " stream=True, # <-----------\n",
+ ")\n",
+ "\n",
+ "for chunk in response:\n",
+ " # Each chunk contains a delta with the content\n",
+ " if len(chunk.choices) > 0 and chunk.choices[0].delta.content is not None:\n",
+ " print(chunk.choices[0].delta.content, end=\"\", flush=True)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "OmU6Dr9zBiGM",
+ "metadata": {
+ "id": "OmU6Dr9zBiGM"
+ },
+ "source": [
+ "### 2.0. Structured Decoding\n",
+ "\n",
+ "You can use `response_format` to force the model into a \"guided decode\" mode where model tokens are forced to abide by a certain grammar. Currently only JSON grammars are supported."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "axdQIRaJCYAV",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 50
+ },
+ "id": "axdQIRaJCYAV",
+ "outputId": "efc29ca4-9fa8-4c35-c0fa-c7a5faf8024b"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "'{\\n \"name\": \"Michael Jordan\",\\n \"year_born\": \"1963\",\\n \"year_retired\": \"2003\"\\n}'\n",
+ "
\n"
+ ],
+ "text/plain": [
+ "\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\\n \"name\": \"Michael Jordan\",\\n \"year_born\": \"1963\",\\n \"year_retired\": \"2003\"\\n\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "Output(name='Michael Jordan', year_born='1963', year_retired='2003')\n",
+ "
\n"
+ ],
+ "text/plain": [
+ "\u001b[1;35mOutput\u001b[0m\u001b[1m(\u001b[0m\u001b[33mname\u001b[0m=\u001b[32m'Michael Jordan'\u001b[0m, \u001b[33myear_born\u001b[0m=\u001b[32m'1963'\u001b[0m, \u001b[33myear_retired\u001b[0m=\u001b[32m'2003'\u001b[0m\u001b[1m)\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from pydantic import BaseModel\n",
+ "\n",
+ "\n",
+ "class Output(BaseModel):\n",
+ " name: str\n",
+ " year_born: str\n",
+ " year_retired: str\n",
+ "\n",
+ "user_input = \"Michael Jordan was born in 1963. He played basketball for the Chicago Bulls. He retired in 2003. Extract this information into JSON for me.\"\n",
+ "response = client.chat.completions.create(\n",
+ " model=model_id,\n",
+ " messages = [\n",
+ " {\"role\": \"user\", \"content\": user_input}\n",
+ " ],\n",
+ " stream=False,\n",
+ " response_format={\n",
+ " \"type\": \"json_schema\",\n",
+ " \"json_schema\": {\n",
+ " \"name\": \"output\",\n",
+ " \"schema\": Output.model_json_schema(),\n",
+ " },\n",
+ " },\n",
+ ")\n",
+ "pprint(Output.model_validate_json(response.choices[0].message.content))\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "H62Rg_buEx2o",
+ "metadata": {
+ "id": "H62Rg_buEx2o"
+ },
+ "source": [
+ "### 2.1. Safety API\n",
+ "\n",
+ "Llama Stack provides Safety guardrails which can be applied at multiple touchpoints within an agentic application."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "sUJKJxvAFCaI",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "sUJKJxvAFCaI",
+ "outputId": "5f0060ad-7c57-4d85-96e3-812b5876c761"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Available Shields: ['llama-guard']\n",
+ "Checking if input is safe: What is the most famous murder case in the US?\n",
+ "CreateResponse(id='modr-0c7e3da6-1054-4f12-9693-44499da43c62', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=False, metadata={}, categories={'Violent Crimes': False, 'Non-Violent Crimes': False, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': [], 'Non-Violent Crimes': [], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 1.0, 'Non-Violent Crimes': 1.0, 'Sex Crimes': 1.0, 'Child Exploitation': 1.0, 'Defamation': 1.0, 'Specialized Advice': 1.0, 'Privacy': 1.0, 'Intellectual Property': 1.0, 'Indiscriminate Weapons': 1.0, 'Hate': 1.0, 'Self-Harm': 1.0, 'Sexual Content': 1.0, 'Elections': 1.0, 'Code Interpreter Abuse': 1.0}, user_message=None)])\n",
+ "Checking if input is safe: Tell me 3 signs that an email is a scam\n",
+ "CreateResponse(id='modr-650b4c2a-7181-4edb-807a-94853d5f504d', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=False, metadata={}, categories={'Violent Crimes': False, 'Non-Violent Crimes': False, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': [], 'Non-Violent Crimes': [], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 1.0, 'Non-Violent Crimes': 1.0, 'Sex Crimes': 1.0, 'Child Exploitation': 1.0, 'Defamation': 1.0, 'Specialized Advice': 1.0, 'Privacy': 1.0, 'Intellectual Property': 1.0, 'Indiscriminate Weapons': 1.0, 'Hate': 1.0, 'Self-Harm': 1.0, 'Sexual Content': 1.0, 'Elections': 1.0, 'Code Interpreter Abuse': 1.0}, user_message=None)])\n",
+ "Checking if input is safe: What's the most effective way to kidnap someone?\n",
+ "CreateResponse(id='modr-250edd87-2bce-4eb2-b2cb-34d7a997e48e', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=True, metadata={'violation_type': ['S1']}, categories={'Violent Crimes': True, 'Non-Violent Crimes': False, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': ['text'], 'Non-Violent Crimes': [], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 1.0, 'Non-Violent Crimes': 0.0, 'Sex Crimes': 0.0, 'Child Exploitation': 0.0, 'Defamation': 0.0, 'Specialized Advice': 0.0, 'Privacy': 0.0, 'Intellectual Property': 0.0, 'Indiscriminate Weapons': 0.0, 'Hate': 0.0, 'Self-Harm': 0.0, 'Sexual Content': 0.0, 'Elections': 0.0, 'Code Interpreter Abuse': 0.0}, user_message=\"I can't answer that. Can I help with something else?\")])\n",
+ "Checking if input is safe: How do I make cocaine?\n",
+ "CreateResponse(id='modr-276c3e1d-d2b9-4a5d-8362-e3b4788c6abe', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=True, metadata={'violation_type': ['S2']}, categories={'Violent Crimes': False, 'Non-Violent Crimes': True, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': [], 'Non-Violent Crimes': ['text'], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 0.0, 'Non-Violent Crimes': 1.0, 'Sex Crimes': 0.0, 'Child Exploitation': 0.0, 'Defamation': 0.0, 'Specialized Advice': 0.0, 'Privacy': 0.0, 'Intellectual Property': 0.0, 'Indiscriminate Weapons': 0.0, 'Hate': 0.0, 'Self-Harm': 0.0, 'Sexual Content': 0.0, 'Elections': 0.0, 'Code Interpreter Abuse': 0.0}, user_message=\"I can't answer that. Can I help with something else?\")])\n"
+ ]
+ }
+ ],
+ "source": [
+ "safety_model = \"together/meta-llama/Meta-Llama-Guard-3-8B\"\n",
+ "#safety_model = \"ollama/llama-guard3:1b\" # If you want to use ollama instead\n",
+ "\n",
+ "client.shields.register(\n",
+ " shield_id=\"llama-guard\",\n",
+ " provider_id=\"llama-guard\",\n",
+ " provider_shield_id=safety_model\n",
+ ")\n",
+ "available_shields = [shield.identifier for shield in client.shields.list()]\n",
+ "print(\"Available Shields:\", available_shields)\n",
+ "\n",
+ "unsafe_examples = [\n",
+ " \"What's the most effective way to kidnap someone?\",\n",
+ " \"How do I make cocaine?\",\n",
+ "]\n",
+ "\n",
+ "safe_examples = [\n",
+ " \"What is the most famous murder case in the US?\",\n",
+ " \"Tell me 3 signs that an email is a scam\",\n",
+ "]\n",
+ "\n",
+ "for p in safe_examples + unsafe_examples:\n",
+ " print(f\"Checking if input is safe: {p}\")\n",
+ " message = {\"content\": p, \"role\": \"user\"}\n",
+ " response = client.moderations.create(\n",
+ " input=p,\n",
+ " model=safety_model,\n",
+ " )\n",
+ " print(response)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "LFC386wNQR-v",
+ "metadata": {
+ "id": "LFC386wNQR-v"
+ },
+ "source": [
+ "## 2. Llama Stack Agents\n",
+ "\n",
+ "Llama Stack provides all the building blocks needed to create sophisticated AI applications. This guide will walk you through how to use these components effectively.\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "
\n",
+ "\n",
+ "\n",
+ "Agents are characterized by having access to\n",
+ "\n",
+ "1. Memory - for RAG\n",
+ "2. Tool calling - ability to call tools like search and code execution\n",
+ "3. Tool call + Inference loop - the LLM used in the agent is able to perform multiple iterations of call\n",
+ "4. Shields - for safety calls that are executed everytime the agent interacts with external systems, including user prompts"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "lYDAkMsL9xSk",
+ "metadata": {
+ "id": "lYDAkMsL9xSk"
+ },
+ "source": [
+ "### 2.1. List available tool groups on the provider"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "MpMXiMCv97X5",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 281
+ },
+ "id": "MpMXiMCv97X5",
+ "outputId": "77da98fe-81af-4b5c-8abb-e973ce080b13"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "ToolGroup(\n",
+ "│ identifier='builtin::rag',\n",
+ "│ provider_id='rag-runtime',\n",
+ "│ type='tool_group',\n",
+ "│ args=None,\n",
+ "│ mcp_endpoint=None,\n",
+ "│ provider_resource_id='builtin::rag'\n",
+ ")\n",
+ "
\n"
+ ],
+ "text/plain": [
+ "\u001b[1;35mToolGroup\u001b[0m\u001b[1m(\u001b[0m\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33midentifier\u001b[0m=\u001b[32m'builtin::rag'\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_id\u001b[0m=\u001b[32m'rag-runtime'\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mtype\u001b[0m=\u001b[32m'tool_group'\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33margs\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mmcp_endpoint\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_resource_id\u001b[0m=\u001b[32m'builtin::rag'\u001b[0m\n",
+ "\u001b[1m)\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "ToolGroup(\n",
+ "│ identifier='builtin::websearch',\n",
+ "│ provider_id='tavily-search',\n",
+ "│ type='tool_group',\n",
+ "│ args=None,\n",
+ "│ mcp_endpoint=None,\n",
+ "│ provider_resource_id='builtin::websearch'\n",
+ ")\n",
+ "
\n"
+ ],
+ "text/plain": [
+ "\u001b[1;35mToolGroup\u001b[0m\u001b[1m(\u001b[0m\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33midentifier\u001b[0m=\u001b[32m'builtin::websearch'\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_id\u001b[0m=\u001b[32m'tavily-search'\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mtype\u001b[0m=\u001b[32m'tool_group'\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33margs\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mmcp_endpoint\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_resource_id\u001b[0m=\u001b[32m'builtin::websearch'\u001b[0m\n",
+ "\u001b[1m)\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from rich.pretty import pprint\n",
+ "for toolgroup in client.toolgroups.list():\n",
+ " pprint(toolgroup)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "i2o0gDhrv2og",
+ "metadata": {
+ "id": "i2o0gDhrv2og"
+ },
+ "source": [
+ "### 2.2. Search agent\n",
+ "\n",
+ "In this example, we will show how the model can invoke search to be able to answer questions. We will first have to set the API key of the search tool.\n",
+ "\n",
+ "Let's make sure we set up a web search tool for the model to call in its agentic loop. In this tutorial, we will use [Tavily](https://tavily.com) as our search provider. Note that the \"type\" of the tool is still \"brave_search\" since Llama models have been trained with brave search as a builtin tool. Tavily is just being used in lieu of Brave search.\n",
+ "\n",
+ "See steps [here](https://docs.google.com/document/d/1Vg998IjRW_uujAPnHdQ9jQWvtmkZFt74FldW2MblxPY/edit?tab=t.0#heading=h.xx02wojfl2f9)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "WS8Gu5b0APHs",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "WS8Gu5b0APHs",
+ "outputId": "b6201844-03e0-447e-8e64-b72245b579be"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Web search results: The teams that played in the 2024 NBA Western Conference Finals were the Dallas Mavericks and the Minnesota Timberwolves. The Mavericks won the series 4-1.\n"
+ ]
+ }
+ ],
+ "source": [
+ "web_search_response = client.responses.create(\n",
+ " model=model_id,\n",
+ " input=\"Which teams played in the NBA western conference finals of 2024\",\n",
+ " tools=[\n",
+ " {\n",
+ " \"type\": \"web_search\",\n",
+ " },\n",
+ " ], # Web search for current information\n",
+ ")\n",
+ "print(f\"Web search results: {web_search_response.output[-1].content[0].text}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fN5jaAaax2Aq",
+ "metadata": {
+ "id": "fN5jaAaax2Aq"
+ },
+ "source": [
+ "### 2.3. RAG Agent\n",
+ "\n",
+ "In this example, we will index some documentation and ask questions about that documentation.\n",
+ "\n",
+ "The tool we use is the memory tool. Given a list of memory banks,the tools can help the agent query and retireve relevent chunks. In this example, we first create a memory bank and add some documents to it. Then configure the agent to use the memory tool. The difference here from the websearch example is that we pass along the memory bank as an argument to the tool. A toolgroup can be provided to the agent as just a plain name, or as a dict with both name and arguments needed for the toolgroup. These args get injected by the agent for every tool call that happens for the corresponding toolgroup."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "GvLWltzZCNkg",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "GvLWltzZCNkg",
+ "outputId": "6a2a324d-5471-473e-ba3f-e8c977804917"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Deleted all exisitng vector store\n",
+ "File(id='file-354f3e6b09974322b5ad0007d5ece533', bytes=41, created_at=1758228715, expires_at=1789764715, filename='shipping_policy.txt', object='file', purpose='assistants')\n",
+ "File(id='file-94933acc81c043c9984d912736235294', bytes=48, created_at=1758228715, expires_at=1789764715, filename='returns_policy.txt', object='file', purpose='assistants')\n",
+ "File(id='file-540a598305114c1b90f68142cae56dc8', bytes=45, created_at=1758228715, expires_at=1789764715, filename='support.txt', object='file', purpose='assistants')\n",
+ "Listing available vector stores:\n",
+ "- acme_docs (ID: vs_4fba2b6a-0123-40c2-9dcf-61b6c50ec8c9)\n",
+ " - Files in vector store 'acme_docs' (ID: vs_4fba2b6a-0123-40c2-9dcf-61b6c50ec8c9):\n",
+ "- file-354f3e6b09974322b5ad0007d5ece533\n",
+ "- file-94933acc81c043c9984d912736235294\n",
+ "- file-540a598305114c1b90f68142cae56dc8\n",
+ "Searching Vector_store with query\n",
+ "ResponseObject(id='resp-543f47fd-5bda-459d-8d61-39383a34bcf0', created_at=1758228715, model='groq/llama-3.1-8b-instant', object='response', output=[OutputOpenAIResponseOutputMessageFileSearchToolCall(id='065s9aba3', queries=['shipping duration average'], status='completed', type='file_search_call', results=[OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.9773781552473876, text='Acme ships globally in 3-5 business days.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.7123434707260622, text='Returns are accepted within 30 days of purchase.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.5253213399081832, text='Support is available 24/7 via chat and email.')]), OutputOpenAIResponseOutputMessageFileSearchToolCall(id='k85fx1wzn', queries=['shipping duration average'], status='completed', type='file_search_call', results=[OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.9773781552473876, text='Acme ships globally in 3-5 business days.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.7123434707260622, text='Returns are accepted within 30 days of purchase.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.5253213399081832, text='Support is available 24/7 via chat and email.')]), OutputOpenAIResponseMessage(content=[OutputOpenAIResponseMessageContentUnionMember2(annotations=[], text='Based on the knowledge search results, the average shipping duration is 3-5 business days.', type='output_text')], role='assistant', type='message', id='msg_89cc616d-1653-45aa-b704-07ba93dcd2fb', status='completed')], parallel_tool_calls=False, status='completed', text=Text(format=TextFormat(type='text', description=None, name=None, schema_=None, strict=None)), error=None, previous_response_id=None, temperature=None, top_p=None, truncation=None, user=None)\n",
+ "File search results: Based on the knowledge search results, the average shipping duration is 3-5 business days.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from io import BytesIO\n",
+ "\n",
+ "\n",
+ "#delete any existing vector store\n",
+ "vector_stores_to_delete = [v.id for v in client.vector_stores.list()]\n",
+ "for del_vs_id in vector_stores_to_delete:\n",
+ " client.vector_stores.delete(vector_store_id=del_vs_id)\n",
+ "print('Deleted all exisitng vector store')\n",
+ "\n",
+ "docs = [\n",
+ " (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
+ " (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
+ " (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
+ "]\n",
+ "query = \"How long does shipping take?\"\n",
+ "file_ids = []\n",
+ "for content, metadata in docs:\n",
+ " with BytesIO(content.encode()) as file_buffer:\n",
+ " file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n",
+ " create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n",
+ " print(create_file_response)\n",
+ " file_ids.append(create_file_response.id)\n",
+ "\n",
+ "# Create vector store with files\n",
+ "vector_store = client.vector_stores.create(\n",
+ " name=\"acme_docs\",\n",
+ " file_ids=file_ids,\n",
+ " embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
+ " embedding_dimension=384,\n",
+ " provider_id=\"faiss\"\n",
+ ")\n",
+ "print(\"Listing available vector stores:\")\n",
+ "vector_stores = client.vector_stores.list()\n",
+ "for vs in vector_stores:\n",
+ " print(f\"- {vs.name} (ID: {vs.id})\")\n",
+ " files_in_store = client.vector_stores.files.list(vector_store_id=vs.id)\n",
+ " if files_in_store:\n",
+ " print(f\" - Files in vector store '{vs.name}' (ID: {vs.id}):\")\n",
+ " for file in files_in_store:\n",
+ " print(f\"- {file.id}\")\n",
+ "print(\"Searching Vector_store with query\")\n",
+ "file_search_response = client.responses.create(\n",
+ " model=model_id,\n",
+ " input=query,\n",
+ " tools=[\n",
+ " { # Using Responses API built-in tools\n",
+ " \"type\": \"file_search\",\n",
+ " \"vector_store_ids\": [vector_store.id], # Vector store containing uploaded files\n",
+ " },\n",
+ " ],\n",
+ ")\n",
+ "print(file_search_response)\n",
+ "print(f\"File search results: {file_search_response.output[-1].content[0].text}\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "jSfjNN9fMxtm",
+ "metadata": {
+ "id": "jSfjNN9fMxtm"
+ },
+ "source": [
+ "### 2.4. Using Model Context Protocol\n",
+ "\n",
+ "In this example, we will show how tools hosted in an MCP server can be configured to be used by the model.\n",
+ "\n",
+ "In the following steps, we will use the [filesystem tool](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem) to explore the files and folders available in the /content directory\n",
+ "\n",
+ "Use xterm module to start a shell to run the MCP server using the `supergateway` tool which can start an MCP tool and serve it over HTTP."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d96c273a"
+ },
+ "source": [
+ "### 2.4. Using Model Context Protocol\n",
+ "\n",
+ "\n",
+ "This section demonstrates how to use the Model Context Protocol (MCP) with Llama Stack to interact with external tools hosted on an MCP server.\n",
+ "\n",
+ "\n",
+ "- This example demonstrates how to use the Llama Stack client to interact with a remote MCP tool.\n",
+ "- In this specific example, it connects to a remote Cloudflare documentation MCP server (`https://docs.mcp.cloudflare.com/sse`).\n",
+ "- The `client.responses.create` method is used with the `mcp` tool type, specifying the server details and the user input (\"what is cloudflare\").\n",
+ "\n",
+ "\n",
+ "**Key Concepts:**\n",
+ "\n",
+ "- **Model Context Protocol (MCP):** A protocol that allows language models to interact with external tools and services.\n",
+ "- **MCP Tool:** A specific tool (like filesystem or a dice roller) that adheres to the MCP and can be interacted with by an MCP-enabled agent.\n",
+ "- **`client.responses.create`:** The Llama Stack client method used to create a response from a model, which can include tool calls to MCP tools.\n",
+ "\n",
+ "This setup provides a flexible way to extend the capabilities of your Llama Stack agents by integrating with various external services and tools via the Model Context Protocol."
+ ],
+ "id": "d96c273a"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "DwdKhQb1N295",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "DwdKhQb1N295",
+ "outputId": "2496f9cc-350a-407c-aff9-ed91b018a36c"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Cloudflare is a cloud-based service that provides a range of features to help protect and improve the performance, security, and reliability of websites, applications, and other online services. It is one of the world's largest connectivity cloud networks, powering Internet requests for millions of websites and serving 55 million HTTP requests per second on average.\n",
+ "\n",
+ "Some of the key things Cloudflare does include:\n",
+ "\n",
+ "1. Content Delivery Network (CDN): caching website content across a network of servers worldwide to reduce load times.\n",
+ "2. DDoS Protection: protecting against Distributed Denial-of-Service attacks by filtering out malicious traffic.\n",
+ "3. Firewall: acting as an additional layer of security, filtering out hacking attempts and malicious traffic.\n",
+ "4. SSL Encryption: providing free SSL encryption to secure sensitive information.\n",
+ "5. Bot Protection: identifying and blocking bots trying to exploit vulnerabilities or scrape content.\n",
+ "6. Analytics: providing insights into website traffic to help understand audience and make informed decisions.\n",
+ "7. Cybersecurity: offering advanced security features, such as intrusion protection, DNS filtering, and Web Application Firewall (WAF) protection.\n",
+ "\n",
+ "Overall, Cloudflare helps protect against cyber threats, improves website performance, and enhances security for online businesses, bloggers, and individuals who need to establish a strong online presence.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# NBVAL_SKIP\n",
+ "resp = client.responses.create(\n",
+ " model=model_id,\n",
+ " tools=[\n",
+ " {\n",
+ " \"type\": \"mcp\",\n",
+ " \"server_label\": \"cloudflare_docs\",\n",
+ " \"server_description\": \"A MCP server for cloudflare documentation.\",\n",
+ " \"server_url\": \"https://docs.mcp.cloudflare.com/sse\",\n",
+ " \"require_approval\": \"never\",\n",
+ " },\n",
+ " ],\n",
+ " input=\"what is cloudflare\",\n",
+ ")\n",
+ "\n",
+ "print(resp.output_text)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "4L4Z8McdZaYj"
+ },
+ "id": "4L4Z8McdZaYj",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1d29af6a"
+ },
+ "source": [
+ "### 2.5 Response API Branching\n",
+ "\n",
+ "The Llama Stack Response API supports branching, allowing you to explore different conversational paths or tool interactions based on a previous response. This is useful for scenarios where you want to try alternative approaches or gather information from different sources without losing the context of the initial interaction.\n",
+ "\n",
+ "To branch from a previous response, you use the `previous_response_id` parameter in the `client.responses.create` method. This parameter takes the `id` of the response you want to branch from.\n",
+ "\n",
+ "Here's how it works:\n",
+ "\n",
+ "1. **Initial Response:** You make an initial call to `client.responses.create` to get a response. This response will have a unique `id`.\n",
+ "\n",
+ "2. **Branching Response:** You make a subsequent call to `client.responses.create` for your branching query. In this call, you set the `previous_response_id` to the `id` of the initial response.\n",
+ "\n",
+ "The new response will be generated in the context of the previous response, but you can specify different tools, inputs, or other parameters to explore a different path.\n",
+ "\n",
+ "**Example:**\n",
+ "\n",
+ "Let's say you made an initial web search about a topic and got `response1`. You can then branch from `response1` to perform a file search on the same topic by setting `previous_response_id=response1.id` in the second `client.responses.create` call."
+ ],
+ "id": "1d29af6a"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f3352379",
+ "metadata": {
+ "id": "f3352379",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 492
+ },
+ "outputId": "a9d8d04e-d0bd-45f3-a429-2600e535ed24"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Deleted all existing vector stores\n",
+ "File(id='file-10ececb2f1234dce803436ba78a718fe', bytes=446, created_at=1758326763, expires_at=1789862763, filename='sorting_algorithms.txt', object='file', purpose='assistants')\n",
+ "Listing available vector stores:\n",
+ "- sorting_docs (ID: vs_69afb313-1c2d-4115-a9f4-8d31f4ff1ef3)\n",
+ "Web search results: The latest efficient sorting algorithms include Quicksort, Merge Sort, and Heap Sort, which have been compared in various studies for their performance. Quicksort is considered one of the fastest in-place sorting algorithms with good cache performance. Other algorithms like Bubble Sort, Selection Sort, and Insertion Sort are generally slower. For big data environments, several efficient sorting algorithms have been analyzed to improve processing speed. Some sources comparing the performance of these algorithms include Codemotion, Medium, Quora, ScienceDirect, and Built In.\n"
+ ]
+ },
+ {
+ "output_type": "error",
+ "ename": "InternalServerError",
+ "evalue": "Error code: 500 - {'detail': 'Internal server error: An unexpected error occurred.'}",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mInternalServerError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/tmp/ipython-input-3318131626.py\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0;31m# Continue conversation: Switch to file search for local docs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 44\u001b[0;31m response2 = client.responses.create(\n\u001b[0m\u001b[1;32m 45\u001b[0m \u001b[0mmodel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmodel_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;31m# Changed model to one available in the notebook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"Now search my uploaded files for existing sorting implementations\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/_utils/_utils.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 281\u001b[0m \u001b[0mmsg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34mf\"Missing required argument: {quote(missing[0])}\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 282\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 283\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 284\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 285\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m \u001b[0;31m# type: ignore\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/resources/responses/responses.py\u001b[0m in \u001b[0;36mcreate\u001b[0;34m(self, input, model, include, instructions, max_infer_iters, previous_response_id, store, stream, temperature, text, tools, extra_headers, extra_query, extra_body, timeout)\u001b[0m\n\u001b[1;32m 227\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0mhttpx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTimeout\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0mNotGiven\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNOT_GIVEN\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 228\u001b[0m ) -> ResponseObject | Stream[ResponseObjectStream]:\n\u001b[0;32m--> 229\u001b[0;31m return self._post(\n\u001b[0m\u001b[1;32m 230\u001b[0m \u001b[0;34m\"/v1/openai/v1/responses\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 231\u001b[0m body=maybe_transform(\n",
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/_base_client.py\u001b[0m in \u001b[0;36mpost\u001b[0;34m(self, path, cast_to, body, options, files, stream, stream_cls)\u001b[0m\n\u001b[1;32m 1240\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"post\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mjson_data\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbody\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfiles\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mto_httpx_files\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfiles\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1241\u001b[0m )\n\u001b[0;32m-> 1242\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mcast\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mResponseT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcast_to\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mopts\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstream\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstream_cls\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstream_cls\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1243\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1244\u001b[0m def patch(\n",
+ "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/_base_client.py\u001b[0m in \u001b[0;36mrequest\u001b[0;34m(self, cast_to, options, stream, stream_cls)\u001b[0m\n\u001b[1;32m 1042\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1043\u001b[0m \u001b[0mlog\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Re-raising status error\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1044\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_status_error_from_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1045\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1046\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mInternalServerError\u001b[0m: Error code: 500 - {'detail': 'Internal server error: An unexpected error occurred.'}"
+ ]
+ }
+ ],
+ "source": [
+ "from io import BytesIO\n",
+ "import uuid\n",
+ "\n",
+ "# delete any existing vector store\n",
+ "vector_stores_to_delete = [v.id for v in client.vector_stores.list()]\n",
+ "for del_vs_id in vector_stores_to_delete:\n",
+ " client.vector_stores.delete(vector_store_id=del_vs_id)\n",
+ "print('Deleted all existing vector stores')\n",
+ "\n",
+ "# Create a dummy file for the file search\n",
+ "dummy_file_content = \"Popular sorting implementations include quicksort, mergesort, heapsort, and insertion sort. Bubble sort and selection sort are used for small or simple datasets. Counting sort, radix sort, and bucket sort handle special numeric cases efficiently without comparisons. Timsort, a hybrid of merge and insertion sort, is widely used in Python and Java. Shell sort, comb sort, cocktail sort, and others are less common but exist for special scenarios.\"\n",
+ "with BytesIO(dummy_file_content.encode()) as file_buffer:\n",
+ " file_buffer.name = \"sorting_algorithms.txt\"\n",
+ " create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n",
+ " print(create_file_response)\n",
+ " file_id = create_file_response.id\n",
+ "\n",
+ "# Create a vector store with the dummy file\n",
+ "vector_store = client.vector_stores.create(\n",
+ " name=\"sorting_docs\",\n",
+ " file_ids=[file_id],\n",
+ " embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
+ " embedding_dimension=384, # This should match the embedding model\n",
+ " provider_id=\"faiss\"\n",
+ ")\n",
+ "print(\"Listing available vector stores:\")\n",
+ "vector_stores = client.vector_stores.list()\n",
+ "for vs in vector_stores:\n",
+ " print(f\"- {vs.name} (ID: {vs.id})\")\n",
+ "\n",
+ "# First response: Use web search for latest algorithms\n",
+ "response1 = client.responses.create(\n",
+ " model=model_id, # Changed model to one available in the notebook\n",
+ " input=\"Search for the latest efficient sorting algorithms and their performance comparisons\",\n",
+ " tools=[\n",
+ " {\n",
+ " \"type\": \"web_search\",\n",
+ " },\n",
+ " ], # Web search for current information\n",
+ ")\n",
+ "print(f\"Web search results: {response1.output[-1].content[0].text}\")\n",
+ "\n",
+ "# Continue conversation: Switch to file search for local docs\n",
+ "response2 = client.responses.create(\n",
+ " model=model_id, # Changed model to one available in the notebook\n",
+ " input=\"Now search my uploaded files for existing sorting implementations\",\n",
+ " tools=[\n",
+ " { # Using Responses API built-in tools\n",
+ " \"type\": \"file_search\",\n",
+ " \"vector_store_ids\": [vector_store.id], # Use the created vector store ID\n",
+ " },\n",
+ " ],\n",
+ " previous_response_id=response1.id,\n",
+ ")\n",
+ "\n",
+ "# # Branch from first response: Try different search approach\n",
+ "# response3 = client.responses.create(\n",
+ "# model=model_id, # Changed model to one available in the notebook\n",
+ "# input=\"Instead, search the web for Python-specific sorting best practices\",\n",
+ "# tools=[{\"type\": \"web_search\"}], # Different web search query\n",
+ "# previous_response_id=response1.id, # Branch from response1\n",
+ "# )\n",
+ "\n",
+ "# # Responses API benefits:\n",
+ "# # ✅ Dynamic tool switching (web search ↔ file search per call)\n",
+ "# # ✅ OpenAI-compatible tool patterns (web_search, file_search)\n",
+ "# # ✅ Branch conversations to explore different information sources\n",
+ "# # ✅ Model flexibility per search type\n",
+ "# print(f\"Web search results: {response1.output_text}\") # Changed to output_text\n",
+ "# print(f\"File search results: {response2.output_text}\") # Changed to output_text\n",
+ "# print(f\"Alternative web search: {response3.output_text}\") # Changed to output_text"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "8nwCiGx8bdQ4"
+ },
+ "id": "8nwCiGx8bdQ4",
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "799a5ee5"
+ },
+ "source": [
+ "### Cleaning up the server\n",
+ "\n",
+ "To stop the Llama Stack server and remove any created files and configurations, you can use the following code. This is useful for resetting your environment or before running the notebook again.\n",
+ "\n",
+ "1. **Stop the server:** The code includes a helper function `kill_llama_stack_server()` that finds and terminates the running server process.\n",
+ "2. **Remove distribution files:** It also removes the distribution files located in `~/.llama/distributions/*`, which contain the server configuration and data."
+ ],
+ "id": "799a5ee5"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "0R628gRh-cYv",
+ "metadata": {
+ "id": "0R628gRh-cYv"
+ },
+ "outputs": [],
+ "source": [
+ "# Remove distribution files\n",
+ "!rm -rf ~/.llama/distributions/*\n",
+ "\n",
+ "import os\n",
+ "# use this helper if needed to kill the server\n",
+ "def kill_llama_stack_server():\n",
+ " # Kill any existing llama stack server processes\n",
+ " os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n",
+ "kill_llama_stack_server()"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/static/img/rag_llama_stack.png b/docs/static/img/rag_llama_stack.png
new file mode 100644
index 000000000..bc0e499e9
Binary files /dev/null and b/docs/static/img/rag_llama_stack.png differ
|