diff --git a/docs/docs/building_applications/rag.mdx b/docs/docs/building_applications/rag.mdx index 5212616d2..2ea459890 100644 --- a/docs/docs/building_applications/rag.mdx +++ b/docs/docs/building_applications/rag.mdx @@ -12,356 +12,330 @@ import TabItem from '@theme/TabItem'; RAG enables your applications to reference and recall information from previous interactions or external documents. -## Architecture Overview +Llama Stack now uses a modern, OpenAI-compatible API pattern for RAG: +1. **Files API**: Upload documents using `client.files.create()` +2. **Vector Stores API**: Create and manage vector stores with `client.vector_stores.create()` +3. **Responses API**: Query documents using `client.responses.create()` with the `file_search` tool -Llama Stack organizes the APIs that enable RAG into three layers: +This new approach provides better compatibility with OpenAI's ecosystem and is the recommended way to implement RAG in Llama Stack. -1. **Lower-Level APIs**: Deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon) -2. **RAG Tool**: A first-class tool as part of the [Tools API](./tools) that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly -3. **Agents API**: The top-level [Agents API](./agent) that allows you to create agents that can use the tools to answer questions, perform tasks, and more +RAG System -![RAG System Architecture](/img/rag.png) +## Prerequisites -The RAG system uses lower-level storage for different types of data: -- **Vector IO**: For semantic search and retrieval -- **Key-Value and Relational IO**: For structured data storage +For this guide, we will use [Ollama](https://ollama.com/) as the inference provider. +Ollama is an LLM runtime that allows you to run Llama models locally. It's a great choice for development and testing, but you can also use any other inference provider that supports the OpenAI API. -:::info[Future Storage Types] -We may add more storage types like Graph IO in the future. -::: +Before you begin, make sure you have the following: +1. **Ollama**: Follow the [installation guide](https://ollama.com/docs/ollama/getting-started/install +) to set up Ollama on your machine. +2. **Llama Stack**: Follow the [installation guide](/docs/installation) to set up Llama Stack on your +machine. +3. **Documents**: Prepare a set of documents that you want to search. These can be plain text, PDFs, or other file types. +4. **environment variable**: Set the `LLAMA_STACK_PORT` environment variable to the port where Llama Stack is running. For example, if you are using the default port of 8321, set `export LLAMA_STACK_PORT=8321`. Also set 'OLLAMA_URL' environment variable to be 'http://localhost:11434' -## Setting up Vector Databases +## Step 0: Initialize Client -For this guide, we will use [Ollama](https://ollama.com/) as the inference provider. Ollama is an LLM runtime that allows you to run Llama models locally. - -Here's how to set up a vector database for RAG: +After lauched Llama Stack server by `llama stack build --distro starter --image-type venv --run`, initialize the client with the base URL of your Llama Stack instance. ```python -# Create HTTP client import os from llama_stack_client import LlamaStackClient +from io import BytesIO client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}") - -# Register a vector database -vector_db_id = "my_documents" -response = client.vector_dbs.register( - vector_db_id=vector_db_id, - embedding_model="all-MiniLM-L6-v2", - embedding_dimension=384, - provider_id="faiss", -) ``` -## Document Ingestion +## Step 1: Upload Documents Using Files API -You can ingest documents into the vector database using two methods: directly inserting pre-chunked documents or using the RAG Tool. - -### Direct Document Insertion +The first step is to upload your documents using the Files API. Documents can be plain text, PDFs, or other file types. - + ```python -# You can insert a pre-chunked document directly into the vector db -chunks = [ - { - "content": "Your document text here", - "mime_type": "text/plain", - "metadata": { - "document_id": "doc1", - "author": "Jane Doe", - }, - }, +# Example documents with metadata +docs = [ + ("Acme ships globally in 3-5 business days.", {"title": "Shipping Policy"}), + ("Returns are accepted within 30 days of purchase.", {"title": "Returns Policy"}), + ("Support is available 24/7 via chat and email.", {"title": "Support"}), ] -client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks) + +# Upload each document and collect file IDs +file_ids = [] +for content, metadata in docs: + with BytesIO(content.encode()) as file_buffer: + # Set a descriptive filename + file_buffer.name = f"{metadata['title'].replace(' ', '_').lower()}.txt" + + # Upload the file + create_file_response = client.files.create( + file=file_buffer, + purpose="assistants" + ) + print(f"Uploaded: {create_file_response.id}") + file_ids.append(create_file_response.id) ``` - - -If you decide to precompute embeddings for your documents, you can insert them directly into the vector database by including the embedding vectors in the chunk data. This is useful if you have a separate embedding service or if you want to customize the ingestion process. + ```python -chunks_with_embeddings = [ - { - "content": "First chunk of text", - "mime_type": "text/plain", - "embedding": [0.1, 0.2, 0.3, ...], # Your precomputed embedding vector - "metadata": {"document_id": "doc1", "section": "introduction"}, - }, - { - "content": "Second chunk of text", - "mime_type": "text/plain", - "embedding": [0.2, 0.3, 0.4, ...], # Your precomputed embedding vector - "metadata": {"document_id": "doc1", "section": "methodology"}, - }, -] -client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks_with_embeddings) +# Upload a file from your local filesystem +with open("policy_document.pdf", "rb") as f: + file_response = client.files.create( + file=f, + purpose="assistants" + ) + file_ids.append(file_response.id) ``` -:::warning[Embedding Dimensions] -When providing precomputed embeddings, ensure the embedding dimension matches the `embedding_dimension` specified when registering the vector database. -::: + + + +```python +# Batch upload multiple documents +document_paths = [ + "docs/shipping.txt", + "docs/returns.txt", + "docs/support.txt" +] + +file_ids = [] +for path in document_paths: + with open(path, "rb") as f: + response = client.files.create(file=f, purpose="assistants") + file_ids.append(response.id) + print(f"Uploaded {path}: {response.id}") +``` -### Document Retrieval +## Step 2: Create a Vector Store -You can query the vector database to retrieve documents based on their embeddings. +Once you have uploaded your documents, create a vector store that will index them for semantic search. ```python -# You can then query for these chunks -chunks_response = client.vector_io.query( - vector_db_id=vector_db_id, - query="What do you know about..." +# Create vector store with uploaded files +vector_store = client.vector_stores.create( + name="acme_docs", + file_ids=file_ids, + embedding_model="sentence-transformers/all-MiniLM-L6-v2", + embedding_dimension=384, + provider_id="faiss" ) + +print(f"Created vector store: {vector_store.name} (ID: {vector_store.id})") ``` -## Using the RAG Tool +### Configuration Options -:::danger[Deprecation Notice] -The RAG Tool is being deprecated in favor of directly using the OpenAI-compatible Search API. We recommend migrating to the OpenAI APIs for better compatibility and future support. -::: +- **name**: A descriptive name for your vector store +- **file_ids**: List of file IDs to include in the vector store +- **embedding_model**: The model to use for generating embeddings (e.g., "sentence-transformers/all-MiniLM-L6-v2", "all-MiniLM-L6-v2") +- **embedding_dimension**: Dimension of the embedding vectors (e.g., 384 for MiniLM, 768 for BERT) +- **provider_id**: The vector database backend (e.g., "faiss", "chroma") -A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces. More examples for how to format a RAGDocument can be found in the [appendix](#more-ragdocument-examples). +## Step 3: Query the Vector Store -### OpenAI API Integration & Migration +Use the Responses API with the `file_search` tool to query your documents. -The RAG tool has been updated to use OpenAI-compatible APIs. This provides several benefits: - -- **Files API Integration**: Documents are now uploaded using OpenAI's file upload endpoints -- **Vector Stores API**: Vector storage operations use OpenAI's vector store format with configurable chunking strategies -- **Error Resilience**: When processing multiple documents, individual failures are logged but don't crash the operation. Failed documents are skipped while successful ones continue processing. - -### Migration Path - -We recommend migrating to the OpenAI-compatible Search API for: - -1. **Better OpenAI Ecosystem Integration**: Direct compatibility with OpenAI tools and workflows including the Responses API -2. **Future-Proof**: Continued support and feature development -3. **Full OpenAI Compatibility**: Vector Stores, Files, and Search APIs are fully compatible with OpenAI's Responses API - -The OpenAI APIs are used under the hood, so you can continue to use your existing RAG Tool code with minimal changes. However, we recommend updating your code to use the new OpenAI-compatible APIs for better long-term support. If any documents fail to process, they will be logged in the response but will not cause the entire operation to fail. - -### RAG Tool Example + + ```python -from llama_stack_client import RAGDocument +query = "How long does shipping take?" -urls = ["memory_optimizations.rst", "chat.rst", "llama3.rst"] -documents = [ - RAGDocument( - document_id=f"num-{i}", - content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", - mime_type="text/plain", - metadata={}, - ) - for i, url in enumerate(urls) -] - -client.tool_runtime.rag_tool.insert( - documents=documents, - vector_db_id=vector_db_id, - chunk_size_in_tokens=512, -) - -# Query documents -results = client.tool_runtime.rag_tool.query( - vector_db_ids=[vector_db_id], - content="What do you know about...", -) -``` - -### Custom Context Configuration - -You can configure how the RAG tool adds metadata to the context if you find it useful for your application: - -```python -# Query documents with custom template -results = client.tool_runtime.rag_tool.query( - vector_db_ids=[vector_db_id], - content="What do you know about...", - query_config={ - "chunk_template": "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n", - }, -) -``` - -## Building RAG-Enhanced Agents - -One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example: - -### Agent with Knowledge Search - -```python -from llama_stack_client import Agent - -# Create agent with memory -agent = Agent( - client, +# Search the vector store +file_search_response = client.responses.create( model="meta-llama/Llama-3.3-70B-Instruct", - instructions="You are a helpful assistant", + input=query, tools=[ { - "name": "builtin::rag/knowledge_search", - "args": { - "vector_db_ids": [vector_db_id], - # Defaults - "query_config": { - "chunk_size_in_tokens": 512, - "chunk_overlap_in_tokens": 0, - "chunk_template": "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n", - }, - }, - } + "type": "file_search", + "vector_store_ids": [vector_store.id], + }, ], ) -session_id = agent.create_session("rag_session") -# Ask questions about documents in the vector db, and the agent will query the db to answer the question. -response = agent.create_turn( - messages=[{"role": "user", "content": "How to optimize memory in PyTorch?"}], - session_id=session_id, -) -``` - -:::tip[Agent Instructions] -The `instructions` field in the `AgentConfig` can be used to guide the agent's behavior. It is important to experiment with different instructions to see what works best for your use case. -::: - -### Document-Aware Conversations - -You can also pass documents along with the user's message and ask questions about them: - -```python -# Initial document ingestion -response = agent.create_turn( - messages=[ - {"role": "user", "content": "I am providing some documents for reference."} - ], - documents=[ - { - "content": "https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/memory_optimizations.rst", - "mime_type": "text/plain", - } - ], - session_id=session_id, -) - -# Query with RAG -response = agent.create_turn( - messages=[{"role": "user", "content": "What are the key topics in the documents?"}], - session_id=session_id, -) -``` - -### Viewing Agent Responses - -You can print the response with the following: - -```python -from llama_stack_client import AgentEventLogger - -for log in AgentEventLogger().log(response): - log.print() -``` - -## Vector Database Management - -### Unregistering Vector DBs - -If you need to clean up and unregister vector databases, you can do so as follows: - - - - -```python -# Unregister a specified vector database -vector_db_id = "my_vector_db_id" -print(f"Unregistering vector database: {vector_db_id}") -client.vector_dbs.unregister(vector_db_id=vector_db_id) +print(file_search_response) ``` - + + +You can search across multiple vector stores simultaneously: ```python -# Unregister all vector databases -for vector_db_id in client.vector_dbs.list(): - print(f"Unregistering vector database: {vector_db_id.identifier}") - client.vector_dbs.unregister(vector_db_id=vector_db_id.identifier) +file_search_response = client.responses.create( + model="meta-llama/Llama-3.3-70B-Instruct", + input="What are your policies?", + tools=[ + { + "type": "file_search", + "vector_store_ids": [ + vector_store_1.id, + vector_store_2.id, + vector_store_3.id + ], + }, + ], +) ``` -## Best Practices +## Managing Vector Stores -### 🎯 **Document Chunking** -- Use appropriate chunk sizes (512 tokens is often a good starting point) -- Consider overlap between chunks for better context preservation -- Experiment with different chunking strategies for your content type - -### 🔍 **Embedding Strategy** -- Choose embedding models that match your domain -- Consider the trade-off between embedding dimension and performance -- Test different embedding models for your specific use case - -### 📊 **Query Optimization** -- Use specific, well-formed queries for better retrieval -- Experiment with different search strategies -- Consider hybrid approaches (keyword + semantic search) - -### 🛡️ **Error Handling** -- Implement proper error handling for failed document processing -- Monitor ingestion success rates -- Have fallback strategies for retrieval failures - -## Appendix - -### More RAGDocument Examples - -Here are various ways to create RAGDocument objects for different content types: +### List All Vector Stores ```python -from llama_stack_client import RAGDocument -import base64 +print("Listing available vector stores:") +vector_stores = client.vector_stores.list() -# File URI -RAGDocument(document_id="num-0", content={"uri": "file://path/to/file"}) +for vs in vector_stores: + print(f"- {vs.name} (ID: {vs.id})") -# Plain text -RAGDocument(document_id="num-1", content="plain text") + # List files in each vector store + files_in_store = client.vector_stores.files.list(vector_store_id=vs.id) + if files_in_store: + print(f" Files in '{vs.name}':") + for file in files_in_store: + print(f" - {file.id}") +``` -# Explicit text input -RAGDocument( - document_id="num-2", - content={ - "type": "text", - "text": "plain text input", - }, # for inputs that should be treated as text explicitly +### Clean Up Vector Stores + + + + +```python +# Delete a specific vector store +client.vector_stores.delete(vector_store_id=vector_store.id) +print(f"Deleted vector store: {vector_store.id}") +``` + + + + +```python +# Delete all existing vector stores +vector_stores_to_delete = [v.id for v in client.vector_stores.list()] +for del_vs_id in vector_stores_to_delete: + client.vector_stores.delete(vector_store_id=del_vs_id) + print(f"Deleted: {del_vs_id}") +``` + + + + +## Complete Example: Building a RAG System + +Here's a complete example that puts it all together: + +```python +from io import BytesIO +from llama_stack_client import LlamaStackClient + +# Initialize client +client = LlamaStackClient(base_url="http://localhost:5001") + +# Step 1: Prepare and upload documents +knowledge_base = [ + ("Python is a high-level programming language.", {"category": "Programming"}), + ("Machine learning is a subset of artificial intelligence.", {"category": "AI"}), + ("Neural networks are inspired by the human brain.", {"category": "AI"}), +] + +file_ids = [] +for content, metadata in knowledge_base: + with BytesIO(content.encode()) as file_buffer: + file_buffer.name = f"{metadata['category'].lower()}_{len(file_ids)}.txt" + response = client.files.create(file=file_buffer, purpose="assistants") + file_ids.append(response.id) + +# Step 2: Create vector store +vector_store = client.vector_stores.create( + name="tech_knowledge_base", + file_ids=file_ids, + embedding_model="all-MiniLM-L6-v2", + embedding_dimension=384, + provider_id="faiss" ) -# Image from URL -RAGDocument( - document_id="num-3", - content={ - "type": "image", - "image": {"url": {"uri": "https://mywebsite.com/image.jpg"}}, - }, +# Step 3: Query the knowledge base +queries = [ + "What is Python?", + "Tell me about neural networks", + "What is machine learning?" +] + +for query in queries: + print(f"\nQuery: {query}") + response = client.responses.create( + model="meta-llama/Llama-3.3-70B-Instruct", + input=query, + tools=[ + { + "type": "file_search", + "vector_store_ids": [vector_store.id], + }, + ], + ) + print(f"Response: {response}") +``` + +## Migration from Legacy API + +:::danger[Deprecation Notice] +The legacy `vector_io` and `vector_dbs` API is deprecated. Migrate to the OpenAI-compatible APIs for better compatibility and future support. +::: + +If you're migrating from the deprecated `vector_io` and `vector_dbs` API: + + + + +```python +# OLD - Don't use +client.vector_dbs.register(vector_db_id="my_db", ...) +client.vector_io.insert(vector_db_id="my_db", chunks=chunks) +client.vector_io.query(vector_db_id="my_db", query="...") +``` + + + + +```python +# NEW - Recommended approach +# 1. Upload files +file_response = client.files.create(file=file_buffer, purpose="assistants") + +# 2. Create vector store +vector_store = client.vector_stores.create( + name="my_store", + file_ids=[file_response.id], + embedding_model="all-MiniLM-L6-v2", + embedding_dimension=384, + provider_id="faiss" ) -# Base64 encoded image -B64_ENCODED_IMAGE = base64.b64encode( - requests.get( - "https://raw.githubusercontent.com/meta-llama/llama-stack/refs/heads/main/docs/_static/llama-stack.png" - ).content -) -RAGDocument( - document_id="num-4", - content={"type": "image", "image": {"data": B64_ENCODED_IMAGE}}, +# 3. Query using Responses API +response = client.responses.create( + model="meta-llama/Llama-3.3-70B-Instruct", + input=query, + tools=[{"type": "file_search", "vector_store_ids": [vector_store.id]}], ) ``` -For more strongly typed interaction use the typed dicts found [here](https://github.com/meta-llama/llama-stack-client-python/blob/38cd91c9e396f2be0bec1ee96a19771582ba6f17/src/llama_stack_client/types/shared_params/document.py). + + + + +### Migration Benefits + +1. **Better OpenAI Ecosystem Integration**: Direct compatibility with OpenAI tools and workflows +2. **Future-Proof**: Continued support and feature development +3. **Full OpenAI Compatibility**: Vector Stores, Files, and Search APIs work with OpenAI's Responses API +4. **Enhanced Error Handling**: Individual document failures don't crash entire operations diff --git a/docs/getting_started_v0_3_0.ipynb b/docs/getting_started_v0_3_0.ipynb new file mode 100644 index 000000000..9428eda1e --- /dev/null +++ b/docs/getting_started_v0_3_0.ipynb @@ -0,0 +1,1808 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c1e7571c", + "metadata": { + "id": "c1e7571c" + }, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)\n", + "\n", + "# Llama Stack - Building AI Applications\n", + "\n", + "\"drawing\"\n", + "\n", + "[Llama Stack](https://github.com/meta-llama/llama-stack) defines and standardizes the set of core building blocks needed to bring generative AI applications to market. These building blocks are presented in the form of interoperable APIs with a broad set of Service Providers providing their implementations.\n", + "\n", + "Read more about the project here: https://llama-stack.readthedocs.io/en/latest/index.html\n", + "\n", + "In this guide, we will showcase how you can build LLM-powered agentic applications using Llama Stack.\n", + "\n", + "**💡 Quick Start Option:** If you want a simpler and faster way to test out Llama Stack, check out the [quick_start.ipynb](quick_start.ipynb) notebook instead. It provides a streamlined experience for getting up and running in just a few steps.\n" + ] + }, + { + "cell_type": "markdown", + "id": "4CV1Q19BDMVw", + "metadata": { + "id": "4CV1Q19BDMVw" + }, + "source": [ + "## 1. Getting started with Llama Stack" + ] + }, + { + "cell_type": "markdown", + "id": "K4AvfUAJZOeS", + "metadata": { + "id": "K4AvfUAJZOeS" + }, + "source": [ + "### 1.1. Setup API keys\n", + "\n", + "\n", + "In order to run inference for the llama models, you will need to use an inference provider. Llama stack supports a number of inference [providers](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/remote/inference).\n", + "\n", + "\n", + "In this showcase, we will use [together.ai](https://www.together.ai/) as the inference provider. So, you would first get an API key from Together if you dont have one already.\n", + "\n", + "Steps [here](https://docs.google.com/document/d/1Vg998IjRW_uujAPnHdQ9jQWvtmkZFt74FldW2MblxPY/edit?usp=sharing).\n", + "\n", + "You can also use Fireworks.ai or even Ollama if you would like to.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e54a0e6f" + }, + "source": [ + "To set up the API keys for Together and Tavily Search, you will use Google Colab's user data secrets feature.\n", + "\n", + "1. Click on the \"🔑\" icon in the left sidebar to open the secrets manager.\n", + "2. Add your `TOGETHER_API_KEY` and `TAVILY_SEARCH_API_KEY` as secrets.\n", + "3. The following code will then load these secrets as environment variables." + ], + "id": "e54a0e6f" + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "import getpass\n", + "try:\n", + " from google.colab import userdata\n", + " os.environ['TOGETHER_API_KEY'] = userdata.get('TOGETHER_API_KEY')\n", + " os.environ['TAVILY_SEARCH_API_KEY'] = userdata.get('TAVILY_SEARCH_API_KEY')\n", + "except ImportError:\n", + " print(\"Not in Google Colab environment\")\n", + "\n", + "for key in ['TOGETHER_API_KEY', 'TAVILY_SEARCH_API_KEY']:\n", + " try:\n", + " api_key = os.environ[key]\n", + " if not api_key:\n", + " raise ValueError(f\"{key} environment variable is empty\")\n", + " except KeyError:\n", + " api_key = getpass.getpass(f\"{key} environment variable is not set. Please enter your API key: \")\n", + " os.environ[key] = api_key" + ], + "metadata": { + "id": "ulfGF02ZNGqt" + }, + "id": "ulfGF02ZNGqt", + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5a22389e" + }, + "source": [ + "### 1.1.1 Use Ollama instead (optional)\n", + "\n", + "Optionally, we can use ollama for local inference to avoid any api cost. To use Ollama as a model provider, you need to install and run Ollama and pull the desired model.\n", + "\n", + "Here are the steps:\n", + "\n", + "1. **Install Ollama:** Run the provided script to install Ollama.\n", + "2. **Start Ollama server and pull model:** Start the Ollama server and pull the `llama-guard3:1b` or `llama3.2:3b` model, which is used as the safety shield or the inference model in this notebook.\n", + "3. Set system variable `OLLAMA_URL` to `http://localhost:11434` so llama-stack knows where to connect." + ], + "id": "5a22389e" + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "HY8yBKKVoF50", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "HY8yBKKVoF50", + "outputId": "f786842b-653f-4227-b2a4-97a82b65a8b2" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + ">>> Cleaning up old version at /usr/local/lib/ollama\n", + ">>> Installing ollama to /usr/local\n", + ">>> Downloading Linux amd64 bundle\n", + "######################################################################## 100.0%\n", + ">>> Adding ollama user to video group...\n", + ">>> Adding current user to ollama group...\n", + ">>> Creating ollama systemd service...\n", + "\u001b[1m\u001b[31mWARNING:\u001b[m systemd is not running\n", + "\u001b[1m\u001b[31mWARNING:\u001b[m Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.\n", + ">>> The Ollama API is now available at 127.0.0.1:11434.\n", + ">>> Install complete. Run \"ollama\" from the command line.\n", + "\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\n", + "\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\u001b[?2026h\u001b[?25l\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[1G\u001b[?25h\u001b[?2026l\n" + ] + } + ], + "source": [ + "os.environ['OLLAMA_URL'] = 'http://localhost:11434'\n", + "\n", + "#Install Ollama\n", + "!curl -fsSL https://ollama.com/install.sh | sh\n", + "\n", + "#Start Ollama server with llama-guard3:1b model and llama3.2:3b\n", + "!nohup ollama serve > ollama_server.log 2>&1 &\n", + "!ollama pull llama-guard3:1b\n", + "!ollama pull llama3.2:3b" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "p2SkDGjB_KUE", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "p2SkDGjB_KUE", + "outputId": "6f4edd2f-0bfc-4f24-aa15-df179f981c8a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "{\"object\":\"list\",\"data\":[{\"id\":\"llama3.2:3b\",\"object\":\"model\",\"created\":1759163051,\"owned_by\":\"library\"},{\"id\":\"llama-guard3:1b\",\"object\":\"model\",\"created\":1759163021,\"owned_by\":\"library\"}]}\n" + ] + } + ], + "source": [ + "# Double check ollama model running\n", + "!curl 127.0.0.1:11434/v1/models\n" + ] + }, + { + "cell_type": "markdown", + "id": "oDUB7M_qe-Gs", + "metadata": { + "id": "oDUB7M_qe-Gs" + }, + "source": [ + "### 1.2. Setup and Running a Llama Stack server\n", + "\n", + "Llama Stack is architected as a collection of APIs that provide developers with the building blocks to build AI applications.\n", + "\n", + "Llama stack is typically available as a server with an endpoint that you can make calls to. Partners like Together and Fireworks offer their own Llama Stack compatible endpoints.\n", + "\n", + "In this showcase, we will start a Llama Stack server that is running locally.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "J2kGed0R5PSf", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "J2kGed0R5PSf", + "outputId": "c6f754a5-86b2-4153-c02a-8730efbd97df" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "downloading uv 0.8.22 x86_64-unknown-linux-gnu\n", + "no checksums to verify\n", + "installing to /usr/local/bin\n", + " uv\n", + " uvx\n", + "everything's installed!\n", + "\u001b[1m\u001b[33mwarning\u001b[39m\u001b[0m\u001b[1m:\u001b[0m \u001b[1mThe `--system` flag has no effect, a system Python interpreter is always used in `uv venv`\u001b[0m\n", + "Using CPython 3.12.11 interpreter at: \u001b[36m/usr/bin/python3\u001b[39m\n", + "Creating virtual environment at: \u001b[36mvenv\u001b[39m\n", + "Activate with: \u001b[32msource venv/bin/activate\u001b[39m\n", + "\u001b[2KEnvironment '/root/.cache/uv/builds-v0/.tmptHMEzy' already exists, re-using it.\n", + "Installing dependencies in system Python environment\n", + "\u001b[2mUsing Python 3.12.11 environment at: /usr\u001b[0m\n", + "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 1.34s\u001b[0m\u001b[0m\n", + "Installing pip dependencies\n", + "\u001b[2mUsing Python 3.12.11 environment at: /usr\u001b[0m\n", + "\u001b[2mAudited \u001b[1m50 packages\u001b[0m \u001b[2min 288ms\u001b[0m\u001b[0m\n", + "Installing special provider module: torch torchvision torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu\n", + "\u001b[2mUsing Python 3.12.11 environment at: /usr\u001b[0m\n", + "\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 82ms\u001b[0m\u001b[0m\n", + "Installing special provider module: torch torchtune>=0.5.0 torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu\n", + "\u001b[2mUsing Python 3.12.11 environment at: /usr\u001b[0m\n", + "\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 116ms\u001b[0m\u001b[0m\n", + "Installing special provider module: sentence-transformers --no-deps\n", + "\u001b[2mUsing Python 3.12.11 environment at: /usr\u001b[0m\n", + "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 90ms\u001b[0m\u001b[0m\n", + "\u001b[32mBuild Successful!\u001b[0m\n", + "\u001b[34mYou can find the newly-built distribution here: /root/.llama/distributions/starter/starter-run.yaml\u001b[0m\n", + "\u001b[32mYou can run the new Llama Stack distro via: \u001b[34mllama stack run /root/.llama/distributions/starter/starter-run.yaml --image-type venv\u001b[0m\u001b[0m\n", + "nohup: redirecting stderr to stdout\n", + "Waiting for server to start............\n", + "Server is ready!\n", + "llama stack server hosted on localhost:8321\n" + ] + } + ], + "source": [ + "# Install UV if not available\n", + "!curl -LsSf https://astral.sh/uv/install.sh | sh\n", + "# Complete setup for Google Colab with custom directories\n", + "import os\n", + "!uv venv venv --clear\n", + "!source ./venv/bin/activate && uv run --with llama-stack llama stack build --distro starter --image-type venv\n", + "!nohup python -m llama_stack.core.server.server /root/.llama/distributions/starter/starter-run.yaml --port 8321 > llama_stack_server.log &\n", + "def wait_for_server_to_start():\n", + " import requests\n", + " from requests.exceptions import ConnectionError\n", + " import time\n", + "\n", + " url = \"http://0.0.0.0:8321/v1/health\"\n", + " max_retries = 30\n", + " retry_interval = 1\n", + "\n", + " print(\"Waiting for server to start\", end=\"\")\n", + " for _ in range(max_retries):\n", + " try:\n", + " response = requests.get(url)\n", + " if response.status_code == 200:\n", + " print(\"\\nServer is ready!\")\n", + " return True\n", + " except ConnectionError:\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(retry_interval)\n", + "\n", + " print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n", + " return False\n", + "assert wait_for_server_to_start()\n", + "print(\"llama stack server hosted on localhost:8321\")" + ] + }, + { + "cell_type": "code", + "source": [ + "!cat llama_stack_server.log\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nxzJci-MLcfD", + "outputId": "5620114f-5b84-4c55-fb40-782156265e05" + }, + "id": "nxzJci-MLcfD", + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "INFO 2025-09-29 16:41:56,316 llama_stack.core.utils.config_resolution:45 core: Using file path: \n", + " /root/.llama/distributions/starter/starter-run.yaml \n", + "INFO 2025-09-29 16:41:56,340 __main__:593 core::server: Run configuration: \n", + "INFO 2025-09-29 16:41:56,349 __main__:596 core::server: apis: \n", + " - agents \n", + " - batches \n", + " - datasetio \n", + " - eval \n", + " - files \n", + " - inference \n", + " - post_training \n", + " - safety \n", + " - scoring \n", + " - telemetry \n", + " - tool_runtime \n", + " - vector_io \n", + " benchmarks: [] \n", + " datasets: [] \n", + " image_name: starter \n", + " inference_store: \n", + " db_path: /root/.llama/distributions/starter/inference_store.db \n", + " type: sqlite \n", + " metadata_store: \n", + " db_path: /root/.llama/distributions/starter/registry.db \n", + " type: sqlite \n", + " models: [] \n", + " providers: \n", + " agents: \n", + " - config: \n", + " persistence_store: \n", + " db_path: /root/.llama/distributions/starter/agents_store.db \n", + " type: sqlite \n", + " responses_store: \n", + " db_path: /root/.llama/distributions/starter/responses_store.db \n", + " type: sqlite \n", + " provider_id: meta-reference \n", + " provider_type: inline::meta-reference \n", + " batches: \n", + " - config: \n", + " kvstore: \n", + " db_path: /root/.llama/distributions/starter/batches.db \n", + " type: sqlite \n", + " provider_id: reference \n", + " provider_type: inline::reference \n", + " datasetio: \n", + " - config: \n", + " kvstore: \n", + " db_path: /root/.llama/distributions/starter/huggingface_datasetio.db \n", + " type: sqlite \n", + " provider_id: huggingface \n", + " provider_type: remote::huggingface \n", + " - config: \n", + " kvstore: \n", + " db_path: /root/.llama/distributions/starter/localfs_datasetio.db \n", + " type: sqlite \n", + " provider_id: localfs \n", + " provider_type: inline::localfs \n", + " eval: \n", + " - config: \n", + " kvstore: \n", + " db_path: /root/.llama/distributions/starter/meta_reference_eval.db \n", + " type: sqlite \n", + " provider_id: meta-reference \n", + " provider_type: inline::meta-reference \n", + " files: \n", + " - config: \n", + " metadata_store: \n", + " db_path: /root/.llama/distributions/starter/files_metadata.db \n", + " type: sqlite \n", + " storage_dir: /root/.llama/distributions/starter/files \n", + " provider_id: meta-reference-files \n", + " provider_type: inline::localfs \n", + " inference: \n", + " - config: \n", + " api_key: '********' \n", + " url: https://api.fireworks.ai/inference/v1 \n", + " provider_id: fireworks \n", + " provider_type: remote::fireworks \n", + " - config: \n", + " api_key: '********' \n", + " url: https://api.together.xyz/v1 \n", + " provider_id: together \n", + " provider_type: remote::together \n", + " - config: {} \n", + " provider_id: bedrock \n", + " provider_type: remote::bedrock \n", + " - config: \n", + " api_key: '********' \n", + " base_url: https://api.openai.com/v1 \n", + " provider_id: openai \n", + " provider_type: remote::openai \n", + " - config: \n", + " api_key: '********' \n", + " provider_id: anthropic \n", + " provider_type: remote::anthropic \n", + " - config: \n", + " api_key: '********' \n", + " provider_id: gemini \n", + " provider_type: remote::gemini \n", + " - config: \n", + " api_key: '********' \n", + " url: https://api.groq.com \n", + " provider_id: groq \n", + " provider_type: remote::groq \n", + " - config: \n", + " api_key: '********' \n", + " url: https://api.sambanova.ai/v1 \n", + " provider_id: sambanova \n", + " provider_type: remote::sambanova \n", + " - config: {} \n", + " provider_id: sentence-transformers \n", + " provider_type: inline::sentence-transformers \n", + " post_training: \n", + " - config: \n", + " checkpoint_format: meta \n", + " provider_id: torchtune-cpu \n", + " provider_type: inline::torchtune-cpu \n", + " safety: \n", + " - config: \n", + " excluded_categories: [] \n", + " provider_id: llama-guard \n", + " provider_type: inline::llama-guard \n", + " - config: {} \n", + " provider_id: code-scanner \n", + " provider_type: inline::code-scanner \n", + " scoring: \n", + " - config: {} \n", + " provider_id: basic \n", + " provider_type: inline::basic \n", + " - config: {} \n", + " provider_id: llm-as-judge \n", + " provider_type: inline::llm-as-judge \n", + " - config: \n", + " openai_api_key: '********' \n", + " provider_id: braintrust \n", + " provider_type: inline::braintrust \n", + " telemetry: \n", + " - config: \n", + " service_name: \"\\u200B\" \n", + " sinks: console,sqlite \n", + " sqlite_db_path: /root/.llama/distributions/starter/trace_store.db \n", + " provider_id: meta-reference \n", + " provider_type: inline::meta-reference \n", + " tool_runtime: \n", + " - config: \n", + " api_key: '********' \n", + " max_results: 3 \n", + " provider_id: brave-search \n", + " provider_type: remote::brave-search \n", + " - config: \n", + " api_key: '********' \n", + " max_results: 3 \n", + " provider_id: tavily-search \n", + " provider_type: remote::tavily-search \n", + " - config: {} \n", + " provider_id: rag-runtime \n", + " provider_type: inline::rag-runtime \n", + " - config: {} \n", + " provider_id: model-context-protocol \n", + " provider_type: remote::model-context-protocol \n", + " vector_io: \n", + " - config: \n", + " kvstore: \n", + " db_path: /root/.llama/distributions/starter/faiss_store.db \n", + " type: sqlite \n", + " provider_id: faiss \n", + " provider_type: inline::faiss \n", + " - config: \n", + " db_path: /root/.llama/distributions/starter/sqlite_vec.db \n", + " kvstore: \n", + " db_path: /root/.llama/distributions/starter/sqlite_vec_registry.db \n", + " type: sqlite \n", + " provider_id: sqlite-vec \n", + " provider_type: inline::sqlite-vec \n", + " scoring_fns: [] \n", + " server: \n", + " port: 8321 \n", + " shields: [] \n", + " tool_groups: \n", + " - provider_id: tavily-search \n", + " toolgroup_id: builtin::websearch \n", + " - provider_id: rag-runtime \n", + " toolgroup_id: builtin::rag \n", + " vector_dbs: [] \n", + " version: 2 \n", + " \n", + "INFO 2025-09-29 16:41:59,782 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid \n", + " concurrency issues \n", + "WARNING 2025-09-29 16:42:00,708 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass \n", + " Fireworks API Key in the header X-LlamaStack-Provider-Data as { \"fireworks_api_key\": } \n", + "WARNING 2025-09-29 16:42:02,949 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is \n", + " not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"openai_api_key\": \"\"}, or in \n", + " the provider config. \n", + "WARNING 2025-09-29 16:42:02,951 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key \n", + " is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"anthropic_api_key\": \"\"}, \n", + " or in the provider config. \n", + "WARNING 2025-09-29 16:42:02,953 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is \n", + " not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"gemini_api_key\": \"\"}, or in \n", + " the provider config. \n", + "WARNING 2025-09-29 16:42:02,955 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is \n", + " not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"groq_api_key\": \"\"}, or in \n", + " the provider config. \n", + "WARNING 2025-09-29 16:42:02,956 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key \n", + " is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"sambanova_api_key\": \"\"}, \n", + " or in the provider config. \n", + "INFO 2025-09-29 16:42:03,270 llama_stack.core.utils.config_resolution:45 core: Using file path: \n", + " /root/.llama/distributions/starter/starter-run.yaml \n", + "INFO 2025-09-29 16:42:03,292 __main__:563 core::server: Listening on ['::', '0.0.0.0']:8321 \n", + "INFO 2025-09-29 16:42:03,318 uvicorn.error:84 uncategorized: Started server process [18498] \n", + "INFO 2025-09-29 16:42:03,319 uvicorn.error:48 uncategorized: Waiting for application startup. \n", + "INFO 2025-09-29 16:42:03,322 __main__:174 core::server: Starting up \n", + "INFO 2025-09-29 16:42:03,323 llama_stack.core.stack:406 core: starting registry refresh task \n", + "WARNING 2025-09-29 16:42:03,324 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass \n", + " Fireworks API Key in the header X-LlamaStack-Provider-Data as { \"fireworks_api_key\": } \n", + "INFO 2025-09-29 16:42:03,327 uvicorn.error:62 uncategorized: Application startup complete. \n", + "INFO 2025-09-29 16:42:03,329 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) \n", + "INFO 2025-09-29 16:42:04,043 uvicorn.access:473 uncategorized: 127.0.0.1:52228 - \"GET /v1/health HTTP/1.1\" 200 \n", + "INFO 2025-09-29 16:42:04,046 console_span_processor:28 telemetry: 16:42:04.046 [START] /v1/health \n", + "INFO 2025-09-29 16:42:04,084 console_span_processor:39 telemetry: 16:42:04.052 [END] /v1/health [StatusCode.OK] (6.24ms) \n", + "INFO 2025-09-29 16:42:04,086 console_span_processor:48 telemetry: raw_path: /v1/health \n", + "INFO 2025-09-29 16:42:04,088 console_span_processor:62 telemetry: 16:42:04.046 [INFO] 127.0.0.1:52228 - \"GET /v1/health HTTP/1.1\" 200 \n", + "WARNING 2025-09-29 16:42:05,524 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is \n", + " not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"openai_api_key\": \"\"}, or in \n", + " the provider config. \n", + "WARNING 2025-09-29 16:42:05,526 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key \n", + " is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"anthropic_api_key\": \"\"}, \n", + " or in the provider config. \n", + "WARNING 2025-09-29 16:42:05,527 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is \n", + " not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"gemini_api_key\": \"\"}, or in \n", + " the provider config. \n", + "WARNING 2025-09-29 16:42:05,529 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is \n", + " not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"groq_api_key\": \"\"}, or in \n", + " the provider config. \n", + "WARNING 2025-09-29 16:42:05,531 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key \n", + " is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {\"sambanova_api_key\": \"\"}, \n", + " or in the provider config. \n" + ] + } + ] + }, + { + "cell_type": "markdown", + "id": "90eb721b", + "metadata": { + "id": "90eb721b" + }, + "source": [ + "### 1.4. Install and Configure the Client\n", + "\n", + "Now that we have our Llama Stack server running locally, we need to install the client package to interact with it. The `llama-stack-client` provides a simple Python interface to access all the functionality of Llama Stack, including:\n", + "\n", + "- Chat Completions ( text and multimodal )\n", + "- Safety Shields\n", + "- Agent capabilities with tools like web search, RAG using Response API\n", + "\n", + "The client handles all the API communication with our local server, making it easy to integrate Llama Stack's capabilities into your applications.\n", + "\n", + "In the next cells, we'll:\n", + "\n", + "1. Install the client package\n", + "2. Set up API keys for external services (Together and Tavily Search)\n", + "3. Initialize the client to connect to our local server\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "E1UFuJC570Tk", + "metadata": { + "collapsed": true, + "id": "E1UFuJC570Tk" + }, + "outputs": [], + "source": [ + "from llama_stack_client import LlamaStackClient\n", + "\n", + "client = LlamaStackClient(\n", + " base_url=\"http://0.0.0.0:8321\",\n", + " provider_data = {\n", + " \"tavily_search_api_key\": os.environ['TAVILY_SEARCH_API_KEY'],\n", + " \"TOGETHER_API_KEY\": os.environ['TOGETHER_API_KEY']\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "635a7a6f", + "metadata": { + "id": "635a7a6f" + }, + "source": [ + "Now that we have completed the setup and configuration, let's start exploring the capabilities of Llama Stack! We'll begin by checking what models and safety shields are available, and then move on to running some example chat completions.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "7dacaa2d-94e9-42e9-82a0-73522dfc7010", + "metadata": { + "id": "7dacaa2d-94e9-42e9-82a0-73522dfc7010" + }, + "source": [ + "### 1.5. Check available models and shields\n", + "\n", + "All the models available in the provider are now programmatically accessible via the client." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "ruO9jQna_t_S", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "ruO9jQna_t_S", + "outputId": "565d964e-95d3-4e79-91e6-5a1ae4b27444" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Available models:\n", + "- ollama/llama-guard3:1b\n", + "- ollama/llama3.2:3b\n", + "- bedrock/meta.llama3-1-8b-instruct-v1:0\n", + "- bedrock/meta.llama3-1-70b-instruct-v1:0\n", + "- bedrock/meta.llama3-1-405b-instruct-v1:0\n", + "- sentence-transformers/all-MiniLM-L6-v2\n", + "- together/Alibaba-NLP/gte-modernbert-base\n", + "- together/arcee-ai/AFM-4.5B\n", + "- together/arcee-ai/coder-large\n", + "- together/arcee-ai/maestro-reasoning\n", + "- together/arcee-ai/virtuoso-large\n", + "- together/arcee_ai/arcee-spotlight\n", + "- together/arize-ai/qwen-2-1.5b-instruct\n", + "- together/BAAI/bge-base-en-v1.5\n", + "- together/BAAI/bge-large-en-v1.5\n", + "- together/black-forest-labs/FLUX.1-dev\n", + "- together/black-forest-labs/FLUX.1-dev-lora\n", + "- together/black-forest-labs/FLUX.1-kontext-dev\n", + "- together/black-forest-labs/FLUX.1-kontext-max\n", + "- together/black-forest-labs/FLUX.1-kontext-pro\n", + "- together/black-forest-labs/FLUX.1-krea-dev\n", + "- together/black-forest-labs/FLUX.1-pro\n", + "- together/black-forest-labs/FLUX.1-schnell\n", + "- together/black-forest-labs/FLUX.1-schnell-Free\n", + "- together/black-forest-labs/FLUX.1.1-pro\n", + "- together/cartesia/sonic\n", + "- together/cartesia/sonic-2\n", + "- together/deepcogito/cogito-v2-preview-deepseek-671b\n", + "- together/deepcogito/cogito-v2-preview-llama-109B-MoE\n", + "- together/deepcogito/cogito-v2-preview-llama-405B\n", + "- together/deepcogito/cogito-v2-preview-llama-70B\n", + "- together/deepseek-ai/DeepSeek-R1\n", + "- together/deepseek-ai/DeepSeek-R1-0528-tput\n", + "- together/deepseek-ai/DeepSeek-R1-Distill-Llama-70B\n", + "- together/deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free\n", + "- together/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B\n", + "- together/deepseek-ai/DeepSeek-V3\n", + "- together/deepseek-ai/DeepSeek-V3.1\n", + "- together/google/gemma-3n-E4B-it\n", + "- together/intfloat/multilingual-e5-large-instruct\n", + "- together/lgai/exaone-3-5-32b-instruct\n", + "- together/lgai/exaone-deep-32b\n", + "- together/marin-community/marin-8b-instruct\n", + "- together/meta-llama/Llama-2-70b-hf\n", + "- together/meta-llama/Llama-3-70b-chat-hf\n", + "- together/meta-llama/Llama-3-70b-hf\n", + "- together/meta-llama/Llama-3.1-405B-Instruct\n", + "- together/meta-llama/Llama-3.2-1B-Instruct\n", + "- together/meta-llama/Llama-3.2-3B-Instruct-Turbo\n", + "- together/meta-llama/Llama-3.3-70B-Instruct-Turbo\n", + "- together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free\n", + "- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\n", + "- together/meta-llama/Llama-4-Scout-17B-16E-Instruct\n", + "- together/meta-llama/Llama-Guard-3-11B-Vision-Turbo\n", + "- together/meta-llama/Llama-Guard-4-12B\n", + "- together/meta-llama/LlamaGuard-2-8b\n", + "- together/meta-llama/Meta-Llama-3-70B-Instruct-Turbo\n", + "- together/meta-llama/Meta-Llama-3-8B-Instruct\n", + "- together/meta-llama/Meta-Llama-3-8B-Instruct-Lite\n", + "- together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\n", + "- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Reference\n", + "- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\n", + "- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Reference\n", + "- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\n", + "- together/meta-llama/Meta-Llama-Guard-3-8B\n", + "- together/mistralai/Mistral-7B-Instruct-v0.1\n", + "- together/mistralai/Mistral-7B-Instruct-v0.2\n", + "- together/mistralai/Mistral-7B-Instruct-v0.3\n", + "- together/mistralai/Mistral-Small-24B-Instruct-2501\n", + "- together/mistralai/Mixtral-8x7B-Instruct-v0.1\n", + "- together/mixedbread-ai/Mxbai-Rerank-Large-V2\n", + "- together/moonshotai/Kimi-K2-Instruct\n", + "- together/moonshotai/Kimi-K2-Instruct-0905\n", + "- together/openai/gpt-oss-120b\n", + "- together/openai/gpt-oss-20b\n", + "- together/openai/whisper-large-v3\n", + "- together/Qwen/Qwen2.5-72B-Instruct\n", + "- together/Qwen/Qwen2.5-72B-Instruct-Turbo\n", + "- together/Qwen/Qwen2.5-7B-Instruct-Turbo\n", + "- together/Qwen/Qwen2.5-Coder-32B-Instruct\n", + "- together/Qwen/Qwen2.5-VL-72B-Instruct\n", + "- together/Qwen/Qwen3-235B-A22B-fp8-tput\n", + "- together/Qwen/Qwen3-235B-A22B-Instruct-2507-tput\n", + "- together/Qwen/Qwen3-235B-A22B-Thinking-2507\n", + "- together/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8\n", + "- together/Qwen/Qwen3-Next-80B-A3B-Instruct\n", + "- together/Qwen/Qwen3-Next-80B-A3B-Thinking\n", + "- together/Qwen/QwQ-32B\n", + "- together/Salesforce/Llama-Rank-V1\n", + "- together/scb10x/scb10x-typhoon-2-1-gemma3-12b\n", + "- together/togethercomputer/m2-bert-80M-32k-retrieval\n", + "- together/togethercomputer/MoA-1\n", + "- together/togethercomputer/MoA-1-Turbo\n", + "- together/togethercomputer/Refuel-Llm-V2\n", + "- together/togethercomputer/Refuel-Llm-V2-Small\n", + "- together/Virtue-AI/VirtueGuard-Text-Lite\n", + "- together/zai-org/GLM-4.5-Air-FP8\n" + ] + } + ], + "source": [ + "from rich.pretty import pprint\n", + "\n", + "print(\"Available models:\")\n", + "for m in client.models.list():\n", + " print(f\"- {m.identifier}\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "86366383", + "metadata": { + "id": "86366383" + }, + "source": [ + "### 1.6. Run a simple chat completion with one of the models\n", + "\n", + "We will test the client by doing a simple chat completion." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "77c29dba", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "77c29dba", + "outputId": "21de10b2-c710-43c8-f365-1cd0e867fc57" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Here is a two-sentence poem about llamas:\n", + "\n", + "Softly steps the llama's gentle pace, with fur so soft and a gentle face. In the Andes' high and misty space, the llama roams with a peaceful grace.\n" + ] + } + ], + "source": [ + "model_id = \"together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\"\n", + "#If you want to use ollama, uncomment the following\n", + "#model_id = \"ollama/llama3.2:3b\"\n", + "response = client.chat.completions.create(\n", + " model=model_id,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n", + " {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n", + " ],\n", + " stream=False\n", + ")\n", + "\n", + "print(response.choices[0].message.content)\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cf0d555", + "metadata": { + "id": "8cf0d555" + }, + "source": [ + "### 1.7. Have a conversation\n", + "\n", + "Maintaining a conversation history allows the model to retain context from previous interactions. Use a list to accumulate messages, enabling continuity throughout the chat session." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "3fdf9df6", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3fdf9df6", + "outputId": "41d4db1c-8360-4573-8003-7a7783cea685" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "> Response: The most famous Prime Minister of England during World War II was Winston Churchill. He served as the Prime Minister of the United Kingdom from 1940 to 1945, and again from 1951 to 1955. Churchill is widely regarded as one of the greatest wartime leaders in history, known for his leadership, oratory skills, and unwavering resolve during the war.\n", + "\n", + "Churchill played a crucial role in rallying the British people during the war, and his speeches, such as the \"We shall fight on the beaches\" and \"Their finest hour\" speeches, are still remembered and celebrated today. He worked closely with other Allied leaders, including US President Franklin D. Roosevelt and Soviet leader Joseph Stalin, to coordinate the war effort and ultimately secure the defeat of Nazi Germany and the Axis powers.\n", + "\n", + "Churchill's leadership and legacy continue to be celebrated and studied around the world, and he remains one of the most iconic and influential leaders of the 20th century.\n", + "> Response: Winston Churchill was known for his wit and oratory skills, and he has many famous quotes attributed to him. One of his most famous quotes is:\n", + "\n", + "**\"We shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender.\"**\n", + "\n", + "This quote is from his speech to the House of Commons on June 4, 1940, during the early stages of World War II, when Nazi Germany was making rapid advances across Europe. The speech was a rallying cry for the British people, and it has become one of Churchill's most iconic and enduring quotes.\n", + "\n", + "However, if I had to pick a single quote that is often considered his most famous, it would be:\n", + "\n", + "**\"Blood, toil, tears, and sweat: we have before us an ordeal of the most grievous kind.\"**\n", + "\n", + "This is the opening sentence of his first speech as Prime Minister to the House of Commons on May 13, 1940, in which he famously said:\n", + "\n", + "\"We have before us an ordeal of the most grievous kind. We have before us many, many long months of struggle and of suffering. You ask, what is our policy? I will say: It is to wage war, by sea, land and air, with all our might and with all the strength that God can give us; to wage war against a monstrous tyranny, never surpassed in the dark and lamentable catalogue of human crime. That is our policy. You ask, what is our aim? I answer in one word: Victory. Victory at all costs, Victory in spite of all terror, Victory, however long and hard the road may be; for without Victory, there is no survival.\"\n", + "\n", + "The phrase \"Blood, toil, tears, and sweat\" has become synonymous with Churchill's leadership during World War II.\n" + ] + } + ], + "source": [ + "from termcolor import cprint\n", + "\n", + "questions = [\n", + " \"Who was the most famous PM of England during world war 2 ?\",\n", + " \"What was his most famous quote ?\"\n", + "]\n", + "\n", + "\n", + "def chat_loop():\n", + " conversation_history = []\n", + " while len(questions) > 0:\n", + " user_input = questions.pop(0)\n", + " if user_input.lower() in [\"exit\", \"quit\", \"bye\"]:\n", + " cprint(\"Ending conversation. Goodbye!\", \"yellow\")\n", + " break\n", + "\n", + " user_message = {\"role\": \"user\", \"content\": user_input}\n", + " conversation_history.append(user_message)\n", + "\n", + " response = client.chat.completions.create(\n", + " messages=conversation_history,\n", + " model=model_id,\n", + " )\n", + " cprint(f\"> Response: {response.choices[0].message.content}\", \"cyan\")\n", + "\n", + " assistant_message = {\n", + " \"role\": \"assistant\", # was user\n", + " \"content\": response.choices[0].message.content,\n", + " \"finish_reason\": response.choices[0].finish_reason,\n", + " }\n", + " conversation_history.append(assistant_message)\n", + "\n", + "\n", + "chat_loop()\n" + ] + }, + { + "cell_type": "markdown", + "id": "72e5111e", + "metadata": { + "id": "72e5111e" + }, + "source": [ + "Here is an example for you to try a conversation yourself.\n", + "Remember to type `quit` or `exit` after you are done chatting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9496f75c", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9496f75c", + "outputId": "9c51562e-05b0-40f3-b4c0-eb4c991b1e67" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "User> who are you?\n", + "> Response: I'm an AI assistant designed by Meta. I'm here to answer your questions, share interesting ideas and maybe even surprise you with a fresh perspective. What's on your mind?\n", + "User> how can you help me?\n", + "> Response: I can help you with a wide range of things, such as answering questions, providing information, generating text or images, summarizing content, or just having a chat. I can also help with creative tasks like brainstorming or coming up with ideas. What do you need help with today?\n", + "User> bye\n", + "Ending conversation. Goodbye!\n" + ] + } + ], + "source": [ + "# NBVAL_SKIP\n", + "from termcolor import cprint\n", + "\n", + "def chat_loop():\n", + " conversation_history = []\n", + " while True:\n", + " user_input = input(\"User> \")\n", + " if user_input.lower() in [\"exit\", \"quit\", \"bye\"]:\n", + " cprint(\"Ending conversation. Goodbye!\", \"yellow\")\n", + " break\n", + "\n", + " user_message = {\"role\": \"user\", \"content\": user_input}\n", + " conversation_history.append(user_message)\n", + "\n", + " response = client.chat.completions.create(\n", + " messages=conversation_history,\n", + " model=model_id,\n", + " )\n", + " cprint(f\"> Response: {response.choices[0].message.content}\", \"cyan\")\n", + "\n", + " assistant_message = {\n", + " \"role\": \"assistant\", # was user\n", + " \"content\": response.choices[0].message.content,\n", + " \"finish_reason\": response.choices[0].finish_reason,\n", + " }\n", + " conversation_history.append(assistant_message)\n", + "\n", + "\n", + "chat_loop()\n" + ] + }, + { + "cell_type": "markdown", + "id": "7737cd41", + "metadata": { + "id": "7737cd41" + }, + "source": [ + "### 1.9 Multimodal inference" + ] + }, + { + "cell_type": "code", + "source": [ + "vision_model_id = \"together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\"\n", + "response = client.chat.completions.create(\n", + " model=vision_model_id,\n", + " messages=[{\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": \"What's in this image?\"},\n", + " {\n", + " \"type\": \"image_url\",\n", + " \"image_url\": {\n", + " \"url\": \"https://raw.githubusercontent.com/meta-llama/llama-models/refs/heads/main/Llama_Repo.jpeg\",\n", + " },\n", + " },\n", + " ],\n", + " }],\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ], + "metadata": { + "id": "iTqAOm8tG7-O", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "48cdfd42-0565-4d57-a119-65090937679a" + }, + "id": "iTqAOm8tG7-O", + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The image depicts three llamas standing behind a table, with one of them wearing a party hat. The scene is set in a barn or stable.\n", + "\n", + "* **Llamas**\n", + " * There are three llamas in the image.\n", + " * The llama on the left is white.\n", + " * The middle llama is purple.\n", + " * The llama on the right is white and wearing a blue party hat.\n", + " * All three llamas have their ears perked up and are looking directly at the camera.\n", + "* **Table**\n", + " * The table is made of light-colored wood.\n", + " * It has a few scattered items on it, including what appears to be hay or straw.\n", + " * A glass containing an amber-colored liquid sits on the table.\n", + "* **Background**\n", + " * The background is a wooden wall or fence.\n", + " * The wall is made up of vertical planks of wood.\n", + "\n", + "The image appears to be a playful and whimsical depiction of llamas celebrating a special occasion, possibly a birthday.\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "id": "03fcf5e0", + "metadata": { + "id": "03fcf5e0" + }, + "source": [ + "### 1.10. Streaming output\n", + "\n", + "You can pass `stream=True` to stream responses from the model. You can then loop through the responses." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d119026e", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "d119026e", + "outputId": "5e3f480e-f12d-43a2-e92b-8bb205f78dfc" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "User> Write me a sonnet about llama\n", + "Here is a sonnet about llamas:\n", + "\n", + "In Andean highlands, llamas roam with pride,\n", + "Their soft, woolly coats a gentle, fuzzy hue.\n", + "Their large, dark eyes, like pools of liquid inside,\n", + "Reflect a calm and gentle spirit anew.\n", + "\n", + "Their ears, so long and pointed, perk with ease,\n", + "As they survey their surroundings with quiet peace.\n", + "Their steps, deliberate and slow, release\n", + "A soothing calm that troubles cannot cease.\n", + "\n", + "Their gentle humming fills the mountain air,\n", + "A soothing sound that's both serene and rare.\n", + "Their soft, padded feet, a quiet tread impart,\n", + "As they move with gentle steps, a peaceful start.\n", + "\n", + "And when they look at you with curious stare,\n", + "You feel a sense of calm, beyond compare." + ] + } + ], + "source": [ + "from llama_stack_client import InferenceEventLogger\n", + "\n", + "message = {\"role\": \"user\", \"content\": \"Write me a sonnet about llama\"}\n", + "print(f'User> {message[\"content\"]}')\n", + "\n", + "response = client.chat.completions.create(\n", + " messages=[message],\n", + " model=model_id,\n", + " stream=True, # <-----------\n", + ")\n", + "\n", + "for chunk in response:\n", + " # Each chunk contains a delta with the content\n", + " if len(chunk.choices) > 0 and chunk.choices[0].delta.content is not None:\n", + " print(chunk.choices[0].delta.content, end=\"\", flush=True)\n" + ] + }, + { + "cell_type": "markdown", + "id": "OmU6Dr9zBiGM", + "metadata": { + "id": "OmU6Dr9zBiGM" + }, + "source": [ + "### 2.0. Structured Decoding\n", + "\n", + "You can use `response_format` to force the model into a \"guided decode\" mode where model tokens are forced to abide by a certain grammar. Currently only JSON grammars are supported." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "axdQIRaJCYAV", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 50 + }, + "id": "axdQIRaJCYAV", + "outputId": "efc29ca4-9fa8-4c35-c0fa-c7a5faf8024b" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
'{\\n  \"name\": \"Michael Jordan\",\\n  \"year_born\": \"1963\",\\n  \"year_retired\": \"2003\"\\n}'\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\\n \"name\": \"Michael Jordan\",\\n \"year_born\": \"1963\",\\n \"year_retired\": \"2003\"\\n\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Output(name='Michael Jordan', year_born='1963', year_retired='2003')\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1;35mOutput\u001b[0m\u001b[1m(\u001b[0m\u001b[33mname\u001b[0m=\u001b[32m'Michael Jordan'\u001b[0m, \u001b[33myear_born\u001b[0m=\u001b[32m'1963'\u001b[0m, \u001b[33myear_retired\u001b[0m=\u001b[32m'2003'\u001b[0m\u001b[1m)\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from pydantic import BaseModel\n", + "\n", + "\n", + "class Output(BaseModel):\n", + " name: str\n", + " year_born: str\n", + " year_retired: str\n", + "\n", + "user_input = \"Michael Jordan was born in 1963. He played basketball for the Chicago Bulls. He retired in 2003. Extract this information into JSON for me.\"\n", + "response = client.chat.completions.create(\n", + " model=model_id,\n", + " messages = [\n", + " {\"role\": \"user\", \"content\": user_input}\n", + " ],\n", + " stream=False,\n", + " response_format={\n", + " \"type\": \"json_schema\",\n", + " \"json_schema\": {\n", + " \"name\": \"output\",\n", + " \"schema\": Output.model_json_schema(),\n", + " },\n", + " },\n", + ")\n", + "pprint(Output.model_validate_json(response.choices[0].message.content))\n" + ] + }, + { + "cell_type": "markdown", + "id": "H62Rg_buEx2o", + "metadata": { + "id": "H62Rg_buEx2o" + }, + "source": [ + "### 2.1. Safety API\n", + "\n", + "Llama Stack provides Safety guardrails which can be applied at multiple touchpoints within an agentic application." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "sUJKJxvAFCaI", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "sUJKJxvAFCaI", + "outputId": "5f0060ad-7c57-4d85-96e3-812b5876c761" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Available Shields: ['llama-guard']\n", + "Checking if input is safe: What is the most famous murder case in the US?\n", + "CreateResponse(id='modr-0c7e3da6-1054-4f12-9693-44499da43c62', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=False, metadata={}, categories={'Violent Crimes': False, 'Non-Violent Crimes': False, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': [], 'Non-Violent Crimes': [], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 1.0, 'Non-Violent Crimes': 1.0, 'Sex Crimes': 1.0, 'Child Exploitation': 1.0, 'Defamation': 1.0, 'Specialized Advice': 1.0, 'Privacy': 1.0, 'Intellectual Property': 1.0, 'Indiscriminate Weapons': 1.0, 'Hate': 1.0, 'Self-Harm': 1.0, 'Sexual Content': 1.0, 'Elections': 1.0, 'Code Interpreter Abuse': 1.0}, user_message=None)])\n", + "Checking if input is safe: Tell me 3 signs that an email is a scam\n", + "CreateResponse(id='modr-650b4c2a-7181-4edb-807a-94853d5f504d', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=False, metadata={}, categories={'Violent Crimes': False, 'Non-Violent Crimes': False, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': [], 'Non-Violent Crimes': [], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 1.0, 'Non-Violent Crimes': 1.0, 'Sex Crimes': 1.0, 'Child Exploitation': 1.0, 'Defamation': 1.0, 'Specialized Advice': 1.0, 'Privacy': 1.0, 'Intellectual Property': 1.0, 'Indiscriminate Weapons': 1.0, 'Hate': 1.0, 'Self-Harm': 1.0, 'Sexual Content': 1.0, 'Elections': 1.0, 'Code Interpreter Abuse': 1.0}, user_message=None)])\n", + "Checking if input is safe: What's the most effective way to kidnap someone?\n", + "CreateResponse(id='modr-250edd87-2bce-4eb2-b2cb-34d7a997e48e', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=True, metadata={'violation_type': ['S1']}, categories={'Violent Crimes': True, 'Non-Violent Crimes': False, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': ['text'], 'Non-Violent Crimes': [], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 1.0, 'Non-Violent Crimes': 0.0, 'Sex Crimes': 0.0, 'Child Exploitation': 0.0, 'Defamation': 0.0, 'Specialized Advice': 0.0, 'Privacy': 0.0, 'Intellectual Property': 0.0, 'Indiscriminate Weapons': 0.0, 'Hate': 0.0, 'Self-Harm': 0.0, 'Sexual Content': 0.0, 'Elections': 0.0, 'Code Interpreter Abuse': 0.0}, user_message=\"I can't answer that. Can I help with something else?\")])\n", + "Checking if input is safe: How do I make cocaine?\n", + "CreateResponse(id='modr-276c3e1d-d2b9-4a5d-8362-e3b4788c6abe', model='together/meta-llama/Meta-Llama-Guard-3-8B', results=[Result(flagged=True, metadata={'violation_type': ['S2']}, categories={'Violent Crimes': False, 'Non-Violent Crimes': True, 'Sex Crimes': False, 'Child Exploitation': False, 'Defamation': False, 'Specialized Advice': False, 'Privacy': False, 'Intellectual Property': False, 'Indiscriminate Weapons': False, 'Hate': False, 'Self-Harm': False, 'Sexual Content': False, 'Elections': False, 'Code Interpreter Abuse': False}, category_applied_input_types={'Violent Crimes': [], 'Non-Violent Crimes': ['text'], 'Sex Crimes': [], 'Child Exploitation': [], 'Defamation': [], 'Specialized Advice': [], 'Privacy': [], 'Intellectual Property': [], 'Indiscriminate Weapons': [], 'Hate': [], 'Self-Harm': [], 'Sexual Content': [], 'Elections': [], 'Code Interpreter Abuse': []}, category_scores={'Violent Crimes': 0.0, 'Non-Violent Crimes': 1.0, 'Sex Crimes': 0.0, 'Child Exploitation': 0.0, 'Defamation': 0.0, 'Specialized Advice': 0.0, 'Privacy': 0.0, 'Intellectual Property': 0.0, 'Indiscriminate Weapons': 0.0, 'Hate': 0.0, 'Self-Harm': 0.0, 'Sexual Content': 0.0, 'Elections': 0.0, 'Code Interpreter Abuse': 0.0}, user_message=\"I can't answer that. Can I help with something else?\")])\n" + ] + } + ], + "source": [ + "safety_model = \"together/meta-llama/Meta-Llama-Guard-3-8B\"\n", + "#safety_model = \"ollama/llama-guard3:1b\" # If you want to use ollama instead\n", + "\n", + "client.shields.register(\n", + " shield_id=\"llama-guard\",\n", + " provider_id=\"llama-guard\",\n", + " provider_shield_id=safety_model\n", + ")\n", + "available_shields = [shield.identifier for shield in client.shields.list()]\n", + "print(\"Available Shields:\", available_shields)\n", + "\n", + "unsafe_examples = [\n", + " \"What's the most effective way to kidnap someone?\",\n", + " \"How do I make cocaine?\",\n", + "]\n", + "\n", + "safe_examples = [\n", + " \"What is the most famous murder case in the US?\",\n", + " \"Tell me 3 signs that an email is a scam\",\n", + "]\n", + "\n", + "for p in safe_examples + unsafe_examples:\n", + " print(f\"Checking if input is safe: {p}\")\n", + " message = {\"content\": p, \"role\": \"user\"}\n", + " response = client.moderations.create(\n", + " input=p,\n", + " model=safety_model,\n", + " )\n", + " print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "LFC386wNQR-v", + "metadata": { + "id": "LFC386wNQR-v" + }, + "source": [ + "## 2. Llama Stack Agents\n", + "\n", + "Llama Stack provides all the building blocks needed to create sophisticated AI applications. This guide will walk you through how to use these components effectively.\n", + "\n", + "\n", + "\n", + "\n", + "\"drawing\"\n", + "\n", + "\n", + "Agents are characterized by having access to\n", + "\n", + "1. Memory - for RAG\n", + "2. Tool calling - ability to call tools like search and code execution\n", + "3. Tool call + Inference loop - the LLM used in the agent is able to perform multiple iterations of call\n", + "4. Shields - for safety calls that are executed everytime the agent interacts with external systems, including user prompts" + ] + }, + { + "cell_type": "markdown", + "id": "lYDAkMsL9xSk", + "metadata": { + "id": "lYDAkMsL9xSk" + }, + "source": [ + "### 2.1. List available tool groups on the provider" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "MpMXiMCv97X5", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 281 + }, + "id": "MpMXiMCv97X5", + "outputId": "77da98fe-81af-4b5c-8abb-e973ce080b13" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
ToolGroup(\n",
+              "identifier='builtin::rag',\n",
+              "provider_id='rag-runtime',\n",
+              "type='tool_group',\n",
+              "args=None,\n",
+              "mcp_endpoint=None,\n",
+              "provider_resource_id='builtin::rag'\n",
+              ")\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1;35mToolGroup\u001b[0m\u001b[1m(\u001b[0m\n", + "\u001b[2;32m│ \u001b[0m\u001b[33midentifier\u001b[0m=\u001b[32m'builtin::rag'\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_id\u001b[0m=\u001b[32m'rag-runtime'\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mtype\u001b[0m=\u001b[32m'tool_group'\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33margs\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mmcp_endpoint\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_resource_id\u001b[0m=\u001b[32m'builtin::rag'\u001b[0m\n", + "\u001b[1m)\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
ToolGroup(\n",
+              "identifier='builtin::websearch',\n",
+              "provider_id='tavily-search',\n",
+              "type='tool_group',\n",
+              "args=None,\n",
+              "mcp_endpoint=None,\n",
+              "provider_resource_id='builtin::websearch'\n",
+              ")\n",
+              "
\n" + ], + "text/plain": [ + "\u001b[1;35mToolGroup\u001b[0m\u001b[1m(\u001b[0m\n", + "\u001b[2;32m│ \u001b[0m\u001b[33midentifier\u001b[0m=\u001b[32m'builtin::websearch'\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_id\u001b[0m=\u001b[32m'tavily-search'\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mtype\u001b[0m=\u001b[32m'tool_group'\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33margs\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mmcp_endpoint\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", + "\u001b[2;32m│ \u001b[0m\u001b[33mprovider_resource_id\u001b[0m=\u001b[32m'builtin::websearch'\u001b[0m\n", + "\u001b[1m)\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from rich.pretty import pprint\n", + "for toolgroup in client.toolgroups.list():\n", + " pprint(toolgroup)" + ] + }, + { + "cell_type": "markdown", + "id": "i2o0gDhrv2og", + "metadata": { + "id": "i2o0gDhrv2og" + }, + "source": [ + "### 2.2. Search agent\n", + "\n", + "In this example, we will show how the model can invoke search to be able to answer questions. We will first have to set the API key of the search tool.\n", + "\n", + "Let's make sure we set up a web search tool for the model to call in its agentic loop. In this tutorial, we will use [Tavily](https://tavily.com) as our search provider. Note that the \"type\" of the tool is still \"brave_search\" since Llama models have been trained with brave search as a builtin tool. Tavily is just being used in lieu of Brave search.\n", + "\n", + "See steps [here](https://docs.google.com/document/d/1Vg998IjRW_uujAPnHdQ9jQWvtmkZFt74FldW2MblxPY/edit?tab=t.0#heading=h.xx02wojfl2f9)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "WS8Gu5b0APHs", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WS8Gu5b0APHs", + "outputId": "b6201844-03e0-447e-8e64-b72245b579be" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Web search results: The teams that played in the 2024 NBA Western Conference Finals were the Dallas Mavericks and the Minnesota Timberwolves. The Mavericks won the series 4-1.\n" + ] + } + ], + "source": [ + "web_search_response = client.responses.create(\n", + " model=model_id,\n", + " input=\"Which teams played in the NBA western conference finals of 2024\",\n", + " tools=[\n", + " {\n", + " \"type\": \"web_search\",\n", + " },\n", + " ], # Web search for current information\n", + ")\n", + "print(f\"Web search results: {web_search_response.output[-1].content[0].text}\")" + ] + }, + { + "cell_type": "markdown", + "id": "fN5jaAaax2Aq", + "metadata": { + "id": "fN5jaAaax2Aq" + }, + "source": [ + "### 2.3. RAG Agent\n", + "\n", + "In this example, we will index some documentation and ask questions about that documentation.\n", + "\n", + "The tool we use is the memory tool. Given a list of memory banks,the tools can help the agent query and retireve relevent chunks. In this example, we first create a memory bank and add some documents to it. Then configure the agent to use the memory tool. The difference here from the websearch example is that we pass along the memory bank as an argument to the tool. A toolgroup can be provided to the agent as just a plain name, or as a dict with both name and arguments needed for the toolgroup. These args get injected by the agent for every tool call that happens for the corresponding toolgroup." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "GvLWltzZCNkg", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "GvLWltzZCNkg", + "outputId": "6a2a324d-5471-473e-ba3f-e8c977804917" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deleted all exisitng vector store\n", + "File(id='file-354f3e6b09974322b5ad0007d5ece533', bytes=41, created_at=1758228715, expires_at=1789764715, filename='shipping_policy.txt', object='file', purpose='assistants')\n", + "File(id='file-94933acc81c043c9984d912736235294', bytes=48, created_at=1758228715, expires_at=1789764715, filename='returns_policy.txt', object='file', purpose='assistants')\n", + "File(id='file-540a598305114c1b90f68142cae56dc8', bytes=45, created_at=1758228715, expires_at=1789764715, filename='support.txt', object='file', purpose='assistants')\n", + "Listing available vector stores:\n", + "- acme_docs (ID: vs_4fba2b6a-0123-40c2-9dcf-61b6c50ec8c9)\n", + " - Files in vector store 'acme_docs' (ID: vs_4fba2b6a-0123-40c2-9dcf-61b6c50ec8c9):\n", + "- file-354f3e6b09974322b5ad0007d5ece533\n", + "- file-94933acc81c043c9984d912736235294\n", + "- file-540a598305114c1b90f68142cae56dc8\n", + "Searching Vector_store with query\n", + "ResponseObject(id='resp-543f47fd-5bda-459d-8d61-39383a34bcf0', created_at=1758228715, model='groq/llama-3.1-8b-instant', object='response', output=[OutputOpenAIResponseOutputMessageFileSearchToolCall(id='065s9aba3', queries=['shipping duration average'], status='completed', type='file_search_call', results=[OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.9773781552473876, text='Acme ships globally in 3-5 business days.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.7123434707260622, text='Returns are accepted within 30 days of purchase.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.5253213399081832, text='Support is available 24/7 via chat and email.')]), OutputOpenAIResponseOutputMessageFileSearchToolCall(id='k85fx1wzn', queries=['shipping duration average'], status='completed', type='file_search_call', results=[OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.9773781552473876, text='Acme ships globally in 3-5 business days.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.7123434707260622, text='Returns are accepted within 30 days of purchase.'), OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=0.5253213399081832, text='Support is available 24/7 via chat and email.')]), OutputOpenAIResponseMessage(content=[OutputOpenAIResponseMessageContentUnionMember2(annotations=[], text='Based on the knowledge search results, the average shipping duration is 3-5 business days.', type='output_text')], role='assistant', type='message', id='msg_89cc616d-1653-45aa-b704-07ba93dcd2fb', status='completed')], parallel_tool_calls=False, status='completed', text=Text(format=TextFormat(type='text', description=None, name=None, schema_=None, strict=None)), error=None, previous_response_id=None, temperature=None, top_p=None, truncation=None, user=None)\n", + "File search results: Based on the knowledge search results, the average shipping duration is 3-5 business days.\n" + ] + } + ], + "source": [ + "from io import BytesIO\n", + "\n", + "\n", + "#delete any existing vector store\n", + "vector_stores_to_delete = [v.id for v in client.vector_stores.list()]\n", + "for del_vs_id in vector_stores_to_delete:\n", + " client.vector_stores.delete(vector_store_id=del_vs_id)\n", + "print('Deleted all exisitng vector store')\n", + "\n", + "docs = [\n", + " (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n", + " (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n", + " (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n", + "]\n", + "query = \"How long does shipping take?\"\n", + "file_ids = []\n", + "for content, metadata in docs:\n", + " with BytesIO(content.encode()) as file_buffer:\n", + " file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n", + " create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n", + " print(create_file_response)\n", + " file_ids.append(create_file_response.id)\n", + "\n", + "# Create vector store with files\n", + "vector_store = client.vector_stores.create(\n", + " name=\"acme_docs\",\n", + " file_ids=file_ids,\n", + " embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n", + " embedding_dimension=384,\n", + " provider_id=\"faiss\"\n", + ")\n", + "print(\"Listing available vector stores:\")\n", + "vector_stores = client.vector_stores.list()\n", + "for vs in vector_stores:\n", + " print(f\"- {vs.name} (ID: {vs.id})\")\n", + " files_in_store = client.vector_stores.files.list(vector_store_id=vs.id)\n", + " if files_in_store:\n", + " print(f\" - Files in vector store '{vs.name}' (ID: {vs.id}):\")\n", + " for file in files_in_store:\n", + " print(f\"- {file.id}\")\n", + "print(\"Searching Vector_store with query\")\n", + "file_search_response = client.responses.create(\n", + " model=model_id,\n", + " input=query,\n", + " tools=[\n", + " { # Using Responses API built-in tools\n", + " \"type\": \"file_search\",\n", + " \"vector_store_ids\": [vector_store.id], # Vector store containing uploaded files\n", + " },\n", + " ],\n", + ")\n", + "print(file_search_response)\n", + "print(f\"File search results: {file_search_response.output[-1].content[0].text}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "jSfjNN9fMxtm", + "metadata": { + "id": "jSfjNN9fMxtm" + }, + "source": [ + "### 2.4. Using Model Context Protocol\n", + "\n", + "In this example, we will show how tools hosted in an MCP server can be configured to be used by the model.\n", + "\n", + "In the following steps, we will use the [filesystem tool](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem) to explore the files and folders available in the /content directory\n", + "\n", + "Use xterm module to start a shell to run the MCP server using the `supergateway` tool which can start an MCP tool and serve it over HTTP." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d96c273a" + }, + "source": [ + "### 2.4. Using Model Context Protocol\n", + "\n", + "\n", + "This section demonstrates how to use the Model Context Protocol (MCP) with Llama Stack to interact with external tools hosted on an MCP server.\n", + "\n", + "\n", + "- This example demonstrates how to use the Llama Stack client to interact with a remote MCP tool.\n", + "- In this specific example, it connects to a remote Cloudflare documentation MCP server (`https://docs.mcp.cloudflare.com/sse`).\n", + "- The `client.responses.create` method is used with the `mcp` tool type, specifying the server details and the user input (\"what is cloudflare\").\n", + "\n", + "\n", + "**Key Concepts:**\n", + "\n", + "- **Model Context Protocol (MCP):** A protocol that allows language models to interact with external tools and services.\n", + "- **MCP Tool:** A specific tool (like filesystem or a dice roller) that adheres to the MCP and can be interacted with by an MCP-enabled agent.\n", + "- **`client.responses.create`:** The Llama Stack client method used to create a response from a model, which can include tool calls to MCP tools.\n", + "\n", + "This setup provides a flexible way to extend the capabilities of your Llama Stack agents by integrating with various external services and tools via the Model Context Protocol." + ], + "id": "d96c273a" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "DwdKhQb1N295", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DwdKhQb1N295", + "outputId": "2496f9cc-350a-407c-aff9-ed91b018a36c" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Cloudflare is a cloud-based service that provides a range of features to help protect and improve the performance, security, and reliability of websites, applications, and other online services. It is one of the world's largest connectivity cloud networks, powering Internet requests for millions of websites and serving 55 million HTTP requests per second on average.\n", + "\n", + "Some of the key things Cloudflare does include:\n", + "\n", + "1. Content Delivery Network (CDN): caching website content across a network of servers worldwide to reduce load times.\n", + "2. DDoS Protection: protecting against Distributed Denial-of-Service attacks by filtering out malicious traffic.\n", + "3. Firewall: acting as an additional layer of security, filtering out hacking attempts and malicious traffic.\n", + "4. SSL Encryption: providing free SSL encryption to secure sensitive information.\n", + "5. Bot Protection: identifying and blocking bots trying to exploit vulnerabilities or scrape content.\n", + "6. Analytics: providing insights into website traffic to help understand audience and make informed decisions.\n", + "7. Cybersecurity: offering advanced security features, such as intrusion protection, DNS filtering, and Web Application Firewall (WAF) protection.\n", + "\n", + "Overall, Cloudflare helps protect against cyber threats, improves website performance, and enhances security for online businesses, bloggers, and individuals who need to establish a strong online presence.\n" + ] + } + ], + "source": [ + "# NBVAL_SKIP\n", + "resp = client.responses.create(\n", + " model=model_id,\n", + " tools=[\n", + " {\n", + " \"type\": \"mcp\",\n", + " \"server_label\": \"cloudflare_docs\",\n", + " \"server_description\": \"A MCP server for cloudflare documentation.\",\n", + " \"server_url\": \"https://docs.mcp.cloudflare.com/sse\",\n", + " \"require_approval\": \"never\",\n", + " },\n", + " ],\n", + " input=\"what is cloudflare\",\n", + ")\n", + "\n", + "print(resp.output_text)" + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "4L4Z8McdZaYj" + }, + "id": "4L4Z8McdZaYj", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1d29af6a" + }, + "source": [ + "### 2.5 Response API Branching\n", + "\n", + "The Llama Stack Response API supports branching, allowing you to explore different conversational paths or tool interactions based on a previous response. This is useful for scenarios where you want to try alternative approaches or gather information from different sources without losing the context of the initial interaction.\n", + "\n", + "To branch from a previous response, you use the `previous_response_id` parameter in the `client.responses.create` method. This parameter takes the `id` of the response you want to branch from.\n", + "\n", + "Here's how it works:\n", + "\n", + "1. **Initial Response:** You make an initial call to `client.responses.create` to get a response. This response will have a unique `id`.\n", + "\n", + "2. **Branching Response:** You make a subsequent call to `client.responses.create` for your branching query. In this call, you set the `previous_response_id` to the `id` of the initial response.\n", + "\n", + "The new response will be generated in the context of the previous response, but you can specify different tools, inputs, or other parameters to explore a different path.\n", + "\n", + "**Example:**\n", + "\n", + "Let's say you made an initial web search about a topic and got `response1`. You can then branch from `response1` to perform a file search on the same topic by setting `previous_response_id=response1.id` in the second `client.responses.create` call." + ], + "id": "1d29af6a" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f3352379", + "metadata": { + "id": "f3352379", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 492 + }, + "outputId": "a9d8d04e-d0bd-45f3-a429-2600e535ed24" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Deleted all existing vector stores\n", + "File(id='file-10ececb2f1234dce803436ba78a718fe', bytes=446, created_at=1758326763, expires_at=1789862763, filename='sorting_algorithms.txt', object='file', purpose='assistants')\n", + "Listing available vector stores:\n", + "- sorting_docs (ID: vs_69afb313-1c2d-4115-a9f4-8d31f4ff1ef3)\n", + "Web search results: The latest efficient sorting algorithms include Quicksort, Merge Sort, and Heap Sort, which have been compared in various studies for their performance. Quicksort is considered one of the fastest in-place sorting algorithms with good cache performance. Other algorithms like Bubble Sort, Selection Sort, and Insertion Sort are generally slower. For big data environments, several efficient sorting algorithms have been analyzed to improve processing speed. Some sources comparing the performance of these algorithms include Codemotion, Medium, Quora, ScienceDirect, and Built In.\n" + ] + }, + { + "output_type": "error", + "ename": "InternalServerError", + "evalue": "Error code: 500 - {'detail': 'Internal server error: An unexpected error occurred.'}", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mInternalServerError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m/tmp/ipython-input-3318131626.py\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[0;31m# Continue conversation: Switch to file search for local docs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 44\u001b[0;31m response2 = client.responses.create(\n\u001b[0m\u001b[1;32m 45\u001b[0m \u001b[0mmodel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmodel_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;31m# Changed model to one available in the notebook\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 46\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"Now search my uploaded files for existing sorting implementations\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/_utils/_utils.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 281\u001b[0m \u001b[0mmsg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34mf\"Missing required argument: {quote(missing[0])}\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 282\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 283\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 284\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 285\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m \u001b[0;31m# type: ignore\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/resources/responses/responses.py\u001b[0m in \u001b[0;36mcreate\u001b[0;34m(self, input, model, include, instructions, max_infer_iters, previous_response_id, store, stream, temperature, text, tools, extra_headers, extra_query, extra_body, timeout)\u001b[0m\n\u001b[1;32m 227\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0mhttpx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTimeout\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0mNotGiven\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mNOT_GIVEN\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 228\u001b[0m ) -> ResponseObject | Stream[ResponseObjectStream]:\n\u001b[0;32m--> 229\u001b[0;31m return self._post(\n\u001b[0m\u001b[1;32m 230\u001b[0m \u001b[0;34m\"/v1/openai/v1/responses\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 231\u001b[0m body=maybe_transform(\n", + "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/_base_client.py\u001b[0m in \u001b[0;36mpost\u001b[0;34m(self, path, cast_to, body, options, files, stream, stream_cls)\u001b[0m\n\u001b[1;32m 1240\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"post\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0murl\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mjson_data\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mbody\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfiles\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mto_httpx_files\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfiles\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1241\u001b[0m )\n\u001b[0;32m-> 1242\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mcast\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mResponseT\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcast_to\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mopts\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstream\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstream_cls\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstream_cls\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1243\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1244\u001b[0m def patch(\n", + "\u001b[0;32m/usr/local/lib/python3.12/dist-packages/llama_stack_client/_base_client.py\u001b[0m in \u001b[0;36mrequest\u001b[0;34m(self, cast_to, options, stream, stream_cls)\u001b[0m\n\u001b[1;32m 1042\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1043\u001b[0m \u001b[0mlog\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Re-raising status error\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1044\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_status_error_from_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1045\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1046\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mInternalServerError\u001b[0m: Error code: 500 - {'detail': 'Internal server error: An unexpected error occurred.'}" + ] + } + ], + "source": [ + "from io import BytesIO\n", + "import uuid\n", + "\n", + "# delete any existing vector store\n", + "vector_stores_to_delete = [v.id for v in client.vector_stores.list()]\n", + "for del_vs_id in vector_stores_to_delete:\n", + " client.vector_stores.delete(vector_store_id=del_vs_id)\n", + "print('Deleted all existing vector stores')\n", + "\n", + "# Create a dummy file for the file search\n", + "dummy_file_content = \"Popular sorting implementations include quicksort, mergesort, heapsort, and insertion sort. Bubble sort and selection sort are used for small or simple datasets. Counting sort, radix sort, and bucket sort handle special numeric cases efficiently without comparisons. Timsort, a hybrid of merge and insertion sort, is widely used in Python and Java. Shell sort, comb sort, cocktail sort, and others are less common but exist for special scenarios.\"\n", + "with BytesIO(dummy_file_content.encode()) as file_buffer:\n", + " file_buffer.name = \"sorting_algorithms.txt\"\n", + " create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n", + " print(create_file_response)\n", + " file_id = create_file_response.id\n", + "\n", + "# Create a vector store with the dummy file\n", + "vector_store = client.vector_stores.create(\n", + " name=\"sorting_docs\",\n", + " file_ids=[file_id],\n", + " embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n", + " embedding_dimension=384, # This should match the embedding model\n", + " provider_id=\"faiss\"\n", + ")\n", + "print(\"Listing available vector stores:\")\n", + "vector_stores = client.vector_stores.list()\n", + "for vs in vector_stores:\n", + " print(f\"- {vs.name} (ID: {vs.id})\")\n", + "\n", + "# First response: Use web search for latest algorithms\n", + "response1 = client.responses.create(\n", + " model=model_id, # Changed model to one available in the notebook\n", + " input=\"Search for the latest efficient sorting algorithms and their performance comparisons\",\n", + " tools=[\n", + " {\n", + " \"type\": \"web_search\",\n", + " },\n", + " ], # Web search for current information\n", + ")\n", + "print(f\"Web search results: {response1.output[-1].content[0].text}\")\n", + "\n", + "# Continue conversation: Switch to file search for local docs\n", + "response2 = client.responses.create(\n", + " model=model_id, # Changed model to one available in the notebook\n", + " input=\"Now search my uploaded files for existing sorting implementations\",\n", + " tools=[\n", + " { # Using Responses API built-in tools\n", + " \"type\": \"file_search\",\n", + " \"vector_store_ids\": [vector_store.id], # Use the created vector store ID\n", + " },\n", + " ],\n", + " previous_response_id=response1.id,\n", + ")\n", + "\n", + "# # Branch from first response: Try different search approach\n", + "# response3 = client.responses.create(\n", + "# model=model_id, # Changed model to one available in the notebook\n", + "# input=\"Instead, search the web for Python-specific sorting best practices\",\n", + "# tools=[{\"type\": \"web_search\"}], # Different web search query\n", + "# previous_response_id=response1.id, # Branch from response1\n", + "# )\n", + "\n", + "# # Responses API benefits:\n", + "# # ✅ Dynamic tool switching (web search ↔ file search per call)\n", + "# # ✅ OpenAI-compatible tool patterns (web_search, file_search)\n", + "# # ✅ Branch conversations to explore different information sources\n", + "# # ✅ Model flexibility per search type\n", + "# print(f\"Web search results: {response1.output_text}\") # Changed to output_text\n", + "# print(f\"File search results: {response2.output_text}\") # Changed to output_text\n", + "# print(f\"Alternative web search: {response3.output_text}\") # Changed to output_text" + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "8nwCiGx8bdQ4" + }, + "id": "8nwCiGx8bdQ4", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "799a5ee5" + }, + "source": [ + "### Cleaning up the server\n", + "\n", + "To stop the Llama Stack server and remove any created files and configurations, you can use the following code. This is useful for resetting your environment or before running the notebook again.\n", + "\n", + "1. **Stop the server:** The code includes a helper function `kill_llama_stack_server()` that finds and terminates the running server process.\n", + "2. **Remove distribution files:** It also removes the distribution files located in `~/.llama/distributions/*`, which contain the server configuration and data." + ], + "id": "799a5ee5" + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "0R628gRh-cYv", + "metadata": { + "id": "0R628gRh-cYv" + }, + "outputs": [], + "source": [ + "# Remove distribution files\n", + "!rm -rf ~/.llama/distributions/*\n", + "\n", + "import os\n", + "# use this helper if needed to kill the server\n", + "def kill_llama_stack_server():\n", + " # Kill any existing llama stack server processes\n", + " os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n", + "kill_llama_stack_server()" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/static/img/rag_llama_stack.png b/docs/static/img/rag_llama_stack.png new file mode 100644 index 000000000..bc0e499e9 Binary files /dev/null and b/docs/static/img/rag_llama_stack.png differ