mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-17 18:59:48 +00:00
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Python Package Build Test / build (3.12) (push) Successful in 18s
Python Package Build Test / build (3.13) (push) Successful in 22s
Test External API and Providers / test-external (venv) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (push) Failing after 46s
UI Tests / ui-tests (22) (push) Successful in 1m23s
Unit Tests / unit-tests (3.12) (push) Failing after 1m48s
Unit Tests / unit-tests (3.13) (push) Failing after 1m50s
Pre-commit / pre-commit (22) (push) Successful in 3m31s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m20s
# What does this PR do? This PR updates the RAG examples included in docs/quick_start.ipynb, docs/getting_started/demo_script.py, rag.mdx and index.md to remove references to the deprecated vector_io and vector_db APIs and to add examples that use /v1/vector_stores with responses and completions. --------- Co-authored-by: Omar Abdelwahab <omara@fb.com> Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
718 lines
28 KiB
Text
718 lines
28 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "c1e7571c"
|
||
},
|
||
"source": [
|
||
"[](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)\n",
|
||
"\n",
|
||
"# Llama Stack - Building AI Applications\n",
|
||
"\n",
|
||
"<img src=\"https://llamastack.github.io/latest/_images/llama-stack.png\" alt=\"drawing\" width=\"500\"/>\n",
|
||
"\n",
|
||
"Get started with Llama Stack in minutes!\n",
|
||
"\n",
|
||
"[Llama Stack](https://github.com/meta-llama/llama-stack) is a stateful service with REST APIs to support the seamless transition of AI applications across different environments. You can build and test using a local server first and deploy to a hosted endpoint for production.\n",
|
||
"\n",
|
||
"In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)\n",
|
||
"as the inference [provider](docs/source/providers/index.md#inference) for a Llama Model.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "4CV1Q19BDMVw"
|
||
},
|
||
"source": [
|
||
"## Step 1: Install and setup"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "K4AvfUAJZOeS"
|
||
},
|
||
"source": [
|
||
"### 1.1. Install uv and test inference with Ollama\n",
|
||
"\n",
|
||
"We'll install [uv](https://docs.astral.sh/uv/) to setup the Python virtual environment, along with [colab-xterm](https://github.com/InfuseAI/colab-xterm) for running command-line tools, and [Ollama](https://ollama.com/download) as the inference provider."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"%pip install uv llama_stack llama-stack-client\n",
|
||
"\n",
|
||
"## If running on Collab:\n",
|
||
"# !pip install colab-xterm\n",
|
||
"# %load_ext colabxterm\n",
|
||
"\n",
|
||
"!curl https://ollama.ai/install.sh | sh"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1.2. Test inference with Ollama"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We’ll now launch a terminal and run inference on a Llama model with Ollama to verify that the model is working correctly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"## If running on Colab:\n",
|
||
"# %xterm\n",
|
||
"\n",
|
||
"## To be ran in the terminal:\n",
|
||
"# ollama serve &\n",
|
||
"# ollama run llama3.2:3b --keepalive 60m"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If successful, you should see the model respond to a prompt.\n",
|
||
"\n",
|
||
"...\n",
|
||
"```\n",
|
||
">>> hi\n",
|
||
"Hello! How can I assist you today?\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "oDUB7M_qe-Gs"
|
||
},
|
||
"source": [
|
||
"## Step 2: Run the Llama Stack server\n",
|
||
"\n",
|
||
"In this showcase, we will start a Llama Stack server that is running locally."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.1. Setup the Llama Stack Server"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "J2kGed0R5PSf",
|
||
"outputId": "2478ea60-8d35-48a1-b011-f233831740c5"
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\u001b[2mUsing Python 3.12.12 environment at: /opt/homebrew/Caskroom/miniconda/base/envs/test\u001b[0m\n",
|
||
"\u001b[2mAudited \u001b[1m52 packages\u001b[0m \u001b[2min 1.56s\u001b[0m\u001b[0m\n",
|
||
"\u001b[2mUsing Python 3.12.12 environment at: /opt/homebrew/Caskroom/miniconda/base/envs/test\u001b[0m\n",
|
||
"\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 122ms\u001b[0m\u001b[0m\n",
|
||
"\u001b[2mUsing Python 3.12.12 environment at: /opt/homebrew/Caskroom/miniconda/base/envs/test\u001b[0m\n",
|
||
"\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 197ms\u001b[0m\u001b[0m\n",
|
||
"\u001b[2mUsing Python 3.12.12 environment at: /opt/homebrew/Caskroom/miniconda/base/envs/test\u001b[0m\n",
|
||
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 11ms\u001b[0m\u001b[0m\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import os\n",
|
||
"import subprocess\n",
|
||
"\n",
|
||
"if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
|
||
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
|
||
"\n",
|
||
"# this command installs all the dependencies needed for the llama stack server with the ollama inference provider\n",
|
||
"!uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install\n",
|
||
"\n",
|
||
"def run_llama_stack_server_background():\n",
|
||
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
|
||
" process = subprocess.Popen(\n",
|
||
" f\"OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter\",\n",
|
||
" shell=True,\n",
|
||
" stdout=log_file,\n",
|
||
" stderr=log_file,\n",
|
||
" text=True\n",
|
||
" )\n",
|
||
"\n",
|
||
" print(f\"Starting Llama Stack server with PID: {process.pid}\")\n",
|
||
" return process\n",
|
||
"\n",
|
||
"def wait_for_server_to_start():\n",
|
||
" import requests\n",
|
||
" from requests.exceptions import ConnectionError\n",
|
||
" import time\n",
|
||
"\n",
|
||
" url = \"http://0.0.0.0:8321/v1/health\"\n",
|
||
" max_retries = 30\n",
|
||
" retry_interval = 1\n",
|
||
"\n",
|
||
" print(\"Waiting for server to start\", end=\"\")\n",
|
||
" for _ in range(max_retries):\n",
|
||
" try:\n",
|
||
" response = requests.get(url)\n",
|
||
" if response.status_code == 200:\n",
|
||
" print(\"\\nServer is ready!\")\n",
|
||
" return True\n",
|
||
" except ConnectionError:\n",
|
||
" print(\".\", end=\"\", flush=True)\n",
|
||
" time.sleep(retry_interval)\n",
|
||
"\n",
|
||
" print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
|
||
" return False\n",
|
||
"\n",
|
||
"\n",
|
||
"# use this helper if needed to kill the server\n",
|
||
"def kill_llama_stack_server():\n",
|
||
" # Kill any existing llama stack server processes\n",
|
||
" os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 2.2. Start the Llama Stack Server"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Starting Llama Stack server with PID: 20778\n",
|
||
"Waiting for server to start........\n",
|
||
"Server is ready!\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"server_process = run_llama_stack_server_background()\n",
|
||
"assert wait_for_server_to_start()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Step 3: RAG Demos - Three Approaches\n",
|
||
"\n",
|
||
"We'll demonstrate three different approaches to building RAG applications with Llama Stack:\n",
|
||
"1. **Agent API** - High-level agent with session management\n",
|
||
"2. **Responses API** - Direct OpenAI-compatible responses\n",
|
||
"3. **Chat Completions API** - Manual retrieval with explicit control\n",
|
||
"\n",
|
||
"### Approach 1: Agent Class (High-level)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Available models: ['bedrock/meta.llama3-1-405b-instruct-v1:0', 'bedrock/meta.llama3-1-70b-instruct-v1:0', 'bedrock/meta.llama3-1-8b-instruct-v1:0', 'ollama/chevalblanc/gpt-4o-mini:latest', 'ollama/nomic-embed-text:latest', 'ollama/llama3.3:70b', 'ollama/llama3.2:3b', 'ollama/all-minilm:l6-v2', 'ollama/llama3.1:8b', 'ollama/llama-guard3:latest', 'ollama/llama-guard3:8b', 'ollama/shieldgemma:27b', 'ollama/shieldgemma:latest', 'ollama/llama3.1:8b-instruct-fp16', 'ollama/all-minilm:latest', 'ollama/llama3.2:3b-instruct-fp16', 'sentence-transformers/nomic-ai/nomic-embed-text-v1.5']\n",
|
||
"✓ Using model: ollama/llama3.3:70b\n",
|
||
"\n",
|
||
"✓ Downloading and indexing Paul Graham's essay...\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/files \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"✓ File created with ID: file-e1290f8be28245e681bdfa5c40a7e7c4\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector_stores \"HTTP/1.1 200 OK\"\n",
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/conversations \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"✓ Vector store created with ID: vs_67efaaf4-ba0d-4037-b816-f73d588e9e4d\n",
|
||
"✓ Agent created\n",
|
||
"\n",
|
||
"prompt> How do you do great work?\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/responses \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"🤔 \n",
|
||
"\n",
|
||
"🔧 Executing file_search (server-side)...\n",
|
||
"🤔 To do great work it's essential to decide what to work on and choose something you have a natural aptitude for that you are deeply interested in and offers scope to do great work <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. Develop a habit of working on your own projects and don't let \"work\" mean something other people tell you to do <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. Seek out the best colleagues as they can encourage you and help bounce ideas off each other and it's better to have one or two great ones than a building full of pretty good ones <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. Husband your morale as it's crucial for doing great work and try to learn about other kinds of work by taking ideas from distant fields if you let them be metaphors <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. Negative examples can also be inspiring so try to learn from things done badly as sometimes it becomes clear what's needed when it's missing <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. If you're earnest you'll probably get a warmer welcome than expected when visiting places with the best people in your field which can increase your ambition and self-confidence <|file-e1290f8be28245e681bdfa5c40a7e7c4|>.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"# Make sure that your llama stack client version matches with the llama stack server version you are using.\n",
|
||
"from llama_stack_client import Agent, AgentEventLogger, LlamaStackClient\n",
|
||
"import requests\n",
|
||
"from io import BytesIO\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"vector_store_id = \"my_demo_vector_db\"\n",
|
||
"client = LlamaStackClient(base_url=\"http://0.0.0.0:8321\")\n",
|
||
"\n",
|
||
"# Get model - find any Ollama Llama model\n",
|
||
"models = list(client.models.list())\n",
|
||
"print(f\"Available models: {[m.id for m in models]}\")\n",
|
||
"\n",
|
||
"# Find any Ollama Llama LLM model\n",
|
||
"model_id = None\n",
|
||
"priority_models = [\"ollama/llama3.3:70b\",\"ollama/llama3.2:3b\",\"ollama/llama3.1:8b\"]\n",
|
||
"for m in models:\n",
|
||
" if hasattr(m, \"custom_metadata\") and m.custom_metadata:\n",
|
||
" provider_id = m.custom_metadata.get(\"provider_id\")\n",
|
||
" model_type = m.custom_metadata.get(\"model_type\")\n",
|
||
"\n",
|
||
" # Use any Ollama LLM model with \"llama\" in the name\n",
|
||
" if provider_id == \"ollama\" and model_type == \"llm\" and m.id.lower() in priority_models:\n",
|
||
" model_id = m.id\n",
|
||
" print(f\"✓ Using model: {model_id}\")\n",
|
||
" break\n",
|
||
"\n",
|
||
"if not model_id:\n",
|
||
" raise ValueError(\"No Ollama Llama model found\")\n",
|
||
"\n",
|
||
"# Create vector store\n",
|
||
"print(\"\\n✓ Downloading and indexing Paul Graham's essay...\")\n",
|
||
"source = \"https://www.paulgraham.com/greatwork.html\"\n",
|
||
"response = requests.get(source)\n",
|
||
"\n",
|
||
"# Create a file-like object from the HTML content\n",
|
||
"file_buffer = BytesIO(response.content)\n",
|
||
"file_buffer.name = \"greatwork.html\"\n",
|
||
"\n",
|
||
"file = client.files.create(\n",
|
||
" file=file_buffer,\n",
|
||
" purpose='assistants'\n",
|
||
")\n",
|
||
"print(f\"✓ File created with ID: {file.id}\")\n",
|
||
"\n",
|
||
"vector_store = client.vector_stores.create(\n",
|
||
" name=vector_store_id,\n",
|
||
" file_ids=[file.id],\n",
|
||
")\n",
|
||
"print(f\"✓ Vector store created with ID: {vector_store.id}\")\n",
|
||
"\n",
|
||
"# Create agent\n",
|
||
"agent = Agent(\n",
|
||
" client,\n",
|
||
" model=model_id,\n",
|
||
" instructions=\"You are a helpful assistant\",\n",
|
||
" tools=[\n",
|
||
" {\n",
|
||
" \"type\": \"file_search\",\n",
|
||
" \"vector_store_ids\": [vector_store.id], # Use the actual ID, not the name\n",
|
||
" }\n",
|
||
" ],\n",
|
||
")\n",
|
||
"print(\"✓ Agent created\")\n",
|
||
"\n",
|
||
"prompt = \"How do you do great work?\"\n",
|
||
"print(\"\\nprompt>\", prompt)\n",
|
||
"\n",
|
||
"response = agent.create_turn(\n",
|
||
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
|
||
" session_id=agent.create_session(\"rag_session\"),\n",
|
||
" stream=True,\n",
|
||
")\n",
|
||
"\n",
|
||
"for log in AgentEventLogger().log(response):\n",
|
||
" print(log, end=\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Multi-turn RAG Conversation with Session Management"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/conversations \"HTTP/1.1 200 OK\"\n",
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/responses \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"================================================================================\n",
|
||
"Multi-turn RAG Conversation Demo\n",
|
||
"================================================================================\n",
|
||
"Demonstrating: Session maintains context while agent searches document\n",
|
||
"================================================================================\n",
|
||
"\n",
|
||
"[Turn 1] User: What does the document say about curiosity and great work?\n",
|
||
"(Agent will search the document...)\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/responses \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"Assistant: 🤔 \n",
|
||
"\n",
|
||
"🔧 Executing file_search (server-side)...\n",
|
||
"🤔 Curiosity is a key factor in doing great work <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. It drives people to learn and explore new ideas, which can lead to innovative solutions and discoveries <|file-e12...\n",
|
||
"\n",
|
||
"[Turn 2] User: Why is that important?\n",
|
||
"(Agent remembers 'that' refers to curiosity from Turn 1 - no need to search again)\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/responses \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"Assistant: 🤔 \n",
|
||
"\n",
|
||
"🔧 Executing file_search (server-side)...\n",
|
||
"🤔 Curiosity plays a crucial role in driving individuals to do great work and make meaningful discoveries <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. It is the key to all four steps in doing great work: choo...\n",
|
||
"\n",
|
||
"[Turn 3] User: What about the role of ambition?\n",
|
||
"(New topic - agent will search document again for 'ambition')\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/responses \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"\n",
|
||
"Assistant: 🤔 \n",
|
||
"\n",
|
||
"🔧 Executing file_search (server-side)...\n",
|
||
"🤔 To do great work it's an advantage to be optimistic even though that means you'll risk looking like a fool sometimes <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. One way to avoid intellectual dishonesty is...\n",
|
||
"\n",
|
||
"[Turn 4] User: How do curiosity and ambition work together?\n",
|
||
"(Agent combines information from Turn 1 and Turn 3 using session context)\n",
|
||
"\n",
|
||
"Assistant: 🤔 \n",
|
||
"\n",
|
||
"🔧 Executing file_search (server-side)...\n",
|
||
"🤔 Curiosity and ambition are closely related as they both drive individuals to achieve great work <|file-e1290f8be28245e681bdfa5c40a7e7c4|>. Developing curiosity is essential for doing great work, and it c...\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Create a new session for multi-turn RAG conversation\n",
|
||
"session_id = agent.create_session(\"multi_turn_rag_session\")\n",
|
||
"\n",
|
||
"print(\"\\n\" + \"=\"*80)\n",
|
||
"print(\"Multi-turn RAG Conversation Demo\")\n",
|
||
"print(\"=\"*80)\n",
|
||
"print(\"Demonstrating: Session maintains context while agent searches document\")\n",
|
||
"print(\"=\"*80)\n",
|
||
"\n",
|
||
"# Turn 1: Initial question - Agent searches document for relevant information\n",
|
||
"print(\"\\n[Turn 1] User: What does the document say about curiosity and great work?\")\n",
|
||
"print(\"(Agent will search the document...)\")\n",
|
||
"response1 = agent.create_turn(\n",
|
||
" messages=[{\"role\": \"user\", \"content\": \"What does the document say about curiosity and great work?\"}],\n",
|
||
" session_id=session_id,\n",
|
||
" stream=True, # Use streaming for reliability\n",
|
||
")\n",
|
||
"# Collect the response\n",
|
||
"response1_text = \"\"\n",
|
||
"for log in AgentEventLogger().log(response1):\n",
|
||
" response1_text += log\n",
|
||
"print(\"\\nAssistant:\", response1_text[:250] + \"...\\n\")\n",
|
||
"\n",
|
||
"# Turn 2: Follow-up question using pronouns - Agent remembers the context from Turn 1\n",
|
||
"print(\"[Turn 2] User: Why is that important?\")\n",
|
||
"print(\"(Agent remembers 'that' refers to curiosity from Turn 1 - no need to search again)\")\n",
|
||
"response2 = agent.create_turn(\n",
|
||
" messages=[{\"role\": \"user\", \"content\": \"Why is that important?\"}],\n",
|
||
" session_id=session_id,\n",
|
||
" stream=True, # Use streaming for reliability\n",
|
||
")\n",
|
||
"response2_text = \"\"\n",
|
||
"for log in AgentEventLogger().log(response2):\n",
|
||
" response2_text += log\n",
|
||
"print(\"\\nAssistant:\", response2_text[:250] + \"...\\n\")\n",
|
||
"\n",
|
||
"# Turn 3: New question on different topic - Agent performs new document search\n",
|
||
"print(\"[Turn 3] User: What about the role of ambition?\")\n",
|
||
"print(\"(New topic - agent will search document again for 'ambition')\")\n",
|
||
"response3 = agent.create_turn(\n",
|
||
" messages=[{\"role\": \"user\", \"content\": \"What about the role of ambition?\"}],\n",
|
||
" session_id=session_id,\n",
|
||
" stream=True, # Use streaming for reliability\n",
|
||
")\n",
|
||
"response3_text = \"\"\n",
|
||
"for log in AgentEventLogger().log(response3):\n",
|
||
" response3_text += log\n",
|
||
"print(\"\\nAssistant:\", response3_text[:250] + \"...\\n\")\n",
|
||
"\n",
|
||
"# Turn 4: Compare previous topics - Agent uses session memory\n",
|
||
"print(\"[Turn 4] User: How do curiosity and ambition work together?\")\n",
|
||
"print(\"(Agent combines information from Turn 1 and Turn 3 using session context)\")\n",
|
||
"response4 = agent.create_turn(\n",
|
||
" messages=[{\"role\": \"user\", \"content\": \"How do curiosity and ambition work together?\"}],\n",
|
||
" session_id=session_id,\n",
|
||
" stream=True, # Use streaming for reliability\n",
|
||
")\n",
|
||
"response4_text = \"\"\n",
|
||
"for log in AgentEventLogger().log(response4):\n",
|
||
" response4_text += log\n",
|
||
"print(\"\\nAssistant:\", response4_text[:250] + \"...\\n\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Approach 3: Chat Completions API"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector_stores/vs_67efaaf4-ba0d-4037-b816-f73d588e9e4d/search \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"User Query: What does paul graham say about curiosity and great work?\n",
|
||
"\n",
|
||
"Searching vector store...\n",
|
||
"Using vector store ID: vs_67efaaf4-ba0d-4037-b816-f73d588e9e4d\n",
|
||
"Extracting context from search results...\n",
|
||
"Found 3 relevant chunks\n",
|
||
"\n",
|
||
"Response (Chat Completions API):\n",
|
||
"================================================================================\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"According to Paul Graham, curiosity is a crucial factor in doing great work. He emphasizes that curiosity is the best guide for finding something worth working on, and it plays a significant role in all four steps of doing great work: choosing a field, getting to the frontier, noticing gaps, and exploring them.\n",
|
||
"\n",
|
||
"Graham notes that curiosity is not something that can be commanded, but it can be nurtured and allowed to drive one's efforts. He suggests that curious people are more likely to find the right thing to work on in the first place, as they cast a wide net and are more likely to stumble upon something important.\n",
|
||
"\n",
|
||
"Graham also highlights the importance of curiosity in overcoming obstacles and staying motivated. He argues that when working on something that sparks genuine curiosity, the work will feel less burdensome, even if it's challenging. This is because curious people are driven by a desire to learn and understand, rather than just seeking external validation or rewards.\n",
|
||
"\n",
|
||
"Furthermore, Graham emphasizes that curiosity is a key factor in distinguishing between great work and mediocre work. He notes that people who are truly curious about their work are more likely to produce something original and innovative, whereas those who lack curiosity may simply be going through the motions.\n",
|
||
"\n",
|
||
"In fact, Graham is so convinced of the importance of curiosity that he suggests it might be the single most important factor in doing great work. He even goes so far as to say that if an oracle were to give a single-word answer to the question of how to do great work, it would be \"curiosity.\"\n",
|
||
"\n",
|
||
"Overall, Paul Graham's writings suggest that curiosity is essential for doing great work, and that it plays a central role in driving innovation, creativity, and progress. By nurturing curiosity and allowing it to guide their efforts, individuals can increase their chances of producing something truly remarkable and making a meaningful contribution to their field.\n",
|
||
"================================================================================\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Step 1: Search vector store explicitly\n",
|
||
"prompt = \"What does paul graham say about curiosity and great work?\"\n",
|
||
"print(f\"User Query: {prompt}\")\n",
|
||
"print(\"\\nSearching vector store...\")\n",
|
||
"print(f\"Using vector store ID: {vector_store.id}\")\n",
|
||
"search_results = client.vector_stores.search(\n",
|
||
" vector_store_id=vector_store.id, # Use the actual ID, not the name\n",
|
||
" query=prompt,\n",
|
||
" max_num_results=3,\n",
|
||
" rewrite_query=False\n",
|
||
")\n",
|
||
"\n",
|
||
"# Step 2: Extract context from search results\n",
|
||
"print(\"Extracting context from search results...\")\n",
|
||
"context_chunks = []\n",
|
||
"for result in search_results.data:\n",
|
||
" if hasattr(result, \"content\") and result.content:\n",
|
||
" for content_item in result.content:\n",
|
||
" if hasattr(content_item, \"text\") and content_item.text:\n",
|
||
" context_chunks.append(content_item.text)\n",
|
||
"\n",
|
||
"context = \"\\n\\n\".join(context_chunks)\n",
|
||
"print(f\"Found {len(context_chunks)} relevant chunks\\n\")\n",
|
||
"\n",
|
||
"# Step 3: Use Chat Completions with retrieved context\n",
|
||
"print(\"Response (Chat Completions API):\")\n",
|
||
"print(\"=\"*80)\n",
|
||
"\n",
|
||
"completion = client.chat.completions.create(\n",
|
||
" model=model_id,\n",
|
||
" messages=[\n",
|
||
" {\n",
|
||
" \"role\": \"system\",\n",
|
||
" \"content\": \"You are a helpful assistant. Use the provided context to answer the user's question.\",\n",
|
||
" },\n",
|
||
" {\n",
|
||
" \"role\": \"user\",\n",
|
||
" \"content\": f\"Context:\\n{context}\\n\\nQuestion: {prompt}\\n\\nPlease provide a comprehensive answer based on the context above.\",\n",
|
||
" },\n",
|
||
" ],\n",
|
||
" temperature=0.7,\n",
|
||
")\n",
|
||
"\n",
|
||
"print(completion.choices[0].message.content)\n",
|
||
"print(\"=\"*80)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Next Steps"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now you're ready to dive deeper into Llama Stack!\n",
|
||
"- Explore the [Detailed Tutorial](./detailed_tutorial.md).\n",
|
||
"- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).\n",
|
||
"- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).\n",
|
||
"- Learn about Llama Stack [Concepts](../concepts/index.md).\n",
|
||
"- Discover how to [Build Llama Stacks](../distributions/index.md).\n",
|
||
"- Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK.\n",
|
||
"- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"accelerator": "GPU",
|
||
"colab": {
|
||
"gpuType": "T4",
|
||
"provenance": []
|
||
},
|
||
"fileHeader": "",
|
||
"fileUid": "92b9a30f-53a7-4b4f-8cfa-0fed1619256f",
|
||
"isAdHoc": false,
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|