Use Vector Store and fix other comments

This commit is contained in:
Swapna Lekkala 2025-09-09 11:15:08 -07:00
parent 04b0432a7b
commit 7ae7d6a7d0

View file

@ -11,20 +11,20 @@
"\n",
"### Overview\n",
"\n",
"- **LlamaStack**: Provides the infrastructure for running LLMs and vector databases\n",
"- **LlamaStack**: Provides the infrastructure for running LLMs and Open AI Compatible Vector Stores\n",
"- **LangChain**: Provides the framework for chaining operations and prompt templates\n",
"- **Integration**: Uses LlamaStack's OpenAI-compatible API with LangChain\n",
"\n",
"### What You'll See\n",
"\n",
"1. Setting up LlamaStack server with Together AI provider\n",
"2. Creating and managing vector databases\n",
"2. Creating and Querying Vector Stores\n",
"3. Building RAG chains with LangChain + LLAMAStack\n",
"4. Querying the chain for relevant information\n",
"\n",
"### Prerequisites\n",
"\n",
"- Together AI API key\n",
"- Fireworks API key\n",
"\n",
"---\n",
"\n",
@ -53,68 +53,15 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: fastapi in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.115.14)\n",
"Requirement already satisfied: uvicorn in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.29.0)\n",
"Requirement already satisfied: langchain>=0.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.27)\n",
"Requirement already satisfied: langchain-openai in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.30)\n",
"Requirement already satisfied: langchain-community in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.27)\n",
"Requirement already satisfied: langchain-text-splitters in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.9)\n",
"Requirement already satisfied: faiss-cpu in /Users/swapna942/miniconda3/lib/python3.12/site-packages (1.11.0)\n",
"Requirement already satisfied: starlette<0.47.0,>=0.40.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (0.46.2)\n",
"Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (2.11.7)\n",
"Requirement already satisfied: typing-extensions>=4.8.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (4.14.1)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.33.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (2.33.2)\n",
"Requirement already satisfied: typing-inspection>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.4.1)\n",
"Requirement already satisfied: anyio<5,>=3.6.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from starlette<0.47.0,>=0.40.0->fastapi) (4.10.0)\n",
"Requirement already satisfied: idna>=2.8 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi) (3.10)\n",
"Requirement already satisfied: sniffio>=1.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi) (1.3.1)\n",
"Requirement already satisfied: click>=7.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from uvicorn) (8.2.1)\n",
"Requirement already satisfied: h11>=0.8 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from uvicorn) (0.16.0)\n",
"Requirement already satisfied: langchain-core<1.0.0,>=0.3.72 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (0.3.74)\n",
"Requirement already satisfied: langsmith>=0.1.17 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (0.4.14)\n",
"Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (2.0.41)\n",
"Requirement already satisfied: requests<3,>=2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (2.32.4)\n",
"Requirement already satisfied: PyYAML>=5.3 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (6.0.2)\n",
"Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (9.1.2)\n",
"Requirement already satisfied: jsonpatch<2.0,>=1.33 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (1.33)\n",
"Requirement already satisfied: packaging>=23.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (24.2)\n",
"Requirement already satisfied: jsonpointer>=1.9 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (2.1)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (3.3.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (2.5.0)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (2025.8.3)\n",
"Requirement already satisfied: openai<2.0.0,>=1.99.9 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-openai) (1.100.2)\n",
"Requirement already satisfied: tiktoken<1,>=0.7 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-openai) (0.9.0)\n",
"Requirement already satisfied: distro<2,>=1.7.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (1.9.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.28.1)\n",
"Requirement already satisfied: jiter<1,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.10.0)\n",
"Requirement already satisfied: tqdm>4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (4.67.1)\n",
"Requirement already satisfied: httpcore==1.* in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.99.9->langchain-openai) (1.0.9)\n",
"Requirement already satisfied: regex>=2022.1.18 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.11.6)\n",
"Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (3.12.13)\n",
"Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (0.6.7)\n",
"Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (2.10.1)\n",
"Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (0.4.1)\n",
"Requirement already satisfied: numpy>=1.26.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (2.3.1)\n",
"Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (2.6.1)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.4.0)\n",
"Requirement already satisfied: attrs>=17.3.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (25.3.0)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.7.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (6.6.3)\n",
"Requirement already satisfied: propcache>=0.2.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (0.3.2)\n",
"Requirement already satisfied: yarl<2.0,>=1.17.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.20.1)\n",
"Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (3.26.1)\n",
"Requirement already satisfied: typing-inspect<1,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (0.9.0)\n",
"Requirement already satisfied: python-dotenv>=0.21.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community) (1.1.1)\n",
"Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (1.1.0)\n",
"Requirement already satisfied: orjson>=3.9.14 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (3.10.18)\n",
"Requirement already satisfied: requests-toolbelt>=1.0.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (1.0.0)\n",
"Requirement already satisfied: zstandard>=0.23.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (0.23.0)\n"
"Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n",
"\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n",
"\u001b[2mAudited \u001b[1m7 packages\u001b[0m \u001b[2min 94ms\u001b[0m\u001b[0m\n"
]
}
],
"source": [
"!pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n",
"!pip install uv\n",
"!uv pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n",
" langchain-community langchain-text-splitters \\\n",
" faiss-cpu"
]
@ -130,7 +77,6 @@
"\n",
"This section sets up the LlamaStack server with:\n",
"- **Together AI** as the inference provider\n",
"- **FAISS** as the vector database\n",
"- **Sentence Transformers** for embeddings\n",
"\n",
"The server runs on `localhost:8321` and provides OpenAI-compatible endpoints."
@ -143,67 +89,30 @@
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n",
"Environment '/Users/swapna942/llama-stack/.venv' already exists, re-using it.\n",
"Virtual environment /Users/swapna942/llama-stack/.venv is already active\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 111ms\u001b[0m\u001b[0m\n",
"Installing pip dependencies\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2K\u001b[2mResolved \u001b[1m184 packages\u001b[0m \u001b[2min 357ms\u001b[0m\u001b[0m \u001b[0m\n",
"\u001b[2mUninstalled \u001b[1m3 packages\u001b[0m \u001b[2min 70ms\u001b[0m\u001b[0m\n",
"\u001b[2K\u001b[2mInstalled \u001b[1m3 packages\u001b[0m \u001b[2min 35ms\u001b[0m\u001b[0m \u001b[0m\n",
" \u001b[31m-\u001b[39m \u001b[1mprotobuf\u001b[0m\u001b[2m==5.29.5\u001b[0m\n",
" \u001b[32m+\u001b[39m \u001b[1mprotobuf\u001b[0m\u001b[2m==5.29.4\u001b[0m\n",
" \u001b[31m-\u001b[39m \u001b[1mruamel-yaml\u001b[0m\u001b[2m==0.18.14\u001b[0m\n",
" \u001b[32m+\u001b[39m \u001b[1mruamel-yaml\u001b[0m\u001b[2m==0.17.40\u001b[0m\n",
" \u001b[31m-\u001b[39m \u001b[1mruff\u001b[0m\u001b[2m==0.12.5\u001b[0m\n",
" \u001b[32m+\u001b[39m \u001b[1mruff\u001b[0m\u001b[2m==0.9.10\u001b[0m\n",
"Installing special provider module: torch torchtune>=0.5.0 torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 69ms\u001b[0m\u001b[0m\n",
"Installing special provider module: torch torchvision torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 12ms\u001b[0m\u001b[0m\n",
"Installing special provider module: sentence-transformers --no-deps\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 15ms\u001b[0m\u001b[0m\n",
"\u001b[32mBuild Successful!\u001b[0m\n",
"\u001b[34mYou can find the newly-built distribution here: /Users/swapna942/.llama/distributions/starter/starter-run.yaml\u001b[0m\n",
"\u001b[32mYou can run the new Llama Stack distro via: \u001b[34mllama stack run /Users/swapna942/.llama/distributions/starter/starter-run.yaml --image-type venv\u001b[0m\u001b[0m\n"
]
}
],
"outputs": [],
"source": [
"import os\n",
"import subprocess\n",
"import time\n",
"\n",
"!pip install uv\n",
"\n",
"# Remove UV_SYSTEM_PYTHON to ensure uv creates a proper virtual environment\n",
"# instead of trying to use system Python globally, which could cause permission issues\n",
"# and package conflicts with the system's Python installation\n",
"if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n",
"# this command installs all the dependencies needed for the llama stack server with the together inference provider\n",
"!uv run --with llama-stack llama stack build --distro starter --image-type venv\n",
"\n",
"\n",
"def run_llama_stack_server_background():\n",
" \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n",
" \"uv run --with llama-stack llama stack run starter --image-type venv\",\n",
" \"uv run --with llama-stack llama stack build --distro starter --image-type venv --run\",\n",
" shell=True,\n",
" stdout=log_file,\n",
" stderr=log_file,\n",
" text=True,\n",
" )\n",
"\n",
" print(f\"Starting Llama Stack server with PID: {process.pid}\")\n",
" print(f\"Building and starting Llama Stack server with PID: {process.pid}\")\n",
" return process\n",
"\n",
"\n",
@ -230,10 +139,9 @@
" return False\n",
"\n",
"\n",
"# use this helper if needed to kill the server\n",
"def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes\n",
" os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")"
" # Kill any existing llama stack server processes using pkill command\n",
" os.system(\"pkill -f llama_stack.core.server.server\")"
]
},
{
@ -246,8 +154,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Starting Llama Stack server with PID: 99381\n",
"Waiting for server to start....\n",
"Building and starting Llama Stack server with PID: 81294\n",
"Waiting for server to start......\n",
"Server is ready!\n"
]
}
@ -279,59 +187,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: llama_stack_client in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (0.2.17)\n",
"Requirement already satisfied: anyio<5,>=3.5.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.9.0)\n",
"Requirement already satisfied: click in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (8.2.1)\n",
"Requirement already satisfied: distro<2,>=1.7.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (1.9.0)\n",
"Requirement already satisfied: fire in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (0.7.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (0.28.1)\n",
"Requirement already satisfied: pandas in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.3.1)\n",
"Requirement already satisfied: prompt-toolkit in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (3.0.51)\n",
"Requirement already satisfied: pyaml in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (25.7.0)\n",
"Requirement already satisfied: pydantic<3,>=1.9.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.11.7)\n",
"Requirement already satisfied: requests in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.32.4)\n",
"Requirement already satisfied: rich in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (14.1.0)\n",
"Requirement already satisfied: sniffio in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (1.3.1)\n",
"Requirement already satisfied: termcolor in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (3.1.0)\n",
"Requirement already satisfied: tqdm in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.67.1)\n",
"Requirement already satisfied: typing-extensions<5,>=4.7 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.14.1)\n",
"Requirement already satisfied: idna>=2.8 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from anyio<5,>=3.5.0->llama_stack_client) (3.10)\n",
"Requirement already satisfied: certifi in /opt/homebrew/opt/certifi/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama_stack_client) (2025.8.3)\n",
"Requirement already satisfied: httpcore==1.* in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama_stack_client) (1.0.9)\n",
"Requirement already satisfied: h11>=0.16 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama_stack_client) (0.16.0)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.33.2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (2.33.2)\n",
"Requirement already satisfied: typing-inspection>=0.4.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (0.4.1)\n",
"Requirement already satisfied: numpy>=1.26.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2.3.2)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2.9.0.post0)\n",
"Requirement already satisfied: pytz>=2020.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2025.2)\n",
"Requirement already satisfied: tzdata>=2022.7 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2025.2)\n",
"Requirement already satisfied: six>=1.5 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas->llama_stack_client) (1.17.0)\n",
"Requirement already satisfied: wcwidth in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from prompt-toolkit->llama_stack_client) (0.2.13)\n",
"Requirement already satisfied: PyYAML in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pyaml->llama_stack_client) (6.0.2)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from requests->llama_stack_client) (3.4.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from requests->llama_stack_client) (2.5.0)\n",
"Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from rich->llama_stack_client) (4.0.0)\n",
"Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from rich->llama_stack_client) (2.19.2)\n",
"Requirement already satisfied: mdurl~=0.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from markdown-it-py>=2.2.0->rich->llama_stack_client) (0.1.2)\n"
"\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n",
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 26ms\u001b[0m\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sys\n",
"\n",
"# Install directly to the current Python environment\n",
"subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"llama_stack_client\"])"
"!uv pip install llama_stack_client"
]
},
{
@ -343,7 +205,7 @@
"\n",
"Create a client connection to the LlamaStack server with API keys for different providers:\n",
"\n",
"- **Together API Key**: For Together AI models\n",
"- **Fireworks API Key**: For Together AI models\n",
"\n"
]
},
@ -358,7 +220,7 @@
"\n",
"client = LlamaStackClient(\n",
" base_url=\"http://0.0.0.0:8321\",\n",
" provider_data={\"together_api_key\": \"***\"},\n",
" provider_data={\"fireworks_api_key\": \"fw_3ZPgfLt28zCzx9n4KYYPTT7w\"},\n",
")"
]
},
@ -508,14 +370,14 @@
"id": "gojp7at31ht",
"metadata": {},
"source": [
"### 4. Vector Database Setup\n",
"### 4. Vector Store Setup\n",
"\n",
"#### Register a Vector Database\n",
"#### Create a Vector Store with File Upload\n",
"\n",
"Create a FAISS vector database for storing document embeddings:\n",
"Create a vector store using the OpenAI-compatible vector stores API:\n",
"\n",
"- **Vector DB ID**: Unique identifier for the database\n",
"- **Provider**: FAISS (Facebook AI Similarity Search)\n",
"- **Vector Store**: OpenAI-compatible vector store for document storage\n",
"- **File Upload**: Automatic chunking and embedding of uploaded files \n",
"- **Embedding Model**: Sentence Transformers model for text embeddings\n",
"- **Dimensions**: 384-dimensional embeddings"
]
@ -523,60 +385,37 @@
{
"cell_type": "code",
"execution_count": 7,
"id": "a16e2885-ae70-4fa6-9778-2433fa4dbfff",
"id": "be2c2899-ea53-4e5f-b6b8-ed425f5d6572",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n"
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Registered new vector DB: VectorDBRegisterResponse(embedding_dimension=384, embedding_model='sentence-transformers/all-MiniLM-L6-v2', identifier='acme_docs', provider_id='faiss', type='vector_db', provider_resource_id='acme_docs_v2', owner=None, source='via_register_api', vector_db_name=None)\n",
"Existing vector DBs: [VectorDBListResponseItem(embedding_dimension=384, embedding_model='sentence-transformers/all-MiniLM-L6-v2', identifier='acme_docs', provider_id='faiss', type='vector_db', provider_resource_id='acme_docs_v2', vector_db_name=None)]\n"
"File(id='file-33012cc7c18a4430ad1a11b3397c3e29', bytes=41, created_at=1757441277, expires_at=1788977277, filename='shipping_policy.txt', object='file', purpose='assistants')\n",
"File(id='file-e7692c2f044e44859ea02b0630195b4e', bytes=48, created_at=1757441277, expires_at=1788977277, filename='returns_policy.txt', object='file', purpose='assistants')\n",
"File(id='file-9ff178cbb72b4e79a7513ce5a95a6fe1', bytes=45, created_at=1757441277, expires_at=1788977277, filename='support.txt', object='file', purpose='assistants')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores \"HTTP/1.1 200 OK\"\n"
]
}
],
"source": [
"# Register a new clean vector database\n",
"vector_db = client.vector_dbs.register(\n",
" vector_db_id=\"acme_docs\", # Use a new unique name\n",
" provider_id=\"faiss\",\n",
" provider_vector_db_id=\"acme_docs_v2\",\n",
" embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
" embedding_dimension=384,\n",
")\n",
"print(\"Registered new vector DB:\", vector_db)\n",
"\n",
"# List all registered vector databases\n",
"dbs = client.vector_dbs.list()\n",
"print(\"Existing vector DBs:\", dbs)"
]
},
{
"cell_type": "markdown",
"id": "pcgjqzfr3eo",
"metadata": {},
"source": [
"#### Prepare Sample Documents\n",
"\n",
"Create LLAMA Stack Chunks for FAISS vector store"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5a0a6619-c9fb-4938-8ff3-f84304eed91e",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client.types.vector_io_insert_params import Chunk\n",
"from io import BytesIO\n",
"\n",
"docs = [\n",
" (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
@ -584,49 +423,22 @@
" (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
"]\n",
"\n",
"# Convert to Chunk objects\n",
"chunks = []\n",
"for _, (content, metadata) in enumerate(docs):\n",
" # Transform metadata to required format with document_id from title\n",
" metadata = {\"document_id\": metadata[\"title\"]}\n",
" chunk = Chunk(\n",
" content=content, # Required[InterleavedContent]\n",
" metadata=metadata, # Required[Dict]\n",
" )\n",
" chunks.append(chunk)"
]
},
{
"cell_type": "markdown",
"id": "6bg3sm2ko5g",
"metadata": {},
"source": [
"#### Insert Documents into Vector Database\n",
"file_ids = []\n",
"for content, metadata in docs:\n",
" with BytesIO(content.encode()) as file_buffer:\n",
" file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n",
" create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n",
" print(create_file_response)\n",
" file_ids.append(create_file_response.id)\n",
"\n",
"Store the prepared documents in the FAISS vector database. This process:\n",
"1. Generates embeddings for each document\n",
"2. Stores embeddings with metadata\n",
"3. Enables semantic search capabilities"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0e8740d8-b809-44b9-915f-1e0200e3c3f1",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/insert \"HTTP/1.1 200 OK\"\n"
]
}
],
"source": [
"# Insert chunks into FAISS vector store\n",
"\n",
"response = client.vector_io.insert(vector_db_id=\"acme_docs\", chunks=chunks)"
"# Create vector store with files\n",
"vector_store = client.vector_stores.create(\n",
" name=\"acme_docs\",\n",
" file_ids=file_ids,\n",
" embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
" embedding_dimension=384,\n",
" provider_id=\"faiss\"\n",
")"
]
},
{
@ -634,47 +446,42 @@
"id": "9061tmi1zpq",
"metadata": {},
"source": [
"#### Test Vector Search\n",
"#### Test Vector Store Search\n",
"\n",
"Query the vector database to verify it's working correctly. This performs semantic search to find relevant documents based on the query."
"Query the vector store. This performs semantic search to find relevant documents based on the query."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4a5e010c-eeeb-4020-a957-74d6d1cba342",
"execution_count": 8,
"id": "ba9d1901-bd5e-4216-b3e6-19dc74551cc6",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n"
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_d9fa50e8-3ccc-4db3-9102-1a71cd62ab64/search \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"metadata : {'document_id': 'Shipping Policy'}\n",
"content : Acme ships globally in 35 business days.\n",
"metadata : {'document_id': 'Shipping Policy'}\n",
"content : Acme ships globally in 35 business days.\n",
"metadata : {'document_id': 'Shipping Policy'}\n",
"content : Acme ships globally in 3-5 business days.\n"
"Acme ships globally in 3-5 business days.\n",
"Returns are accepted within 30 days of purchase.\n"
]
}
],
"source": [
"# Query chunks from FAISS vector store\n",
"\n",
"query_chunk_response = client.vector_io.query(\n",
" vector_db_id=\"acme_docs\",\n",
" query=\"How long does Acme take to ship orders?\",\n",
"search_response = client.vector_stores.search(\n",
" vector_store_id=vector_store.id,\n",
" query=\"How long does shipping take?\",\n",
" max_num_results=2\n",
")\n",
"for chunk in query_chunk_response.chunks:\n",
" print(\"metadata\", \":\", chunk.metadata)\n",
" print(\"content\", \":\", chunk.content)"
"for result in search_response.data:\n",
" content = result.content[0].text\n",
" print(content)"
]
},
{
@ -695,7 +502,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 9,
"id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
"metadata": {},
"outputs": [],
@ -705,13 +512,11 @@
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Point LangChain to Llamastack Server\n",
"os.environ[\"OPENAI_API_KEY\"] = \"dummy\"\n",
"os.environ[\"OPENAI_BASE_URL\"] = \"http://0.0.0.0:8321/v1/openai/v1\"\n",
"\n",
"# LLM from Llamastack together model\n",
"llm = ChatOpenAI(\n",
" model=\"together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\",\n",
" default_headers={\"X-LlamaStack-Provider-Data\": '{\"together_api_key\": \"***\"}'},\n",
" base_url=\"http://0.0.0.0:8321/v1/openai/v1\",\n",
" api_key=\"dummy\",\n",
" model=\"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\",\n",
" default_headers={\"X-LlamaStack-Provider-Data\": '{\"fireworks_api_key\": \"fw_3ZPgfLt28zCzx9n4KYYPTT7w\"}'},\n",
")"
]
},
@ -727,9 +532,11 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 10,
"id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
@ -741,10 +548,10 @@
{
"data": {
"text/plain": [
"AIMessage(content='With gentle eyes and soft, fuzzy hair,\\nThe llama roams, its beauty beyond compare.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 50, 'total_tokens': 70, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'cached_tokens': 0}, 'model_name': 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', 'system_fingerprint': None, 'id': 'o9gGhuc-4YNCb4-9790ba4bba2f1754', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--ff54cf64-6423-4997-b4da-5f4852da0c7e-0', usage_metadata={'input_tokens': 50, 'output_tokens': 20, 'total_tokens': 70, 'input_token_details': {}, 'output_token_details': {}})"
"AIMessage(content='Silent llama, soft and slow,\\n Majestic creature, with a gentle glow.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': None, 'model_name': 'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'system_fingerprint': None, 'id': 'chatcmpl-8957d071-9f1a-4a7f-9c0e-d516e02194d1', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--f4331df3-e787-4973-b2e8-7ca90d4e19df-0')"
]
},
"execution_count": 12,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
@ -780,7 +587,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 11,
"id": "9684427d-dcc7-4544-9af5-8b110d014c42",
"metadata": {},
"outputs": [],
@ -792,8 +599,7 @@
"\n",
"\n",
"def join_docs(docs):\n",
" return \"\\n\\n\".join([f\"[{d.metadata.get('document_id')}] {d.content}\" for d in docs.chunks])\n",
"\n",
" return \"\\n\\n\".join([f\"[{d.filename}] {d.content[0].text}\" for d in docs.data])\n",
"\n",
"PROMPT = ChatPromptTemplate.from_messages(\n",
" [\n",
@ -803,11 +609,12 @@
")\n",
"\n",
"vector_step = RunnableLambda(\n",
" lambda x: client.vector_io.query(\n",
" vector_db_id=\"acme_docs\",\n",
" lambda x: client.vector_stores.search(\n",
" vector_store_id=vector_store.id,\n",
" query=x,\n",
" max_num_results=2\n",
" )\n",
" )\n",
")\n",
"\n",
"chain = (\n",
" {\"context\": vector_step | RunnableLambda(join_docs), \"question\": RunnablePassthrough()}\n",
@ -829,15 +636,17 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 12,
"id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_d9fa50e8-3ccc-4db3-9102-1a71cd62ab64/search \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
]
},
@ -846,7 +655,7 @@
"output_type": "stream",
"text": [
"❓ How long does shipping take?\n",
"💡 According to the shipping policy, Acme ships globally in 3-5 business days. This means that the shipping time is typically between 3 and 5 business days, but it may vary depending on the specific location and other factors.\n"
"💡 According to the provided context, Acme ships globally in 3-5 business days.\n"
]
}
],
@ -867,7 +676,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 13,
"id": "61995550-bb0b-46a8-a5d0-023207475d60",
"metadata": {},
"outputs": [
@ -875,7 +684,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_d9fa50e8-3ccc-4db3-9102-1a71cd62ab64/search \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
]
},
@ -884,7 +693,7 @@
"output_type": "stream",
"text": [
"❓ Can I return a product after 40 days?\n",
"💡 Based on the provided returns policy, it appears that returns are only accepted within 30 days of purchase. Since you're asking about returning a product after 40 days, it would not be within the specified 30-day return window. Unfortunately, it's unlikely that you would be able to return the product after 40 days.\n"
"💡 Based on the given context, returns are accepted within 30 days of purchase. Therefore, after 40 days, returning a product is not accepted according to the given policy.\n"
]
}
],
@ -903,13 +712,13 @@
"---\n",
"We have successfully built a RAG system that combines:\n",
"\n",
"- **LlamaStack** for infrastructure (LLM serving + vector database)\n",
"- **LlamaStack** for infrastructure (LLM serving + Vector Store)\n",
"- **LangChain** for orchestration (prompts + chains)\n",
"- **Together AI** for high-quality language models\n",
"\n",
"### Key Benefits\n",
"\n",
"1. **Unified Infrastructure**: Single server for LLMs and vector databases\n",
"1. **Unified Infrastructure**: Single server for LLMs and Vector Store\n",
"2. **OpenAI Compatibility**: Easy integration with existing LangChain code\n",
"3. **Multi-Provider Support**: Switch between different LLM providers\n",
"4. **Production Ready**: Built-in safety shields and monitoring\n",
@ -932,6 +741,16 @@
"kill_llama_stack_server()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "15647c46-22ce-4698-af3f-8161329d8e3a",
"metadata": {},
"outputs": [],
"source": [
"kill_llama_stack_server()"
]
}
],
"metadata": {