llama-stack-mirror/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb
2025-09-09 11:15:18 -07:00

958 lines
45 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "1ztegmwm4sp",
"metadata": {},
"source": [
"## LlamaStack + LangChain Integration Tutorial\n",
"\n",
"This notebook demonstrates how to integrate **LlamaStack** with **LangChain** to build a complete RAG (Retrieval-Augmented Generation) system.\n",
"\n",
"### Overview\n",
"\n",
"- **LlamaStack**: Provides the infrastructure for running LLMs and vector databases\n",
"- **LangChain**: Provides the framework for chaining operations and prompt templates\n",
"- **Integration**: Uses LlamaStack's OpenAI-compatible API with LangChain\n",
"\n",
"### What You'll See\n",
"\n",
"1. Setting up LlamaStack server with Together AI provider\n",
"2. Creating and managing vector databases\n",
"3. Building RAG chains with LangChain + LLAMAStack\n",
"4. Querying the chain for relevant information\n",
"\n",
"### Prerequisites\n",
"\n",
"- Together AI API key\n",
"\n",
"---\n",
"\n",
"### 1. Installation and Setup"
]
},
{
"cell_type": "markdown",
"id": "2ktr5ls2cas",
"metadata": {},
"source": [
"#### Install Required Dependencies\n",
"\n",
"First, we install all the necessary packages for LangChain and FastAPI integration."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5b6a6a17-b931-4bea-8273-0d6e5563637a",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: fastapi in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.115.14)\n",
"Requirement already satisfied: uvicorn in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.29.0)\n",
"Requirement already satisfied: langchain>=0.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.27)\n",
"Requirement already satisfied: langchain-openai in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.30)\n",
"Requirement already satisfied: langchain-community in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.27)\n",
"Requirement already satisfied: langchain-text-splitters in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.9)\n",
"Requirement already satisfied: faiss-cpu in /Users/swapna942/miniconda3/lib/python3.12/site-packages (1.11.0)\n",
"Requirement already satisfied: starlette<0.47.0,>=0.40.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (0.46.2)\n",
"Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (2.11.7)\n",
"Requirement already satisfied: typing-extensions>=4.8.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (4.14.1)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.33.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (2.33.2)\n",
"Requirement already satisfied: typing-inspection>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.4.1)\n",
"Requirement already satisfied: anyio<5,>=3.6.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from starlette<0.47.0,>=0.40.0->fastapi) (4.10.0)\n",
"Requirement already satisfied: idna>=2.8 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi) (3.10)\n",
"Requirement already satisfied: sniffio>=1.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi) (1.3.1)\n",
"Requirement already satisfied: click>=7.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from uvicorn) (8.2.1)\n",
"Requirement already satisfied: h11>=0.8 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from uvicorn) (0.16.0)\n",
"Requirement already satisfied: langchain-core<1.0.0,>=0.3.72 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (0.3.74)\n",
"Requirement already satisfied: langsmith>=0.1.17 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (0.4.14)\n",
"Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (2.0.41)\n",
"Requirement already satisfied: requests<3,>=2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (2.32.4)\n",
"Requirement already satisfied: PyYAML>=5.3 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (6.0.2)\n",
"Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (9.1.2)\n",
"Requirement already satisfied: jsonpatch<2.0,>=1.33 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (1.33)\n",
"Requirement already satisfied: packaging>=23.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (24.2)\n",
"Requirement already satisfied: jsonpointer>=1.9 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (2.1)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (3.3.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (2.5.0)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (2025.8.3)\n",
"Requirement already satisfied: openai<2.0.0,>=1.99.9 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-openai) (1.100.2)\n",
"Requirement already satisfied: tiktoken<1,>=0.7 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-openai) (0.9.0)\n",
"Requirement already satisfied: distro<2,>=1.7.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (1.9.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.28.1)\n",
"Requirement already satisfied: jiter<1,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.10.0)\n",
"Requirement already satisfied: tqdm>4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (4.67.1)\n",
"Requirement already satisfied: httpcore==1.* in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.99.9->langchain-openai) (1.0.9)\n",
"Requirement already satisfied: regex>=2022.1.18 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.11.6)\n",
"Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (3.12.13)\n",
"Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (0.6.7)\n",
"Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (2.10.1)\n",
"Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (0.4.1)\n",
"Requirement already satisfied: numpy>=1.26.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (2.3.1)\n",
"Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (2.6.1)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.4.0)\n",
"Requirement already satisfied: attrs>=17.3.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (25.3.0)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.7.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (6.6.3)\n",
"Requirement already satisfied: propcache>=0.2.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (0.3.2)\n",
"Requirement already satisfied: yarl<2.0,>=1.17.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.20.1)\n",
"Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (3.26.1)\n",
"Requirement already satisfied: typing-inspect<1,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (0.9.0)\n",
"Requirement already satisfied: python-dotenv>=0.21.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community) (1.1.1)\n",
"Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (1.1.0)\n",
"Requirement already satisfied: orjson>=3.9.14 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (3.10.18)\n",
"Requirement already satisfied: requests-toolbelt>=1.0.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (1.0.0)\n",
"Requirement already satisfied: zstandard>=0.23.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (0.23.0)\n"
]
}
],
"source": [
"!pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n",
" langchain-community langchain-text-splitters \\\n",
" faiss-cpu"
]
},
{
"cell_type": "markdown",
"id": "wmt9jvqzh7n",
"metadata": {},
"source": [
"### 2. LlamaStack Server Setup\n",
"\n",
"#### Build and Start LlamaStack Server\n",
"\n",
"This section sets up the LlamaStack server with:\n",
"- **Together AI** as the inference provider\n",
"- **FAISS** as the vector database\n",
"- **Sentence Transformers** for embeddings\n",
"\n",
"The server runs on `localhost:8321` and provides OpenAI-compatible endpoints."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n",
"Environment '/Users/swapna942/llama-stack/.venv' already exists, re-using it.\n",
"Virtual environment /Users/swapna942/llama-stack/.venv is already active\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 111ms\u001b[0m\u001b[0m\n",
"Installing pip dependencies\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2K\u001b[2mResolved \u001b[1m184 packages\u001b[0m \u001b[2min 357ms\u001b[0m\u001b[0m \u001b[0m\n",
"\u001b[2mUninstalled \u001b[1m3 packages\u001b[0m \u001b[2min 70ms\u001b[0m\u001b[0m\n",
"\u001b[2K\u001b[2mInstalled \u001b[1m3 packages\u001b[0m \u001b[2min 35ms\u001b[0m\u001b[0m \u001b[0m\n",
" \u001b[31m-\u001b[39m \u001b[1mprotobuf\u001b[0m\u001b[2m==5.29.5\u001b[0m\n",
" \u001b[32m+\u001b[39m \u001b[1mprotobuf\u001b[0m\u001b[2m==5.29.4\u001b[0m\n",
" \u001b[31m-\u001b[39m \u001b[1mruamel-yaml\u001b[0m\u001b[2m==0.18.14\u001b[0m\n",
" \u001b[32m+\u001b[39m \u001b[1mruamel-yaml\u001b[0m\u001b[2m==0.17.40\u001b[0m\n",
" \u001b[31m-\u001b[39m \u001b[1mruff\u001b[0m\u001b[2m==0.12.5\u001b[0m\n",
" \u001b[32m+\u001b[39m \u001b[1mruff\u001b[0m\u001b[2m==0.9.10\u001b[0m\n",
"Installing special provider module: torch torchtune>=0.5.0 torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 69ms\u001b[0m\u001b[0m\n",
"Installing special provider module: torch torchvision torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m3 packages\u001b[0m \u001b[2min 12ms\u001b[0m\u001b[0m\n",
"Installing special provider module: sentence-transformers --no-deps\n",
"\u001b[2mUsing Python 3.13.7 environment at: /Users/swapna942/llama-stack/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 15ms\u001b[0m\u001b[0m\n",
"\u001b[32mBuild Successful!\u001b[0m\n",
"\u001b[34mYou can find the newly-built distribution here: /Users/swapna942/.llama/distributions/starter/starter-run.yaml\u001b[0m\n",
"\u001b[32mYou can run the new Llama Stack distro via: \u001b[34mllama stack run /Users/swapna942/.llama/distributions/starter/starter-run.yaml --image-type venv\u001b[0m\u001b[0m\n"
]
}
],
"source": [
"import os\n",
"import subprocess\n",
"import time\n",
"\n",
"!pip install uv\n",
"\n",
"if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n",
"# this command installs all the dependencies needed for the llama stack server with the together inference provider\n",
"!uv run --with llama-stack llama stack build --distro starter --image-type venv\n",
"\n",
"\n",
"def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n",
" \"uv run --with llama-stack llama stack run starter --image-type venv\",\n",
" shell=True,\n",
" stdout=log_file,\n",
" stderr=log_file,\n",
" text=True,\n",
" )\n",
"\n",
" print(f\"Starting Llama Stack server with PID: {process.pid}\")\n",
" return process\n",
"\n",
"\n",
"def wait_for_server_to_start():\n",
" import requests\n",
" from requests.exceptions import ConnectionError\n",
"\n",
" url = \"http://0.0.0.0:8321/v1/health\"\n",
" max_retries = 30\n",
" retry_interval = 1\n",
"\n",
" print(\"Waiting for server to start\", end=\"\")\n",
" for _ in range(max_retries):\n",
" try:\n",
" response = requests.get(url)\n",
" if response.status_code == 200:\n",
" print(\"\\nServer is ready!\")\n",
" return True\n",
" except ConnectionError:\n",
" print(\".\", end=\"\", flush=True)\n",
" time.sleep(retry_interval)\n",
"\n",
" print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
" return False\n",
"\n",
"\n",
"# use this helper if needed to kill the server\n",
"def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes\n",
" os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "28bd8dbd-4576-4e76-813f-21ab94db44a2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Starting Llama Stack server with PID: 99381\n",
"Waiting for server to start....\n",
"Server is ready!\n"
]
}
],
"source": [
"server_process = run_llama_stack_server_background()\n",
"assert wait_for_server_to_start()"
]
},
{
"cell_type": "markdown",
"id": "gr9cdcg4r7n",
"metadata": {},
"source": [
"#### Install LlamaStack Client\n",
"\n",
"Install the client library to interact with the LlamaStack server."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "487d2dbc-d071-400e-b4f0-dcee58f8dc95",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: llama_stack_client in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (0.2.17)\n",
"Requirement already satisfied: anyio<5,>=3.5.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.9.0)\n",
"Requirement already satisfied: click in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (8.2.1)\n",
"Requirement already satisfied: distro<2,>=1.7.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (1.9.0)\n",
"Requirement already satisfied: fire in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (0.7.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (0.28.1)\n",
"Requirement already satisfied: pandas in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.3.1)\n",
"Requirement already satisfied: prompt-toolkit in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (3.0.51)\n",
"Requirement already satisfied: pyaml in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (25.7.0)\n",
"Requirement already satisfied: pydantic<3,>=1.9.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.11.7)\n",
"Requirement already satisfied: requests in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.32.4)\n",
"Requirement already satisfied: rich in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (14.1.0)\n",
"Requirement already satisfied: sniffio in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (1.3.1)\n",
"Requirement already satisfied: termcolor in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (3.1.0)\n",
"Requirement already satisfied: tqdm in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.67.1)\n",
"Requirement already satisfied: typing-extensions<5,>=4.7 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.14.1)\n",
"Requirement already satisfied: idna>=2.8 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from anyio<5,>=3.5.0->llama_stack_client) (3.10)\n",
"Requirement already satisfied: certifi in /opt/homebrew/opt/certifi/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama_stack_client) (2025.8.3)\n",
"Requirement already satisfied: httpcore==1.* in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama_stack_client) (1.0.9)\n",
"Requirement already satisfied: h11>=0.16 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama_stack_client) (0.16.0)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.33.2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (2.33.2)\n",
"Requirement already satisfied: typing-inspection>=0.4.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (0.4.1)\n",
"Requirement already satisfied: numpy>=1.26.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2.3.2)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2.9.0.post0)\n",
"Requirement already satisfied: pytz>=2020.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2025.2)\n",
"Requirement already satisfied: tzdata>=2022.7 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2025.2)\n",
"Requirement already satisfied: six>=1.5 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas->llama_stack_client) (1.17.0)\n",
"Requirement already satisfied: wcwidth in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from prompt-toolkit->llama_stack_client) (0.2.13)\n",
"Requirement already satisfied: PyYAML in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pyaml->llama_stack_client) (6.0.2)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from requests->llama_stack_client) (3.4.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from requests->llama_stack_client) (2.5.0)\n",
"Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from rich->llama_stack_client) (4.0.0)\n",
"Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from rich->llama_stack_client) (2.19.2)\n",
"Requirement already satisfied: mdurl~=0.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from markdown-it-py>=2.2.0->rich->llama_stack_client) (0.1.2)\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sys\n",
"\n",
"# Install directly to the current Python environment\n",
"subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"llama_stack_client\"])"
]
},
{
"cell_type": "markdown",
"id": "0j5hag7l9x89",
"metadata": {},
"source": [
"### 3. Initialize LlamaStack Client\n",
"\n",
"Create a client connection to the LlamaStack server with API keys for different providers:\n",
"\n",
"- **Together API Key**: For Together AI models\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client import LlamaStackClient\n",
"\n",
"client = LlamaStackClient(\n",
" base_url=\"http://0.0.0.0:8321\",\n",
" provider_data={\"together_api_key\": \"***\"},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "vwhexjy1e8o",
"metadata": {},
"source": [
"#### Explore Available Models and Safety Features\n",
"\n",
"Check what models and safety shields are available through your LlamaStack instance."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/shields \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Available models:\n",
"- all-minilm\n",
"- nvidia/meta/llama-3.1-405b-instruct\n",
"- nvidia/meta/llama-3.1-70b-instruct\n",
"- nvidia/meta/llama-3.1-8b-instruct\n",
"- nvidia/meta/llama-3.2-11b-vision-instruct\n",
"- nvidia/meta/llama-3.2-1b-instruct\n",
"- nvidia/meta/llama-3.2-3b-instruct\n",
"- nvidia/meta/llama-3.2-90b-vision-instruct\n",
"- nvidia/meta/llama-3.3-70b-instruct\n",
"- nvidia/meta/llama3-70b-instruct\n",
"- nvidia/meta/llama3-8b-instruct\n",
"- nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2\n",
"- nvidia/nvidia/nv-embedqa-e5-v5\n",
"- nvidia/nvidia/nv-embedqa-mistral-7b-v2\n",
"- nvidia/snowflake/arctic-embed-l\n",
"- ollama/all-minilm:l6-v2\n",
"- ollama/llama-guard3:1b\n",
"- ollama/llama-guard3:8b\n",
"- ollama/llama3.2:3b-instruct-fp16\n",
"- ollama/nomic-embed-text\n",
"- fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\n",
"- fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct\n",
"- fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct\n",
"- fireworks/accounts/fireworks/models/llama-v3p2-3b-instruct\n",
"- fireworks/accounts/fireworks/models/llama-v3p2-11b-vision-instruct\n",
"- fireworks/accounts/fireworks/models/llama-v3p2-90b-vision-instruct\n",
"- fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct\n",
"- fireworks/accounts/fireworks/models/llama4-scout-instruct-basic\n",
"- fireworks/accounts/fireworks/models/llama4-maverick-instruct-basic\n",
"- fireworks/nomic-ai/nomic-embed-text-v1.5\n",
"- fireworks/accounts/fireworks/models/llama-guard-3-8b\n",
"- fireworks/accounts/fireworks/models/llama-guard-3-11b-vision\n",
"- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\n",
"- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\n",
"- together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\n",
"- together/meta-llama/Llama-3.2-3B-Instruct-Turbo\n",
"- together/meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo\n",
"- together/meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo\n",
"- together/meta-llama/Llama-3.3-70B-Instruct-Turbo\n",
"- together/togethercomputer/m2-bert-80M-8k-retrieval\n",
"- together/togethercomputer/m2-bert-80M-32k-retrieval\n",
"- together/meta-llama/Llama-4-Scout-17B-16E-Instruct\n",
"- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\n",
"- together/meta-llama/Llama-Guard-3-8B\n",
"- together/meta-llama/Llama-Guard-3-11B-Vision-Turbo\n",
"- bedrock/meta.llama3-1-8b-instruct-v1:0\n",
"- bedrock/meta.llama3-1-70b-instruct-v1:0\n",
"- bedrock/meta.llama3-1-405b-instruct-v1:0\n",
"- openai/gpt-3.5-turbo-0125\n",
"- openai/gpt-3.5-turbo\n",
"- openai/gpt-3.5-turbo-instruct\n",
"- openai/gpt-4\n",
"- openai/gpt-4-turbo\n",
"- openai/gpt-4o\n",
"- openai/gpt-4o-2024-08-06\n",
"- openai/gpt-4o-mini\n",
"- openai/gpt-4o-audio-preview\n",
"- openai/chatgpt-4o-latest\n",
"- openai/o1\n",
"- openai/o1-mini\n",
"- openai/o3-mini\n",
"- openai/o4-mini\n",
"- openai/text-embedding-3-small\n",
"- openai/text-embedding-3-large\n",
"- anthropic/claude-3-5-sonnet-latest\n",
"- anthropic/claude-3-7-sonnet-latest\n",
"- anthropic/claude-3-5-haiku-latest\n",
"- anthropic/voyage-3\n",
"- anthropic/voyage-3-lite\n",
"- anthropic/voyage-code-3\n",
"- gemini/gemini-1.5-flash\n",
"- gemini/gemini-1.5-pro\n",
"- gemini/gemini-2.0-flash\n",
"- gemini/gemini-2.0-flash-lite\n",
"- gemini/gemini-2.5-flash\n",
"- gemini/gemini-2.5-flash-lite\n",
"- gemini/gemini-2.5-pro\n",
"- gemini/text-embedding-004\n",
"- groq/llama3-8b-8192\n",
"- groq/llama-3.1-8b-instant\n",
"- groq/llama3-70b-8192\n",
"- groq/llama-3.3-70b-versatile\n",
"- groq/llama-3.2-3b-preview\n",
"- groq/meta-llama/llama-4-scout-17b-16e-instruct\n",
"- groq/meta-llama/llama-4-maverick-17b-128e-instruct\n",
"- sambanova/Meta-Llama-3.1-8B-Instruct\n",
"- sambanova/Meta-Llama-3.3-70B-Instruct\n",
"- sambanova/Llama-4-Maverick-17B-128E-Instruct\n",
"- sentence-transformers/all-MiniLM-L6-v2\n",
"----\n",
"Available shields (safety models):\n",
"code-scanner\n",
"llama-guard\n",
"nemo-guardrail\n",
"----\n"
]
}
],
"source": [
"print(\"Available models:\")\n",
"for m in client.models.list():\n",
" print(f\"- {m.identifier}\")\n",
"\n",
"print(\"----\")\n",
"print(\"Available shields (safety models):\")\n",
"for s in client.shields.list():\n",
" print(s.identifier)\n",
"print(\"----\")"
]
},
{
"cell_type": "markdown",
"id": "gojp7at31ht",
"metadata": {},
"source": [
"### 4. Vector Database Setup\n",
"\n",
"#### Register a Vector Database\n",
"\n",
"Create a FAISS vector database for storing document embeddings:\n",
"\n",
"- **Vector DB ID**: Unique identifier for the database\n",
"- **Provider**: FAISS (Facebook AI Similarity Search)\n",
"- **Embedding Model**: Sentence Transformers model for text embeddings\n",
"- **Dimensions**: 384-dimensional embeddings"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a16e2885-ae70-4fa6-9778-2433fa4dbfff",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Registered new vector DB: VectorDBRegisterResponse(embedding_dimension=384, embedding_model='sentence-transformers/all-MiniLM-L6-v2', identifier='acme_docs', provider_id='faiss', type='vector_db', provider_resource_id='acme_docs_v2', owner=None, source='via_register_api', vector_db_name=None)\n",
"Existing vector DBs: [VectorDBListResponseItem(embedding_dimension=384, embedding_model='sentence-transformers/all-MiniLM-L6-v2', identifier='acme_docs', provider_id='faiss', type='vector_db', provider_resource_id='acme_docs_v2', vector_db_name=None)]\n"
]
}
],
"source": [
"# Register a new clean vector database\n",
"vector_db = client.vector_dbs.register(\n",
" vector_db_id=\"acme_docs\", # Use a new unique name\n",
" provider_id=\"faiss\",\n",
" provider_vector_db_id=\"acme_docs_v2\",\n",
" embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
" embedding_dimension=384,\n",
")\n",
"print(\"Registered new vector DB:\", vector_db)\n",
"\n",
"# List all registered vector databases\n",
"dbs = client.vector_dbs.list()\n",
"print(\"Existing vector DBs:\", dbs)"
]
},
{
"cell_type": "markdown",
"id": "pcgjqzfr3eo",
"metadata": {},
"source": [
"#### Prepare Sample Documents\n",
"\n",
"Create LLAMA Stack Chunks for FAISS vector store"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5a0a6619-c9fb-4938-8ff3-f84304eed91e",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client.types.vector_io_insert_params import Chunk\n",
"\n",
"docs = [\n",
" (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
" (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
" (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
"]\n",
"\n",
"# Convert to Chunk objects\n",
"chunks = []\n",
"for _, (content, metadata) in enumerate(docs):\n",
" # Transform metadata to required format with document_id from title\n",
" metadata = {\"document_id\": metadata[\"title\"]}\n",
" chunk = Chunk(\n",
" content=content, # Required[InterleavedContent]\n",
" metadata=metadata, # Required[Dict]\n",
" )\n",
" chunks.append(chunk)"
]
},
{
"cell_type": "markdown",
"id": "6bg3sm2ko5g",
"metadata": {},
"source": [
"#### Insert Documents into Vector Database\n",
"\n",
"Store the prepared documents in the FAISS vector database. This process:\n",
"1. Generates embeddings for each document\n",
"2. Stores embeddings with metadata\n",
"3. Enables semantic search capabilities"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0e8740d8-b809-44b9-915f-1e0200e3c3f1",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/insert \"HTTP/1.1 200 OK\"\n"
]
}
],
"source": [
"# Insert chunks into FAISS vector store\n",
"\n",
"response = client.vector_io.insert(vector_db_id=\"acme_docs\", chunks=chunks)"
]
},
{
"cell_type": "markdown",
"id": "9061tmi1zpq",
"metadata": {},
"source": [
"#### Test Vector Search\n",
"\n",
"Query the vector database to verify it's working correctly. This performs semantic search to find relevant documents based on the query."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4a5e010c-eeeb-4020-a957-74d6d1cba342",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"metadata : {'document_id': 'Shipping Policy'}\n",
"content : Acme ships globally in 35 business days.\n",
"metadata : {'document_id': 'Shipping Policy'}\n",
"content : Acme ships globally in 35 business days.\n",
"metadata : {'document_id': 'Shipping Policy'}\n",
"content : Acme ships globally in 3-5 business days.\n"
]
}
],
"source": [
"# Query chunks from FAISS vector store\n",
"\n",
"query_chunk_response = client.vector_io.query(\n",
" vector_db_id=\"acme_docs\",\n",
" query=\"How long does Acme take to ship orders?\",\n",
")\n",
"for chunk in query_chunk_response.chunks:\n",
" print(\"metadata\", \":\", chunk.metadata)\n",
" print(\"content\", \":\", chunk.content)"
]
},
{
"cell_type": "markdown",
"id": "usne6mbspms",
"metadata": {},
"source": [
"### 5. LangChain Integration\n",
"\n",
"#### Configure LangChain with LlamaStack\n",
"\n",
"Set up LangChain to use LlamaStack's OpenAI-compatible API:\n",
"\n",
"- **Base URL**: Points to LlamaStack's OpenAI endpoint\n",
"- **Headers**: Include Together AI API key for model access\n",
"- **Model**: Use Meta Llama 3.1 8B model via Together AI"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Point LangChain to Llamastack Server\n",
"os.environ[\"OPENAI_API_KEY\"] = \"dummy\"\n",
"os.environ[\"OPENAI_BASE_URL\"] = \"http://0.0.0.0:8321/v1/openai/v1\"\n",
"\n",
"# LLM from Llamastack together model\n",
"llm = ChatOpenAI(\n",
" model=\"together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\",\n",
" default_headers={\"X-LlamaStack-Provider-Data\": '{\"together_api_key\": \"***\"}'},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5a4ddpcuk3l",
"metadata": {},
"source": [
"#### Test LLM Connection\n",
"\n",
"Verify that LangChain can successfully communicate with the LlamaStack server."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
]
},
{
"data": {
"text/plain": [
"AIMessage(content='With gentle eyes and soft, fuzzy hair,\\nThe llama roams, its beauty beyond compare.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 50, 'total_tokens': 70, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'cached_tokens': 0}, 'model_name': 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', 'system_fingerprint': None, 'id': 'o9gGhuc-4YNCb4-9790ba4bba2f1754', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--ff54cf64-6423-4997-b4da-5f4852da0c7e-0', usage_metadata={'input_tokens': 50, 'output_tokens': 20, 'total_tokens': 70, 'input_token_details': {}, 'output_token_details': {}})"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Test llm with simple message\n",
"messages = [\n",
" {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n",
" {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n",
"]\n",
"llm.invoke(messages)"
]
},
{
"cell_type": "markdown",
"id": "0xh0jg6a0l4a",
"metadata": {},
"source": [
"### 6. Building the RAG Chain\n",
"\n",
"#### Create a Complete RAG Pipeline\n",
"\n",
"Build a LangChain pipeline that combines:\n",
"\n",
"1. **Vector Search**: Query LlamaStack's vector database\n",
"2. **Context Assembly**: Format retrieved documents\n",
"3. **Prompt Template**: Structure the input for the LLM\n",
"4. **LLM Generation**: Generate answers using context\n",
"5. **Output Parsing**: Extract the final response\n",
"\n",
"**Chain Flow**: `Query → Vector Search → Context + Question → LLM → Response`"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9684427d-dcc7-4544-9af5-8b110d014c42",
"metadata": {},
"outputs": [],
"source": [
"# LangChain for prompt template and chaining + LLAMA Stack Client Vector DB and LLM chat completion\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
"\n",
"\n",
"def join_docs(docs):\n",
" return \"\\n\\n\".join([f\"[{d.metadata.get('document_id')}] {d.content}\" for d in docs.chunks])\n",
"\n",
"\n",
"PROMPT = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant. Use the following context to answer.\"),\n",
" (\"user\", \"Question: {question}\\n\\nContext:\\n{context}\"),\n",
" ]\n",
")\n",
"\n",
"vector_step = RunnableLambda(\n",
" lambda x: client.vector_io.query(\n",
" vector_db_id=\"acme_docs\",\n",
" query=x,\n",
" )\n",
")\n",
"\n",
"chain = (\n",
" {\"context\": vector_step | RunnableLambda(join_docs), \"question\": RunnablePassthrough()}\n",
" | PROMPT\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0onu6rhphlra",
"metadata": {},
"source": [
"### 7. Testing the RAG System\n",
"\n",
"#### Example 1: Shipping Query"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"❓ How long does shipping take?\n",
"💡 According to the shipping policy, Acme ships globally in 3-5 business days. This means that the shipping time is typically between 3 and 5 business days, but it may vary depending on the specific location and other factors.\n"
]
}
],
"source": [
"query = \"How long does shipping take?\"\n",
"response = chain.invoke(query)\n",
"print(\"❓\", query)\n",
"print(\"💡\", response)"
]
},
{
"cell_type": "markdown",
"id": "b7krhqj88ku",
"metadata": {},
"source": [
"#### Example 2: Returns Policy Query"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "61995550-bb0b-46a8-a5d0-023207475d60",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n",
"INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"❓ Can I return a product after 40 days?\n",
"💡 Based on the provided returns policy, it appears that returns are only accepted within 30 days of purchase. Since you're asking about returning a product after 40 days, it would not be within the specified 30-day return window. Unfortunately, it's unlikely that you would be able to return the product after 40 days.\n"
]
}
],
"source": [
"query = \"Can I return a product after 40 days?\"\n",
"response = chain.invoke(query)\n",
"print(\"❓\", query)\n",
"print(\"💡\", response)"
]
},
{
"cell_type": "markdown",
"id": "h4w24fadvjs",
"metadata": {},
"source": [
"---\n",
"We have successfully built a RAG system that combines:\n",
"\n",
"- **LlamaStack** for infrastructure (LLM serving + vector database)\n",
"- **LangChain** for orchestration (prompts + chains)\n",
"- **Together AI** for high-quality language models\n",
"\n",
"### Key Benefits\n",
"\n",
"1. **Unified Infrastructure**: Single server for LLMs and vector databases\n",
"2. **OpenAI Compatibility**: Easy integration with existing LangChain code\n",
"3. **Multi-Provider Support**: Switch between different LLM providers\n",
"4. **Production Ready**: Built-in safety shields and monitoring\n",
"\n",
"### Next Steps\n",
"\n",
"- Add more sophisticated document processing\n",
"- Implement conversation memory\n",
"- Add safety filtering and monitoring\n",
"- Scale to larger document collections\n",
"- Integrate with web frameworks like FastAPI or Streamlit\n",
"\n",
"---\n",
"\n",
"##### 🔧 Cleanup\n",
"\n",
"Don't forget to stop the LlamaStack server when you're done:\n",
"\n",
"```python\n",
"kill_llama_stack_server()\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}