feat: Add example notebook for Langchain + LLAMAStack integration (#3228)

# What does this PR do? Add LLAMAStack + Langchain integration example notebook ## Test Plan Ran in Jupyter notebook, works end to end. (Used Claude mainly for documentation and coding/debugging help)
2025-12-04 02:03:44 +00:00 · 2025-08-26 11:34:08 -07:00 · 2025-08-26 11:34:08 -07:00 · 2666029427
commit 2666029427
parent 7ca8233889
1 changed files with 946 additions and 0 deletions
--- a/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb
+++ b/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb
@ -0,0 +1,946 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1ztegmwm4sp",
+   "metadata": {},
+   "source": [
+    "## LlamaStack + LangChain Integration Tutorial\n",
+    "\n",
+    "This notebook demonstrates how to integrate **LlamaStack** with **LangChain** to build a complete RAG (Retrieval-Augmented Generation) system.\n",
+    "\n",
+    "### Overview\n",
+    "\n",
+    "- **LlamaStack**: Provides the infrastructure for running LLMs and vector databases\n",
+    "- **LangChain**: Provides the framework for chaining operations and prompt templates\n",
+    "- **Integration**: Uses LlamaStack's OpenAI-compatible API with LangChain\n",
+    "\n",
+    "### What You'll See\n",
+    "\n",
+    "1. Setting up LlamaStack server with Together AI provider\n",
+    "2. Creating and managing vector databases\n",
+    "3. Building RAG chains with LangChain + LLAMAStack\n",
+    "4. Querying the chain for relevant information\n",
+    "\n",
+    "### Prerequisites\n",
+    "\n",
+    "- Together AI API key\n",
+    "\n",
+    "---\n",
+    "\n",
+    "### 1. Installation and Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ktr5ls2cas",
+   "metadata": {},
+   "source": [
+    "#### Install Required Dependencies\n",
+    "\n",
+    "First, we install all the necessary packages for LangChain and FastAPI integration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "5b6a6a17-b931-4bea-8273-0d6e5563637a",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: fastapi in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.115.14)\n",
+      "Requirement already satisfied: uvicorn in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.29.0)\n",
+      "Requirement already satisfied: langchain>=0.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.27)\n",
+      "Requirement already satisfied: langchain-openai in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.30)\n",
+      "Requirement already satisfied: langchain-community in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.27)\n",
+      "Requirement already satisfied: langchain-text-splitters in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.3.9)\n",
+      "Requirement already satisfied: faiss-cpu in /Users/swapna942/miniconda3/lib/python3.12/site-packages (1.11.0)\n",
+      "Requirement already satisfied: starlette<0.47.0,>=0.40.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (0.46.2)\n",
+      "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (2.11.7)\n",
+      "Requirement already satisfied: typing-extensions>=4.8.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from fastapi) (4.14.1)\n",
+      "Requirement already satisfied: annotated-types>=0.6.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.7.0)\n",
+      "Requirement already satisfied: pydantic-core==2.33.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (2.33.2)\n",
+      "Requirement already satisfied: typing-inspection>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.4.1)\n",
+      "Requirement already satisfied: anyio<5,>=3.6.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from starlette<0.47.0,>=0.40.0->fastapi) (4.10.0)\n",
+      "Requirement already satisfied: idna>=2.8 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi) (3.10)\n",
+      "Requirement already satisfied: sniffio>=1.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from anyio<5,>=3.6.2->starlette<0.47.0,>=0.40.0->fastapi) (1.3.1)\n",
+      "Requirement already satisfied: click>=7.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from uvicorn) (8.2.1)\n",
+      "Requirement already satisfied: h11>=0.8 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from uvicorn) (0.16.0)\n",
+      "Requirement already satisfied: langchain-core<1.0.0,>=0.3.72 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (0.3.74)\n",
+      "Requirement already satisfied: langsmith>=0.1.17 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (0.4.14)\n",
+      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (2.0.41)\n",
+      "Requirement already satisfied: requests<3,>=2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (2.32.4)\n",
+      "Requirement already satisfied: PyYAML>=5.3 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain>=0.2) (6.0.2)\n",
+      "Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (9.1.2)\n",
+      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (1.33)\n",
+      "Requirement already satisfied: packaging>=23.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (24.2)\n",
+      "Requirement already satisfied: jsonpointer>=1.9 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.72->langchain>=0.2) (2.1)\n",
+      "Requirement already satisfied: charset_normalizer<4,>=2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (3.3.2)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (2.5.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from requests<3,>=2->langchain>=0.2) (2025.8.3)\n",
+      "Requirement already satisfied: openai<2.0.0,>=1.99.9 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-openai) (1.100.2)\n",
+      "Requirement already satisfied: tiktoken<1,>=0.7 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-openai) (0.9.0)\n",
+      "Requirement already satisfied: distro<2,>=1.7.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (1.9.0)\n",
+      "Requirement already satisfied: httpx<1,>=0.23.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.28.1)\n",
+      "Requirement already satisfied: jiter<1,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.10.0)\n",
+      "Requirement already satisfied: tqdm>4 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (4.67.1)\n",
+      "Requirement already satisfied: httpcore==1.* in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.99.9->langchain-openai) (1.0.9)\n",
+      "Requirement already satisfied: regex>=2022.1.18 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.11.6)\n",
+      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (3.12.13)\n",
+      "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (0.6.7)\n",
+      "Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (2.10.1)\n",
+      "Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (0.4.1)\n",
+      "Requirement already satisfied: numpy>=1.26.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langchain-community) (2.3.1)\n",
+      "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (2.6.1)\n",
+      "Requirement already satisfied: aiosignal>=1.1.2 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.4.0)\n",
+      "Requirement already satisfied: attrs>=17.3.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (25.3.0)\n",
+      "Requirement already satisfied: frozenlist>=1.1.1 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.7.0)\n",
+      "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (6.6.3)\n",
+      "Requirement already satisfied: propcache>=0.2.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (0.3.2)\n",
+      "Requirement already satisfied: yarl<2.0,>=1.17.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.20.1)\n",
+      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (3.26.1)\n",
+      "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community) (0.9.0)\n",
+      "Requirement already satisfied: python-dotenv>=0.21.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community) (1.1.1)\n",
+      "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (1.1.0)\n",
+      "Requirement already satisfied: orjson>=3.9.14 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (3.10.18)\n",
+      "Requirement already satisfied: requests-toolbelt>=1.0.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (1.0.0)\n",
+      "Requirement already satisfied: zstandard>=0.23.0 in /Users/swapna942/miniconda3/lib/python3.12/site-packages (from langsmith>=0.1.17->langchain>=0.2) (0.23.0)\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n",
+    "             langchain-community langchain-text-splitters \\\n",
+    "             faiss-cpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "wmt9jvqzh7n",
+   "metadata": {},
+   "source": [
+    "### 2. LlamaStack Server Setup\n",
+    "\n",
+    "#### Build and Start LlamaStack Server\n",
+    "\n",
+    "This section sets up the LlamaStack server with:\n",
+    "- **Together AI** as the inference provider\n",
+    "- **FAISS** as the vector database\n",
+    "- **Sentence Transformers** for embeddings\n",
+    "\n",
+    "The server runs on `localhost:8321` and provides OpenAI-compatible endpoints."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n",
+      "Environment '/Users/swapna942/llama-stack/.venv' already exists, re-using it.\n",
+      "Virtual environment /Users/swapna942/llama-stack/.venv is already active\n",
+      "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 86ms\u001b[0m\u001b[0m\n",
+      "Installing pip dependencies\n",
+      "\u001b[2K\u001b[2mResolved \u001b[1m178 packages\u001b[0m \u001b[2min 462ms\u001b[0m\u001b[0m                                       \u001b[0m\n",
+      "\u001b[2mUninstalled \u001b[1m2 packages\u001b[0m \u001b[2min 28ms\u001b[0m\u001b[0m\n",
+      "\u001b[2K\u001b[2mInstalled \u001b[1m2 packages\u001b[0m \u001b[2min 5ms\u001b[0m\u001b[0m                                 \u001b[0m\n",
+      " \u001b[31m-\u001b[39m \u001b[1mprotobuf\u001b[0m\u001b[2m==5.29.5\u001b[0m\n",
+      " \u001b[32m+\u001b[39m \u001b[1mprotobuf\u001b[0m\u001b[2m==5.29.4\u001b[0m\n",
+      " \u001b[31m-\u001b[39m \u001b[1mruff\u001b[0m\u001b[2m==0.12.5\u001b[0m\n",
+      " \u001b[32m+\u001b[39m \u001b[1mruff\u001b[0m\u001b[2m==0.9.10\u001b[0m\n",
+      "Installing special provider module: torch torchvision --index-url https://download.pytorch.org/whl/cpu\n",
+      "\u001b[2mAudited \u001b[1m2 packages\u001b[0m \u001b[2min 5ms\u001b[0m\u001b[0m\n",
+      "Installing special provider module: sentence-transformers --no-deps\n",
+      "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 9ms\u001b[0m\u001b[0m\n",
+      "\u001b[32mBuild Successful!\u001b[0m\n",
+      "\u001b[34mYou can find the newly-built distribution here: /Users/swapna942/.llama/distributions/starter/starter-run.yaml\u001b[0m\n",
+      "\u001b[32mYou can run the new Llama Stack distro via: \u001b[34mllama stack run /Users/swapna942/.llama/distributions/starter/starter-run.yaml --image-type venv\u001b[0m\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import subprocess\n",
+    "import time\n",
+    "\n",
+    "!pip install uv\n",
+    "\n",
+    "if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
+    "    del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
+    "\n",
+    "# this command installs all the dependencies needed for the llama stack server with the together inference provider\n",
+    "!uv run --with llama-stack llama stack build --distro starter --image-type venv\n",
+    "\n",
+    "\n",
+    "def run_llama_stack_server_background():\n",
+    "    log_file = open(\"llama_stack_server.log\", \"w\")\n",
+    "    process = subprocess.Popen(\n",
+    "        \"uv run --with llama-stack llama stack run /Users/swapna942/.llama/distributions/starter/starter-run.yaml --image-type venv\",\n",
+    "        shell=True,\n",
+    "        stdout=log_file,\n",
+    "        stderr=log_file,\n",
+    "        text=True,\n",
+    "    )\n",
+    "\n",
+    "    print(f\"Starting Llama Stack server with PID: {process.pid}\")\n",
+    "    return process\n",
+    "\n",
+    "\n",
+    "def wait_for_server_to_start():\n",
+    "    import requests\n",
+    "    from requests.exceptions import ConnectionError\n",
+    "\n",
+    "    url = \"http://0.0.0.0:8321/v1/health\"\n",
+    "    max_retries = 30\n",
+    "    retry_interval = 1\n",
+    "\n",
+    "    print(\"Waiting for server to start\", end=\"\")\n",
+    "    for _ in range(max_retries):\n",
+    "        try:\n",
+    "            response = requests.get(url)\n",
+    "            if response.status_code == 200:\n",
+    "                print(\"\\nServer is ready!\")\n",
+    "                return True\n",
+    "        except ConnectionError:\n",
+    "            print(\".\", end=\"\", flush=True)\n",
+    "            time.sleep(retry_interval)\n",
+    "\n",
+    "    print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
+    "    return False\n",
+    "\n",
+    "\n",
+    "# use this helper if needed to kill the server\n",
+    "def kill_llama_stack_server():\n",
+    "    # Kill any existing llama stack server processes\n",
+    "    os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "28bd8dbd-4576-4e76-813f-21ab94db44a2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Starting Llama Stack server with PID: 99016\n",
+      "Waiting for server to start....\n",
+      "Server is ready!\n"
+     ]
+    }
+   ],
+   "source": [
+    "server_process = run_llama_stack_server_background()\n",
+    "assert wait_for_server_to_start()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gr9cdcg4r7n",
+   "metadata": {},
+   "source": [
+    "#### Install LlamaStack Client\n",
+    "\n",
+    "Install the client library to interact with the LlamaStack server."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "487d2dbc-d071-400e-b4f0-dcee58f8dc95",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: llama_stack_client in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (0.2.17)\n",
+      "Requirement already satisfied: anyio<5,>=3.5.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.9.0)\n",
+      "Requirement already satisfied: click in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (8.2.1)\n",
+      "Requirement already satisfied: distro<2,>=1.7.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (1.9.0)\n",
+      "Requirement already satisfied: fire in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (0.7.0)\n",
+      "Requirement already satisfied: httpx<1,>=0.23.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (0.28.1)\n",
+      "Requirement already satisfied: pandas in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.3.1)\n",
+      "Requirement already satisfied: prompt-toolkit in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (3.0.51)\n",
+      "Requirement already satisfied: pyaml in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (25.7.0)\n",
+      "Requirement already satisfied: pydantic<3,>=1.9.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.11.7)\n",
+      "Requirement already satisfied: requests in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (2.32.4)\n",
+      "Requirement already satisfied: rich in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (14.1.0)\n",
+      "Requirement already satisfied: sniffio in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (1.3.1)\n",
+      "Requirement already satisfied: termcolor in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (3.1.0)\n",
+      "Requirement already satisfied: tqdm in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.67.1)\n",
+      "Requirement already satisfied: typing-extensions<5,>=4.7 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from llama_stack_client) (4.14.1)\n",
+      "Requirement already satisfied: idna>=2.8 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from anyio<5,>=3.5.0->llama_stack_client) (3.10)\n",
+      "Requirement already satisfied: certifi in /opt/homebrew/opt/certifi/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama_stack_client) (2025.8.3)\n",
+      "Requirement already satisfied: httpcore==1.* in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from httpx<1,>=0.23.0->llama_stack_client) (1.0.9)\n",
+      "Requirement already satisfied: h11>=0.16 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->llama_stack_client) (0.16.0)\n",
+      "Requirement already satisfied: annotated-types>=0.6.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (0.7.0)\n",
+      "Requirement already satisfied: pydantic-core==2.33.2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (2.33.2)\n",
+      "Requirement already satisfied: typing-inspection>=0.4.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pydantic<3,>=1.9.0->llama_stack_client) (0.4.1)\n",
+      "Requirement already satisfied: numpy>=1.26.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2.3.2)\n",
+      "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2.9.0.post0)\n",
+      "Requirement already satisfied: pytz>=2020.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2025.2)\n",
+      "Requirement already satisfied: tzdata>=2022.7 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pandas->llama_stack_client) (2025.2)\n",
+      "Requirement already satisfied: six>=1.5 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas->llama_stack_client) (1.17.0)\n",
+      "Requirement already satisfied: wcwidth in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from prompt-toolkit->llama_stack_client) (0.2.13)\n",
+      "Requirement already satisfied: PyYAML in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from pyaml->llama_stack_client) (6.0.2)\n",
+      "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from requests->llama_stack_client) (3.4.2)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from requests->llama_stack_client) (2.5.0)\n",
+      "Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from rich->llama_stack_client) (4.0.0)\n",
+      "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from rich->llama_stack_client) (2.19.2)\n",
+      "Requirement already satisfied: mdurl~=0.1 in /opt/homebrew/Cellar/jupyterlab/4.4.5/libexec/lib/python3.13/site-packages (from markdown-it-py>=2.2.0->rich->llama_stack_client) (0.1.2)\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "0"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "\n",
+    "# Install directly to the current Python environment\n",
+    "subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"llama_stack_client\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0j5hag7l9x89",
+   "metadata": {},
+   "source": [
+    "### 3. Initialize LlamaStack Client\n",
+    "\n",
+    "Create a client connection to the LlamaStack server with API keys for different providers:\n",
+    "\n",
+    "- **OpenAI API Key**: For OpenAI models\n",
+    "- **Gemini API Key**: For Google's Gemini models  \n",
+    "- **Together API Key**: For Together AI models\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client import LlamaStackClient\n",
+    "\n",
+    "client = LlamaStackClient(\n",
+    "    base_url=\"http://0.0.0.0:8321\",\n",
+    "    provider_data={\"openai_api_key\": \"****\", \"gemini_api_key\": \"****\", \"together_api_key\": \"****\"},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "vwhexjy1e8o",
+   "metadata": {},
+   "source": [
+    "#### Explore Available Models and Safety Features\n",
+    "\n",
+    "Check what models and safety shields are available through your LlamaStack instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/shields \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Available models:\n",
+      "- all-minilm\n",
+      "- ollama/all-minilm:l6-v2\n",
+      "- ollama/llama-guard3:1b\n",
+      "- ollama/llama-guard3:8b\n",
+      "- ollama/llama3.2:3b-instruct-fp16\n",
+      "- ollama/nomic-embed-text\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p2-3b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p2-11b-vision-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p2-90b-vision-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama4-scout-instruct-basic\n",
+      "- fireworks/accounts/fireworks/models/llama4-maverick-instruct-basic\n",
+      "- fireworks/nomic-ai/nomic-embed-text-v1.5\n",
+      "- fireworks/accounts/fireworks/models/llama-guard-3-8b\n",
+      "- fireworks/accounts/fireworks/models/llama-guard-3-11b-vision\n",
+      "- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\n",
+      "- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\n",
+      "- together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\n",
+      "- together/meta-llama/Llama-3.2-3B-Instruct-Turbo\n",
+      "- together/meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo\n",
+      "- together/meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo\n",
+      "- together/meta-llama/Llama-3.3-70B-Instruct-Turbo\n",
+      "- together/togethercomputer/m2-bert-80M-8k-retrieval\n",
+      "- together/togethercomputer/m2-bert-80M-32k-retrieval\n",
+      "- together/meta-llama/Llama-4-Scout-17B-16E-Instruct\n",
+      "- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\n",
+      "- together/meta-llama/Llama-Guard-3-8B\n",
+      "- together/meta-llama/Llama-Guard-3-11B-Vision-Turbo\n",
+      "- bedrock/meta.llama3-1-8b-instruct-v1:0\n",
+      "- bedrock/meta.llama3-1-70b-instruct-v1:0\n",
+      "- bedrock/meta.llama3-1-405b-instruct-v1:0\n",
+      "- openai/gpt-3.5-turbo-0125\n",
+      "- openai/gpt-3.5-turbo\n",
+      "- openai/gpt-3.5-turbo-instruct\n",
+      "- openai/gpt-4\n",
+      "- openai/gpt-4-turbo\n",
+      "- openai/gpt-4o\n",
+      "- openai/gpt-4o-2024-08-06\n",
+      "- openai/gpt-4o-mini\n",
+      "- openai/gpt-4o-audio-preview\n",
+      "- openai/chatgpt-4o-latest\n",
+      "- openai/o1\n",
+      "- openai/o1-mini\n",
+      "- openai/o3-mini\n",
+      "- openai/o4-mini\n",
+      "- openai/text-embedding-3-small\n",
+      "- openai/text-embedding-3-large\n",
+      "- anthropic/claude-3-5-sonnet-latest\n",
+      "- anthropic/claude-3-7-sonnet-latest\n",
+      "- anthropic/claude-3-5-haiku-latest\n",
+      "- anthropic/voyage-3\n",
+      "- anthropic/voyage-3-lite\n",
+      "- anthropic/voyage-code-3\n",
+      "- gemini/gemini-1.5-flash\n",
+      "- gemini/gemini-1.5-pro\n",
+      "- gemini/gemini-2.0-flash\n",
+      "- gemini/gemini-2.0-flash-lite\n",
+      "- gemini/gemini-2.5-flash\n",
+      "- gemini/gemini-2.5-flash-lite\n",
+      "- gemini/gemini-2.5-pro\n",
+      "- gemini/text-embedding-004\n",
+      "- groq/llama3-8b-8192\n",
+      "- groq/llama-3.1-8b-instant\n",
+      "- groq/llama3-70b-8192\n",
+      "- groq/llama-3.3-70b-versatile\n",
+      "- groq/llama-3.2-3b-preview\n",
+      "- groq/meta-llama/llama-4-scout-17b-16e-instruct\n",
+      "- groq/meta-llama/llama-4-maverick-17b-128e-instruct\n",
+      "- sambanova/Meta-Llama-3.1-8B-Instruct\n",
+      "- sambanova/Meta-Llama-3.3-70B-Instruct\n",
+      "- sambanova/Llama-4-Maverick-17B-128E-Instruct\n",
+      "- sentence-transformers/all-MiniLM-L6-v2\n",
+      "----\n",
+      "Available shields (safety models):\n",
+      "code-scanner\n",
+      "llama-guard\n",
+      "----\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Available models:\")\n",
+    "for m in client.models.list():\n",
+    "    print(f\"- {m.identifier}\")\n",
+    "\n",
+    "print(\"----\")\n",
+    "print(\"Available shields (safety models):\")\n",
+    "for s in client.shields.list():\n",
+    "    print(s.identifier)\n",
+    "print(\"----\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gojp7at31ht",
+   "metadata": {},
+   "source": [
+    "### 4. Vector Database Setup\n",
+    "\n",
+    "#### Register a Vector Database\n",
+    "\n",
+    "Create a FAISS vector database for storing document embeddings:\n",
+    "\n",
+    "- **Vector DB ID**: Unique identifier for the database\n",
+    "- **Provider**: FAISS (Facebook AI Similarity Search)\n",
+    "- **Embedding Model**: Sentence Transformers model for text embeddings\n",
+    "- **Dimensions**: 384-dimensional embeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a16e2885-ae70-4fa6-9778-2433fa4dbfff",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Registered new vector DB: VectorDBRegisterResponse(embedding_dimension=384, embedding_model='sentence-transformers/all-MiniLM-L6-v2', identifier='acme_docs', provider_id='faiss', type='vector_db', provider_resource_id='acme_docs_v2', owner=None, source='via_register_api', vector_db_name=None)\n",
+      "Existing vector DBs: [VectorDBListResponseItem(embedding_dimension=384, embedding_model='sentence-transformers/all-MiniLM-L6-v2', identifier='acme_docs', provider_id='faiss', type='vector_db', provider_resource_id='acme_docs_v2', vector_db_name=None)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Register a new clean vector database\n",
+    "vector_db = client.vector_dbs.register(\n",
+    "    vector_db_id=\"acme_docs\",  # Use a new unique name\n",
+    "    provider_id=\"faiss\",\n",
+    "    provider_vector_db_id=\"acme_docs_v2\",\n",
+    "    embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
+    "    embedding_dimension=384,\n",
+    ")\n",
+    "print(\"Registered new vector DB:\", vector_db)\n",
+    "\n",
+    "# List all registered vector databases\n",
+    "dbs = client.vector_dbs.list()\n",
+    "print(\"Existing vector DBs:\", dbs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "pcgjqzfr3eo",
+   "metadata": {},
+   "source": [
+    "#### Prepare Sample Documents\n",
+    "\n",
+    "Create LLAMA Stack Chunks for FAISS vector store"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5a0a6619-c9fb-4938-8ff3-f84304eed91e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client.types.vector_io_insert_params import Chunk\n",
+    "\n",
+    "docs = [\n",
+    "    (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
+    "    (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
+    "    (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
+    "]\n",
+    "\n",
+    "# Convert to Chunk objects\n",
+    "chunks = []\n",
+    "for _, (content, metadata) in enumerate(docs):\n",
+    "    # Transform metadata to required format with document_id from title\n",
+    "    metadata = {\"document_id\": metadata[\"title\"]}\n",
+    "    chunk = Chunk(\n",
+    "        content=content,  # Required[InterleavedContent]\n",
+    "        metadata=metadata,  # Required[Dict]\n",
+    "    )\n",
+    "    chunks.append(chunk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bg3sm2ko5g",
+   "metadata": {},
+   "source": [
+    "#### Insert Documents into Vector Database\n",
+    "\n",
+    "Store the prepared documents in the FAISS vector database. This process:\n",
+    "1. Generates embeddings for each document\n",
+    "2. Stores embeddings with metadata\n",
+    "3. Enables semantic search capabilities"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "0e8740d8-b809-44b9-915f-1e0200e3c3f1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/insert \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Documents inserted: None\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Insert chunks into FAISS vector store\n",
+    "\n",
+    "response = client.vector_io.insert(vector_db_id=\"acme_docs\", chunks=chunks)\n",
+    "print(\"Documents inserted:\", response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9061tmi1zpq",
+   "metadata": {},
+   "source": [
+    "#### Test Vector Search\n",
+    "\n",
+    "Query the vector database to verify it's working correctly. This performs semantic search to find relevant documents based on the query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "4a5e010c-eeeb-4020-a957-74d6d1cba342",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "metadata : {'document_id': 'Shipping Policy'}\n",
+      "content : Acme ships globally in 3–5 business days.\n",
+      "metadata : {'document_id': 'Shipping Policy'}\n",
+      "content : Acme ships globally in 3–5 business days.\n",
+      "metadata : {'document_id': 'Returns Policy'}\n",
+      "content : Returns are accepted within 30 days of purchase.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Query chunks from FAISS vector store\n",
+    "\n",
+    "query_chunk_response = client.vector_io.query(\n",
+    "    vector_db_id=\"acme_docs\",\n",
+    "    query=\"How long does Acme take to ship orders?\",\n",
+    ")\n",
+    "for chunk in query_chunk_response.chunks:\n",
+    "    print(\"metadata\", \":\", chunk.metadata)\n",
+    "    print(\"content\", \":\", chunk.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "usne6mbspms",
+   "metadata": {},
+   "source": [
+    "### 5. LangChain Integration\n",
+    "\n",
+    "#### Configure LangChain with LlamaStack\n",
+    "\n",
+    "Set up LangChain to use LlamaStack's OpenAI-compatible API:\n",
+    "\n",
+    "- **Base URL**: Points to LlamaStack's OpenAI endpoint\n",
+    "- **Headers**: Include Together AI API key for model access\n",
+    "- **Model**: Use Meta Llama 3.1 8B model via Together AI"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "\n",
+    "# Point LangChain to Llamastack Server\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"dummy\"\n",
+    "os.environ[\"OPENAI_BASE_URL\"] = \"http://0.0.0.0:8321/v1/openai/v1\"\n",
+    "\n",
+    "# LLM from Llamastack together model\n",
+    "llm = ChatOpenAI(\n",
+    "    model=\"together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\",\n",
+    "    default_headers={\"X-LlamaStack-Provider-Data\": '{\"together_api_key\": \"***\"}'},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a4ddpcuk3l",
+   "metadata": {},
+   "source": [
+    "#### Test LLM Connection\n",
+    "\n",
+    "Verify that LangChain can successfully communicate with the LlamaStack server."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content=\"In the Andes, a gentle soul resides, \\nA llama's soft eyes, with kindness abide.\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 50, 'total_tokens': 72, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'cached_tokens': 0}, 'model_name': 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', 'system_fingerprint': None, 'id': 'o86Jy3i-2j9zxn-972d7b27f8f22aaa', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--4797f8b9-a5f6-4730-aece-80c1fd88ac55-0', usage_metadata={'input_tokens': 50, 'output_tokens': 22, 'total_tokens': 72, 'input_token_details': {}, 'output_token_details': {}})"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Test llm with simple message\n",
+    "messages = [\n",
+    "    {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n",
+    "    {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n",
+    "]\n",
+    "llm.invoke(messages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0xh0jg6a0l4a",
+   "metadata": {},
+   "source": [
+    "### 6. Building the RAG Chain\n",
+    "\n",
+    "#### Create a Complete RAG Pipeline\n",
+    "\n",
+    "Build a LangChain pipeline that combines:\n",
+    "\n",
+    "1. **Vector Search**: Query LlamaStack's vector database\n",
+    "2. **Context Assembly**: Format retrieved documents\n",
+    "3. **Prompt Template**: Structure the input for the LLM\n",
+    "4. **LLM Generation**: Generate answers using context\n",
+    "5. **Output Parsing**: Extract the final response\n",
+    "\n",
+    "**Chain Flow**: `Query → Vector Search → Context + Question → LLM → Response`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9684427d-dcc7-4544-9af5-8b110d014c42",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# LangChain for prompt template and chaining + LLAMA Stack Client Vector DB and LLM chat completion\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
+    "\n",
+    "\n",
+    "def join_docs(docs):\n",
+    "    return \"\\n\\n\".join([f\"[{d.metadata.get('document_id')}] {d.content}\" for d in docs.chunks])\n",
+    "\n",
+    "\n",
+    "PROMPT = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", \"You are a helpful assistant. Use the following context to answer.\"),\n",
+    "        (\"user\", \"Question: {question}\\n\\nContext:\\n{context}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "vector_step = RunnableLambda(\n",
+    "    lambda x: client.vector_io.query(\n",
+    "        vector_db_id=\"acme_docs\",\n",
+    "        query=x,\n",
+    "    )\n",
+    ")\n",
+    "\n",
+    "chain = (\n",
+    "    {\"context\": vector_step | RunnableLambda(join_docs), \"question\": RunnablePassthrough()}\n",
+    "    | PROMPT\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0onu6rhphlra",
+   "metadata": {},
+   "source": [
+    "### 7. Testing the RAG System\n",
+    "\n",
+    "#### Example 1: Shipping Query"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "❓ How long does shipping take?\n",
+      "💡 According to the Shipping Policy, shipping from Acme takes 3-5 business days.\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"How long does shipping take?\"\n",
+    "response = chain.invoke(query)\n",
+    "print(\"❓\", query)\n",
+    "print(\"💡\", response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7krhqj88ku",
+   "metadata": {},
+   "source": [
+    "#### Example 2: Returns Policy Query"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "61995550-bb0b-46a8-a5d0-023207475d60",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "❓ Can I return a product after 40 days?\n",
+      "💡 Based on the provided returns policy, it appears that returns are only accepted within 30 days of purchase. Since you're asking about returning a product after 40 days, it would not be within the specified 30-day return window.\n",
+      "\n",
+      "Unfortunately, it seems that you would not be eligible for a return in this case. However, I would recommend reaching out to the support team via chat or email to confirm their policy and see if there are any exceptions or alternative solutions available.\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"Can I return a product after 40 days?\"\n",
+    "response = chain.invoke(query)\n",
+    "print(\"❓\", query)\n",
+    "print(\"💡\", response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "h4w24fadvjs",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "We have successfully built a RAG system that combines:\n",
+    "\n",
+    "- **LlamaStack** for infrastructure (LLM serving + vector database)\n",
+    "- **LangChain** for orchestration (prompts + chains)\n",
+    "- **Together AI** for high-quality language models\n",
+    "\n",
+    "### Key Benefits\n",
+    "\n",
+    "1. **Unified Infrastructure**: Single server for LLMs and vector databases\n",
+    "2. **OpenAI Compatibility**: Easy integration with existing LangChain code\n",
+    "3. **Multi-Provider Support**: Switch between different LLM providers\n",
+    "4. **Production Ready**: Built-in safety shields and monitoring\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Add more sophisticated document processing\n",
+    "- Implement conversation memory\n",
+    "- Add safety filtering and monitoring\n",
+    "- Scale to larger document collections\n",
+    "- Integrate with web frameworks like FastAPI or Streamlit\n",
+    "\n",
+    "---\n",
+    "\n",
+    "##### 🔧 Cleanup\n",
+    "\n",
+    "Don't forget to stop the LlamaStack server when you're done:\n",
+    "\n",
+    "```python\n",
+    "kill_llama_stack_server()\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}