llama-stack-mirror/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1ztegmwm4sp",
   "metadata": {},
   "source": [
    "## LlamaStack + LangChain Integration Tutorial\n",
    "\n",
    "This notebook demonstrates how to integrate **LlamaStack** with **LangChain** to build a complete RAG (Retrieval-Augmented Generation) system.\n",
    "\n",
    "### Overview\n",
    "\n",
    "- **LlamaStack**: Provides the infrastructure for running LLMs and Open AI Compatible Vector Stores\n",
    "- **LangChain**: Provides the framework for chaining operations and prompt templates\n",
    "- **Integration**: Uses LlamaStack's OpenAI-compatible API with LangChain\n",
    "\n",
    "### What You'll See\n",
    "\n",
    "1. Setting up LlamaStack server with Together AI provider\n",
    "2. Creating and Querying Vector Stores\n",
    "3. Building RAG chains with LangChain + LLAMAStack\n",
    "4. Querying the chain for relevant information\n",
    "\n",
    "### Prerequisites\n",
    "\n",
    "- Fireworks API key\n",
    "\n",
    "---\n",
    "\n",
    "### 1. Installation and Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ktr5ls2cas",
   "metadata": {},
   "source": [
    "#### Install Required Dependencies\n",
    "\n",
    "First, we install all the necessary packages for LangChain and FastAPI integration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5b6a6a17-b931-4bea-8273-0d6e5563637a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n",
      "\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n",
      "\u001b[2mAudited \u001b[1m7 packages\u001b[0m \u001b[2min 94ms\u001b[0m\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install uv\n",
    "!uv pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n",
    "             langchain-community langchain-text-splitters \\\n",
    "             faiss-cpu"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wmt9jvqzh7n",
   "metadata": {},
   "source": [
    "### 2. LlamaStack Server Setup\n",
    "\n",
    "#### Build and Start LlamaStack Server\n",
    "\n",
    "This section sets up the LlamaStack server with:\n",
    "- **Together AI** as the inference provider\n",
    "- **Sentence Transformers** for embeddings\n",
    "\n",
    "The server runs on `localhost:8321` and provides OpenAI-compatible endpoints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import subprocess\n",
    "import time\n",
    "\n",
    "# Remove UV_SYSTEM_PYTHON to ensure uv creates a proper virtual environment\n",
    "# instead of trying to use system Python globally, which could cause permission issues\n",
    "# and package conflicts with the system's Python installation\n",
    "if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
    "    del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
    "\n",
    "def run_llama_stack_server_background():\n",
    "    \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n",
    "    log_file = open(\"llama_stack_server.log\", \"w\")\n",
    "    process = subprocess.Popen(\n",
    "        \"uv run --with llama-stack llama stack build --distro starter --image-type venv --run\",\n",
    "        shell=True,\n",
    "        stdout=log_file,\n",
    "        stderr=log_file,\n",
    "        text=True,\n",
    "    )\n",
    "\n",
    "    print(f\"Building and starting Llama Stack server with PID: {process.pid}\")\n",
    "    return process\n",
    "\n",
    "\n",
    "def wait_for_server_to_start():\n",
    "    import requests\n",
    "    from requests.exceptions import ConnectionError\n",
    "\n",
    "    url = \"http://0.0.0.0:8321/v1/health\"\n",
    "    max_retries = 30\n",
    "    retry_interval = 1\n",
    "\n",
    "    print(\"Waiting for server to start\", end=\"\")\n",
    "    for _ in range(max_retries):\n",
    "        try:\n",
    "            response = requests.get(url)\n",
    "            if response.status_code == 200:\n",
    "                print(\"\\nServer is ready!\")\n",
    "                return True\n",
    "        except ConnectionError:\n",
    "            print(\".\", end=\"\", flush=True)\n",
    "            time.sleep(retry_interval)\n",
    "\n",
    "    print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
    "    return False\n",
    "\n",
    "\n",
    "def kill_llama_stack_server():\n",
    "    # Kill any existing llama stack server processes using pkill command\n",
    "    os.system(\"pkill -f llama_stack.core.server.server\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "28bd8dbd-4576-4e76-813f-21ab94db44a2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Building and starting Llama Stack server with PID: 81294\n",
      "Waiting for server to start......\n",
      "Server is ready!\n"
     ]
    }
   ],
   "source": [
    "server_process = run_llama_stack_server_background()\n",
    "assert wait_for_server_to_start()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "gr9cdcg4r7n",
   "metadata": {},
   "source": [
    "#### Install LlamaStack Client\n",
    "\n",
    "Install the client library to interact with the LlamaStack server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "487d2dbc-d071-400e-b4f0-dcee58f8dc95",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n",
      "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 26ms\u001b[0m\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!uv pip install llama_stack_client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0j5hag7l9x89",
   "metadata": {},
   "source": [
    "### 3. Initialize LlamaStack Client\n",
    "\n",
    "Create a client connection to the LlamaStack server with API keys for different providers:\n",
    "\n",
    "- **Fireworks API Key**: For Together AI models\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_stack_client import LlamaStackClient\n",
    "\n",
    "client = LlamaStackClient(\n",
    "    base_url=\"http://0.0.0.0:8321\",\n",
    "    provider_data={\"fireworks_api_key\": \"fw_3ZPgfLt28zCzx9n4KYYPTT7w\"},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "vwhexjy1e8o",
   "metadata": {},
   "source": [
    "#### Explore Available Models and Safety Features\n",
    "\n",
    "Check what models and safety shields are available through your LlamaStack instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n",
      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/shields \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Available models:\n",
      "- all-minilm\n",
      "- nvidia/meta/llama-3.1-405b-instruct\n",
      "- nvidia/meta/llama-3.1-70b-instruct\n",
      "- nvidia/meta/llama-3.1-8b-instruct\n",
      "- nvidia/meta/llama-3.2-11b-vision-instruct\n",
      "- nvidia/meta/llama-3.2-1b-instruct\n",
      "- nvidia/meta/llama-3.2-3b-instruct\n",
      "- nvidia/meta/llama-3.2-90b-vision-instruct\n",
      "- nvidia/meta/llama-3.3-70b-instruct\n",
      "- nvidia/meta/llama3-70b-instruct\n",
      "- nvidia/meta/llama3-8b-instruct\n",
      "- nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2\n",
      "- nvidia/nvidia/nv-embedqa-e5-v5\n",
      "- nvidia/nvidia/nv-embedqa-mistral-7b-v2\n",
      "- nvidia/snowflake/arctic-embed-l\n",
      "- ollama/all-minilm:l6-v2\n",
      "- ollama/llama-guard3:1b\n",
      "- ollama/llama-guard3:8b\n",
      "- ollama/llama3.2:3b-instruct-fp16\n",
      "- ollama/nomic-embed-text\n",
      "- fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\n",
      "- fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct\n",
      "- fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct\n",
      "- fireworks/accounts/fireworks/models/llama-v3p2-3b-instruct\n",
      "- fireworks/accounts/fireworks/models/llama-v3p2-11b-vision-instruct\n",
      "- fireworks/accounts/fireworks/models/llama-v3p2-90b-vision-instruct\n",
      "- fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct\n",
      "- fireworks/accounts/fireworks/models/llama4-scout-instruct-basic\n",
      "- fireworks/accounts/fireworks/models/llama4-maverick-instruct-basic\n",
      "- fireworks/nomic-ai/nomic-embed-text-v1.5\n",
      "- fireworks/accounts/fireworks/models/llama-guard-3-8b\n",
      "- fireworks/accounts/fireworks/models/llama-guard-3-11b-vision\n",
      "- together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\n",
      "- together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo\n",
      "- together/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\n",
      "- together/meta-llama/Llama-3.2-3B-Instruct-Turbo\n",
      "- together/meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo\n",
      "- together/meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo\n",
      "- together/meta-llama/Llama-3.3-70B-Instruct-Turbo\n",
      "- together/togethercomputer/m2-bert-80M-8k-retrieval\n",
      "- together/togethercomputer/m2-bert-80M-32k-retrieval\n",
      "- together/meta-llama/Llama-4-Scout-17B-16E-Instruct\n",
      "- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8\n",
      "- together/meta-llama/Llama-Guard-3-8B\n",
      "- together/meta-llama/Llama-Guard-3-11B-Vision-Turbo\n",
      "- bedrock/meta.llama3-1-8b-instruct-v1:0\n",
      "- bedrock/meta.llama3-1-70b-instruct-v1:0\n",
      "- bedrock/meta.llama3-1-405b-instruct-v1:0\n",
      "- openai/gpt-3.5-turbo-0125\n",
      "- openai/gpt-3.5-turbo\n",
      "- openai/gpt-3.5-turbo-instruct\n",
      "- openai/gpt-4\n",
      "- openai/gpt-4-turbo\n",
      "- openai/gpt-4o\n",
      "- openai/gpt-4o-2024-08-06\n",
      "- openai/gpt-4o-mini\n",
      "- openai/gpt-4o-audio-preview\n",
      "- openai/chatgpt-4o-latest\n",
      "- openai/o1\n",
      "- openai/o1-mini\n",
      "- openai/o3-mini\n",
      "- openai/o4-mini\n",
      "- openai/text-embedding-3-small\n",
      "- openai/text-embedding-3-large\n",
      "- anthropic/claude-3-5-sonnet-latest\n",
      "- anthropic/claude-3-7-sonnet-latest\n",
      "- anthropic/claude-3-5-haiku-latest\n",
      "- anthropic/voyage-3\n",
      "- anthropic/voyage-3-lite\n",
      "- anthropic/voyage-code-3\n",
      "- gemini/gemini-1.5-flash\n",
      "- gemini/gemini-1.5-pro\n",
      "- gemini/gemini-2.0-flash\n",
      "- gemini/gemini-2.0-flash-lite\n",
      "- gemini/gemini-2.5-flash\n",
      "- gemini/gemini-2.5-flash-lite\n",
      "- gemini/gemini-2.5-pro\n",
      "- gemini/text-embedding-004\n",
      "- groq/llama3-8b-8192\n",
      "- groq/llama-3.1-8b-instant\n",
      "- groq/llama3-70b-8192\n",
      "- groq/llama-3.3-70b-versatile\n",
      "- groq/llama-3.2-3b-preview\n",
      "- groq/meta-llama/llama-4-scout-17b-16e-instruct\n",
      "- groq/meta-llama/llama-4-maverick-17b-128e-instruct\n",
      "- sambanova/Meta-Llama-3.1-8B-Instruct\n",
      "- sambanova/Meta-Llama-3.3-70B-Instruct\n",
      "- sambanova/Llama-4-Maverick-17B-128E-Instruct\n",
      "- sentence-transformers/all-MiniLM-L6-v2\n",
      "----\n",
      "Available shields (safety models):\n",
      "code-scanner\n",
      "llama-guard\n",
      "nemo-guardrail\n",
      "----\n"
     ]
    }
   ],
   "source": [
    "print(\"Available models:\")\n",
    "for m in client.models.list():\n",
    "    print(f\"- {m.identifier}\")\n",
    "\n",
    "print(\"----\")\n",
    "print(\"Available shields (safety models):\")\n",
    "for s in client.shields.list():\n",
    "    print(s.identifier)\n",
    "print(\"----\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "gojp7at31ht",
   "metadata": {},
   "source": [
    "### 4. Vector Store Setup\n",
    "\n",
    "#### Create a Vector Store with File Upload\n",
    "\n",
    "Create a vector store using the OpenAI-compatible vector stores API:\n",
    "\n",
    "- **Vector Store**: OpenAI-compatible vector store for document storage\n",
    "- **File Upload**: Automatic chunking and embedding of uploaded files  \n",
    "- **Embedding Model**: Sentence Transformers model for text embeddings\n",
    "- **Dimensions**: 384-dimensional embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "be2c2899-ea53-4e5f-b6b8-ed425f5d6572",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "File(id='file-33012cc7c18a4430ad1a11b3397c3e29', bytes=41, created_at=1757441277, expires_at=1788977277, filename='shipping_policy.txt', object='file', purpose='assistants')\n",
      "File(id='file-e7692c2f044e44859ea02b0630195b4e', bytes=48, created_at=1757441277, expires_at=1788977277, filename='returns_policy.txt', object='file', purpose='assistants')\n",
      "File(id='file-9ff178cbb72b4e79a7513ce5a95a6fe1', bytes=45, created_at=1757441277, expires_at=1788977277, filename='support.txt', object='file', purpose='assistants')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores \"HTTP/1.1 200 OK\"\n"
     ]
    }
   ],
   "source": [
    "from io import BytesIO\n",
    "\n",
    "docs = [\n",
    "    (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
    "    (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
    "    (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
    "]\n",
    "\n",
    "file_ids = []\n",
    "for content, metadata in docs:\n",
    "  with BytesIO(content.encode()) as file_buffer:\n",
    "      file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n",
    "      create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n",
    "      print(create_file_response)\n",
    "      file_ids.append(create_file_response.id)\n",
    "\n",
    "# Create vector store with files\n",
    "vector_store = client.vector_stores.create(\n",
    "  name=\"acme_docs\",\n",
    "  file_ids=file_ids,\n",
    "  embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
    "  embedding_dimension=384,\n",
    "  provider_id=\"faiss\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9061tmi1zpq",
   "metadata": {},
   "source": [
    "#### Test Vector Store Search\n",
    "\n",
    "Query the vector store. This performs semantic search to find relevant documents based on the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ba9d1901-bd5e-4216-b3e6-19dc74551cc6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_d9fa50e8-3ccc-4db3-9102-1a71cd62ab64/search \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acme ships globally in 3-5 business days.\n",
      "Returns are accepted within 30 days of purchase.\n"
     ]
    }
   ],
   "source": [
    "search_response = client.vector_stores.search(\n",
    "  vector_store_id=vector_store.id,\n",
    "  query=\"How long does shipping take?\",\n",
    "  max_num_results=2\n",
    ")\n",
    "for result in search_response.data:\n",
    "  content = result.content[0].text\n",
    "  print(content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "usne6mbspms",
   "metadata": {},
   "source": [
    "### 5. LangChain Integration\n",
    "\n",
    "#### Configure LangChain with LlamaStack\n",
    "\n",
    "Set up LangChain to use LlamaStack's OpenAI-compatible API:\n",
    "\n",
    "- **Base URL**: Points to LlamaStack's OpenAI endpoint\n",
    "- **Headers**: Include Together AI API key for model access\n",
    "- **Model**: Use Meta Llama 3.1 8B model via Together AI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "# Point LangChain to Llamastack Server\n",
    "llm = ChatOpenAI(\n",
    "    base_url=\"http://0.0.0.0:8321/v1/openai/v1\",\n",
    "    api_key=\"dummy\",\n",
    "    model=\"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\",\n",
    "    default_headers={\"X-LlamaStack-Provider-Data\": '{\"fireworks_api_key\": \"fw_3ZPgfLt28zCzx9n4KYYPTT7w\"}'},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a4ddpcuk3l",
   "metadata": {},
   "source": [
    "#### Test LLM Connection\n",
    "\n",
    "Verify that LangChain can successfully communicate with the LlamaStack server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Silent llama, soft and slow,\\n Majestic creature, with a gentle glow.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': None, 'model_name': 'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'system_fingerprint': None, 'id': 'chatcmpl-8957d071-9f1a-4a7f-9c0e-d516e02194d1', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--f4331df3-e787-4973-b2e8-7ca90d4e19df-0')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Test llm with simple message\n",
    "messages = [\n",
    "    {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n",
    "    {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n",
    "]\n",
    "llm.invoke(messages)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0xh0jg6a0l4a",
   "metadata": {},
   "source": [
    "### 6. Building the RAG Chain\n",
    "\n",
    "#### Create a Complete RAG Pipeline\n",
    "\n",
    "Build a LangChain pipeline that combines:\n",
    "\n",
    "1. **Vector Search**: Query LlamaStack's vector database\n",
    "2. **Context Assembly**: Format retrieved documents\n",
    "3. **Prompt Template**: Structure the input for the LLM\n",
    "4. **LLM Generation**: Generate answers using context\n",
    "5. **Output Parsing**: Extract the final response\n",
    "\n",
    "**Chain Flow**: `Query → Vector Search → Context + Question → LLM → Response`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "9684427d-dcc7-4544-9af5-8b110d014c42",
   "metadata": {},
   "outputs": [],
   "source": [
    "# LangChain for prompt template and chaining + LLAMA Stack Client Vector DB and LLM chat completion\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
    "\n",
    "\n",
    "def join_docs(docs):\n",
    "    return \"\\n\\n\".join([f\"[{d.filename}] {d.content[0].text}\" for d in docs.data])\n",
    "\n",
    "PROMPT = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\"system\", \"You are a helpful assistant. Use the following context to answer.\"),\n",
    "        (\"user\", \"Question: {question}\\n\\nContext:\\n{context}\"),\n",
    "    ]\n",
    ")\n",
    "\n",
    "vector_step = RunnableLambda(\n",
    "      lambda x: client.vector_stores.search(\n",
    "          vector_store_id=vector_store.id,\n",
    "          query=x,\n",
    "          max_num_results=2\n",
    "      )\n",
    "  )\n",
    "\n",
    "chain = (\n",
    "    {\"context\": vector_step | RunnableLambda(join_docs), \"question\": RunnablePassthrough()}\n",
    "    | PROMPT\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0onu6rhphlra",
   "metadata": {},
   "source": [
    "### 7. Testing the RAG System\n",
    "\n",
    "#### Example 1: Shipping Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_d9fa50e8-3ccc-4db3-9102-1a71cd62ab64/search \"HTTP/1.1 200 OK\"\n",
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "❓ How long does shipping take?\n",
      "💡 According to the provided context, Acme ships globally in 3-5 business days.\n"
     ]
    }
   ],
   "source": [
    "query = \"How long does shipping take?\"\n",
    "response = chain.invoke(query)\n",
    "print(\"❓\", query)\n",
    "print(\"💡\", response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7krhqj88ku",
   "metadata": {},
   "source": [
    "#### Example 2: Returns Policy Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "61995550-bb0b-46a8-a5d0-023207475d60",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_d9fa50e8-3ccc-4db3-9102-1a71cd62ab64/search \"HTTP/1.1 200 OK\"\n",
      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "❓ Can I return a product after 40 days?\n",
      "💡 Based on the given context, returns are accepted within 30 days of purchase. Therefore, after 40 days, returning a product is not accepted according to the given policy.\n"
     ]
    }
   ],
   "source": [
    "query = \"Can I return a product after 40 days?\"\n",
    "response = chain.invoke(query)\n",
    "print(\"❓\", query)\n",
    "print(\"💡\", response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "h4w24fadvjs",
   "metadata": {},
   "source": [
    "---\n",
    "We have successfully built a RAG system that combines:\n",
    "\n",
    "- **LlamaStack** for infrastructure (LLM serving + Vector Store)\n",
    "- **LangChain** for orchestration (prompts + chains)\n",
    "- **Together AI** for high-quality language models\n",
    "\n",
    "### Key Benefits\n",
    "\n",
    "1. **Unified Infrastructure**: Single server for LLMs and Vector Store\n",
    "2. **OpenAI Compatibility**: Easy integration with existing LangChain code\n",
    "3. **Multi-Provider Support**: Switch between different LLM providers\n",
    "4. **Production Ready**: Built-in safety shields and monitoring\n",
    "\n",
    "### Next Steps\n",
    "\n",
    "- Add more sophisticated document processing\n",
    "- Implement conversation memory\n",
    "- Add safety filtering and monitoring\n",
    "- Scale to larger document collections\n",
    "- Integrate with web frameworks like FastAPI or Streamlit\n",
    "\n",
    "---\n",
    "\n",
    "##### 🔧 Cleanup\n",
    "\n",
    "Don't forget to stop the LlamaStack server when you're done:\n",
    "\n",
    "```python\n",
    "kill_llama_stack_server()\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "15647c46-22ce-4698-af3f-8161329d8e3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "kill_llama_stack_server()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}