Merge branch 'llamastack:main' into crewai

2025-12-17 09:39:48 +00:00 · 2025-09-29 12:06:53 -07:00 · 2025-09-29 12:06:53 -07:00 · ac1af2d96a
commit ac1af2d96a
parent 4375764074 975ead1d6a
581 changed files with 148868 additions and 11347 deletions
--- a/docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb
+++ b/docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb
@ -14,7 +14,7 @@
        "We will also showcase how to leverage existing Llama stack [inference APIs](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/inference/inference.py) (ollama as provider) to get the new model's output and the [eval APIs](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/eval/eval.py) to help you better measure the new model performance. We hope the flywheel of post-training -> eval -> inference can greatly empower agentic apps development.\n",
        "\n",
        "\n",
-        "- Read more about Llama Stack: https://llama-stack.readthedocs.io/en/latest/introduction/index.html\n",
+        "- Read more about Llama Stack: https://llamastack.github.io/\n",
        "- Read more about post training APIs definition: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/apis/post_training/post_training.py\n",
        "\n",
        "\n",
@ -3632,7 +3632,7 @@
      },
      "source": [
        "#### 1.2. Kick-off eval job\n",
-        "- More details on Llama-stack eval: https://llama-stack.readthedocs.io/en/latest/benchmark_evaluations/index.html\n",
+        "- More details on Llama-stack eval: https://llamastack.github.io/latest/references/evals_reference/index.html\n",
        "  - Define an EvalCandidate\n",
        "  - Run evaluate on datasets (we choose brainstrust's answer-similarity as scoring function with OpenAI's model as judge model)\n",
        "\n",
--- a/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
+++ b/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
@ -12,7 +12,7 @@
        "\n",
        "This notebook will walk you through the main sets of APIs we offer with Llama Stack for supporting running benchmark evaluations of your with working examples to explore the possibilities that Llama Stack opens up for you.\n",
        "\n",
-        "Read more about Llama Stack: https://llama-stack.readthedocs.io/en/latest/index.html"
+        "Read more about Llama Stack: https://llamastack.github.io/latest/index.html"
      ]
    },
    {
--- a/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb
+++ b/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb
@ -0,0 +1,701 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1ztegmwm4sp",
+   "metadata": {},
+   "source": [
+    "## LlamaStack + LangChain Integration Tutorial\n",
+    "\n",
+    "This notebook demonstrates how to integrate **LlamaStack** with **LangChain** to build a complete RAG (Retrieval-Augmented Generation) system.\n",
+    "\n",
+    "### Overview\n",
+    "\n",
+    "- **LlamaStack**: Provides the infrastructure for running LLMs and Open AI Compatible Vector Stores\n",
+    "- **LangChain**: Provides the framework for chaining operations and prompt templates\n",
+    "- **Integration**: Uses LlamaStack's OpenAI-compatible API with LangChain\n",
+    "\n",
+    "### What You'll See\n",
+    "\n",
+    "1. Setting up LlamaStack server with Fireworks AI provider\n",
+    "2. Creating and Querying Vector Stores\n",
+    "3. Building RAG chains with LangChain + LLAMAStack\n",
+    "4. Querying the chain for relevant information\n",
+    "\n",
+    "### Prerequisites\n",
+    "\n",
+    "- Fireworks API key\n",
+    "\n",
+    "---\n",
+    "\n",
+    "### 1. Installation and Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ktr5ls2cas",
+   "metadata": {},
+   "source": [
+    "#### Install Required Dependencies\n",
+    "\n",
+    "First, we install all the necessary packages for LangChain and FastAPI integration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "5b6a6a17-b931-4bea-8273-0d6e5563637a",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n",
+      "\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n",
+      "\u001b[2mAudited \u001b[1m7 packages\u001b[0m \u001b[2min 42ms\u001b[0m\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install uv\n",
+    "!uv pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n",
+    "             langchain-community langchain-text-splitters \\\n",
+    "             faiss-cpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "wmt9jvqzh7n",
+   "metadata": {},
+   "source": [
+    "### 2. LlamaStack Server Setup\n",
+    "\n",
+    "#### Build and Start LlamaStack Server\n",
+    "\n",
+    "This section sets up the LlamaStack server with:\n",
+    "- **Fireworks AI** as the inference provider\n",
+    "- **Sentence Transformers** for embeddings\n",
+    "\n",
+    "The server runs on `localhost:8321` and provides OpenAI-compatible endpoints."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import subprocess\n",
+    "import time\n",
+    "\n",
+    "# Remove UV_SYSTEM_PYTHON to ensure uv creates a proper virtual environment\n",
+    "# instead of trying to use system Python globally, which could cause permission issues\n",
+    "# and package conflicts with the system's Python installation\n",
+    "if \"UV_SYSTEM_PYTHON\" in os.environ:\n",
+    "    del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
+    "\n",
+    "def run_llama_stack_server_background():\n",
+    "    \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n",
+    "    log_file = open(\"llama_stack_server.log\", \"w\")\n",
+    "    process = subprocess.Popen(\n",
+    "        \"uv run --with llama-stack llama stack build --distro starter --image-type venv --run\",\n",
+    "        shell=True,\n",
+    "        stdout=log_file,\n",
+    "        stderr=log_file,\n",
+    "        text=True,\n",
+    "    )\n",
+    "\n",
+    "    print(f\"Building and starting Llama Stack server with PID: {process.pid}\")\n",
+    "    return process\n",
+    "\n",
+    "\n",
+    "def wait_for_server_to_start():\n",
+    "    import requests\n",
+    "    from requests.exceptions import ConnectionError\n",
+    "\n",
+    "    url = \"http://0.0.0.0:8321/v1/health\"\n",
+    "    max_retries = 30\n",
+    "    retry_interval = 1\n",
+    "\n",
+    "    print(\"Waiting for server to start\", end=\"\")\n",
+    "    for _ in range(max_retries):\n",
+    "        try:\n",
+    "            response = requests.get(url)\n",
+    "            if response.status_code == 200:\n",
+    "                print(\"\\nServer is ready!\")\n",
+    "                return True\n",
+    "        except ConnectionError:\n",
+    "            print(\".\", end=\"\", flush=True)\n",
+    "            time.sleep(retry_interval)\n",
+    "\n",
+    "    print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n",
+    "    return False\n",
+    "\n",
+    "\n",
+    "def kill_llama_stack_server():\n",
+    "    # Kill any existing llama stack server processes using pkill command\n",
+    "    os.system(\"pkill -f llama_stack.core.server.server\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "28bd8dbd-4576-4e76-813f-21ab94db44a2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Building and starting Llama Stack server with PID: 19747\n",
+      "Waiting for server to start....\n",
+      "Server is ready!\n"
+     ]
+    }
+   ],
+   "source": [
+    "server_process = run_llama_stack_server_background()\n",
+    "assert wait_for_server_to_start()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gr9cdcg4r7n",
+   "metadata": {},
+   "source": [
+    "#### Install LlamaStack Client\n",
+    "\n",
+    "Install the client library to interact with the LlamaStack server."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "487d2dbc-d071-400e-b4f0-dcee58f8dc95",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n",
+      "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 27ms\u001b[0m\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!uv pip install llama_stack_client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0j5hag7l9x89",
+   "metadata": {},
+   "source": [
+    "### 3. Initialize LlamaStack Client\n",
+    "\n",
+    "Create a client connection to the LlamaStack server with API keys for different providers:\n",
+    "\n",
+    "- **Fireworks API Key**: For Fireworks models\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client import LlamaStackClient\n",
+    "\n",
+    "client = LlamaStackClient(\n",
+    "    base_url=\"http://0.0.0.0:8321\",\n",
+    "    provider_data={\"fireworks_api_key\": \"***\"},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "vwhexjy1e8o",
+   "metadata": {},
+   "source": [
+    "#### Explore Available Models and Safety Features\n",
+    "\n",
+    "Check what models and safety shields are available through your LlamaStack instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/shields \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Available Fireworks models:\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p2-3b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p2-11b-vision-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p2-90b-vision-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct\n",
+      "- fireworks/accounts/fireworks/models/llama4-scout-instruct-basic\n",
+      "- fireworks/accounts/fireworks/models/llama4-maverick-instruct-basic\n",
+      "- fireworks/nomic-ai/nomic-embed-text-v1.5\n",
+      "- fireworks/accounts/fireworks/models/llama-guard-3-8b\n",
+      "- fireworks/accounts/fireworks/models/llama-guard-3-11b-vision\n",
+      "----\n",
+      "Available shields (safety models):\n",
+      "code-scanner\n",
+      "llama-guard\n",
+      "nemo-guardrail\n",
+      "----\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Available Fireworks models:\")\n",
+    "for m in client.models.list():\n",
+    "    if m.identifier.startswith(\"fireworks/\"):\n",
+    "        print(f\"- {m.identifier}\")\n",
+    "\n",
+    "print(\"----\")\n",
+    "print(\"Available shields (safety models):\")\n",
+    "for s in client.shields.list():\n",
+    "    print(s.identifier)\n",
+    "print(\"----\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gojp7at31ht",
+   "metadata": {},
+   "source": [
+    "### 4. Vector Store Setup\n",
+    "\n",
+    "#### Create a Vector Store with File Upload\n",
+    "\n",
+    "Create a vector store using the OpenAI-compatible vector stores API:\n",
+    "\n",
+    "- **Vector Store**: OpenAI-compatible vector store for document storage\n",
+    "- **File Upload**: Automatic chunking and embedding of uploaded files  \n",
+    "- **Embedding Model**: Sentence Transformers model for text embeddings\n",
+    "- **Dimensions**: 384-dimensional embeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "be2c2899-ea53-4e5f-b6b8-ed425f5d6572",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File(id='file-54652c95c56c4c34918a97d7ff8a4320', bytes=41, created_at=1757442621, expires_at=1788978621, filename='shipping_policy.txt', object='file', purpose='assistants')\n",
+      "File(id='file-fb1227c1d1854da1bd774d21e5b7e41c', bytes=48, created_at=1757442621, expires_at=1788978621, filename='returns_policy.txt', object='file', purpose='assistants')\n",
+      "File(id='file-673f874852fe42798675a13d06a256e2', bytes=45, created_at=1757442621, expires_at=1788978621, filename='support.txt', object='file', purpose='assistants')\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores \"HTTP/1.1 200 OK\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "from io import BytesIO\n",
+    "\n",
+    "docs = [\n",
+    "    (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
+    "    (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
+    "    (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
+    "]\n",
+    "\n",
+    "file_ids = []\n",
+    "for content, metadata in docs:\n",
+    "  with BytesIO(content.encode()) as file_buffer:\n",
+    "      file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n",
+    "      create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n",
+    "      print(create_file_response)\n",
+    "      file_ids.append(create_file_response.id)\n",
+    "\n",
+    "# Create vector store with files\n",
+    "vector_store = client.vector_stores.create(\n",
+    "  name=\"acme_docs\",\n",
+    "  file_ids=file_ids,\n",
+    "  embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n",
+    "  embedding_dimension=384,\n",
+    "  provider_id=\"faiss\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9061tmi1zpq",
+   "metadata": {},
+   "source": [
+    "#### Test Vector Store Search\n",
+    "\n",
+    "Query the vector store. This performs semantic search to find relevant documents based on the query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "ba9d1901-bd5e-4216-b3e6-19dc74551cc6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_708c060b-45da-423e-8354-68529b4fd1a6/search \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Acme ships globally in 3-5 business days.\n",
+      "Returns are accepted within 30 days of purchase.\n"
+     ]
+    }
+   ],
+   "source": [
+    "search_response = client.vector_stores.search(\n",
+    "  vector_store_id=vector_store.id,\n",
+    "  query=\"How long does shipping take?\",\n",
+    "  max_num_results=2\n",
+    ")\n",
+    "for result in search_response.data:\n",
+    "  content = result.content[0].text\n",
+    "  print(content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "usne6mbspms",
+   "metadata": {},
+   "source": [
+    "### 5. LangChain Integration\n",
+    "\n",
+    "#### Configure LangChain with LlamaStack\n",
+    "\n",
+    "Set up LangChain to use LlamaStack's OpenAI-compatible API:\n",
+    "\n",
+    "- **Base URL**: Points to LlamaStack's OpenAI endpoint\n",
+    "- **Headers**: Include Fireworks API key for model access\n",
+    "- **Model**: Use Meta Llama v3p1 8b instruct model for inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "\n",
+    "# Point LangChain to Llamastack Server\n",
+    "llm = ChatOpenAI(\n",
+    "    base_url=\"http://0.0.0.0:8321/v1/openai/v1\",\n",
+    "    api_key=\"dummy\",\n",
+    "    model=\"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\",\n",
+    "    default_headers={\"X-LlamaStack-Provider-Data\": '{\"fireworks_api_key\": \"***\"}'},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a4ddpcuk3l",
+   "metadata": {},
+   "source": [
+    "#### Test LLM Connection\n",
+    "\n",
+    "Verify that LangChain can successfully communicate with the LlamaStack server."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "f88ffb5a-657b-4916-9375-c6ddc156c25e",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content=\"A llama's gentle eyes shine bright,\\nIn the Andes, it roams through morning light.\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': None, 'model_name': 'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'system_fingerprint': None, 'id': 'chatcmpl-602b5967-82a3-476b-9cd2-7d3b29b76ee8', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--0933c465-ff4d-4a7b-b7fb-fd97dd8244f3-0')"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Test llm with simple message\n",
+    "messages = [\n",
+    "    {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n",
+    "    {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n",
+    "]\n",
+    "llm.invoke(messages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0xh0jg6a0l4a",
+   "metadata": {},
+   "source": [
+    "### 6. Building the RAG Chain\n",
+    "\n",
+    "#### Create a Complete RAG Pipeline\n",
+    "\n",
+    "Build a LangChain pipeline that combines:\n",
+    "\n",
+    "1. **Vector Search**: Query LlamaStack's Open AI compatible Vector Store\n",
+    "2. **Context Assembly**: Format retrieved documents\n",
+    "3. **Prompt Template**: Structure the input for the LLM\n",
+    "4. **LLM Generation**: Generate answers using context\n",
+    "5. **Output Parsing**: Extract the final response\n",
+    "\n",
+    "**Chain Flow**: `Query → Vector Search → Context + Question → LLM → Response`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "9684427d-dcc7-4544-9af5-8b110d014c42",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# LangChain for prompt template and chaining + LLAMA Stack Client Vector DB and LLM chat completion\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
+    "\n",
+    "\n",
+    "def join_docs(docs):\n",
+    "    return \"\\n\\n\".join([f\"[{d.filename}] {d.content[0].text}\" for d in docs.data])\n",
+    "\n",
+    "PROMPT = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", \"You are a helpful assistant. Use the following context to answer.\"),\n",
+    "        (\"user\", \"Question: {question}\\n\\nContext:\\n{context}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "vector_step = RunnableLambda(\n",
+    "      lambda x: client.vector_stores.search(\n",
+    "          vector_store_id=vector_store.id,\n",
+    "          query=x,\n",
+    "          max_num_results=2\n",
+    "      )\n",
+    "  )\n",
+    "\n",
+    "chain = (\n",
+    "    {\"context\": vector_step | RunnableLambda(join_docs), \"question\": RunnablePassthrough()}\n",
+    "    | PROMPT\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0onu6rhphlra",
+   "metadata": {},
+   "source": [
+    "### 7. Testing the RAG System\n",
+    "\n",
+    "#### Example 1: Shipping Query"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "03322188-9509-446a-a4a8-ce3bb83ec87c",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_708c060b-45da-423e-8354-68529b4fd1a6/search \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "❓ How long does shipping take?\n",
+      "💡 Acme ships globally in 3-5 business days. This means that shipping typically takes between 3 to 5 working days from the date of dispatch or order fulfillment.\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"How long does shipping take?\"\n",
+    "response = chain.invoke(query)\n",
+    "print(\"❓\", query)\n",
+    "print(\"💡\", response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7krhqj88ku",
+   "metadata": {},
+   "source": [
+    "#### Example 2: Returns Policy Query"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "61995550-bb0b-46a8-a5d0-023207475d60",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_708c060b-45da-423e-8354-68529b4fd1a6/search \"HTTP/1.1 200 OK\"\n",
+      "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "❓ Can I return a product after 40 days?\n",
+      "💡 Based on the provided context, you cannot return a product after 40 days. The return window is limited to 30 days from the date of purchase.\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"Can I return a product after 40 days?\"\n",
+    "response = chain.invoke(query)\n",
+    "print(\"❓\", query)\n",
+    "print(\"💡\", response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "h4w24fadvjs",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "We have successfully built a RAG system that combines:\n",
+    "\n",
+    "- **LlamaStack** for infrastructure (LLM serving + Vector Store)\n",
+    "- **LangChain** for orchestration (prompts + chains)\n",
+    "- **Fireworks** for high-quality language models\n",
+    "\n",
+    "### Key Benefits\n",
+    "\n",
+    "1. **Unified Infrastructure**: Single server for LLMs and Vector Store\n",
+    "2. **OpenAI Compatibility**: Easy integration with existing LangChain code\n",
+    "3. **Multi-Provider Support**: Switch between different LLM providers\n",
+    "4. **Production Ready**: Built-in safety shields and monitoring\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Add more sophisticated document processing\n",
+    "- Implement conversation memory\n",
+    "- Add safety filtering and monitoring\n",
+    "- Scale to larger document collections\n",
+    "- Integrate with web frameworks like FastAPI or Streamlit\n",
+    "\n",
+    "---\n",
+    "\n",
+    "##### 🔧 Cleanup\n",
+    "\n",
+    "Don't forget to stop the LlamaStack server when you're done:\n",
+    "\n",
+    "```python\n",
+    "kill_llama_stack_server()\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "15647c46-22ce-4698-af3f-8161329d8e3a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "kill_llama_stack_server()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/notebooks/nvidia/tool_calling/2_finetuning_and_inference.ipynb
+++ b/docs/notebooks/nvidia/tool_calling/2_finetuning_and_inference.ipynb
@ -373,7 +373,7 @@
    "    metadata={\n",
    "        \"format\": \"json\",\n",
    "        \"description\": \"Tool calling xLAM dataset in OpenAI ChatCompletions format\",\n",
-    "        \"provider\": \"nvidia\"\n",
+    "        \"provider_id\": \"nvidia\"\n",
    "    }\n",
    ")\n",
    "print(response)"