diff --git a/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb b/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb new file mode 100644 index 000000000..d44ac6994 --- /dev/null +++ b/docs/notebooks/langchain/Llama_Stack_LangChain.ipynb @@ -0,0 +1,701 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1ztegmwm4sp", + "metadata": {}, + "source": [ + "## LlamaStack + LangChain Integration Tutorial\n", + "\n", + "This notebook demonstrates how to integrate **LlamaStack** with **LangChain** to build a complete RAG (Retrieval-Augmented Generation) system.\n", + "\n", + "### Overview\n", + "\n", + "- **LlamaStack**: Provides the infrastructure for running LLMs and Open AI Compatible Vector Stores\n", + "- **LangChain**: Provides the framework for chaining operations and prompt templates\n", + "- **Integration**: Uses LlamaStack's OpenAI-compatible API with LangChain\n", + "\n", + "### What You'll See\n", + "\n", + "1. Setting up LlamaStack server with Fireworks AI provider\n", + "2. Creating and Querying Vector Stores\n", + "3. Building RAG chains with LangChain + LLAMAStack\n", + "4. Querying the chain for relevant information\n", + "\n", + "### Prerequisites\n", + "\n", + "- Fireworks API key\n", + "\n", + "---\n", + "\n", + "### 1. Installation and Setup" + ] + }, + { + "cell_type": "markdown", + "id": "2ktr5ls2cas", + "metadata": {}, + "source": [ + "#### Install Required Dependencies\n", + "\n", + "First, we install all the necessary packages for LangChain and FastAPI integration." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "5b6a6a17-b931-4bea-8273-0d6e5563637a", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: uv in /Users/swapna942/miniconda3/lib/python3.12/site-packages (0.7.20)\n", + "\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n", + "\u001b[2mAudited \u001b[1m7 packages\u001b[0m \u001b[2min 42ms\u001b[0m\u001b[0m\n" + ] + } + ], + "source": [ + "!pip install uv\n", + "!uv pip install fastapi uvicorn \"langchain>=0.2\" langchain-openai \\\n", + " langchain-community langchain-text-splitters \\\n", + " faiss-cpu" + ] + }, + { + "cell_type": "markdown", + "id": "wmt9jvqzh7n", + "metadata": {}, + "source": [ + "### 2. LlamaStack Server Setup\n", + "\n", + "#### Build and Start LlamaStack Server\n", + "\n", + "This section sets up the LlamaStack server with:\n", + "- **Fireworks AI** as the inference provider\n", + "- **Sentence Transformers** for embeddings\n", + "\n", + "The server runs on `localhost:8321` and provides OpenAI-compatible endpoints." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "dd2dacf3-ec8b-4cc7-8ff4-b5b6ea4a6e9e", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import os\n", + "import subprocess\n", + "import time\n", + "\n", + "# Remove UV_SYSTEM_PYTHON to ensure uv creates a proper virtual environment\n", + "# instead of trying to use system Python globally, which could cause permission issues\n", + "# and package conflicts with the system's Python installation\n", + "if \"UV_SYSTEM_PYTHON\" in os.environ:\n", + " del os.environ[\"UV_SYSTEM_PYTHON\"]\n", + "\n", + "def run_llama_stack_server_background():\n", + " \"\"\"Build and run LlamaStack server in one step using --run flag\"\"\"\n", + " log_file = open(\"llama_stack_server.log\", \"w\")\n", + " process = subprocess.Popen(\n", + " \"uv run --with llama-stack llama stack build --distro starter --image-type venv --run\",\n", + " shell=True,\n", + " stdout=log_file,\n", + " stderr=log_file,\n", + " text=True,\n", + " )\n", + "\n", + " print(f\"Building and starting Llama Stack server with PID: {process.pid}\")\n", + " return process\n", + "\n", + "\n", + "def wait_for_server_to_start():\n", + " import requests\n", + " from requests.exceptions import ConnectionError\n", + "\n", + " url = \"http://0.0.0.0:8321/v1/health\"\n", + " max_retries = 30\n", + " retry_interval = 1\n", + "\n", + " print(\"Waiting for server to start\", end=\"\")\n", + " for _ in range(max_retries):\n", + " try:\n", + " response = requests.get(url)\n", + " if response.status_code == 200:\n", + " print(\"\\nServer is ready!\")\n", + " return True\n", + " except ConnectionError:\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(retry_interval)\n", + "\n", + " print(\"\\nServer failed to start after\", max_retries * retry_interval, \"seconds\")\n", + " return False\n", + "\n", + "\n", + "def kill_llama_stack_server():\n", + " # Kill any existing llama stack server processes using pkill command\n", + " os.system(\"pkill -f llama_stack.core.server.server\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "28bd8dbd-4576-4e76-813f-21ab94db44a2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Building and starting Llama Stack server with PID: 19747\n", + "Waiting for server to start....\n", + "Server is ready!\n" + ] + } + ], + "source": [ + "server_process = run_llama_stack_server_background()\n", + "assert wait_for_server_to_start()" + ] + }, + { + "cell_type": "markdown", + "id": "gr9cdcg4r7n", + "metadata": {}, + "source": [ + "#### Install LlamaStack Client\n", + "\n", + "Install the client library to interact with the LlamaStack server." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "487d2dbc-d071-400e-b4f0-dcee58f8dc95", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2mUsing Python 3.12.11 environment at: /Users/swapna942/miniconda3\u001b[0m\n", + "\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 27ms\u001b[0m\u001b[0m\n" + ] + } + ], + "source": [ + "!uv pip install llama_stack_client" + ] + }, + { + "cell_type": "markdown", + "id": "0j5hag7l9x89", + "metadata": {}, + "source": [ + "### 3. Initialize LlamaStack Client\n", + "\n", + "Create a client connection to the LlamaStack server with API keys for different providers:\n", + "\n", + "- **Fireworks API Key**: For Fireworks models\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "ab4eff97-4565-4c73-b1b3-0020a4c7e2a5", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_stack_client import LlamaStackClient\n", + "\n", + "client = LlamaStackClient(\n", + " base_url=\"http://0.0.0.0:8321\",\n", + " provider_data={\"fireworks_api_key\": \"***\"},\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "vwhexjy1e8o", + "metadata": {}, + "source": [ + "#### Explore Available Models and Safety Features\n", + "\n", + "Check what models and safety shields are available through your LlamaStack instance." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "880443ef-ac3c-48b1-a80a-7dab5b25ac61", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/models \"HTTP/1.1 200 OK\"\n", + "INFO:httpx:HTTP Request: GET http://0.0.0.0:8321/v1/shields \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Available Fireworks models:\n", + "- fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\n", + "- fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct\n", + "- fireworks/accounts/fireworks/models/llama-v3p1-405b-instruct\n", + "- fireworks/accounts/fireworks/models/llama-v3p2-3b-instruct\n", + "- fireworks/accounts/fireworks/models/llama-v3p2-11b-vision-instruct\n", + "- fireworks/accounts/fireworks/models/llama-v3p2-90b-vision-instruct\n", + "- fireworks/accounts/fireworks/models/llama-v3p3-70b-instruct\n", + "- fireworks/accounts/fireworks/models/llama4-scout-instruct-basic\n", + "- fireworks/accounts/fireworks/models/llama4-maverick-instruct-basic\n", + "- fireworks/nomic-ai/nomic-embed-text-v1.5\n", + "- fireworks/accounts/fireworks/models/llama-guard-3-8b\n", + "- fireworks/accounts/fireworks/models/llama-guard-3-11b-vision\n", + "----\n", + "Available shields (safety models):\n", + "code-scanner\n", + "llama-guard\n", + "nemo-guardrail\n", + "----\n" + ] + } + ], + "source": [ + "print(\"Available Fireworks models:\")\n", + "for m in client.models.list():\n", + " if m.identifier.startswith(\"fireworks/\"):\n", + " print(f\"- {m.identifier}\")\n", + "\n", + "print(\"----\")\n", + "print(\"Available shields (safety models):\")\n", + "for s in client.shields.list():\n", + " print(s.identifier)\n", + "print(\"----\")" + ] + }, + { + "cell_type": "markdown", + "id": "gojp7at31ht", + "metadata": {}, + "source": [ + "### 4. Vector Store Setup\n", + "\n", + "#### Create a Vector Store with File Upload\n", + "\n", + "Create a vector store using the OpenAI-compatible vector stores API:\n", + "\n", + "- **Vector Store**: OpenAI-compatible vector store for document storage\n", + "- **File Upload**: Automatic chunking and embedding of uploaded files \n", + "- **Embedding Model**: Sentence Transformers model for text embeddings\n", + "- **Dimensions**: 384-dimensional embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "be2c2899-ea53-4e5f-b6b8-ed425f5d6572", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n", + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n", + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/files \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File(id='file-54652c95c56c4c34918a97d7ff8a4320', bytes=41, created_at=1757442621, expires_at=1788978621, filename='shipping_policy.txt', object='file', purpose='assistants')\n", + "File(id='file-fb1227c1d1854da1bd774d21e5b7e41c', bytes=48, created_at=1757442621, expires_at=1788978621, filename='returns_policy.txt', object='file', purpose='assistants')\n", + "File(id='file-673f874852fe42798675a13d06a256e2', bytes=45, created_at=1757442621, expires_at=1788978621, filename='support.txt', object='file', purpose='assistants')\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores \"HTTP/1.1 200 OK\"\n" + ] + } + ], + "source": [ + "from io import BytesIO\n", + "\n", + "docs = [\n", + " (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n", + " (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n", + " (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n", + "]\n", + "\n", + "file_ids = []\n", + "for content, metadata in docs:\n", + " with BytesIO(content.encode()) as file_buffer:\n", + " file_buffer.name = f\"{metadata['title'].replace(' ', '_').lower()}.txt\"\n", + " create_file_response = client.files.create(file=file_buffer, purpose=\"assistants\")\n", + " print(create_file_response)\n", + " file_ids.append(create_file_response.id)\n", + "\n", + "# Create vector store with files\n", + "vector_store = client.vector_stores.create(\n", + " name=\"acme_docs\",\n", + " file_ids=file_ids,\n", + " embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",\n", + " embedding_dimension=384,\n", + " provider_id=\"faiss\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9061tmi1zpq", + "metadata": {}, + "source": [ + "#### Test Vector Store Search\n", + "\n", + "Query the vector store. This performs semantic search to find relevant documents based on the query." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "ba9d1901-bd5e-4216-b3e6-19dc74551cc6", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_708c060b-45da-423e-8354-68529b4fd1a6/search \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Acme ships globally in 3-5 business days.\n", + "Returns are accepted within 30 days of purchase.\n" + ] + } + ], + "source": [ + "search_response = client.vector_stores.search(\n", + " vector_store_id=vector_store.id,\n", + " query=\"How long does shipping take?\",\n", + " max_num_results=2\n", + ")\n", + "for result in search_response.data:\n", + " content = result.content[0].text\n", + " print(content)" + ] + }, + { + "cell_type": "markdown", + "id": "usne6mbspms", + "metadata": {}, + "source": [ + "### 5. LangChain Integration\n", + "\n", + "#### Configure LangChain with LlamaStack\n", + "\n", + "Set up LangChain to use LlamaStack's OpenAI-compatible API:\n", + "\n", + "- **Base URL**: Points to LlamaStack's OpenAI endpoint\n", + "- **Headers**: Include Fireworks API key for model access\n", + "- **Model**: Use Meta Llama v3p1 8b instruct model for inference" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c378bd10-09c2-417c-bdfc-1e0a2dd19084", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_openai import ChatOpenAI\n", + "\n", + "# Point LangChain to Llamastack Server\n", + "llm = ChatOpenAI(\n", + " base_url=\"http://0.0.0.0:8321/v1/openai/v1\",\n", + " api_key=\"dummy\",\n", + " model=\"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct\",\n", + " default_headers={\"X-LlamaStack-Provider-Data\": '{\"fireworks_api_key\": \"***\"}'},\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5a4ddpcuk3l", + "metadata": {}, + "source": [ + "#### Test LLM Connection\n", + "\n", + "Verify that LangChain can successfully communicate with the LlamaStack server." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "f88ffb5a-657b-4916-9375-c6ddc156c25e", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "data": { + "text/plain": [ + "AIMessage(content=\"A llama's gentle eyes shine bright,\\nIn the Andes, it roams through morning light.\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': None, 'model_name': 'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'system_fingerprint': None, 'id': 'chatcmpl-602b5967-82a3-476b-9cd2-7d3b29b76ee8', 'service_tier': None, 'finish_reason': 'stop', 'logprobs': None}, id='run--0933c465-ff4d-4a7b-b7fb-fd97dd8244f3-0')" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Test llm with simple message\n", + "messages = [\n", + " {\"role\": \"system\", \"content\": \"You are a friendly assistant.\"},\n", + " {\"role\": \"user\", \"content\": \"Write a two-sentence poem about llama.\"},\n", + "]\n", + "llm.invoke(messages)" + ] + }, + { + "cell_type": "markdown", + "id": "0xh0jg6a0l4a", + "metadata": {}, + "source": [ + "### 6. Building the RAG Chain\n", + "\n", + "#### Create a Complete RAG Pipeline\n", + "\n", + "Build a LangChain pipeline that combines:\n", + "\n", + "1. **Vector Search**: Query LlamaStack's Open AI compatible Vector Store\n", + "2. **Context Assembly**: Format retrieved documents\n", + "3. **Prompt Template**: Structure the input for the LLM\n", + "4. **LLM Generation**: Generate answers using context\n", + "5. **Output Parsing**: Extract the final response\n", + "\n", + "**Chain Flow**: `Query → Vector Search → Context + Question → LLM → Response`" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "9684427d-dcc7-4544-9af5-8b110d014c42", + "metadata": {}, + "outputs": [], + "source": [ + "# LangChain for prompt template and chaining + LLAMA Stack Client Vector DB and LLM chat completion\n", + "from langchain_core.output_parsers import StrOutputParser\n", + "from langchain_core.prompts import ChatPromptTemplate\n", + "from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n", + "\n", + "\n", + "def join_docs(docs):\n", + " return \"\\n\\n\".join([f\"[{d.filename}] {d.content[0].text}\" for d in docs.data])\n", + "\n", + "PROMPT = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\"system\", \"You are a helpful assistant. Use the following context to answer.\"),\n", + " (\"user\", \"Question: {question}\\n\\nContext:\\n{context}\"),\n", + " ]\n", + ")\n", + "\n", + "vector_step = RunnableLambda(\n", + " lambda x: client.vector_stores.search(\n", + " vector_store_id=vector_store.id,\n", + " query=x,\n", + " max_num_results=2\n", + " )\n", + " )\n", + "\n", + "chain = (\n", + " {\"context\": vector_step | RunnableLambda(join_docs), \"question\": RunnablePassthrough()}\n", + " | PROMPT\n", + " | llm\n", + " | StrOutputParser()\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "0onu6rhphlra", + "metadata": {}, + "source": [ + "### 7. Testing the RAG System\n", + "\n", + "#### Example 1: Shipping Query" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "03322188-9509-446a-a4a8-ce3bb83ec87c", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_708c060b-45da-423e-8354-68529b4fd1a6/search \"HTTP/1.1 200 OK\"\n", + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "❓ How long does shipping take?\n", + "💡 Acme ships globally in 3-5 business days. This means that shipping typically takes between 3 to 5 working days from the date of dispatch or order fulfillment.\n" + ] + } + ], + "source": [ + "query = \"How long does shipping take?\"\n", + "response = chain.invoke(query)\n", + "print(\"❓\", query)\n", + "print(\"💡\", response)" + ] + }, + { + "cell_type": "markdown", + "id": "b7krhqj88ku", + "metadata": {}, + "source": [ + "#### Example 2: Returns Policy Query" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "61995550-bb0b-46a8-a5d0-023207475d60", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/vector_stores/vs_708c060b-45da-423e-8354-68529b4fd1a6/search \"HTTP/1.1 200 OK\"\n", + "INFO:httpx:HTTP Request: POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "❓ Can I return a product after 40 days?\n", + "💡 Based on the provided context, you cannot return a product after 40 days. The return window is limited to 30 days from the date of purchase.\n" + ] + } + ], + "source": [ + "query = \"Can I return a product after 40 days?\"\n", + "response = chain.invoke(query)\n", + "print(\"❓\", query)\n", + "print(\"💡\", response)" + ] + }, + { + "cell_type": "markdown", + "id": "h4w24fadvjs", + "metadata": {}, + "source": [ + "---\n", + "We have successfully built a RAG system that combines:\n", + "\n", + "- **LlamaStack** for infrastructure (LLM serving + Vector Store)\n", + "- **LangChain** for orchestration (prompts + chains)\n", + "- **Fireworks** for high-quality language models\n", + "\n", + "### Key Benefits\n", + "\n", + "1. **Unified Infrastructure**: Single server for LLMs and Vector Store\n", + "2. **OpenAI Compatibility**: Easy integration with existing LangChain code\n", + "3. **Multi-Provider Support**: Switch between different LLM providers\n", + "4. **Production Ready**: Built-in safety shields and monitoring\n", + "\n", + "### Next Steps\n", + "\n", + "- Add more sophisticated document processing\n", + "- Implement conversation memory\n", + "- Add safety filtering and monitoring\n", + "- Scale to larger document collections\n", + "- Integrate with web frameworks like FastAPI or Streamlit\n", + "\n", + "---\n", + "\n", + "##### 🔧 Cleanup\n", + "\n", + "Don't forget to stop the LlamaStack server when you're done:\n", + "\n", + "```python\n", + "kill_llama_stack_server()\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "15647c46-22ce-4698-af3f-8161329d8e3a", + "metadata": {}, + "outputs": [], + "source": [ + "kill_llama_stack_server()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}