diff --git a/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb new file mode 100644 index 000000000..d6841e488 --- /dev/null +++ b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb @@ -0,0 +1,270 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "notebook-title", + "metadata": {}, + "source": [ + "# OpenAI Client with LLAMA Stack Extensions\n", + "\n", + "This notebook demonstrates how to use the **OpenAI Python client** with **LLAMA Stack server extensions**, allowing you to access LLAMA Stack-specific APIs through familiar OpenAI client patterns.\n", + "\n", + "## What You'll Learn\n", + "\n", + "1. 🔌 **Connect to LLAMA Stack** using the OpenAI client with a custom base URL\n", + "2. 🗄️ **Create and manage vector databases** using LLAMA Stack's vector-db API\n", + "3. 📄 **Insert and query vector data** for semantic search capabilities\n", + "4. 🌐 **Use low-level HTTP requests** to access LLAMA Stack-specific endpoints\n", + "\n", + "## Prerequisites\n", + "\n", + "- ✅ LLAMA Stack server running on `localhost:8321`\n", + "- ✅ Python packages: `pip install openai llama-stack-client`" + ] + }, + { + "cell_type": "markdown", + "id": "setup-section", + "metadata": {}, + "source": [ + "## 🔧 Setup: Connect OpenAI Client to LLAMA Stack\n", + "\n", + "We'll use the OpenAI client but point it to our local LLAMA Stack server instead of OpenAI's servers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a42d6950-65c2-445e-96f9-6301d36d3a0f", + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\n", + "\n", + "client = OpenAI(base_url=\"http://localhost:8321/v1\", api_key=\"dummy-key\")" + ] + }, + { + "cell_type": "markdown", + "id": "vector-db-section", + "metadata": {}, + "source": [ + "## 🗄️ Create a Vector Database\n", + "\n", + "The code above creates a vector database using LLAMA Stack's vector-db API. We're using:\n", + "- **FAISS** as the backend provider\n", + "- **sentence-transformers/all-MiniLM-L6-v2** for embeddings (384 dimensions)\n", + "- A unique identifier `acme_docs_v2` for this database" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e45da4ef-03d9-48f7-a48f-8b5c09f2a46f", + "metadata": {}, + "outputs": [], + "source": [ + "# Using a low-level request for llama-stack specific API\n", + "resp = client._client.request(\n", + " \"POST\",\n", + " \"/vector-dbs\",\n", + " json={\n", + " \"vector_db_id\": \"acme_docs\", # Use a new unique name\n", + " \"provider_id\": \"faiss\",\n", + " \"provider_vector_db_id\": \"acme_docs_v2\",\n", + " \"embedding_model\": \"sentence-transformers/all-MiniLM-L6-v2\",\n", + " \"embedding_dimension\": 384,\n", + " },\n", + ")\n", + "\n", + "print(resp.json())" + ] + }, + { + "cell_type": "markdown", + "id": "list-dbs-section", + "metadata": {}, + "source": [ + "## 📋 List All Vector Databases\n", + "\n", + "This lists all vector databases registered in the LLAMA Stack server, allowing us to verify our database was created successfully." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94e3ca3c-3a8a-4e95-91a6-e87aa4221629", + "metadata": {}, + "outputs": [], + "source": [ + "resp = client._client.request(\n", + " \"GET\",\n", + " \"/vector-dbs\"\n", + ")\n", + "\n", + "print(resp.json())" + ] + }, + { + "cell_type": "markdown", + "id": "get-db-section", + "metadata": {}, + "source": [ + "## 🔍 Retrieve Specific Database Info\n", + "\n", + "Get detailed information about our specific vector database, including its configuration and metadata." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e67a8ba3-7785-4087-b0c8-442596fbdf92", + "metadata": {}, + "outputs": [], + "source": [ + "resp = client._client.request(\n", + " \"GET\",\n", + " \"/vector-dbs/acme_docs_v2\"\n", + ")\n", + "\n", + "print(resp.json())" + ] + }, + { + "cell_type": "markdown", + "id": "prepare-data-section", + "metadata": {}, + "source": [ + "## 📄 Prepare Documents for Vector Storage\n", + "\n", + "We create sample company policy documents and convert them into **Chunk** objects that LLAMA Stack can process. Each chunk contains:\n", + "- **content**: The actual text content\n", + "- **metadata**: Searchable metadata with a `document_id`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f08461a-7f9b-439e-bd38-72b68ee9a430", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_stack_client.types.vector_io_insert_params import Chunk\n", + "\n", + "docs = [\n", + " (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n", + " (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n", + " (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n", + "]\n", + "\n", + "# Convert to Chunk objects\n", + "chunks = []\n", + "for _, (content, metadata) in enumerate(docs):\n", + " # Transform metadata to required format with document_id from title\n", + " metadata = {\"document_id\": metadata[\"title\"]}\n", + " chunk = Chunk(\n", + " content=content, # Required[InterleavedContent]\n", + " metadata=metadata, # Required[Dict]\n", + " )\n", + " chunks.append(chunk)" + ] + }, + { + "cell_type": "markdown", + "id": "insert-data-section", + "metadata": {}, + "source": [ + "## 📤 Insert Documents into Vector Database\n", + "\n", + "Insert our prepared chunks into the vector database. LLAMA Stack will automatically:\n", + "- Generate embeddings using the specified model\n", + "- Store the vectors in the FAISS index\n", + "- Set a TTL (time-to-live) of 1 hour for the chunks" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0fb7f7a7-6b8e-4af4-af93-ecc4fb3e1696", + "metadata": {}, + "outputs": [], + "source": [ + "resp = client._client.request(\n", + " \"POST\",\n", + " \"/vector-io/insert\",\n", + " json={\n", + " \"vector_db_id\": \"acme_docs_v2\",\n", + " \"chunks\": chunks,\n", + " \"ttl_seconds\": 3600, # optional\n", + " }\n", + ")\n", + "\n", + "print(resp.status_code)\n", + "print(resp.json()) # might be empty if API returns None" + ] + }, + { + "cell_type": "markdown", + "id": "search-section", + "metadata": {}, + "source": [ + "## 🔍 Semantic Search Query\n", + "\n", + "Perform a **semantic search** on our documents. The query \"How long does Acme take to ship orders?\" will be converted to an embedding and matched against stored document embeddings to find the most relevant content.\n", + "\n", + "The results show the most relevant chunks ranked by semantic similarity, with metadata and content for each match." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "faa7e0ab-aef9-496f-a2bc-90b4eb8bc860", + "metadata": {}, + "outputs": [], + "source": [ + "query = \"How long does Acme take to ship orders?\"\n", + "\n", + "resp = client._client.request(\n", + " \"POST\",\n", + " \"/vector-io/query\", # endpoint for vector queries\n", + " json={\n", + " \"vector_db_id\": \"acme_docs_v2\",\n", + " \"query\": query,\n", + " \"top_k\": 5 # optional, number of results to return\n", + " }\n", + ")\n", + "\n", + "# Convert response to Python dictionary\n", + "data = resp.json()\n", + "\n", + "# Loop through returned chunks\n", + "for i, chunk in enumerate(data.get(\"chunks\", []), start=1):\n", + " print(f\"Chunk {i}:\")\n", + " print(\" Metadata:\", chunk.get(\"metadata\"))\n", + " print(\" Content :\", chunk.get(\"content\"))\n", + " print(\"-\" * 50)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}