Add Open AI client example for LLAMAExtensions

2025-10-03 19:57:35 +00:00 · 2025-10-02 11:38:10 -07:00 · 2025-10-02 11:38:10 -07:00 · 26f8c86dd8
commit 26f8c86dd8
parent 0e13512dd7
1 changed files with 270 additions and 0 deletions
--- a/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb
+++ b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb
@ -0,0 +1,270 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "notebook-title",
+   "metadata": {},
+   "source": [
+    "# OpenAI Client with LLAMA Stack Extensions\n",
+    "\n",
+    "This notebook demonstrates how to use the **OpenAI Python client** with **LLAMA Stack server extensions**, allowing you to access LLAMA Stack-specific APIs through familiar OpenAI client patterns.\n",
+    "\n",
+    "## What You'll Learn\n",
+    "\n",
+    "1. 🔌 **Connect to LLAMA Stack** using the OpenAI client with a custom base URL\n",
+    "2. 🗄️ **Create and manage vector databases** using LLAMA Stack's vector-db API\n",
+    "3. 📄 **Insert and query vector data** for semantic search capabilities\n",
+    "4. 🌐 **Use low-level HTTP requests** to access LLAMA Stack-specific endpoints\n",
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "- ✅ LLAMA Stack server running on `localhost:8321`\n",
+    "- ✅ Python packages: `pip install openai llama-stack-client`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "setup-section",
+   "metadata": {},
+   "source": [
+    "## 🔧 Setup: Connect OpenAI Client to LLAMA Stack\n",
+    "\n",
+    "We'll use the OpenAI client but point it to our local LLAMA Stack server instead of OpenAI's servers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a42d6950-65c2-445e-96f9-6301d36d3a0f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(base_url=\"http://localhost:8321/v1\", api_key=\"dummy-key\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "vector-db-section",
+   "metadata": {},
+   "source": [
+    "## 🗄️ Create a Vector Database\n",
+    "\n",
+    "The code above creates a vector database using LLAMA Stack's vector-db API. We're using:\n",
+    "- **FAISS** as the backend provider\n",
+    "- **sentence-transformers/all-MiniLM-L6-v2** for embeddings (384 dimensions)\n",
+    "- A unique identifier `acme_docs_v2` for this database"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e45da4ef-03d9-48f7-a48f-8b5c09f2a46f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Using a low-level request for llama-stack specific API\n",
+    "resp = client._client.request(\n",
+    "    \"POST\",\n",
+    "    \"/vector-dbs\",\n",
+    "    json={\n",
+    "        \"vector_db_id\": \"acme_docs\",  # Use a new unique name\n",
+    "        \"provider_id\": \"faiss\",\n",
+    "        \"provider_vector_db_id\": \"acme_docs_v2\",\n",
+    "        \"embedding_model\": \"sentence-transformers/all-MiniLM-L6-v2\",\n",
+    "        \"embedding_dimension\": 384,\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "print(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "list-dbs-section",
+   "metadata": {},
+   "source": [
+    "## 📋 List All Vector Databases\n",
+    "\n",
+    "This lists all vector databases registered in the LLAMA Stack server, allowing us to verify our database was created successfully."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94e3ca3c-3a8a-4e95-91a6-e87aa4221629",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp = client._client.request(\n",
+    "    \"GET\",\n",
+    "    \"/vector-dbs\"\n",
+    ")\n",
+    "\n",
+    "print(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "get-db-section",
+   "metadata": {},
+   "source": [
+    "## 🔍 Retrieve Specific Database Info\n",
+    "\n",
+    "Get detailed information about our specific vector database, including its configuration and metadata."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e67a8ba3-7785-4087-b0c8-442596fbdf92",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp = client._client.request(\n",
+    "    \"GET\",\n",
+    "    \"/vector-dbs/acme_docs_v2\"\n",
+    ")\n",
+    "\n",
+    "print(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "prepare-data-section",
+   "metadata": {},
+   "source": [
+    "## 📄 Prepare Documents for Vector Storage\n",
+    "\n",
+    "We create sample company policy documents and convert them into **Chunk** objects that LLAMA Stack can process. Each chunk contains:\n",
+    "- **content**: The actual text content\n",
+    "- **metadata**: Searchable metadata with a `document_id`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f08461a-7f9b-439e-bd38-72b68ee9a430",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_stack_client.types.vector_io_insert_params import Chunk\n",
+    "\n",
+    "docs = [\n",
+    "    (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
+    "    (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
+    "    (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
+    "]\n",
+    "\n",
+    "# Convert to Chunk objects\n",
+    "chunks = []\n",
+    "for _, (content, metadata) in enumerate(docs):\n",
+    "    # Transform metadata to required format with document_id from title\n",
+    "    metadata = {\"document_id\": metadata[\"title\"]}\n",
+    "    chunk = Chunk(\n",
+    "        content=content,  # Required[InterleavedContent]\n",
+    "        metadata=metadata,  # Required[Dict]\n",
+    "    )\n",
+    "    chunks.append(chunk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "insert-data-section",
+   "metadata": {},
+   "source": [
+    "## 📤 Insert Documents into Vector Database\n",
+    "\n",
+    "Insert our prepared chunks into the vector database. LLAMA Stack will automatically:\n",
+    "- Generate embeddings using the specified model\n",
+    "- Store the vectors in the FAISS index\n",
+    "- Set a TTL (time-to-live) of 1 hour for the chunks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0fb7f7a7-6b8e-4af4-af93-ecc4fb3e1696",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp = client._client.request(\n",
+    "    \"POST\",\n",
+    "    \"/vector-io/insert\",\n",
+    "    json={\n",
+    "        \"vector_db_id\": \"acme_docs_v2\",\n",
+    "        \"chunks\": chunks,\n",
+    "        \"ttl_seconds\": 3600,  # optional\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "print(resp.status_code)\n",
+    "print(resp.json())  # might be empty if API returns None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "search-section",
+   "metadata": {},
+   "source": [
+    "## 🔍 Semantic Search Query\n",
+    "\n",
+    "Perform a **semantic search** on our documents. The query \"How long does Acme take to ship orders?\" will be converted to an embedding and matched against stored document embeddings to find the most relevant content.\n",
+    "\n",
+    "The results show the most relevant chunks ranked by semantic similarity, with metadata and content for each match."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "faa7e0ab-aef9-496f-a2bc-90b4eb8bc860",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"How long does Acme take to ship orders?\"\n",
+    "\n",
+    "resp = client._client.request(\n",
+    "    \"POST\",\n",
+    "    \"/vector-io/query\",   # endpoint for vector queries\n",
+    "    json={\n",
+    "        \"vector_db_id\": \"acme_docs_v2\",\n",
+    "        \"query\": query,\n",
+    "        \"top_k\": 5  # optional, number of results to return\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "# Convert response to Python dictionary\n",
+    "data = resp.json()\n",
+    "\n",
+    "# Loop through returned chunks\n",
+    "for i, chunk in enumerate(data.get(\"chunks\", []), start=1):\n",
+    "    print(f\"Chunk {i}:\")\n",
+    "    print(\"  Metadata:\", chunk.get(\"metadata\"))\n",
+    "    print(\"  Content :\", chunk.get(\"content\"))\n",
+    "    print(\"-\" * 50)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}