From 26f8c86dd899f5f49cf0d9fef74780f191047fd2 Mon Sep 17 00:00:00 2001 From: Swapna Lekkala Date: Thu, 2 Oct 2025 11:38:10 -0700 Subject: [PATCH 1/2] Add Open AI client example for LLAMAExtensions --- ...enAIClient_with_LLAMAStackExtensions.ipynb | 270 ++++++++++++++++++ 1 file changed, 270 insertions(+) create mode 100644 docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb diff --git a/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb new file mode 100644 index 000000000..d6841e488 --- /dev/null +++ b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb @@ -0,0 +1,270 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "notebook-title", + "metadata": {}, + "source": [ + "# OpenAI Client with LLAMA Stack Extensions\n", + "\n", + "This notebook demonstrates how to use the **OpenAI Python client** with **LLAMA Stack server extensions**, allowing you to access LLAMA Stack-specific APIs through familiar OpenAI client patterns.\n", + "\n", + "## What You'll Learn\n", + "\n", + "1. 🔌 **Connect to LLAMA Stack** using the OpenAI client with a custom base URL\n", + "2. 🗄️ **Create and manage vector databases** using LLAMA Stack's vector-db API\n", + "3. 📄 **Insert and query vector data** for semantic search capabilities\n", + "4. 🌐 **Use low-level HTTP requests** to access LLAMA Stack-specific endpoints\n", + "\n", + "## Prerequisites\n", + "\n", + "- ✅ LLAMA Stack server running on `localhost:8321`\n", + "- ✅ Python packages: `pip install openai llama-stack-client`" + ] + }, + { + "cell_type": "markdown", + "id": "setup-section", + "metadata": {}, + "source": [ + "## 🔧 Setup: Connect OpenAI Client to LLAMA Stack\n", + "\n", + "We'll use the OpenAI client but point it to our local LLAMA Stack server instead of OpenAI's servers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a42d6950-65c2-445e-96f9-6301d36d3a0f", + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\n", + "\n", + "client = OpenAI(base_url=\"http://localhost:8321/v1\", api_key=\"dummy-key\")" + ] + }, + { + "cell_type": "markdown", + "id": "vector-db-section", + "metadata": {}, + "source": [ + "## 🗄️ Create a Vector Database\n", + "\n", + "The code above creates a vector database using LLAMA Stack's vector-db API. We're using:\n", + "- **FAISS** as the backend provider\n", + "- **sentence-transformers/all-MiniLM-L6-v2** for embeddings (384 dimensions)\n", + "- A unique identifier `acme_docs_v2` for this database" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e45da4ef-03d9-48f7-a48f-8b5c09f2a46f", + "metadata": {}, + "outputs": [], + "source": [ + "# Using a low-level request for llama-stack specific API\n", + "resp = client._client.request(\n", + " \"POST\",\n", + " \"/vector-dbs\",\n", + " json={\n", + " \"vector_db_id\": \"acme_docs\", # Use a new unique name\n", + " \"provider_id\": \"faiss\",\n", + " \"provider_vector_db_id\": \"acme_docs_v2\",\n", + " \"embedding_model\": \"sentence-transformers/all-MiniLM-L6-v2\",\n", + " \"embedding_dimension\": 384,\n", + " },\n", + ")\n", + "\n", + "print(resp.json())" + ] + }, + { + "cell_type": "markdown", + "id": "list-dbs-section", + "metadata": {}, + "source": [ + "## 📋 List All Vector Databases\n", + "\n", + "This lists all vector databases registered in the LLAMA Stack server, allowing us to verify our database was created successfully." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94e3ca3c-3a8a-4e95-91a6-e87aa4221629", + "metadata": {}, + "outputs": [], + "source": [ + "resp = client._client.request(\n", + " \"GET\",\n", + " \"/vector-dbs\"\n", + ")\n", + "\n", + "print(resp.json())" + ] + }, + { + "cell_type": "markdown", + "id": "get-db-section", + "metadata": {}, + "source": [ + "## 🔍 Retrieve Specific Database Info\n", + "\n", + "Get detailed information about our specific vector database, including its configuration and metadata." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e67a8ba3-7785-4087-b0c8-442596fbdf92", + "metadata": {}, + "outputs": [], + "source": [ + "resp = client._client.request(\n", + " \"GET\",\n", + " \"/vector-dbs/acme_docs_v2\"\n", + ")\n", + "\n", + "print(resp.json())" + ] + }, + { + "cell_type": "markdown", + "id": "prepare-data-section", + "metadata": {}, + "source": [ + "## 📄 Prepare Documents for Vector Storage\n", + "\n", + "We create sample company policy documents and convert them into **Chunk** objects that LLAMA Stack can process. Each chunk contains:\n", + "- **content**: The actual text content\n", + "- **metadata**: Searchable metadata with a `document_id`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f08461a-7f9b-439e-bd38-72b68ee9a430", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_stack_client.types.vector_io_insert_params import Chunk\n", + "\n", + "docs = [\n", + " (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n", + " (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n", + " (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n", + "]\n", + "\n", + "# Convert to Chunk objects\n", + "chunks = []\n", + "for _, (content, metadata) in enumerate(docs):\n", + " # Transform metadata to required format with document_id from title\n", + " metadata = {\"document_id\": metadata[\"title\"]}\n", + " chunk = Chunk(\n", + " content=content, # Required[InterleavedContent]\n", + " metadata=metadata, # Required[Dict]\n", + " )\n", + " chunks.append(chunk)" + ] + }, + { + "cell_type": "markdown", + "id": "insert-data-section", + "metadata": {}, + "source": [ + "## 📤 Insert Documents into Vector Database\n", + "\n", + "Insert our prepared chunks into the vector database. LLAMA Stack will automatically:\n", + "- Generate embeddings using the specified model\n", + "- Store the vectors in the FAISS index\n", + "- Set a TTL (time-to-live) of 1 hour for the chunks" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0fb7f7a7-6b8e-4af4-af93-ecc4fb3e1696", + "metadata": {}, + "outputs": [], + "source": [ + "resp = client._client.request(\n", + " \"POST\",\n", + " \"/vector-io/insert\",\n", + " json={\n", + " \"vector_db_id\": \"acme_docs_v2\",\n", + " \"chunks\": chunks,\n", + " \"ttl_seconds\": 3600, # optional\n", + " }\n", + ")\n", + "\n", + "print(resp.status_code)\n", + "print(resp.json()) # might be empty if API returns None" + ] + }, + { + "cell_type": "markdown", + "id": "search-section", + "metadata": {}, + "source": [ + "## 🔍 Semantic Search Query\n", + "\n", + "Perform a **semantic search** on our documents. The query \"How long does Acme take to ship orders?\" will be converted to an embedding and matched against stored document embeddings to find the most relevant content.\n", + "\n", + "The results show the most relevant chunks ranked by semantic similarity, with metadata and content for each match." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "faa7e0ab-aef9-496f-a2bc-90b4eb8bc860", + "metadata": {}, + "outputs": [], + "source": [ + "query = \"How long does Acme take to ship orders?\"\n", + "\n", + "resp = client._client.request(\n", + " \"POST\",\n", + " \"/vector-io/query\", # endpoint for vector queries\n", + " json={\n", + " \"vector_db_id\": \"acme_docs_v2\",\n", + " \"query\": query,\n", + " \"top_k\": 5 # optional, number of results to return\n", + " }\n", + ")\n", + "\n", + "# Convert response to Python dictionary\n", + "data = resp.json()\n", + "\n", + "# Loop through returned chunks\n", + "for i, chunk in enumerate(data.get(\"chunks\", []), start=1):\n", + " print(f\"Chunk {i}:\")\n", + " print(\" Metadata:\", chunk.get(\"metadata\"))\n", + " print(\" Content :\", chunk.get(\"content\"))\n", + " print(\"-\" * 50)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 0123079a91e2f5bbab9f1653eac87cfd2260ca00 Mon Sep 17 00:00:00 2001 From: Swapna Lekkala Date: Thu, 2 Oct 2025 11:39:05 -0700 Subject: [PATCH 2/2] fix --- ...enAIClient_with_LLAMAStackExtensions.ipynb | 119 ++++++++++++++++-- 1 file changed, 107 insertions(+), 12 deletions(-) diff --git a/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb index d6841e488..62568d5a8 100644 --- a/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb +++ b/docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb @@ -34,7 +34,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 25, "id": "a42d6950-65c2-445e-96f9-6301d36d3a0f", "metadata": {}, "outputs": [], @@ -59,10 +59,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 26, "id": "e45da4ef-03d9-48f7-a48f-8b5c09f2a46f", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://localhost:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'identifier': 'acme_docs_v2', 'provider_resource_id': 'acme_docs_v2', 'provider_id': 'faiss', 'type': 'vector_db', 'owner': None, 'source': 'via_register_api', 'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2', 'embedding_dimension': 384, 'vector_db_name': 'acme_docs'}\n" + ] + } + ], "source": [ "# Using a low-level request for llama-stack specific API\n", "resp = client._client.request(\n", @@ -92,10 +107,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 27, "id": "94e3ca3c-3a8a-4e95-91a6-e87aa4221629", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: GET http://localhost:8321/v1/vector-dbs \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'data': [{'identifier': 'acme_docs_v2', 'provider_resource_id': 'acme_docs_v2', 'provider_id': 'faiss', 'type': 'vector_db', 'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2', 'embedding_dimension': 384, 'vector_db_name': 'acme_docs'}]}\n" + ] + } + ], "source": [ "resp = client._client.request(\n", " \"GET\",\n", @@ -117,10 +147,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 28, "id": "e67a8ba3-7785-4087-b0c8-442596fbdf92", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: GET http://localhost:8321/v1/vector-dbs/acme_docs_v2 \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'identifier': 'acme_docs_v2', 'provider_resource_id': 'acme_docs_v2', 'provider_id': 'faiss', 'type': 'vector_db', 'owner': None, 'source': 'via_register_api', 'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2', 'embedding_dimension': 384, 'vector_db_name': 'acme_docs'}\n" + ] + } + ], "source": [ "resp = client._client.request(\n", " \"GET\",\n", @@ -144,7 +189,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 29, "id": "3f08461a-7f9b-439e-bd38-72b68ee9a430", "metadata": {}, "outputs": [], @@ -184,10 +229,26 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 30, "id": "0fb7f7a7-6b8e-4af4-af93-ecc4fb3e1696", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://localhost:8321/v1/vector-io/insert \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "200\n", + "None\n" + ] + } + ], "source": [ "resp = client._client.request(\n", " \"POST\",\n", @@ -217,10 +278,36 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 31, "id": "faa7e0ab-aef9-496f-a2bc-90b4eb8bc860", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:httpx:HTTP Request: POST http://localhost:8321/v1/vector-io/query \"HTTP/1.1 200 OK\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Chunk 1:\n", + " Metadata: {'document_id': 'Shipping Policy'}\n", + " Content : Acme ships globally in 3-5 business days.\n", + "--------------------------------------------------\n", + "Chunk 2:\n", + " Metadata: {'document_id': 'Shipping Policy'}\n", + " Content : Acme ships globally in 3-5 business days.\n", + "--------------------------------------------------\n", + "Chunk 3:\n", + " Metadata: {'document_id': 'Returns Policy'}\n", + " Content : Returns are accepted within 30 days of purchase.\n", + "--------------------------------------------------\n" + ] + } + ], "source": [ "query = \"How long does Acme take to ship orders?\"\n", "\n", @@ -244,6 +331,14 @@ " print(\" Content :\", chunk.get(\"content\"))\n", " print(\"-\" * 50)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa803d10-8f3f-40e0-b621-51c847778e05", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": {