Add Open AI client example for LLAMAExtensions

This commit is contained in:
Swapna Lekkala 2025-10-02 11:38:10 -07:00
parent 0e13512dd7
commit 26f8c86dd8

View file

@ -0,0 +1,270 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "notebook-title",
"metadata": {},
"source": [
"# OpenAI Client with LLAMA Stack Extensions\n",
"\n",
"This notebook demonstrates how to use the **OpenAI Python client** with **LLAMA Stack server extensions**, allowing you to access LLAMA Stack-specific APIs through familiar OpenAI client patterns.\n",
"\n",
"## What You'll Learn\n",
"\n",
"1. 🔌 **Connect to LLAMA Stack** using the OpenAI client with a custom base URL\n",
"2. 🗄️ **Create and manage vector databases** using LLAMA Stack's vector-db API\n",
"3. 📄 **Insert and query vector data** for semantic search capabilities\n",
"4. 🌐 **Use low-level HTTP requests** to access LLAMA Stack-specific endpoints\n",
"\n",
"## Prerequisites\n",
"\n",
"- ✅ LLAMA Stack server running on `localhost:8321`\n",
"- ✅ Python packages: `pip install openai llama-stack-client`"
]
},
{
"cell_type": "markdown",
"id": "setup-section",
"metadata": {},
"source": [
"## 🔧 Setup: Connect OpenAI Client to LLAMA Stack\n",
"\n",
"We'll use the OpenAI client but point it to our local LLAMA Stack server instead of OpenAI's servers."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a42d6950-65c2-445e-96f9-6301d36d3a0f",
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\n",
"\n",
"client = OpenAI(base_url=\"http://localhost:8321/v1\", api_key=\"dummy-key\")"
]
},
{
"cell_type": "markdown",
"id": "vector-db-section",
"metadata": {},
"source": [
"## 🗄️ Create a Vector Database\n",
"\n",
"The code above creates a vector database using LLAMA Stack's vector-db API. We're using:\n",
"- **FAISS** as the backend provider\n",
"- **sentence-transformers/all-MiniLM-L6-v2** for embeddings (384 dimensions)\n",
"- A unique identifier `acme_docs_v2` for this database"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e45da4ef-03d9-48f7-a48f-8b5c09f2a46f",
"metadata": {},
"outputs": [],
"source": [
"# Using a low-level request for llama-stack specific API\n",
"resp = client._client.request(\n",
" \"POST\",\n",
" \"/vector-dbs\",\n",
" json={\n",
" \"vector_db_id\": \"acme_docs\", # Use a new unique name\n",
" \"provider_id\": \"faiss\",\n",
" \"provider_vector_db_id\": \"acme_docs_v2\",\n",
" \"embedding_model\": \"sentence-transformers/all-MiniLM-L6-v2\",\n",
" \"embedding_dimension\": 384,\n",
" },\n",
")\n",
"\n",
"print(resp.json())"
]
},
{
"cell_type": "markdown",
"id": "list-dbs-section",
"metadata": {},
"source": [
"## 📋 List All Vector Databases\n",
"\n",
"This lists all vector databases registered in the LLAMA Stack server, allowing us to verify our database was created successfully."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94e3ca3c-3a8a-4e95-91a6-e87aa4221629",
"metadata": {},
"outputs": [],
"source": [
"resp = client._client.request(\n",
" \"GET\",\n",
" \"/vector-dbs\"\n",
")\n",
"\n",
"print(resp.json())"
]
},
{
"cell_type": "markdown",
"id": "get-db-section",
"metadata": {},
"source": [
"## 🔍 Retrieve Specific Database Info\n",
"\n",
"Get detailed information about our specific vector database, including its configuration and metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e67a8ba3-7785-4087-b0c8-442596fbdf92",
"metadata": {},
"outputs": [],
"source": [
"resp = client._client.request(\n",
" \"GET\",\n",
" \"/vector-dbs/acme_docs_v2\"\n",
")\n",
"\n",
"print(resp.json())"
]
},
{
"cell_type": "markdown",
"id": "prepare-data-section",
"metadata": {},
"source": [
"## 📄 Prepare Documents for Vector Storage\n",
"\n",
"We create sample company policy documents and convert them into **Chunk** objects that LLAMA Stack can process. Each chunk contains:\n",
"- **content**: The actual text content\n",
"- **metadata**: Searchable metadata with a `document_id`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f08461a-7f9b-439e-bd38-72b68ee9a430",
"metadata": {},
"outputs": [],
"source": [
"from llama_stack_client.types.vector_io_insert_params import Chunk\n",
"\n",
"docs = [\n",
" (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
" (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
" (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
"]\n",
"\n",
"# Convert to Chunk objects\n",
"chunks = []\n",
"for _, (content, metadata) in enumerate(docs):\n",
" # Transform metadata to required format with document_id from title\n",
" metadata = {\"document_id\": metadata[\"title\"]}\n",
" chunk = Chunk(\n",
" content=content, # Required[InterleavedContent]\n",
" metadata=metadata, # Required[Dict]\n",
" )\n",
" chunks.append(chunk)"
]
},
{
"cell_type": "markdown",
"id": "insert-data-section",
"metadata": {},
"source": [
"## 📤 Insert Documents into Vector Database\n",
"\n",
"Insert our prepared chunks into the vector database. LLAMA Stack will automatically:\n",
"- Generate embeddings using the specified model\n",
"- Store the vectors in the FAISS index\n",
"- Set a TTL (time-to-live) of 1 hour for the chunks"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fb7f7a7-6b8e-4af4-af93-ecc4fb3e1696",
"metadata": {},
"outputs": [],
"source": [
"resp = client._client.request(\n",
" \"POST\",\n",
" \"/vector-io/insert\",\n",
" json={\n",
" \"vector_db_id\": \"acme_docs_v2\",\n",
" \"chunks\": chunks,\n",
" \"ttl_seconds\": 3600, # optional\n",
" }\n",
")\n",
"\n",
"print(resp.status_code)\n",
"print(resp.json()) # might be empty if API returns None"
]
},
{
"cell_type": "markdown",
"id": "search-section",
"metadata": {},
"source": [
"## 🔍 Semantic Search Query\n",
"\n",
"Perform a **semantic search** on our documents. The query \"How long does Acme take to ship orders?\" will be converted to an embedding and matched against stored document embeddings to find the most relevant content.\n",
"\n",
"The results show the most relevant chunks ranked by semantic similarity, with metadata and content for each match."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "faa7e0ab-aef9-496f-a2bc-90b4eb8bc860",
"metadata": {},
"outputs": [],
"source": [
"query = \"How long does Acme take to ship orders?\"\n",
"\n",
"resp = client._client.request(\n",
" \"POST\",\n",
" \"/vector-io/query\", # endpoint for vector queries\n",
" json={\n",
" \"vector_db_id\": \"acme_docs_v2\",\n",
" \"query\": query,\n",
" \"top_k\": 5 # optional, number of results to return\n",
" }\n",
")\n",
"\n",
"# Convert response to Python dictionary\n",
"data = resp.json()\n",
"\n",
"# Loop through returned chunks\n",
"for i, chunk in enumerate(data.get(\"chunks\", []), start=1):\n",
" print(f\"Chunk {i}:\")\n",
" print(\" Metadata:\", chunk.get(\"metadata\"))\n",
" print(\" Content :\", chunk.get(\"content\"))\n",
" print(\"-\" * 50)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}