mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-03 19:57:35 +00:00
Add Open AI client example for LLAMAExtensions
This commit is contained in:
parent
0e13512dd7
commit
26f8c86dd8
1 changed files with 270 additions and 0 deletions
270
docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb
Normal file
270
docs/notebooks/OpenAIClient_with_LLAMAStackExtensions.ipynb
Normal file
|
@ -0,0 +1,270 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "notebook-title",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Client with LLAMA Stack Extensions\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use the **OpenAI Python client** with **LLAMA Stack server extensions**, allowing you to access LLAMA Stack-specific APIs through familiar OpenAI client patterns.\n",
|
||||
"\n",
|
||||
"## What You'll Learn\n",
|
||||
"\n",
|
||||
"1. 🔌 **Connect to LLAMA Stack** using the OpenAI client with a custom base URL\n",
|
||||
"2. 🗄️ **Create and manage vector databases** using LLAMA Stack's vector-db API\n",
|
||||
"3. 📄 **Insert and query vector data** for semantic search capabilities\n",
|
||||
"4. 🌐 **Use low-level HTTP requests** to access LLAMA Stack-specific endpoints\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"- ✅ LLAMA Stack server running on `localhost:8321`\n",
|
||||
"- ✅ Python packages: `pip install openai llama-stack-client`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "setup-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 🔧 Setup: Connect OpenAI Client to LLAMA Stack\n",
|
||||
"\n",
|
||||
"We'll use the OpenAI client but point it to our local LLAMA Stack server instead of OpenAI's servers."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a42d6950-65c2-445e-96f9-6301d36d3a0f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from openai import OpenAI\n",
|
||||
"\n",
|
||||
"client = OpenAI(base_url=\"http://localhost:8321/v1\", api_key=\"dummy-key\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "vector-db-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 🗄️ Create a Vector Database\n",
|
||||
"\n",
|
||||
"The code above creates a vector database using LLAMA Stack's vector-db API. We're using:\n",
|
||||
"- **FAISS** as the backend provider\n",
|
||||
"- **sentence-transformers/all-MiniLM-L6-v2** for embeddings (384 dimensions)\n",
|
||||
"- A unique identifier `acme_docs_v2` for this database"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e45da4ef-03d9-48f7-a48f-8b5c09f2a46f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Using a low-level request for llama-stack specific API\n",
|
||||
"resp = client._client.request(\n",
|
||||
" \"POST\",\n",
|
||||
" \"/vector-dbs\",\n",
|
||||
" json={\n",
|
||||
" \"vector_db_id\": \"acme_docs\", # Use a new unique name\n",
|
||||
" \"provider_id\": \"faiss\",\n",
|
||||
" \"provider_vector_db_id\": \"acme_docs_v2\",\n",
|
||||
" \"embedding_model\": \"sentence-transformers/all-MiniLM-L6-v2\",\n",
|
||||
" \"embedding_dimension\": 384,\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(resp.json())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "list-dbs-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 📋 List All Vector Databases\n",
|
||||
"\n",
|
||||
"This lists all vector databases registered in the LLAMA Stack server, allowing us to verify our database was created successfully."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "94e3ca3c-3a8a-4e95-91a6-e87aa4221629",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"resp = client._client.request(\n",
|
||||
" \"GET\",\n",
|
||||
" \"/vector-dbs\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(resp.json())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "get-db-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 🔍 Retrieve Specific Database Info\n",
|
||||
"\n",
|
||||
"Get detailed information about our specific vector database, including its configuration and metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e67a8ba3-7785-4087-b0c8-442596fbdf92",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"resp = client._client.request(\n",
|
||||
" \"GET\",\n",
|
||||
" \"/vector-dbs/acme_docs_v2\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(resp.json())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "prepare-data-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 📄 Prepare Documents for Vector Storage\n",
|
||||
"\n",
|
||||
"We create sample company policy documents and convert them into **Chunk** objects that LLAMA Stack can process. Each chunk contains:\n",
|
||||
"- **content**: The actual text content\n",
|
||||
"- **metadata**: Searchable metadata with a `document_id`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3f08461a-7f9b-439e-bd38-72b68ee9a430",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from llama_stack_client.types.vector_io_insert_params import Chunk\n",
|
||||
"\n",
|
||||
"docs = [\n",
|
||||
" (\"Acme ships globally in 3-5 business days.\", {\"title\": \"Shipping Policy\"}),\n",
|
||||
" (\"Returns are accepted within 30 days of purchase.\", {\"title\": \"Returns Policy\"}),\n",
|
||||
" (\"Support is available 24/7 via chat and email.\", {\"title\": \"Support\"}),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# Convert to Chunk objects\n",
|
||||
"chunks = []\n",
|
||||
"for _, (content, metadata) in enumerate(docs):\n",
|
||||
" # Transform metadata to required format with document_id from title\n",
|
||||
" metadata = {\"document_id\": metadata[\"title\"]}\n",
|
||||
" chunk = Chunk(\n",
|
||||
" content=content, # Required[InterleavedContent]\n",
|
||||
" metadata=metadata, # Required[Dict]\n",
|
||||
" )\n",
|
||||
" chunks.append(chunk)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "insert-data-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 📤 Insert Documents into Vector Database\n",
|
||||
"\n",
|
||||
"Insert our prepared chunks into the vector database. LLAMA Stack will automatically:\n",
|
||||
"- Generate embeddings using the specified model\n",
|
||||
"- Store the vectors in the FAISS index\n",
|
||||
"- Set a TTL (time-to-live) of 1 hour for the chunks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0fb7f7a7-6b8e-4af4-af93-ecc4fb3e1696",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"resp = client._client.request(\n",
|
||||
" \"POST\",\n",
|
||||
" \"/vector-io/insert\",\n",
|
||||
" json={\n",
|
||||
" \"vector_db_id\": \"acme_docs_v2\",\n",
|
||||
" \"chunks\": chunks,\n",
|
||||
" \"ttl_seconds\": 3600, # optional\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(resp.status_code)\n",
|
||||
"print(resp.json()) # might be empty if API returns None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "search-section",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 🔍 Semantic Search Query\n",
|
||||
"\n",
|
||||
"Perform a **semantic search** on our documents. The query \"How long does Acme take to ship orders?\" will be converted to an embedding and matched against stored document embeddings to find the most relevant content.\n",
|
||||
"\n",
|
||||
"The results show the most relevant chunks ranked by semantic similarity, with metadata and content for each match."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "faa7e0ab-aef9-496f-a2bc-90b4eb8bc860",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"How long does Acme take to ship orders?\"\n",
|
||||
"\n",
|
||||
"resp = client._client.request(\n",
|
||||
" \"POST\",\n",
|
||||
" \"/vector-io/query\", # endpoint for vector queries\n",
|
||||
" json={\n",
|
||||
" \"vector_db_id\": \"acme_docs_v2\",\n",
|
||||
" \"query\": query,\n",
|
||||
" \"top_k\": 5 # optional, number of results to return\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Convert response to Python dictionary\n",
|
||||
"data = resp.json()\n",
|
||||
"\n",
|
||||
"# Loop through returned chunks\n",
|
||||
"for i, chunk in enumerate(data.get(\"chunks\", []), start=1):\n",
|
||||
" print(f\"Chunk {i}:\")\n",
|
||||
" print(\" Metadata:\", chunk.get(\"metadata\"))\n",
|
||||
" print(\" Content :\", chunk.get(\"content\"))\n",
|
||||
" print(\"-\" * 50)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue