# OpenAI Client with LLAMA Stack Extensions

This notebook demonstrates how to use the **OpenAI Python client** with **LLAMA Stack server extensions**, allowing you to access LLAMA Stack-specific APIs through familiar OpenAI client patterns.

## What You'll Learn

1. üîå **Connect to LLAMA Stack** using the OpenAI client with a custom base URL
2. üóÑÔ∏è **Create and manage vector databases** using LLAMA Stack's vector-db API
3. üìÑ **Insert and query vector data** for semantic search capabilities
4. üåê **Use low-level HTTP requests** to access LLAMA Stack-specific endpoints

## Prerequisites

- ‚úÖ LLAMA Stack server running on `localhost:8321`
- ‚úÖ Python packages: `pip install openai llama-stack-client`

## üîß Setup: Connect OpenAI Client to LLAMA Stack

We'll use the OpenAI client but point it to our local LLAMA Stack server instead of OpenAI's servers.

In [25]:
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="dummy-key")

## üóÑÔ∏è Create a Vector Database

The code above creates a vector database using LLAMA Stack's vector-db API. We're using:
- **FAISS** as the backend provider
- **sentence-transformers/all-MiniLM-L6-v2** for embeddings (384 dimensions)
- A unique identifier `acme_docs_v2` for this database

In [26]:
# Using a low-level request for llama-stack specific API
resp = client._client.request(
    "POST",
    "/vector-dbs",
    json={
        "vector_db_id": "acme_docs",  # Use a new unique name
        "provider_id": "faiss",
        "provider_vector_db_id": "acme_docs_v2",
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
        "embedding_dimension": 384,
    },
)

print(resp.json())

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/vector-dbs "HTTP/1.1 200 OK"


{'identifier': 'acme_docs_v2', 'provider_resource_id': 'acme_docs_v2', 'provider_id': 'faiss', 'type': 'vector_db', 'owner': None, 'source': 'via_register_api', 'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2', 'embedding_dimension': 384, 'vector_db_name': 'acme_docs'}


## üìã List All Vector Databases

This lists all vector databases registered in the LLAMA Stack server, allowing us to verify our database was created successfully.

In [27]:
resp = client._client.request(
    "GET",
    "/vector-dbs"
)

print(resp.json())

INFO:httpx:HTTP Request: GET http://localhost:8321/v1/vector-dbs "HTTP/1.1 200 OK"


{'data': [{'identifier': 'acme_docs_v2', 'provider_resource_id': 'acme_docs_v2', 'provider_id': 'faiss', 'type': 'vector_db', 'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2', 'embedding_dimension': 384, 'vector_db_name': 'acme_docs'}]}


## üîç Retrieve Specific Database Info

Get detailed information about our specific vector database, including its configuration and metadata.

In [28]:
resp = client._client.request(
    "GET",
    "/vector-dbs/acme_docs_v2"
)

print(resp.json())

INFO:httpx:HTTP Request: GET http://localhost:8321/v1/vector-dbs/acme_docs_v2 "HTTP/1.1 200 OK"


{'identifier': 'acme_docs_v2', 'provider_resource_id': 'acme_docs_v2', 'provider_id': 'faiss', 'type': 'vector_db', 'owner': None, 'source': 'via_register_api', 'embedding_model': 'sentence-transformers/all-MiniLM-L6-v2', 'embedding_dimension': 384, 'vector_db_name': 'acme_docs'}


## üìÑ Prepare Documents for Vector Storage

We create sample company policy documents and convert them into **Chunk** objects that LLAMA Stack can process. Each chunk contains:
- **content**: The actual text content
- **metadata**: Searchable metadata with a `document_id`

In [29]:
from llama_stack_client.types.vector_io_insert_params import Chunk

docs = [
    ("Acme ships globally in 3-5 business days.", {"title": "Shipping Policy"}),
    ("Returns are accepted within 30 days of purchase.", {"title": "Returns Policy"}),
    ("Support is available 24/7 via chat and email.", {"title": "Support"}),
]

# Convert to Chunk objects
chunks = []
for _, (content, metadata) in enumerate(docs):
    # Transform metadata to required format with document_id from title
    metadata = {"document_id": metadata["title"]}
    chunk = Chunk(
        content=content,  # Required[InterleavedContent]
        metadata=metadata,  # Required[Dict]
    )
    chunks.append(chunk)

## üì§ Insert Documents into Vector Database

Insert our prepared chunks into the vector database. LLAMA Stack will automatically:
- Generate embeddings using the specified model
- Store the vectors in the FAISS index
- Set a TTL (time-to-live) of 1 hour for the chunks

In [30]:
resp = client._client.request(
    "POST",
    "/vector-io/insert",
    json={
        "vector_db_id": "acme_docs_v2",
        "chunks": chunks,
        "ttl_seconds": 3600,  # optional
    }
)

print(resp.status_code)
print(resp.json())  # might be empty if API returns None

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/vector-io/insert "HTTP/1.1 200 OK"


200
None


## üîç Semantic Search Query

Perform a **semantic search** on our documents. The query "How long does Acme take to ship orders?" will be converted to an embedding and matched against stored document embeddings to find the most relevant content.

The results show the most relevant chunks ranked by semantic similarity, with metadata and content for each match.

In [31]:
query = "How long does Acme take to ship orders?"

resp = client._client.request(
    "POST",
    "/vector-io/query",   # endpoint for vector queries
    json={
        "vector_db_id": "acme_docs_v2",
        "query": query,
        "top_k": 5  # optional, number of results to return
    }
)

# Convert response to Python dictionary
data = resp.json()

# Loop through returned chunks
for i, chunk in enumerate(data.get("chunks", []), start=1):
    print(f"Chunk {i}:")
    print("  Metadata:", chunk.get("metadata"))
    print("  Content :", chunk.get("content"))
    print("-" * 50)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/vector-io/query "HTTP/1.1 200 OK"


Chunk 1:
  Metadata: {'document_id': 'Shipping Policy'}
  Content : Acme ships globally in 3-5 business days.
--------------------------------------------------
Chunk 2:
  Metadata: {'document_id': 'Shipping Policy'}
  Content : Acme ships globally in 3-5 business days.
--------------------------------------------------
Chunk 3:
  Metadata: {'document_id': 'Returns Policy'}
  Content : Returns are accepted within 30 days of purchase.
--------------------------------------------------
