llama-stack-mirror/docs/docs/building_applications/rag.mdx
Francisco Arceo ef4bc70bbe
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
feat: Enable setting a default embedding model in the stack (#3803)
# What does this PR do?

Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
- Unit tests
- Integration tests
- Simple example below:

Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:

```python
vs = client.vector_stores.create(
    extra_body={
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
        "embedding_dimension": 384,
    }
)
```

The `extra_body` is now unnecessary.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-14 18:25:13 -07:00

120 lines
4.8 KiB
Text

---
title: Retrieval Augmented Generation (RAG)
description: Build knowledge-enhanced AI applications with external document retrieval
sidebar_label: RAG (Retrieval Augmented Generation)
sidebar_position: 2
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Retrieval Augmented Generation (RAG)
RAG enables your applications to reference and recall information from external documents. Llama Stack makes Agentic RAG available through OpenAI's Responses API.
## Quick Start
### 1. Start the Server
In one terminal, start the Llama Stack server:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
### 2. Connect with OpenAI Client
In another terminal, use the standard OpenAI client with the Responses API:
```python
import io, requests
from openai import OpenAI
url = "https://www.paulgraham.com/greatwork.html"
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
# Create vector store - auto-detects default embedding model
vs = client.vector_stores.create()
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)
resp = client.responses.create(
model="gpt-4o",
input="How do you do great work? Use the existing knowledge_search tool.",
tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
include=["file_search_call.results"],
)
print(resp.output[-1].content[-1].text)
```
Which should give output like:
```
Doing great work is about more than just hard work and ambition; it involves combining several elements:
1. **Pursue What Excites You**: Engage in projects that are both ambitious and exciting to you. It's important to work on something you have a natural aptitude for and a deep interest in.
2. **Explore and Discover**: Great work often feels like a blend of discovery and creation. Focus on seeing possibilities and let ideas take their natural shape, rather than just executing a plan.
3. **Be Bold Yet Flexible**: Take bold steps in your work without over-planning. An adaptable approach that evolves with new ideas can often lead to breakthroughs.
4. **Work on Your Own Projects**: Develop a habit of working on projects of your own choosing, as these often lead to great achievements. These should be projects you find exciting and that challenge you intellectually.
5. **Be Earnest and Authentic**: Approach your work with earnestness and authenticity. Trying to impress others with affectation can be counterproductive, as genuine effort and intellectual honesty lead to better work outcomes.
6. **Build a Supportive Environment**: Work alongside great colleagues who inspire you and enhance your work. Surrounding yourself with motivating individuals creates a fertile environment for great work.
7. **Maintain High Morale**: High morale significantly impacts your ability to do great work. Stay optimistic and protect your mental well-being to maintain progress and momentum.
8. **Balance**: While hard work is essential, overworking can lead to diminishing returns. Balance periods of intensive work with rest to sustain productivity over time.
This approach shows that great work is less about following a strict formula and more about aligning your interests, ambition, and environment to foster creativity and innovation.
```
## Architecture Overview
Llama Stack provides OpenAI-compatible RAG capabilities through:
- **Vector Stores API**: OpenAI-compatible vector storage with automatic embedding model detection
- **Files API**: Document upload and processing using OpenAI's file format
- **Responses API**: Enhanced chat completions with agentic tool calling via file search
## Configuring Default Embedding Models
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your run.yaml like so:
```yaml
models:
- model_id: nomic-ai/nomic-embed-text-v1.5
provider_id: inline::sentence-transformers
metadata:
embedding_dimension: 768
default_configured: true
```
With this configuration:
- `client.vector_stores.create()` works without requiring embedding model parameters
- The system automatically uses the default model and its embedding dimension for any newly created vector store
- Only one model can be marked as `default_configured: true`
## Vector Store Operations
### Creating Vector Stores
You can create vector stores with automatic or explicit embedding model selection:
```python
# Automatic - uses default configured embedding model
vs = client.vector_stores.create()
# Explicit - specify embedding model when you need a specific one
vs = client.vector_stores.create(
extra_body={
"embedding_model": "nomic-ai/nomic-embed-text-v1.5",
"embedding_dimension": 768
}
)
```