mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-23 00:27:26 +00:00
**This PR changes configurations in a backward incompatible way.** Run configs today repeat full SQLite/Postgres snippets everywhere a store is needed, which means duplicated credentials, extra connection pools, and lots of drift between files. This PR introduces named storage backends so the stack and providers can share a single catalog and reference those backends by name. ## Key Changes - Add `storage.backends` to `StackRunConfig`, register each KV/SQL backend once at startup, and validate that references point to the right family. - Move server stores under `storage.stores` with lightweight references (backend + namespace/table) instead of full configs. - Update every provider/config/doc to use the new reference style; docs/codegen now surface the simplified YAML. ## Migration Before: ```yaml metadata_store: type: sqlite db_path: ~/.llama/distributions/foo/registry.db inference_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} conversations_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} ``` After: ```yaml storage: backends: kv_default: type: kv_sqlite db_path: ~/.llama/distributions/foo/kvstore.db sql_default: type: sql_postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} stores: metadata: backend: kv_default namespace: registry inference: backend: sql_default table_name: inference_store max_write_queue_size: 10000 num_writers: 4 conversations: backend: sql_default table_name: openai_conversations ``` Provider configs follow the same pattern—for example, a Chroma vector adapter switches from: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} kvstore: type: sqlite db_path: ~/.llama/distributions/foo/chroma.db ``` to: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} persistence: backend: kv_default namespace: vector_io::chroma_remote ``` Once the backends are declared, everything else just points at them, so rotating credentials or swapping to Postgres happens in one place and the stack reuses a single connection pool.
420 lines
15 KiB
Text
420 lines
15 KiB
Text
---
|
|
description: |
|
|
[SQLite-Vec](https://github.com/asg017/sqlite-vec) is an inline vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly within an SQLite database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
|
|
- Lightweight and easy to use
|
|
- Fully integrated with Llama Stacks
|
|
- Uses disk-based storage for persistence, allowing for larger vector storage
|
|
|
|
### Comparison to Faiss
|
|
|
|
The choice between Faiss and sqlite-vec should be made based on the needs of your application,
|
|
as they have different strengths.
|
|
|
|
#### Choosing the Right Provider
|
|
|
|
Scenario | Recommended Tool | Reason
|
|
-- |-----------------| --
|
|
Online Analytical Processing (OLAP) | Faiss | Fast, in-memory searches
|
|
Online Transaction Processing (OLTP) | sqlite-vec | Frequent writes and reads
|
|
Frequent writes | sqlite-vec | Efficient disk-based storage and incremental indexing
|
|
Large datasets | sqlite-vec | Disk-based storage for larger vector storage
|
|
Datasets that can fit in memory, frequent reads | Faiss | Optimized for speed, indexing, and GPU acceleration
|
|
|
|
#### Empirical Example
|
|
|
|
Consider the histogram below in which 10,000 randomly generated strings were inserted
|
|
in batches of 100 into both Faiss and sqlite-vec using `client.tool_runtime.rag_tool.insert()`.
|
|
|
|
```{image} ../../../../_static/providers/vector_io/write_time_comparison_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss write times
|
|
:width: 400px
|
|
```
|
|
|
|
You will notice that the average write time for `sqlite-vec` was 788ms, compared to
|
|
47,640ms for Faiss. While the number is jarring, if you look at the distribution, you can see that it is rather
|
|
uniformly spread across the [1500, 100000] interval.
|
|
|
|
Looking at each individual write in the order that the documents are inserted you'll see the increase in
|
|
write speed as Faiss reindexes the vectors after each write.
|
|
```{image} ../../../../_static/providers/vector_io/write_time_sequence_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss write times
|
|
:width: 400px
|
|
```
|
|
|
|
In comparison, the read times for Faiss was on average 10% faster than sqlite-vec.
|
|
The modes of the two distributions highlight the differences much further where Faiss
|
|
will likely yield faster read performance.
|
|
|
|
```{image} ../../../../_static/providers/vector_io/read_time_comparison_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss read times
|
|
:width: 400px
|
|
```
|
|
|
|
## Usage
|
|
|
|
To use sqlite-vec in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use SQLite-Vec.
|
|
3. Start storing and querying vectors.
|
|
|
|
The SQLite-vec provider supports three search modes:
|
|
|
|
1. **Vector Search** (`mode="vector"`): Performs pure vector similarity search using the embeddings.
|
|
2. **Keyword Search** (`mode="keyword"`): Performs full-text search using SQLite's FTS5.
|
|
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword search for better results. First performs keyword search to get candidate matches, then applies vector similarity search on those candidates.
|
|
|
|
Example with hybrid search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "hybrid", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
|
|
# Using RRF ranker
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={
|
|
"mode": "hybrid",
|
|
"max_chunks": 3,
|
|
"score_threshold": 0.7,
|
|
"ranker": {"type": "rrf", "impact_factor": 60.0},
|
|
},
|
|
)
|
|
|
|
# Using weighted ranker
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={
|
|
"mode": "hybrid",
|
|
"max_chunks": 3,
|
|
"score_threshold": 0.7,
|
|
"ranker": {"type": "weighted", "alpha": 0.7}, # 70% vector, 30% keyword
|
|
},
|
|
)
|
|
```
|
|
|
|
Example with explicit vector search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "vector", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
```
|
|
|
|
Example with keyword search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "keyword", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
```
|
|
|
|
## Supported Search Modes
|
|
|
|
The SQLite vector store supports three search modes:
|
|
|
|
1. **Vector Search** (`mode="vector"`): Uses vector similarity to find relevant chunks
|
|
2. **Keyword Search** (`mode="keyword"`): Uses keyword matching to find relevant chunks
|
|
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword scores using a ranker
|
|
|
|
### Hybrid Search
|
|
|
|
Hybrid search combines the strengths of both vector and keyword search by:
|
|
- Computing vector similarity scores
|
|
- Computing keyword match scores
|
|
- Using a ranker to combine these scores
|
|
|
|
Two ranker types are supported:
|
|
|
|
1. **RRF (Reciprocal Rank Fusion)**:
|
|
- Combines ranks from both vector and keyword results
|
|
- Uses an impact factor (default: 60.0) to control the weight of higher-ranked results
|
|
- Good for balancing between vector and keyword results
|
|
- The default impact factor of 60.0 comes from the original RRF paper by Cormack et al. (2009) [^1], which found this value to provide optimal performance across various retrieval tasks
|
|
|
|
2. **Weighted**:
|
|
- Linearly combines normalized vector and keyword scores
|
|
- Uses an alpha parameter (0-1) to control the blend:
|
|
- alpha=0: Only use keyword scores
|
|
- alpha=1: Only use vector scores
|
|
- alpha=0.5: Equal weight to both (default)
|
|
|
|
Example using RAGQueryConfig with different search modes:
|
|
|
|
```python
|
|
from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
|
|
|
|
# Vector search
|
|
config = RAGQueryConfig(mode="vector", max_chunks=5)
|
|
|
|
# Keyword search
|
|
config = RAGQueryConfig(mode="keyword", max_chunks=5)
|
|
|
|
# Hybrid search with custom RRF ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid",
|
|
max_chunks=5,
|
|
ranker=RRFRanker(impact_factor=50.0), # Custom impact factor
|
|
)
|
|
|
|
# Hybrid search with weighted ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid",
|
|
max_chunks=5,
|
|
ranker=WeightedRanker(alpha=0.7), # 70% vector, 30% keyword
|
|
)
|
|
|
|
# Hybrid search with default RRF ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid", max_chunks=5
|
|
) # Will use RRF with impact_factor=60.0
|
|
```
|
|
|
|
Note: The ranker configuration is only used in hybrid mode. For vector or keyword modes, the ranker parameter is ignored.
|
|
|
|
## Installation
|
|
|
|
You can install SQLite-Vec using pip:
|
|
|
|
```bash
|
|
pip install sqlite-vec
|
|
```
|
|
|
|
## Documentation
|
|
|
|
See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) for more details about sqlite-vec in general.
|
|
|
|
[^1]: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). [Reciprocal rank fusion outperforms condorcet and individual rank learning methods](https://dl.acm.org/doi/10.1145/1571941.1572114). In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 758-759).
|
|
sidebar_label: Sqlite-Vec
|
|
title: inline::sqlite-vec
|
|
---
|
|
|
|
# inline::sqlite-vec
|
|
|
|
## Description
|
|
|
|
|
|
[SQLite-Vec](https://github.com/asg017/sqlite-vec) is an inline vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly within an SQLite database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
|
|
- Lightweight and easy to use
|
|
- Fully integrated with Llama Stacks
|
|
- Uses disk-based storage for persistence, allowing for larger vector storage
|
|
|
|
### Comparison to Faiss
|
|
|
|
The choice between Faiss and sqlite-vec should be made based on the needs of your application,
|
|
as they have different strengths.
|
|
|
|
#### Choosing the Right Provider
|
|
|
|
Scenario | Recommended Tool | Reason
|
|
-- |-----------------| --
|
|
Online Analytical Processing (OLAP) | Faiss | Fast, in-memory searches
|
|
Online Transaction Processing (OLTP) | sqlite-vec | Frequent writes and reads
|
|
Frequent writes | sqlite-vec | Efficient disk-based storage and incremental indexing
|
|
Large datasets | sqlite-vec | Disk-based storage for larger vector storage
|
|
Datasets that can fit in memory, frequent reads | Faiss | Optimized for speed, indexing, and GPU acceleration
|
|
|
|
#### Empirical Example
|
|
|
|
Consider the histogram below in which 10,000 randomly generated strings were inserted
|
|
in batches of 100 into both Faiss and sqlite-vec using `client.tool_runtime.rag_tool.insert()`.
|
|
|
|
```{image} ../../../../_static/providers/vector_io/write_time_comparison_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss write times
|
|
:width: 400px
|
|
```
|
|
|
|
You will notice that the average write time for `sqlite-vec` was 788ms, compared to
|
|
47,640ms for Faiss. While the number is jarring, if you look at the distribution, you can see that it is rather
|
|
uniformly spread across the [1500, 100000] interval.
|
|
|
|
Looking at each individual write in the order that the documents are inserted you'll see the increase in
|
|
write speed as Faiss reindexes the vectors after each write.
|
|
```{image} ../../../../_static/providers/vector_io/write_time_sequence_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss write times
|
|
:width: 400px
|
|
```
|
|
|
|
In comparison, the read times for Faiss was on average 10% faster than sqlite-vec.
|
|
The modes of the two distributions highlight the differences much further where Faiss
|
|
will likely yield faster read performance.
|
|
|
|
```{image} ../../../../_static/providers/vector_io/read_time_comparison_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss read times
|
|
:width: 400px
|
|
```
|
|
|
|
## Usage
|
|
|
|
To use sqlite-vec in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use SQLite-Vec.
|
|
3. Start storing and querying vectors.
|
|
|
|
The SQLite-vec provider supports three search modes:
|
|
|
|
1. **Vector Search** (`mode="vector"`): Performs pure vector similarity search using the embeddings.
|
|
2. **Keyword Search** (`mode="keyword"`): Performs full-text search using SQLite's FTS5.
|
|
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword search for better results. First performs keyword search to get candidate matches, then applies vector similarity search on those candidates.
|
|
|
|
Example with hybrid search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "hybrid", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
|
|
# Using RRF ranker
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={
|
|
"mode": "hybrid",
|
|
"max_chunks": 3,
|
|
"score_threshold": 0.7,
|
|
"ranker": {"type": "rrf", "impact_factor": 60.0},
|
|
},
|
|
)
|
|
|
|
# Using weighted ranker
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={
|
|
"mode": "hybrid",
|
|
"max_chunks": 3,
|
|
"score_threshold": 0.7,
|
|
"ranker": {"type": "weighted", "alpha": 0.7}, # 70% vector, 30% keyword
|
|
},
|
|
)
|
|
```
|
|
|
|
Example with explicit vector search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "vector", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
```
|
|
|
|
Example with keyword search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "keyword", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
```
|
|
|
|
## Supported Search Modes
|
|
|
|
The SQLite vector store supports three search modes:
|
|
|
|
1. **Vector Search** (`mode="vector"`): Uses vector similarity to find relevant chunks
|
|
2. **Keyword Search** (`mode="keyword"`): Uses keyword matching to find relevant chunks
|
|
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword scores using a ranker
|
|
|
|
### Hybrid Search
|
|
|
|
Hybrid search combines the strengths of both vector and keyword search by:
|
|
- Computing vector similarity scores
|
|
- Computing keyword match scores
|
|
- Using a ranker to combine these scores
|
|
|
|
Two ranker types are supported:
|
|
|
|
1. **RRF (Reciprocal Rank Fusion)**:
|
|
- Combines ranks from both vector and keyword results
|
|
- Uses an impact factor (default: 60.0) to control the weight of higher-ranked results
|
|
- Good for balancing between vector and keyword results
|
|
- The default impact factor of 60.0 comes from the original RRF paper by Cormack et al. (2009) [^1], which found this value to provide optimal performance across various retrieval tasks
|
|
|
|
2. **Weighted**:
|
|
- Linearly combines normalized vector and keyword scores
|
|
- Uses an alpha parameter (0-1) to control the blend:
|
|
- alpha=0: Only use keyword scores
|
|
- alpha=1: Only use vector scores
|
|
- alpha=0.5: Equal weight to both (default)
|
|
|
|
Example using RAGQueryConfig with different search modes:
|
|
|
|
```python
|
|
from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
|
|
|
|
# Vector search
|
|
config = RAGQueryConfig(mode="vector", max_chunks=5)
|
|
|
|
# Keyword search
|
|
config = RAGQueryConfig(mode="keyword", max_chunks=5)
|
|
|
|
# Hybrid search with custom RRF ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid",
|
|
max_chunks=5,
|
|
ranker=RRFRanker(impact_factor=50.0), # Custom impact factor
|
|
)
|
|
|
|
# Hybrid search with weighted ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid",
|
|
max_chunks=5,
|
|
ranker=WeightedRanker(alpha=0.7), # 70% vector, 30% keyword
|
|
)
|
|
|
|
# Hybrid search with default RRF ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid", max_chunks=5
|
|
) # Will use RRF with impact_factor=60.0
|
|
```
|
|
|
|
Note: The ranker configuration is only used in hybrid mode. For vector or keyword modes, the ranker parameter is ignored.
|
|
|
|
## Installation
|
|
|
|
You can install SQLite-Vec using pip:
|
|
|
|
```bash
|
|
pip install sqlite-vec
|
|
```
|
|
|
|
## Documentation
|
|
|
|
See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) for more details about sqlite-vec in general.
|
|
|
|
[^1]: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). [Reciprocal rank fusion outperforms condorcet and individual rank learning methods](https://dl.acm.org/doi/10.1145/1571941.1572114). In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 758-759).
|
|
|
|
|
|
## Configuration
|
|
|
|
| Field | Type | Required | Default | Description |
|
|
|-------|------|----------|---------|-------------|
|
|
| `db_path` | `<class 'str'>` | No | | Path to the SQLite database file |
|
|
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | Config for KV store backend (SQLite only for now) |
|
|
|
|
## Sample Configuration
|
|
|
|
```yaml
|
|
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/sqlite_vec.db
|
|
persistence:
|
|
namespace: vector_io::sqlite_vec
|
|
backend: kv_default
|
|
```
|