phoenix-oss/llama-stack-mirror

Fork 1

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-16 06:27:58 +00:00

Ashwin Bharambe 7f834339ba

Integration Tests (Replay) / discover-tests (push) Successful in 3s

Details

Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped

Details

Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s

Details

Python Package Build Test / build (3.12) (push) Failing after 4s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s

Details

Test Llama Stack Build / generate-matrix (push) Successful in 11s

Details

Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s

Details

SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s

Details

Test External API and Providers / test-external (venv) (push) Failing after 14s

Details

Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s

Details

SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s

Details

Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s

Details

Unit Tests / unit-tests (3.13) (push) Failing after 14s

Details

Test Llama Stack Build / build-single-provider (push) Failing after 13s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s

Details

Unit Tests / unit-tests (3.12) (push) Failing after 16s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s

Details

Test Llama Stack Build / build (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s

Details

Python Package Build Test / build (3.13) (push) Failing after 53s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s

Details

Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s

Details

Pre-commit / pre-commit (push) Successful in 1m53s

Details

chore(misc): make tests and starter faster (#3042 )

A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.

- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`

2025-08-05 14:55:05 -07:00

7.6 KiB

Raw Permalink Blame History

inline::sqlite-vec

Description

SQLite-Vec is an inline vector database provider for Llama Stack. It allows you to store and query vectors directly within an SQLite database. That means you're not limited to storing vectors in memory or in a separate service.

Features

Lightweight and easy to use
Fully integrated with Llama Stacks
Uses disk-based storage for persistence, allowing for larger vector storage

Comparison to Faiss

The choice between Faiss and sqlite-vec should be made based on the needs of your application, as they have different strengths.

Choosing the Right Provider

Scenario	Recommended Tool	Reason
Online Analytical Processing (OLAP)	Faiss	Fast, in-memory searches
Online Transaction Processing (OLTP)	sqlite-vec	Frequent writes and reads
Frequent writes	sqlite-vec	Efficient disk-based storage and incremental indexing
Large datasets	sqlite-vec	Disk-based storage for larger vector storage
Datasets that can fit in memory, frequent reads	Faiss	Optimized for speed, indexing, and GPU acceleration

Empirical Example

Consider the histogram below in which 10,000 randomly generated strings were inserted in batches of 100 into both Faiss and sqlite-vec using client.tool_runtime.rag_tool.insert().

:alt: Comparison of SQLite-Vec and Faiss write times
:width: 400px

You will notice that the average write time for sqlite-vec was 788ms, compared to 47,640ms for Faiss. While the number is jarring, if you look at the distribution, you can see that it is rather uniformly spread across the [1500, 100000] interval.

Looking at each individual write in the order that the documents are inserted you'll see the increase in write speed as Faiss reindexes the vectors after each write.

:alt: Comparison of SQLite-Vec and Faiss write times
:width: 400px

In comparison, the read times for Faiss was on average 10% faster than sqlite-vec. The modes of the two distributions highlight the differences much further where Faiss will likely yield faster read performance.

:alt: Comparison of SQLite-Vec and Faiss read times
:width: 400px

Usage

To use sqlite-vec in your Llama Stack project, follow these steps:

Install the necessary dependencies.
Configure your Llama Stack project to use SQLite-Vec.
Start storing and querying vectors.

The SQLite-vec provider supports three search modes:

Vector Search (mode="vector"): Performs pure vector similarity search using the embeddings.
Keyword Search (mode="keyword"): Performs full-text search using SQLite's FTS5.
Hybrid Search (mode="hybrid"): Combines both vector and keyword search for better results. First performs keyword search to get candidate matches, then applies vector similarity search on those candidates.

Example with hybrid search:

response = await vector_io.query_chunks(
    vector_db_id="my_db",
    query="your query here",
    params={"mode": "hybrid", "max_chunks": 3, "score_threshold": 0.7},
)

# Using RRF ranker
response = await vector_io.query_chunks(
    vector_db_id="my_db",
    query="your query here",
    params={
        "mode": "hybrid",
        "max_chunks": 3,
        "score_threshold": 0.7,
        "ranker": {"type": "rrf", "impact_factor": 60.0},
    },
)

# Using weighted ranker
response = await vector_io.query_chunks(
    vector_db_id="my_db",
    query="your query here",
    params={
        "mode": "hybrid",
        "max_chunks": 3,
        "score_threshold": 0.7,
        "ranker": {"type": "weighted", "alpha": 0.7},  # 70% vector, 30% keyword
    },
)

Example with explicit vector search:

response = await vector_io.query_chunks(
    vector_db_id="my_db",
    query="your query here",
    params={"mode": "vector", "max_chunks": 3, "score_threshold": 0.7},
)

Example with keyword search:

response = await vector_io.query_chunks(
    vector_db_id="my_db",
    query="your query here",
    params={"mode": "keyword", "max_chunks": 3, "score_threshold": 0.7},
)

Supported Search Modes

The SQLite vector store supports three search modes:

Vector Search (mode="vector"): Uses vector similarity to find relevant chunks
Keyword Search (mode="keyword"): Uses keyword matching to find relevant chunks
Hybrid Search (mode="hybrid"): Combines both vector and keyword scores using a ranker

Hybrid Search

Hybrid search combines the strengths of both vector and keyword search by:

Computing vector similarity scores
Computing keyword match scores
Using a ranker to combine these scores

Two ranker types are supported:

RRF (Reciprocal Rank Fusion):
- Combines ranks from both vector and keyword results
- Uses an impact factor (default: 60.0) to control the weight of higher-ranked results
- Good for balancing between vector and keyword results
- The default impact factor of 60.0 comes from the original RRF paper by Cormack et al. (2009) ¹, which found this value to provide optimal performance across various retrieval tasks
Weighted:
- Linearly combines normalized vector and keyword scores
- Uses an alpha parameter (0-1) to control the blend:
  - alpha=0: Only use keyword scores
  - alpha=1: Only use vector scores
  - alpha=0.5: Equal weight to both (default)

Example using RAGQueryConfig with different search modes:

from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker

# Vector search
config = RAGQueryConfig(mode="vector", max_chunks=5)

# Keyword search
config = RAGQueryConfig(mode="keyword", max_chunks=5)

# Hybrid search with custom RRF ranker
config = RAGQueryConfig(
    mode="hybrid",
    max_chunks=5,
    ranker=RRFRanker(impact_factor=50.0),  # Custom impact factor
)

# Hybrid search with weighted ranker
config = RAGQueryConfig(
    mode="hybrid",
    max_chunks=5,
    ranker=WeightedRanker(alpha=0.7),  # 70% vector, 30% keyword
)

# Hybrid search with default RRF ranker
config = RAGQueryConfig(
    mode="hybrid", max_chunks=5
)  # Will use RRF with impact_factor=60.0

Note: The ranker configuration is only used in hybrid mode. For vector or keyword modes, the ranker parameter is ignored.

Installation

You can install SQLite-Vec using pip:

pip install sqlite-vec

Documentation

See sqlite-vec's GitHub repo for more details about sqlite-vec in general.

Configuration

Field	Type	Required	Default	Description
`db_path`	`<class 'str'>`	No		Path to the SQLite database file
`kvstore`	`utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig`	No	sqlite	Config for KV store backend (SQLite only for now)

Sample Configuration

db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/sqlite_vec.db
kvstore:
  type: sqlite
  db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/sqlite_vec_registry.db

Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 758-759). ↩︎

7.6 KiB Raw Permalink Blame History

inline::sqlite-vec

Description

Features

Comparison to Faiss

Choosing the Right Provider

Empirical Example

Usage

Supported Search Modes

Hybrid Search

Installation

Documentation

Configuration

Sample Configuration

7.6 KiB

Raw Permalink Blame History