mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-02 04:28:46 +00:00
Some checks failed
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 18s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-single-provider (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 35s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 41s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 2m3s
645 lines
22 KiB
Python
645 lines
22 KiB
Python
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
# All rights reserved.
|
|
#
|
|
# This source code is licensed under the terms described in the LICENSE file in
|
|
# the root directory of this source tree.
|
|
|
|
|
|
from llama_stack.providers.datatypes import (
|
|
AdapterSpec,
|
|
Api,
|
|
InlineProviderSpec,
|
|
ProviderSpec,
|
|
remote_provider_spec,
|
|
)
|
|
|
|
|
|
def available_providers() -> list[ProviderSpec]:
|
|
return [
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::meta-reference",
|
|
pip_packages=["faiss-cpu"],
|
|
module="llama_stack.providers.inline.vector_io.faiss",
|
|
config_class="llama_stack.providers.inline.vector_io.faiss.FaissVectorIOConfig",
|
|
deprecation_warning="Please use the `inline::faiss` provider instead.",
|
|
api_dependencies=[Api.inference],
|
|
optional_api_dependencies=[Api.files],
|
|
description="Meta's reference implementation of a vector database.",
|
|
),
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::faiss",
|
|
pip_packages=["faiss-cpu"],
|
|
module="llama_stack.providers.inline.vector_io.faiss",
|
|
config_class="llama_stack.providers.inline.vector_io.faiss.FaissVectorIOConfig",
|
|
api_dependencies=[Api.inference],
|
|
optional_api_dependencies=[Api.files],
|
|
description="""
|
|
[Faiss](https://github.com/facebookresearch/faiss) is an inline vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly in memory.
|
|
That means you'll get fast and efficient vector retrieval.
|
|
|
|
## Features
|
|
|
|
- Lightweight and easy to use
|
|
- Fully integrated with Llama Stack
|
|
- GPU support
|
|
|
|
## Usage
|
|
|
|
To use Faiss in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use Faiss.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
You can install Faiss using pip:
|
|
|
|
```bash
|
|
pip install faiss-cpu
|
|
```
|
|
## Documentation
|
|
See [Faiss' documentation](https://faiss.ai/) or the [Faiss Wiki](https://github.com/facebookresearch/faiss/wiki) for
|
|
more details about Faiss in general.
|
|
""",
|
|
),
|
|
# NOTE: sqlite-vec cannot be bundled into the container image because it does not have a
|
|
# source distribution and the wheels are not available for all platforms.
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::sqlite-vec",
|
|
pip_packages=["sqlite-vec"],
|
|
module="llama_stack.providers.inline.vector_io.sqlite_vec",
|
|
config_class="llama_stack.providers.inline.vector_io.sqlite_vec.SQLiteVectorIOConfig",
|
|
api_dependencies=[Api.inference],
|
|
optional_api_dependencies=[Api.files],
|
|
description="""
|
|
[SQLite-Vec](https://github.com/asg017/sqlite-vec) is an inline vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly within an SQLite database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
|
|
- Lightweight and easy to use
|
|
- Fully integrated with Llama Stacks
|
|
- Uses disk-based storage for persistence, allowing for larger vector storage
|
|
|
|
### Comparison to Faiss
|
|
|
|
The choice between Faiss and sqlite-vec should be made based on the needs of your application,
|
|
as they have different strengths.
|
|
|
|
#### Choosing the Right Provider
|
|
|
|
Scenario | Recommended Tool | Reason
|
|
-- |-----------------| --
|
|
Online Analytical Processing (OLAP) | Faiss | Fast, in-memory searches
|
|
Online Transaction Processing (OLTP) | sqlite-vec | Frequent writes and reads
|
|
Frequent writes | sqlite-vec | Efficient disk-based storage and incremental indexing
|
|
Large datasets | sqlite-vec | Disk-based storage for larger vector storage
|
|
Datasets that can fit in memory, frequent reads | Faiss | Optimized for speed, indexing, and GPU acceleration
|
|
|
|
#### Empirical Example
|
|
|
|
Consider the histogram below in which 10,000 randomly generated strings were inserted
|
|
in batches of 100 into both Faiss and sqlite-vec using `client.tool_runtime.rag_tool.insert()`.
|
|
|
|
```{image} ../../../../_static/providers/vector_io/write_time_comparison_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss write times
|
|
:width: 400px
|
|
```
|
|
|
|
You will notice that the average write time for `sqlite-vec` was 788ms, compared to
|
|
47,640ms for Faiss. While the number is jarring, if you look at the distribution, you can see that it is rather
|
|
uniformly spread across the [1500, 100000] interval.
|
|
|
|
Looking at each individual write in the order that the documents are inserted you'll see the increase in
|
|
write speed as Faiss reindexes the vectors after each write.
|
|
```{image} ../../../../_static/providers/vector_io/write_time_sequence_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss write times
|
|
:width: 400px
|
|
```
|
|
|
|
In comparison, the read times for Faiss was on average 10% faster than sqlite-vec.
|
|
The modes of the two distributions highlight the differences much further where Faiss
|
|
will likely yield faster read performance.
|
|
|
|
```{image} ../../../../_static/providers/vector_io/read_time_comparison_sqlite-vec-faiss.png
|
|
:alt: Comparison of SQLite-Vec and Faiss read times
|
|
:width: 400px
|
|
```
|
|
|
|
## Usage
|
|
|
|
To use sqlite-vec in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use SQLite-Vec.
|
|
3. Start storing and querying vectors.
|
|
|
|
The SQLite-vec provider supports three search modes:
|
|
|
|
1. **Vector Search** (`mode="vector"`): Performs pure vector similarity search using the embeddings.
|
|
2. **Keyword Search** (`mode="keyword"`): Performs full-text search using SQLite's FTS5.
|
|
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword search for better results. First performs keyword search to get candidate matches, then applies vector similarity search on those candidates.
|
|
|
|
Example with hybrid search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "hybrid", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
|
|
# Using RRF ranker
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={
|
|
"mode": "hybrid",
|
|
"max_chunks": 3,
|
|
"score_threshold": 0.7,
|
|
"ranker": {"type": "rrf", "impact_factor": 60.0},
|
|
},
|
|
)
|
|
|
|
# Using weighted ranker
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={
|
|
"mode": "hybrid",
|
|
"max_chunks": 3,
|
|
"score_threshold": 0.7,
|
|
"ranker": {"type": "weighted", "alpha": 0.7}, # 70% vector, 30% keyword
|
|
},
|
|
)
|
|
```
|
|
|
|
Example with explicit vector search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "vector", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
```
|
|
|
|
Example with keyword search:
|
|
```python
|
|
response = await vector_io.query_chunks(
|
|
vector_db_id="my_db",
|
|
query="your query here",
|
|
params={"mode": "keyword", "max_chunks": 3, "score_threshold": 0.7},
|
|
)
|
|
```
|
|
|
|
## Supported Search Modes
|
|
|
|
The SQLite vector store supports three search modes:
|
|
|
|
1. **Vector Search** (`mode="vector"`): Uses vector similarity to find relevant chunks
|
|
2. **Keyword Search** (`mode="keyword"`): Uses keyword matching to find relevant chunks
|
|
3. **Hybrid Search** (`mode="hybrid"`): Combines both vector and keyword scores using a ranker
|
|
|
|
### Hybrid Search
|
|
|
|
Hybrid search combines the strengths of both vector and keyword search by:
|
|
- Computing vector similarity scores
|
|
- Computing keyword match scores
|
|
- Using a ranker to combine these scores
|
|
|
|
Two ranker types are supported:
|
|
|
|
1. **RRF (Reciprocal Rank Fusion)**:
|
|
- Combines ranks from both vector and keyword results
|
|
- Uses an impact factor (default: 60.0) to control the weight of higher-ranked results
|
|
- Good for balancing between vector and keyword results
|
|
- The default impact factor of 60.0 comes from the original RRF paper by Cormack et al. (2009) [^1], which found this value to provide optimal performance across various retrieval tasks
|
|
|
|
2. **Weighted**:
|
|
- Linearly combines normalized vector and keyword scores
|
|
- Uses an alpha parameter (0-1) to control the blend:
|
|
- alpha=0: Only use keyword scores
|
|
- alpha=1: Only use vector scores
|
|
- alpha=0.5: Equal weight to both (default)
|
|
|
|
Example using RAGQueryConfig with different search modes:
|
|
|
|
```python
|
|
from llama_stack.apis.tools import RAGQueryConfig, RRFRanker, WeightedRanker
|
|
|
|
# Vector search
|
|
config = RAGQueryConfig(mode="vector", max_chunks=5)
|
|
|
|
# Keyword search
|
|
config = RAGQueryConfig(mode="keyword", max_chunks=5)
|
|
|
|
# Hybrid search with custom RRF ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid",
|
|
max_chunks=5,
|
|
ranker=RRFRanker(impact_factor=50.0), # Custom impact factor
|
|
)
|
|
|
|
# Hybrid search with weighted ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid",
|
|
max_chunks=5,
|
|
ranker=WeightedRanker(alpha=0.7), # 70% vector, 30% keyword
|
|
)
|
|
|
|
# Hybrid search with default RRF ranker
|
|
config = RAGQueryConfig(
|
|
mode="hybrid", max_chunks=5
|
|
) # Will use RRF with impact_factor=60.0
|
|
```
|
|
|
|
Note: The ranker configuration is only used in hybrid mode. For vector or keyword modes, the ranker parameter is ignored.
|
|
|
|
## Installation
|
|
|
|
You can install SQLite-Vec using pip:
|
|
|
|
```bash
|
|
pip install sqlite-vec
|
|
```
|
|
|
|
## Documentation
|
|
|
|
See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) for more details about sqlite-vec in general.
|
|
|
|
[^1]: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). [Reciprocal rank fusion outperforms condorcet and individual rank learning methods](https://dl.acm.org/doi/10.1145/1571941.1572114). In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 758-759).
|
|
""",
|
|
),
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::sqlite_vec",
|
|
pip_packages=["sqlite-vec"],
|
|
module="llama_stack.providers.inline.vector_io.sqlite_vec",
|
|
config_class="llama_stack.providers.inline.vector_io.sqlite_vec.SQLiteVectorIOConfig",
|
|
deprecation_warning="Please use the `inline::sqlite-vec` provider (notice the hyphen instead of underscore) instead.",
|
|
api_dependencies=[Api.inference],
|
|
optional_api_dependencies=[Api.files],
|
|
description="""
|
|
Please refer to the sqlite-vec provider documentation.
|
|
""",
|
|
),
|
|
remote_provider_spec(
|
|
Api.vector_io,
|
|
AdapterSpec(
|
|
adapter_type="chromadb",
|
|
pip_packages=["chromadb-client"],
|
|
module="llama_stack.providers.remote.vector_io.chroma",
|
|
config_class="llama_stack.providers.remote.vector_io.chroma.ChromaVectorIOConfig",
|
|
description="""
|
|
[Chroma](https://www.trychroma.com/) is an inline and remote vector
|
|
database provider for Llama Stack. It allows you to store and query vectors directly within a Chroma database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
Chroma supports:
|
|
- Store embeddings and their metadata
|
|
- Vector search
|
|
- Full-text search
|
|
- Document storage
|
|
- Metadata filtering
|
|
- Multi-modal retrieval
|
|
|
|
## Usage
|
|
|
|
To use Chrome in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use chroma.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
You can install chroma using pip:
|
|
|
|
```bash
|
|
pip install chromadb
|
|
```
|
|
|
|
## Documentation
|
|
See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introduction) for more details about Chroma in general.
|
|
""",
|
|
),
|
|
api_dependencies=[Api.inference],
|
|
),
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::chromadb",
|
|
pip_packages=["chromadb"],
|
|
module="llama_stack.providers.inline.vector_io.chroma",
|
|
config_class="llama_stack.providers.inline.vector_io.chroma.ChromaVectorIOConfig",
|
|
api_dependencies=[Api.inference],
|
|
description="""
|
|
[Chroma](https://www.trychroma.com/) is an inline and remote vector
|
|
database provider for Llama Stack. It allows you to store and query vectors directly within a Chroma database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
Chroma supports:
|
|
- Store embeddings and their metadata
|
|
- Vector search
|
|
- Full-text search
|
|
- Document storage
|
|
- Metadata filtering
|
|
- Multi-modal retrieval
|
|
|
|
## Usage
|
|
|
|
To use Chrome in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use chroma.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
You can install chroma using pip:
|
|
|
|
```bash
|
|
pip install chromadb
|
|
```
|
|
|
|
## Documentation
|
|
See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introduction) for more details about Chroma in general.
|
|
|
|
""",
|
|
),
|
|
remote_provider_spec(
|
|
Api.vector_io,
|
|
AdapterSpec(
|
|
adapter_type="pgvector",
|
|
pip_packages=["psycopg2-binary"],
|
|
module="llama_stack.providers.remote.vector_io.pgvector",
|
|
config_class="llama_stack.providers.remote.vector_io.pgvector.PGVectorVectorIOConfig",
|
|
description="""
|
|
[PGVector](https://github.com/pgvector/pgvector) is a remote vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly in memory.
|
|
That means you'll get fast and efficient vector retrieval.
|
|
|
|
## Features
|
|
|
|
- Easy to use
|
|
- Fully integrated with Llama Stack
|
|
|
|
## Usage
|
|
|
|
To use PGVector in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use Faiss.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
You can install PGVector using docker:
|
|
|
|
```bash
|
|
docker pull pgvector/pgvector:pg17
|
|
```
|
|
## Documentation
|
|
See [PGVector's documentation](https://github.com/pgvector/pgvector) for more details about PGVector in general.
|
|
""",
|
|
),
|
|
api_dependencies=[Api.inference],
|
|
),
|
|
remote_provider_spec(
|
|
Api.vector_io,
|
|
AdapterSpec(
|
|
adapter_type="weaviate",
|
|
pip_packages=["weaviate-client"],
|
|
module="llama_stack.providers.remote.vector_io.weaviate",
|
|
config_class="llama_stack.providers.remote.vector_io.weaviate.WeaviateVectorIOConfig",
|
|
provider_data_validator="llama_stack.providers.remote.vector_io.weaviate.WeaviateRequestProviderData",
|
|
description="""
|
|
[Weaviate](https://weaviate.io/) is a vector database provider for Llama Stack.
|
|
It allows you to store and query vectors directly within a Weaviate database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
Weaviate supports:
|
|
- Store embeddings and their metadata
|
|
- Vector search
|
|
- Full-text search
|
|
- Hybrid search
|
|
- Document storage
|
|
- Metadata filtering
|
|
- Multi-modal retrieval
|
|
|
|
## Usage
|
|
|
|
To use Weaviate in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use chroma.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
To install Weaviate see the [Weaviate quickstart documentation](https://weaviate.io/developers/weaviate/quickstart).
|
|
|
|
## Documentation
|
|
See [Weaviate's documentation](https://weaviate.io/developers/weaviate) for more details about Weaviate in general.
|
|
""",
|
|
),
|
|
api_dependencies=[Api.inference],
|
|
),
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::qdrant",
|
|
pip_packages=["qdrant-client"],
|
|
module="llama_stack.providers.inline.vector_io.qdrant",
|
|
config_class="llama_stack.providers.inline.vector_io.qdrant.QdrantVectorIOConfig",
|
|
api_dependencies=[Api.inference],
|
|
description=r"""
|
|
[Qdrant](https://qdrant.tech/documentation/) is an inline and remote vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly in memory.
|
|
That means you'll get fast and efficient vector retrieval.
|
|
|
|
> By default, Qdrant stores vectors in RAM, delivering incredibly fast access for datasets that fit comfortably in
|
|
> memory. But when your dataset exceeds RAM capacity, Qdrant offers Memmap as an alternative.
|
|
>
|
|
> \[[An Introduction to Vector Databases](https://qdrant.tech/articles/what-is-a-vector-database/)\]
|
|
|
|
|
|
|
|
## Features
|
|
|
|
- Lightweight and easy to use
|
|
- Fully integrated with Llama Stack
|
|
- Apache 2.0 license terms
|
|
- Store embeddings and their metadata
|
|
- Supports search by
|
|
[Keyword](https://qdrant.tech/articles/qdrant-introduces-full-text-filters-and-indexes/)
|
|
and [Hybrid](https://qdrant.tech/articles/hybrid-search/#building-a-hybrid-search-system-in-qdrant) search
|
|
- [Multilingual and Multimodal retrieval](https://qdrant.tech/documentation/multimodal-search/)
|
|
- [Medatata filtering](https://qdrant.tech/articles/vector-search-filtering/)
|
|
- [GPU support](https://qdrant.tech/documentation/guides/running-with-gpu/)
|
|
|
|
## Usage
|
|
|
|
To use Qdrant in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use Qdrant.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
You can install Qdrant using docker:
|
|
|
|
```bash
|
|
docker pull qdrant/qdrant
|
|
```
|
|
## Documentation
|
|
See the [Qdrant documentation](https://qdrant.tech/documentation/) for more details about Qdrant in general.
|
|
""",
|
|
),
|
|
remote_provider_spec(
|
|
Api.vector_io,
|
|
AdapterSpec(
|
|
adapter_type="qdrant",
|
|
pip_packages=["qdrant-client"],
|
|
module="llama_stack.providers.remote.vector_io.qdrant",
|
|
config_class="llama_stack.providers.remote.vector_io.qdrant.QdrantVectorIOConfig",
|
|
description="""
|
|
Please refer to the inline provider documentation.
|
|
""",
|
|
),
|
|
api_dependencies=[Api.inference],
|
|
),
|
|
remote_provider_spec(
|
|
Api.vector_io,
|
|
AdapterSpec(
|
|
adapter_type="milvus",
|
|
pip_packages=["pymilvus[marshmallow<3.13.0]"],
|
|
module="llama_stack.providers.remote.vector_io.milvus",
|
|
config_class="llama_stack.providers.remote.vector_io.milvus.MilvusVectorIOConfig",
|
|
description="""
|
|
[Milvus](https://milvus.io/) is an inline and remote vector database provider for Llama Stack. It
|
|
allows you to store and query vectors directly within a Milvus database.
|
|
That means you're not limited to storing vectors in memory or in a separate service.
|
|
|
|
## Features
|
|
|
|
- Easy to use
|
|
- Fully integrated with Llama Stack
|
|
|
|
## Usage
|
|
|
|
To use Milvus in your Llama Stack project, follow these steps:
|
|
|
|
1. Install the necessary dependencies.
|
|
2. Configure your Llama Stack project to use Milvus.
|
|
3. Start storing and querying vectors.
|
|
|
|
## Installation
|
|
|
|
You can install Milvus using pymilvus:
|
|
|
|
```bash
|
|
pip install pymilvus
|
|
```
|
|
|
|
## Configuration
|
|
|
|
In Llama Stack, Milvus can be configured in two ways:
|
|
- **Inline (Local) Configuration** - Uses Milvus-Lite for local storage
|
|
- **Remote Configuration** - Connects to a remote Milvus server
|
|
|
|
### Inline (Local) Configuration
|
|
|
|
The simplest method is local configuration, which requires setting `db_path`, a path for locally storing Milvus-Lite files:
|
|
|
|
```yaml
|
|
vector_io:
|
|
- provider_id: milvus
|
|
provider_type: inline::milvus
|
|
config:
|
|
db_path: ~/.llama/distributions/together/milvus_store.db
|
|
```
|
|
|
|
### Remote Configuration
|
|
|
|
Remote configuration is suitable for larger data storage requirements:
|
|
|
|
#### Standard Remote Connection
|
|
|
|
```yaml
|
|
vector_io:
|
|
- provider_id: milvus
|
|
provider_type: remote::milvus
|
|
config:
|
|
uri: "http://<host>:<port>"
|
|
token: "<user>:<password>"
|
|
```
|
|
|
|
#### TLS-Enabled Remote Connection (One-way TLS)
|
|
|
|
For connections to Milvus instances with one-way TLS enabled:
|
|
|
|
```yaml
|
|
vector_io:
|
|
- provider_id: milvus
|
|
provider_type: remote::milvus
|
|
config:
|
|
uri: "https://<host>:<port>"
|
|
token: "<user>:<password>"
|
|
secure: True
|
|
server_pem_path: "/path/to/server.pem"
|
|
```
|
|
|
|
#### Mutual TLS (mTLS) Remote Connection
|
|
|
|
For connections to Milvus instances with mutual TLS (mTLS) enabled:
|
|
|
|
```yaml
|
|
vector_io:
|
|
- provider_id: milvus
|
|
provider_type: remote::milvus
|
|
config:
|
|
uri: "https://<host>:<port>"
|
|
token: "<user>:<password>"
|
|
secure: True
|
|
ca_pem_path: "/path/to/ca.pem"
|
|
client_pem_path: "/path/to/client.pem"
|
|
client_key_path: "/path/to/client.key"
|
|
```
|
|
|
|
#### Key Parameters for TLS Configuration
|
|
|
|
- **`secure`**: Enables TLS encryption when set to `true`. Defaults to `false`.
|
|
- **`server_pem_path`**: Path to the **server certificate** for verifying the server's identity (used in one-way TLS).
|
|
- **`ca_pem_path`**: Path to the **Certificate Authority (CA) certificate** for validating the server certificate (required in mTLS).
|
|
- **`client_pem_path`**: Path to the **client certificate** file (required for mTLS).
|
|
- **`client_key_path`**: Path to the **client private key** file (required for mTLS).
|
|
|
|
## Documentation
|
|
See the [Milvus documentation](https://milvus.io/docs/install-overview.md) for more details about Milvus in general.
|
|
|
|
For more details on TLS configuration, refer to the [TLS setup guide](https://milvus.io/docs/tls.md).
|
|
""",
|
|
),
|
|
api_dependencies=[Api.inference],
|
|
),
|
|
InlineProviderSpec(
|
|
api=Api.vector_io,
|
|
provider_type="inline::milvus",
|
|
pip_packages=["pymilvus"],
|
|
module="llama_stack.providers.inline.vector_io.milvus",
|
|
config_class="llama_stack.providers.inline.vector_io.milvus.MilvusVectorIOConfig",
|
|
api_dependencies=[Api.inference],
|
|
optional_api_dependencies=[Api.files],
|
|
description="""
|
|
Please refer to the remote provider documentation.
|
|
""",
|
|
),
|
|
]
|