5.4 KiB
Vector IO Embedding Model Configuration
This guide explains how to configure embedding models for vector IO providers in Llama Stack, enabling you to use different embedding models for different use cases and optimize performance and storage requirements.
Overview
Vector IO providers now support configurable embedding models at the provider level. This allows you to:
- Use different embedding models for different vector databases based on your use case
- Optimize for performance with lightweight models for fast retrieval
- Optimize for quality with high-dimensional models for semantic search
- Save storage space with variable-dimension embeddings (Matryoshka embeddings)
- Ensure consistency with provider-level defaults
Configuration Options
Each vector IO provider configuration can include:
embedding_model: The default embedding model ID to use for this providerembedding_dimension: Optional dimension override for models with variable dimensions
Priority Order
The system uses the following priority order for embedding model selection:
- Explicit API parameters (highest priority)
- Provider configuration defaults (new feature)
- System default from model registry (fallback)
Example Configurations
Fast Local Search with Lightweight Embeddings
vector_io:
- provider_id: fast_search
provider_type: inline::faiss
config:
db_path: ~/.llama/faiss_fast.db
embedding_model: "all-MiniLM-L6-v2" # Fast, 384-dimensional
embedding_dimension: 384
High-Quality Semantic Search
vector_io:
- provider_id: quality_search
provider_type: inline::sqlite_vec
config:
db_path: ~/.llama/sqlite_quality.db
embedding_model: "sentence-transformers/all-mpnet-base-v2" # High quality, 768-dimensional
embedding_dimension: 768
Storage-Optimized with Matryoshka Embeddings
vector_io:
- provider_id: compact_search
provider_type: inline::faiss
config:
db_path: ~/.llama/faiss_compact.db
embedding_model: "nomic-embed-text" # Matryoshka model
embedding_dimension: 256 # Reduced from default 768 for storage efficiency
Cloud Deployment with OpenAI Embeddings
vector_io:
- provider_id: cloud_search
provider_type: remote::qdrant
config:
api_key: "${env.QDRANT_API_KEY}"
url: "${env.QDRANT_URL}"
embedding_model: "text-embedding-3-small"
embedding_dimension: 1536
Model Registry Setup
Ensure your embedding models are properly configured in the model registry:
models:
# Lightweight model
- model_id: all-MiniLM-L6-v2
provider_id: local_inference
provider_model_id: sentence-transformers/all-MiniLM-L6-v2
model_type: embedding
metadata:
embedding_dimension: 384
description: "Fast, lightweight embeddings"
# High-quality model
- model_id: sentence-transformers/all-mpnet-base-v2
provider_id: local_inference
provider_model_id: sentence-transformers/all-mpnet-base-v2
model_type: embedding
metadata:
embedding_dimension: 768
description: "High-quality embeddings"
# Matryoshka model
- model_id: nomic-embed-text
provider_id: local_inference
provider_model_id: nomic-embed-text
model_type: embedding
metadata:
embedding_dimension: 768 # Default dimension
description: "Variable-dimension Matryoshka embeddings"
Use Cases
Multi-Environment Setup
Configure different providers for different environments:
vector_io:
# Development - fast, lightweight
- provider_id: dev_search
provider_type: inline::faiss
config:
db_path: ~/.llama/dev_faiss.db
embedding_model: "all-MiniLM-L6-v2"
embedding_dimension: 384
# Production - high quality, scalable
- provider_id: prod_search
provider_type: remote::qdrant
config:
api_key: "${env.QDRANT_API_KEY}"
embedding_model: "text-embedding-3-large"
embedding_dimension: 3072
Domain-Specific Models
Use different models for different content types:
vector_io:
# Code search - specialized model
- provider_id: code_search
provider_type: inline::sqlite_vec
config:
db_path: ~/.llama/code_vectors.db
embedding_model: "microsoft/codebert-base"
embedding_dimension: 768
# General documents - general-purpose model
- provider_id: doc_search
provider_type: inline::sqlite_vec
config:
db_path: ~/.llama/doc_vectors.db
embedding_model: "all-mpnet-base-v2"
embedding_dimension: 768
Backward Compatibility
If no embedding model is specified in the provider configuration, the system will fall back to the existing behavior of using the first available embedding model from the model registry.
Supported Providers
The configurable embedding models feature is supported by:
- Inline providers: Faiss, SQLite-vec, Milvus, ChromaDB, Qdrant
- Remote providers: Qdrant, Milvus, ChromaDB, PGVector, Weaviate
Best Practices
- Match dimensions: Ensure
embedding_dimensionmatches your model's output - Use variable dimensions wisely: Only override dimensions for Matryoshka models that support it
- Consider performance trade-offs: Smaller dimensions = faster search, larger = better quality
- Test configurations: Validate your setup with sample queries before production use
- Document your choices: Comment your configurations to explain model selection rationale