mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-22 20:40:00 +00:00

skamenan7 d55dd3e9a0 feat(vector-io): configurable embedding models for all providers (v2)\n\nAdds embedding_model and embedding_dimension fields to all VectorIOConfig classes.\nRouter respects provider defaults with fallback.\nIntroduces embedding_utils helper.\nComprehensive docs & samples.\nResolves #2729

2025-07-21 17:04:18 -04:00

5.4 KiB

Raw Blame History

Vector IO Embedding Model Configuration

This guide explains how to configure embedding models for vector IO providers in Llama Stack, enabling you to use different embedding models for different use cases and optimize performance and storage requirements.

Overview

Vector IO providers now support configurable embedding models at the provider level. This allows you to:

Use different embedding models for different vector databases based on your use case
Optimize for performance with lightweight models for fast retrieval
Optimize for quality with high-dimensional models for semantic search
Save storage space with variable-dimension embeddings (Matryoshka embeddings)
Ensure consistency with provider-level defaults

Configuration Options

Each vector IO provider configuration can include:

embedding_model: The default embedding model ID to use for this provider
embedding_dimension: Optional dimension override for models with variable dimensions

Priority Order

The system uses the following priority order for embedding model selection:

Explicit API parameters (highest priority)
Provider configuration defaults (new feature)
System default from model registry (fallback)

Example Configurations

Fast Local Search with Lightweight Embeddings

vector_io:
  - provider_id: fast_search
    provider_type: inline::faiss
    config:
      db_path: ~/.llama/faiss_fast.db
      embedding_model: "all-MiniLM-L6-v2"  # Fast, 384-dimensional
      embedding_dimension: 384

High-Quality Semantic Search

vector_io:
  - provider_id: quality_search
    provider_type: inline::sqlite_vec
    config:
      db_path: ~/.llama/sqlite_quality.db
      embedding_model: "sentence-transformers/all-mpnet-base-v2"  # High quality, 768-dimensional
      embedding_dimension: 768

Storage-Optimized with Matryoshka Embeddings

vector_io:
  - provider_id: compact_search
    provider_type: inline::faiss
    config:
      db_path: ~/.llama/faiss_compact.db
      embedding_model: "nomic-embed-text"  # Matryoshka model
      embedding_dimension: 256  # Reduced from default 768 for storage efficiency

Cloud Deployment with OpenAI Embeddings

vector_io:
  - provider_id: cloud_search
    provider_type: remote::qdrant
    config:
      api_key: "${env.QDRANT_API_KEY}"
      url: "${env.QDRANT_URL}"
      embedding_model: "text-embedding-3-small"
      embedding_dimension: 1536

Model Registry Setup

Ensure your embedding models are properly configured in the model registry:

models:
  # Lightweight model
  - model_id: all-MiniLM-L6-v2
    provider_id: local_inference
    provider_model_id: sentence-transformers/all-MiniLM-L6-v2
    model_type: embedding
    metadata:
      embedding_dimension: 384
      description: "Fast, lightweight embeddings"

  # High-quality model
  - model_id: sentence-transformers/all-mpnet-base-v2
    provider_id: local_inference
    provider_model_id: sentence-transformers/all-mpnet-base-v2
    model_type: embedding
    metadata:
      embedding_dimension: 768
      description: "High-quality embeddings"

  # Matryoshka model
  - model_id: nomic-embed-text
    provider_id: local_inference
    provider_model_id: nomic-embed-text
    model_type: embedding
    metadata:
      embedding_dimension: 768  # Default dimension
      description: "Variable-dimension Matryoshka embeddings"

Use Cases

Multi-Environment Setup

Configure different providers for different environments:

vector_io:
  # Development - fast, lightweight
  - provider_id: dev_search
    provider_type: inline::faiss
    config:
      db_path: ~/.llama/dev_faiss.db
      embedding_model: "all-MiniLM-L6-v2"
      embedding_dimension: 384

  # Production - high quality, scalable
  - provider_id: prod_search
    provider_type: remote::qdrant
    config:
      api_key: "${env.QDRANT_API_KEY}"
      embedding_model: "text-embedding-3-large"
      embedding_dimension: 3072

Domain-Specific Models

Use different models for different content types:

vector_io:
  # Code search - specialized model
  - provider_id: code_search
    provider_type: inline::sqlite_vec
    config:
      db_path: ~/.llama/code_vectors.db
      embedding_model: "microsoft/codebert-base"
      embedding_dimension: 768

  # General documents - general-purpose model
  - provider_id: doc_search
    provider_type: inline::sqlite_vec
    config:
      db_path: ~/.llama/doc_vectors.db
      embedding_model: "all-mpnet-base-v2"
      embedding_dimension: 768

Backward Compatibility

If no embedding model is specified in the provider configuration, the system will fall back to the existing behavior of using the first available embedding model from the model registry.

Supported Providers

The configurable embedding models feature is supported by:

Inline providers: Faiss, SQLite-vec, Milvus, ChromaDB, Qdrant
Remote providers: Qdrant, Milvus, ChromaDB, PGVector, Weaviate

Best Practices

Match dimensions: Ensure embedding_dimension matches your model's output
Use variable dimensions wisely: Only override dimensions for Matryoshka models that support it
Consider performance trade-offs: Smaller dimensions = faster search, larger = better quality
Test configurations: Validate your setup with sample queries before production use
Document your choices: Comment your configurations to explain model selection rationale

5.4 KiB Raw Blame History