feat(vector-io): implement global default embedding model configuration (Issue #2729)

- Add VectorStoreConfig with global default_embedding_model and default_embedding_dimension
- Support environment variables LLAMA_STACK_DEFAULT_EMBEDDING_MODEL and LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION
- Implement precedence: explicit model > global default > clear error (no fallback)
- Update VectorIORouter with _resolve_embedding_model() precedence logic
- Remove non-deterministic 'first model in run.yaml' fallback behavior
- Add vector_store_config to StackRunConfig and all distribution templates
- Include comprehensive unit tests for config loading and router precedence
- Update documentation with configuration examples and usage patterns
- Fix error messages to include 'Failed to' prefix per coding standards

Resolves deterministic vector store creation by eliminating unpredictable fallbacks
and providing clear configuration options at the stack level.
This commit is contained in:
skamenan7 2025-07-25 17:06:43 -04:00
parent 8422bd102a
commit 17fbd21c0d
7 changed files with 243 additions and 8 deletions

View file

@ -0,0 +1,45 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from __future__ import annotations
"""Global vector-store configuration shared across the stack.
This module introduces `VectorStoreConfig`, a small Pydantic model that
lives under `StackRunConfig.vector_store_config`. It lets deployers set
an explicit default embedding model (and dimension) that the Vector-IO
router will inject whenever the caller does not specify one.
"""
import os
from pydantic import BaseModel, ConfigDict, Field
__all__ = ["VectorStoreConfig"]
class VectorStoreConfig(BaseModel):
"""Stack-level defaults for vector-store creation.
Attributes
----------
default_embedding_model
The model *id* the stack should use when an embedding model is
required but not supplied by the API caller. When *None* the
router will raise a :class:`~llama_stack.errors.MissingEmbeddingModelError`.
default_embedding_dimension
Optional integer hint for vector dimension. Routers/providers
may validate that the chosen model emits vectors of this size.
"""
default_embedding_model: str | None = Field(
default_factory=lambda: os.getenv("LLAMA_STACK_DEFAULT_EMBEDDING_MODEL")
)
default_embedding_dimension: int | None = Field(
default_factory=lambda: int(os.getenv("LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION", 0)) or None, ge=1
)
model_config = ConfigDict(frozen=True)