feat(vector-io): implement global default embedding model configuration (Issue #2729)

- Add VectorStoreConfig with global default_embedding_model and default_embedding_dimension - Support environment variables LLAMA_STACK_DEFAULT_EMBEDDING_MODEL and LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION - Implement precedence: explicit model > global default > clear error (no fallback) - Update VectorIORouter with _resolve_embedding_model() precedence logic - Remove non-deterministic 'first model in run.yaml' fallback behavior - Add vector_store_config to StackRunConfig and all distribution templates - Include comprehensive unit tests for config loading and router precedence - Update documentation with configuration examples and usage patterns - Fix error messages to include 'Failed to' prefix per coding standards Resolves deterministic vector store creation by eliminating unpredictable fallbacks and providing clear configuration options at the stack level.
2025-10-05 04:17:32 +00:00 · 2025-07-25 17:06:43 -04:00 · 2025-07-25 17:06:43 -04:00 · 17fbd21c0d
commit 17fbd21c0d
parent 8422bd102a
7 changed files with 243 additions and 8 deletions
--- a/llama_stack/apis/common/vector_store_config.py
+++ b/llama_stack/apis/common/vector_store_config.py
@ -0,0 +1,45 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from __future__ import annotations
+
+"""Global vector-store configuration shared across the stack.
+
+This module introduces `VectorStoreConfig`, a small Pydantic model that
+lives under `StackRunConfig.vector_store_config`.  It lets deployers set
+an explicit default embedding model (and dimension) that the Vector-IO
+router will inject whenever the caller does not specify one.
+"""
+
+import os
+
+from pydantic import BaseModel, ConfigDict, Field
+
+__all__ = ["VectorStoreConfig"]
+
+
+class VectorStoreConfig(BaseModel):
+    """Stack-level defaults for vector-store creation.
+
+    Attributes
+    ----------
+    default_embedding_model
+        The model *id* the stack should use when an embedding model is
+        required but not supplied by the API caller.  When *None* the
+        router will raise a :class:`~llama_stack.errors.MissingEmbeddingModelError`.
+    default_embedding_dimension
+        Optional integer hint for vector dimension.  Routers/providers
+        may validate that the chosen model emits vectors of this size.
+    """
+
+    default_embedding_model: str | None = Field(
+        default_factory=lambda: os.getenv("LLAMA_STACK_DEFAULT_EMBEDDING_MODEL")
+    )
+    default_embedding_dimension: int | None = Field(
+        default_factory=lambda: int(os.getenv("LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION", 0)) or None, ge=1
+    )
+
+    model_config = ConfigDict(frozen=True)