feat(vector-io): implement global default embedding model configuration (Issue #2729)

- Add VectorStoreConfig with global default_embedding_model and default_embedding_dimension
- Support environment variables LLAMA_STACK_DEFAULT_EMBEDDING_MODEL and LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION
- Implement precedence: explicit model > global default > clear error (no fallback)
- Update VectorIORouter with _resolve_embedding_model() precedence logic
- Remove non-deterministic 'first model in run.yaml' fallback behavior
- Add vector_store_config to StackRunConfig and all distribution templates
- Include comprehensive unit tests for config loading and router precedence
- Update documentation with configuration examples and usage patterns
- Fix error messages to include 'Failed to' prefix per coding standards

Resolves deterministic vector store creation by eliminating unpredictable fallbacks
and providing clear configuration options at the stack level.
This commit is contained in:
skamenan7 2025-07-25 17:06:43 -04:00
parent 8422bd102a
commit 17fbd21c0d
7 changed files with 243 additions and 8 deletions

View file

@ -800,3 +800,38 @@ shields:
provider_shield_id: null
...
```
### Global Vector-Store Defaults
Starting with Llama-Stack v2, you can provide a *stack-level* default embedding model that will be used whenever a new vector-store is created and the caller does **not** specify an `embedding_model` parameter.
Add a top-level block next to `models:` and `vector_io:` in your build/run YAML:
```yaml
vector_store_config:
default_embedding_model: ${env.LLAMA_STACK_DEFAULT_EMBEDDING_MODEL:=all-MiniLM-L6-v2}
# optional but recommended
default_embedding_dimension: ${env.LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION:=384}
```
Precedence rules at runtime:
1. If `embedding_model` is explicitly passed in an API call, that value is used.
2. Otherwise the value in `vector_store_config.default_embedding_model` is used.
3. If neither is available the server will raise **MissingEmbeddingModelError** at store-creation time so mis-configuration is caught early.
#### Environment variables
| Variable | Purpose | Example |
|----------|---------|---------|
| `LLAMA_STACK_DEFAULT_EMBEDDING_MODEL` | Global default embedding model id | `all-MiniLM-L6-v2` |
| `LLAMA_STACK_DEFAULT_EMBEDDING_DIMENSION` | Dimension for embeddings (optional) | `384` |
If you include the `${env.…}` placeholder in `vector_store_config`, deployments can override the default without editing YAML:
```bash
export LLAMA_STACK_DEFAULT_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
llama stack run --config run.yaml
```
> Tip: If you omit `vector_store_config` entirely you **must** either pass `embedding_model=` on every `create_vector_store` call or set `LLAMA_STACK_DEFAULT_EMBEDDING_MODEL` in the environment, otherwise the server will refuse to create a vector store.