## Summary
Cherry-picks 5 critical fixes from main to the release-0.3.x branch for
the v0.3.1 release, plus CI workflow updates.
**Note**: This recreates the cherry-picks from the closed PR #3991, now
targeting the renamed `release-0.3.x` branch (previously
`release-0.3.x-maint`).
## Commits
1. **2c56a8560** - fix(context): prevent provider data leak between
streaming requests (#3924)
- **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking
between requests
- Fixed import path for 0.3.0 compatibility
2. **ddd32b187** - fix(inference): enable routing of models with
provider_data alone (#3928)
- Enables routing for fully qualified model IDs with provider_data
- Resolved merge conflicts, adapted for 0.3.0 structure
3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens
(#3667)
- Fixes failures with Gemini and other providers that reject
max_tokens=0
- Non-breaking API change
4. **d7f9da616** - fix(responses): sync conversation before yielding
terminal events in streaming (#3888)
- Ensures conversation sync executes even when streaming consumers break
early
5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey
levels (#3885)
- Fixes logging infrastructure
6. **75b49cb3c** - ci: support release branches and match client branch
(#3990)
- Updates CI workflows to support release-X.Y.x branches
- Matches client branch from llama-stack-client-python for release
testing
- Fixes artifact name collisions
## Adaptations for 0.3.0
- Fixed import paths: `llama_stack.core.telemetry.tracing` →
`llama_stack.providers.utils.telemetry.tracing`
- Fixed import paths: `llama_stack.core.telemetry.telemetry` →
`llama_stack.apis.telemetry`
- Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute
name)
- Removed `rerank()` method that doesn't exist in 0.3.0
## Testing
All imports verified and tests should pass once CI is set up.
# What does this PR do?
Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).
New config is simply (default for Starter distro):
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
```
## Test Plan
CI and Unit tests.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
type: sqlite
db_path: ~/.llama/distributions/foo/registry.db
inference_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
conversations_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/distributions/foo/kvstore.db
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
persistence:
backend: kv_default
namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
# What does this PR do?
Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
- Unit tests
- Integration tests
- Simple example below:
Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:
```python
vs = client.vector_stores.create(
extra_body={
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dimension": 384,
}
)
```
The `extra_body` is now unnecessary.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
2 main changes:
1. Remove `provider_id` requirement in call to vector stores and
2. Removes "register first embedding model" logic
- Now forces embedding model id as required on Vector Store creation
Simplifies the UX for OpenAI to:
```python
vs = client.vector_stores.create(
name="my_citations_db",
extra_body={
"embedding_model": "ollama/nomic-embed-text:latest",
}
)
```
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>