Replace isolated UV configuration with unified wrapper script that:
- Detects release branches (release-X.Y.x pattern)
- Checks upstream target branch for feature branches
- Supports LLAMA_STACK_RELEASE_MODE env var for local testing
- Wraps all uv commands (lock, run) in local hooks
When on/targeting release branches, automatically sets:
- UV_EXTRA_INDEX_URL=https://test.pypi.org/simple/
- UV_INDEX_STRATEGY=unsafe-best-match
For local testing on feature branches targeting release branches:
LLAMA_STACK_RELEASE_MODE=true pre-commit run --all-files
This ensures RC versions can be resolved from test.pypi in both
CI (on actual release branches) and local development.
Make uv-lock pre-commit hook smart about release branches by wrapping it
in a script that detects release branches and sets UV_EXTRA_INDEX_URL.
This allows the same pre-commit config to work locally and in CI without
special environment variable setup in workflows.
Changes:
- Add scripts/pre-commit-uv-lock.sh wrapper that detects release branches
- Move uv-lock from astral-sh/uv-pre-commit to local hook using wrapper
- Remove UV env var setup from pre-commit workflow (hook handles it)
- Regenerate uv.lock with test.pypi as extra index (not primary)
UV was configured with test.pypi as primary index and PyPI as extra index.
This caused failures because packages like hf-transfer don't exist on test.pypi.
Changed to use PyPI as primary (default) and test.pypi as extra index.
UV will now find common packages on PyPI and only look for RC versions on test.pypi.
All client installation is now handled by the install-llama-stack-client
action through setup-runner. The duplicate logic in setup-test-environment
was causing failures and is no longer needed.
UV env vars need to persist across workflow steps for scripts that run
'uv' commands (like unit-tests.sh which uses 'uv run --with-editable').
Export them to GITHUB_ENV so they're available in subsequent steps.
When using multiple indexes (test.pypi + PyPI), uv uses first-index-wins
strategy by default to prevent dependency confusion. This causes it to
try fetching all packages from test.pypi first, which fails.
Setting UV_INDEX_STRATEGY=unsafe-best-match tells uv to check all
indexes for the best version match, allowing it to get common packages
from PyPI and RC versions from test.pypi.
Pre-commit hooks run in isolated environments and don't inherit env
vars from the workflow step. Export UV_INDEX_URL and UV_EXTRA_INDEX_URL
to GITHUB_ENV so they're available to all subsequent steps and their
subprocesses, including pre-commit hooks.
The uv-lock pre-commit hook runs 'uv lock' which needs UV_INDEX_URL
set to resolve RC dependencies on release branches. Configure the
client before running pre-commit so the env vars are available.
The previous approach tried to install before uv sync, but there's no
venv yet. The correct solution:
- Release branches: Point UV_INDEX_URL to test.pypi so uv sync can
resolve RC versions, then install exact git version after sync
- Non-release branches: Run uv sync normally, then install git version
if client-version=latest
This lets uv sync create the venv first, then we install/override the
client version as needed.
Renamed install-client-for-release to install-llama-stack-client and
made it handle both release branches and client-version inputs. Now
all client installation logic lives in one place:
- Release branches: always install from matching git branch
- Non-release branches: install based on client-version input
This eliminates all the conditional logic from setup-runner.
Moved the release branch detection and client pre-install logic into
a dedicated action to eliminate duplication between setup-runner and
pre-commit workflows.
## Summary
Cherry-picks 5 critical fixes from main to the release-0.3.x branch for
the v0.3.1 release, plus CI workflow updates.
**Note**: This recreates the cherry-picks from the closed PR #3991, now
targeting the renamed `release-0.3.x` branch (previously
`release-0.3.x-maint`).
## Commits
1. **2c56a8560** - fix(context): prevent provider data leak between
streaming requests (#3924)
- **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking
between requests
- Fixed import path for 0.3.0 compatibility
2. **ddd32b187** - fix(inference): enable routing of models with
provider_data alone (#3928)
- Enables routing for fully qualified model IDs with provider_data
- Resolved merge conflicts, adapted for 0.3.0 structure
3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens
(#3667)
- Fixes failures with Gemini and other providers that reject
max_tokens=0
- Non-breaking API change
4. **d7f9da616** - fix(responses): sync conversation before yielding
terminal events in streaming (#3888)
- Ensures conversation sync executes even when streaming consumers break
early
5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey
levels (#3885)
- Fixes logging infrastructure
6. **75b49cb3c** - ci: support release branches and match client branch
(#3990)
- Updates CI workflows to support release-X.Y.x branches
- Matches client branch from llama-stack-client-python for release
testing
- Fixes artifact name collisions
## Adaptations for 0.3.0
- Fixed import paths: `llama_stack.core.telemetry.tracing` →
`llama_stack.providers.utils.telemetry.tracing`
- Fixed import paths: `llama_stack.core.telemetry.telemetry` →
`llama_stack.apis.telemetry`
- Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute
name)
- Removed `rerank()` method that doesn't exist in 0.3.0
## Testing
All imports verified and tests should pass once CI is set up.
# What does this PR do?
metadata is conflicting with the default embedding model set on server
side via extra body, removing the check and just letting metadata take
precedence over extra body
`ValueError: Embedding model inconsistent between metadata
('text-embedding-3-small') and extra_body
('sentence-transformers/nomic-ai/nomic-embed-text-v1.5')`
## Test Plan
CI
# What does this PR do?
Fix segfault with load model
The cc-vec integration failed with segfault when used with default
embedding model on macOS
`model_id: nomic-ai/nomic-embed-text-v1.5` and `provider_id:
sentence-transformers`
Checked crash report and see this is due to torch OPENMP settings.
Constrainting to 1 thread works without crashes.
## Test Plan
Tested with cc-vec integration
1. start server llama stack run starter
2. Do the setup in https://github.com/raghotham/cc-vec to set env
variables and try
`uv run cc-vec index --url-patterns "%.github.io" --vector-store-name
"ml-research" --limit 50 --chunk-size 800 --overlap 400`
- Moved environment variable parsing and `setup_logging()` call from
module level to proper initialization points
- Added explicit `setup_logging()` calls in `server.py::create_app()`
and `library_client.py::AsyncLlamaStackAsLibraryClient.__init__()`
Module-level side effects are bad practice and can cause issues with
import order, testing, and circular dependencies. The previous
implementation ran logging setup on every import of the log module,
which is unpredictable and difficult to control.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Kill the `builtin::rag` tool group completely since it is no longer
targeted. We use the Responses implementation for knowledge_search which
uses the `openai_vector_stores` pathway.
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
## Summary
- Link pre-commit bot comment to workflow run instead of PR for better
debugging
- Dump docker container logs before removal to ensure logs are actually
captured
## Changes
1. **Pre-commit bot**: Changed the initial bot comment to link
"pre-commit hooks" text to the actual workflow run URL instead of just
having the PR number auto-link
2. **Docker logs**: Moved docker container log dumping from GitHub
Actions to the integration-tests.sh script's stop_container() function,
ensuring logs are captured before container removal
## Test plan
- Pre-commit bot comment will now have a clickable link to the workflow
run
- Docker container logs will be successfully captured in CI runs
# What does this PR do?
similarly to `alpha:` move `v1beta` routes under a `beta` group so the
client will have `client.beta`
From what I can tell, the openapi.stainless.yml file is hand written
while the openapi.yml file is generated and copied using the shell
script so I did this by hand.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 24.3.0 to 24.8.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [fastapi](https://github.com/fastapi/fastapi) from 0.116.1 to
0.119.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fastapi/fastapi/releases">fastapi's
releases</a>.</em></p>
<blockquote>
<h2>0.119.0</h2>
<p>FastAPI now (temporarily) supports both Pydantic v2 models and
<code>pydantic.v1</code> models at the same time in the same app, to
make it easier for any FastAPI apps still using Pydantic v1 to gradually
but quickly <strong>migrate to Pydantic v2</strong>.</p>
<pre lang="Python"><code>from fastapi import FastAPI
from pydantic import BaseModel as BaseModelV2
from pydantic.v1 import BaseModel
<p>class Item(BaseModel):<br />
name: str<br />
description: str | None = None</p>
<p>class ItemV2(BaseModelV2):<br />
title: str<br />
summary: str | None = None</p>
<p>app = FastAPI()</p>
<p><a
href="https://github.com/app"><code>@app</code></a>.post("/items/",
response_model=ItemV2)<br />
def create_item(item: Item):<br />
return {"title": item.name, "summary":
item.description}<br />
</code></pre></p>
<p>Adding this feature was a big effort with the main objective of
making it easier for the few applications still stuck in Pydantic v1 to
migrate to Pydantic v2.</p>
<p>And with this, support for <strong>Pydantic v1 is now
deprecated</strong> and will be <strong>removed</strong> from FastAPI in
a future version soon.</p>
<p><strong>Note</strong>: have in mind that the Pydantic team already
stopped supporting Pydantic v1 for recent versions of Python, starting
with Python 3.14.</p>
<p>You can read in the docs more about how to <a
href="https://fastapi.tiangolo.com/how-to/migrate-from-pydantic-v1-to-pydantic-v2/">Migrate
from Pydantic v1 to Pydantic v2</a>.</p>
<h3>Features</h3>
<ul>
<li>✨ Add support for <code>from pydantic.v1 import BaseModel</code>,
mixed Pydantic v1 and v2 models in the same app. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14168">#14168</a>
by <a
href="https://github.com/tiangolo"><code>@tiangolo</code></a>.</li>
</ul>
<h2>0.118.3</h2>
<h3>Upgrades</h3>
<ul>
<li>⬆️ Add support for Python 3.14. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14165">#14165</a>
by <a
href="https://github.com/svlandeg"><code>@svlandeg</code></a>.</li>
</ul>
<h2>0.118.2</h2>
<h3>Fixes</h3>
<ul>
<li>🐛 Fix tagged discriminated union not recognized as body field. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/12942">#12942</a>
by <a
href="https://github.com/frankie567"><code>@frankie567</code></a>.</li>
</ul>
<h3>Internal</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2e721e1b02"><code>2e721e1</code></a>
🔖 Release version 0.119.0</li>
<li><a
href="fc7a0686af"><code>fc7a068</code></a>
📝 Update release notes</li>
<li><a
href="3a3879b2c3"><code>3a3879b</code></a>
📝 Update release notes</li>
<li><a
href="d34918abf0"><code>d34918a</code></a>
✨ Add support for <code>from pydantic.v1 import BaseModel</code>, mixed
Pydantic v1 and ...</li>
<li><a
href="352dbefc63"><code>352dbef</code></a>
🔖 Release version 0.118.3</li>
<li><a
href="96e7d6eaa4"><code>96e7d6e</code></a>
📝 Update release notes</li>
<li><a
href="3611c3fc5b"><code>3611c3f</code></a>
⬆️ Add support for Python 3.14 (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14165">#14165</a>)</li>
<li><a
href="942fce394b"><code>942fce3</code></a>
🔖 Release version 0.118.2</li>
<li><a
href="13b067c9b6"><code>13b067c</code></a>
📝 Update release notes</li>
<li><a
href="185cecd891"><code>185cecd</code></a>
🐛 Fix tagged discriminated union not recognized as body field (<a
href="https://redirect.github.com/fastapi/fastapi/issues/12942">#12942</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/fastapi/fastapi/compare/0.116.1...0.119.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
mirrors build_container.sh
trying to resolve:
0.105 + [ editable = editable ]
0.105 + [ ! -d /workspace/llama-stack ]
0.105 + uv pip install --no-cache-dir -e /workspace/llama-stack
0.261 Using Python 3.12.12 environment at: /usr/local
0.479 × No solution found when resolving dependencies:
0.479 ╰─▶ Because only llama-stack-client<=0.2.23 is available and
0.479 llama-stack==0.3.0rc4 depends on llama-stack-client>=0.3.0rc4, we
can
0.479 conclude that llama-stack==0.3.0rc4 cannot be used.
0.479 And because only llama-stack==0.3.0rc4 is available and you
require
0.479 llama-stack, we can conclude that your requirements are
unsatisfiable.
------
## Test Plan
**NOTE: this is a backwards incompatible change to the run-configs.**
A small QOL update, but this will prove useful when I do a rename for
"vector_dbs" to "vector_stores" next.
Moves all the `models, shields, ...` keys in run-config under a
`registered_resources` sub-key.
# What does this PR do?
Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).
New config is simply (default for Starter distro):
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
```
## Test Plan
CI and Unit tests.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
type: sqlite
db_path: ~/.llama/distributions/foo/registry.db
inference_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
conversations_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/distributions/foo/kvstore.db
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
persistence:
backend: kv_default
namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.