Cherry-pick of #3955 to release-0.3.x
Adds a getting started notebook with simple agent examples to help users
get started with llama-stack agents.
Co-authored-by: Omar Abdelwahab <omaryashraf10@gmail.com>
Co-authored-by: Omar Abdelwahab <omara@fb.com>
Cherry-pick of #3992 to release-0.3.x
Adds support for configuring the number of workers in run.yaml
configuration files.
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Cherry-pick of #4012 to release-0.3.x
Fixes container builds failing with UV index strategy errors when build
args are passed with empty values.
Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="")
become environment variables with empty string values in RUN commands.
UV interprets these as if --index-strategy "" was passed on the command
line, causing build failures with "error: a value is required for
'--index-strategy <UV_INDEX_STRATEGY>'".
This is a footgun because empty string ≠ unset variable, and ARGs
silently propagate to all RUN commands, only failing when declared with
empty defaults.
The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of
RUN blocks, saves the values early, and only restores them for editable
installs with RC dependencies. All other install modes (PyPI, test-pypi,
client) now run with a clean environment.
Cherry-pick of #3974 to release-0.3.x branch.
## Summary
- Fixes handling of missing external_providers_dir in stack
configuration
## Original PR
Fixes from #3974
Signed-off-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Doug Edgar <dedgar@redhat.com>
Backport of #4001 to release-0.3.x branch.
Fixes CI failures on release branches where uv sync can't resolve RC
dependencies.
## The Problem
On release branches like `release-0.3.x`, pyproject.toml requires
`llama-stack-client>=0.3.1rc1`. RC versions only exist on test.pypi, not
PyPI. This causes multiple CI failures:
1. `uv sync` fails because it can't resolve RC versions from PyPI
2. pre-commit hooks (uv-lock, codegen) fail for the same reason
3. mypy workflow section needs uv installed
## The Solution
Configure UV to use test.pypi when on release branches:
- Set `UV_INDEX_URL=https://test.pypi.org/simple/` (primary)
- Set `UV_EXTRA_INDEX_URL=https://pypi.org/simple/` (fallback)
- Set `UV_INDEX_STRATEGY=unsafe-best-match` to check both indexes
This allows `uv sync` to resolve common packages from PyPI and RC
versions from test.pypi.
## Additional Fixes
- Export UV env vars to `GITHUB_ENV` so pre-commit hooks inherit them
- Install uv in pre-commit workflow for mypy section
- Handle missing `type_checking` dependency group on release-0.3.x
- Regenerate uv.lock with RC versions for the release branch
## Changes
- Created reusable `install-llama-stack-client` action for configuration
- Modified `setup-runner` to set UV environment variables before sync
- Modified `pre-commit` workflow to configure client and export env vars
- Updated uv.lock with RC versions from test.pypi
This is a cherry-pick of commits afa9f0882, c86e6e906, 626639bee, and
081566321 from main, plus additional fixes for release branch
compatibility.
## Summary
Cherry-picks 5 critical fixes from main to the release-0.3.x branch for
the v0.3.1 release, plus CI workflow updates.
**Note**: This recreates the cherry-picks from the closed PR #3991, now
targeting the renamed `release-0.3.x` branch (previously
`release-0.3.x-maint`).
## Commits
1. **2c56a8560** - fix(context): prevent provider data leak between
streaming requests (#3924)
- **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking
between requests
- Fixed import path for 0.3.0 compatibility
2. **ddd32b187** - fix(inference): enable routing of models with
provider_data alone (#3928)
- Enables routing for fully qualified model IDs with provider_data
- Resolved merge conflicts, adapted for 0.3.0 structure
3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens
(#3667)
- Fixes failures with Gemini and other providers that reject
max_tokens=0
- Non-breaking API change
4. **d7f9da616** - fix(responses): sync conversation before yielding
terminal events in streaming (#3888)
- Ensures conversation sync executes even when streaming consumers break
early
5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey
levels (#3885)
- Fixes logging infrastructure
6. **75b49cb3c** - ci: support release branches and match client branch
(#3990)
- Updates CI workflows to support release-X.Y.x branches
- Matches client branch from llama-stack-client-python for release
testing
- Fixes artifact name collisions
## Adaptations for 0.3.0
- Fixed import paths: `llama_stack.core.telemetry.tracing` →
`llama_stack.providers.utils.telemetry.tracing`
- Fixed import paths: `llama_stack.core.telemetry.telemetry` →
`llama_stack.apis.telemetry`
- Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute
name)
- Removed `rerank()` method that doesn't exist in 0.3.0
## Testing
All imports verified and tests should pass once CI is set up.
# What does this PR do?
metadata is conflicting with the default embedding model set on server
side via extra body, removing the check and just letting metadata take
precedence over extra body
`ValueError: Embedding model inconsistent between metadata
('text-embedding-3-small') and extra_body
('sentence-transformers/nomic-ai/nomic-embed-text-v1.5')`
## Test Plan
CI
# What does this PR do?
Fix segfault with load model
The cc-vec integration failed with segfault when used with default
embedding model on macOS
`model_id: nomic-ai/nomic-embed-text-v1.5` and `provider_id:
sentence-transformers`
Checked crash report and see this is due to torch OPENMP settings.
Constrainting to 1 thread works without crashes.
## Test Plan
Tested with cc-vec integration
1. start server llama stack run starter
2. Do the setup in https://github.com/raghotham/cc-vec to set env
variables and try
`uv run cc-vec index --url-patterns "%.github.io" --vector-store-name
"ml-research" --limit 50 --chunk-size 800 --overlap 400`
- Moved environment variable parsing and `setup_logging()` call from
module level to proper initialization points
- Added explicit `setup_logging()` calls in `server.py::create_app()`
and `library_client.py::AsyncLlamaStackAsLibraryClient.__init__()`
Module-level side effects are bad practice and can cause issues with
import order, testing, and circular dependencies. The previous
implementation ran logging setup on every import of the log module,
which is unpredictable and difficult to control.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Kill the `builtin::rag` tool group completely since it is no longer
targeted. We use the Responses implementation for knowledge_search which
uses the `openai_vector_stores` pathway.
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
## Summary
- Link pre-commit bot comment to workflow run instead of PR for better
debugging
- Dump docker container logs before removal to ensure logs are actually
captured
## Changes
1. **Pre-commit bot**: Changed the initial bot comment to link
"pre-commit hooks" text to the actual workflow run URL instead of just
having the PR number auto-link
2. **Docker logs**: Moved docker container log dumping from GitHub
Actions to the integration-tests.sh script's stop_container() function,
ensuring logs are captured before container removal
## Test plan
- Pre-commit bot comment will now have a clickable link to the workflow
run
- Docker container logs will be successfully captured in CI runs
# What does this PR do?
similarly to `alpha:` move `v1beta` routes under a `beta` group so the
client will have `client.beta`
From what I can tell, the openapi.stainless.yml file is hand written
while the openapi.yml file is generated and copied using the shell
script so I did this by hand.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 24.3.0 to 24.8.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [fastapi](https://github.com/fastapi/fastapi) from 0.116.1 to
0.119.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fastapi/fastapi/releases">fastapi's
releases</a>.</em></p>
<blockquote>
<h2>0.119.0</h2>
<p>FastAPI now (temporarily) supports both Pydantic v2 models and
<code>pydantic.v1</code> models at the same time in the same app, to
make it easier for any FastAPI apps still using Pydantic v1 to gradually
but quickly <strong>migrate to Pydantic v2</strong>.</p>
<pre lang="Python"><code>from fastapi import FastAPI
from pydantic import BaseModel as BaseModelV2
from pydantic.v1 import BaseModel
<p>class Item(BaseModel):<br />
name: str<br />
description: str | None = None</p>
<p>class ItemV2(BaseModelV2):<br />
title: str<br />
summary: str | None = None</p>
<p>app = FastAPI()</p>
<p><a
href="https://github.com/app"><code>@app</code></a>.post("/items/",
response_model=ItemV2)<br />
def create_item(item: Item):<br />
return {"title": item.name, "summary":
item.description}<br />
</code></pre></p>
<p>Adding this feature was a big effort with the main objective of
making it easier for the few applications still stuck in Pydantic v1 to
migrate to Pydantic v2.</p>
<p>And with this, support for <strong>Pydantic v1 is now
deprecated</strong> and will be <strong>removed</strong> from FastAPI in
a future version soon.</p>
<p><strong>Note</strong>: have in mind that the Pydantic team already
stopped supporting Pydantic v1 for recent versions of Python, starting
with Python 3.14.</p>
<p>You can read in the docs more about how to <a
href="https://fastapi.tiangolo.com/how-to/migrate-from-pydantic-v1-to-pydantic-v2/">Migrate
from Pydantic v1 to Pydantic v2</a>.</p>
<h3>Features</h3>
<ul>
<li>✨ Add support for <code>from pydantic.v1 import BaseModel</code>,
mixed Pydantic v1 and v2 models in the same app. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14168">#14168</a>
by <a
href="https://github.com/tiangolo"><code>@tiangolo</code></a>.</li>
</ul>
<h2>0.118.3</h2>
<h3>Upgrades</h3>
<ul>
<li>⬆️ Add support for Python 3.14. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14165">#14165</a>
by <a
href="https://github.com/svlandeg"><code>@svlandeg</code></a>.</li>
</ul>
<h2>0.118.2</h2>
<h3>Fixes</h3>
<ul>
<li>🐛 Fix tagged discriminated union not recognized as body field. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/12942">#12942</a>
by <a
href="https://github.com/frankie567"><code>@frankie567</code></a>.</li>
</ul>
<h3>Internal</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2e721e1b02"><code>2e721e1</code></a>
🔖 Release version 0.119.0</li>
<li><a
href="fc7a0686af"><code>fc7a068</code></a>
📝 Update release notes</li>
<li><a
href="3a3879b2c3"><code>3a3879b</code></a>
📝 Update release notes</li>
<li><a
href="d34918abf0"><code>d34918a</code></a>
✨ Add support for <code>from pydantic.v1 import BaseModel</code>, mixed
Pydantic v1 and ...</li>
<li><a
href="352dbefc63"><code>352dbef</code></a>
🔖 Release version 0.118.3</li>
<li><a
href="96e7d6eaa4"><code>96e7d6e</code></a>
📝 Update release notes</li>
<li><a
href="3611c3fc5b"><code>3611c3f</code></a>
⬆️ Add support for Python 3.14 (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14165">#14165</a>)</li>
<li><a
href="942fce394b"><code>942fce3</code></a>
🔖 Release version 0.118.2</li>
<li><a
href="13b067c9b6"><code>13b067c</code></a>
📝 Update release notes</li>
<li><a
href="185cecd891"><code>185cecd</code></a>
🐛 Fix tagged discriminated union not recognized as body field (<a
href="https://redirect.github.com/fastapi/fastapi/issues/12942">#12942</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/fastapi/fastapi/compare/0.116.1...0.119.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
mirrors build_container.sh
trying to resolve:
0.105 + [ editable = editable ]
0.105 + [ ! -d /workspace/llama-stack ]
0.105 + uv pip install --no-cache-dir -e /workspace/llama-stack
0.261 Using Python 3.12.12 environment at: /usr/local
0.479 × No solution found when resolving dependencies:
0.479 ╰─▶ Because only llama-stack-client<=0.2.23 is available and
0.479 llama-stack==0.3.0rc4 depends on llama-stack-client>=0.3.0rc4, we
can
0.479 conclude that llama-stack==0.3.0rc4 cannot be used.
0.479 And because only llama-stack==0.3.0rc4 is available and you
require
0.479 llama-stack, we can conclude that your requirements are
unsatisfiable.
------
## Test Plan
**NOTE: this is a backwards incompatible change to the run-configs.**
A small QOL update, but this will prove useful when I do a rename for
"vector_dbs" to "vector_stores" next.
Moves all the `models, shields, ...` keys in run-config under a
`registered_resources` sub-key.
# What does this PR do?
Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).
New config is simply (default for Starter distro):
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
```
## Test Plan
CI and Unit tests.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
type: sqlite
db_path: ~/.llama/distributions/foo/registry.db
inference_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
conversations_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/distributions/foo/kvstore.db
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
persistence:
backend: kv_default
namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.