llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-07 02:47:21 +00:00

Author	SHA1	Message	Date
Sébastien Han	d4aa348b60	chore: remove HTML generation for openapi spec (#4039 ) # What does this PR do? This seems to be an ancient artifact when we were using readthedocs? Now docusaurus read the specs directly. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 18:03:40 +01:00
dependabot[bot]	7e294d33d9	chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 7.1.2 (#4023 ) Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.0.1 to 7.1.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v7.1.2 🌈 Speed up extraction on Windows</h2> <h2>Changes</h2> <p><a href="https://github.com/lazka"><code>@lazka</code></a> fixed a bug that caused extracting uv to take up to 30s. Thank you!</p> <h2>🐛 Bug fixes</h2> <ul> <li>Use tar for extracting the uv zip file on Windows too <a href="https://github.com/lazka"><code>@lazka</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known checksums for 0.9.5 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump dependencies <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li> <li>Bump github/codeql-action from 4.30.8 to 4.30.9 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li> </ul> <h2>v7.1.1 🌈 Fix empty workdir detection and lowest resolution strategy</h2> <h2>Changes</h2> <p>This release fixes a bug where the <code>working-directory</code> input was not used to detect an empty work dir. It also fixes the <code>lowest</code> resolution strategy resolving to latest when only a lower bound was specified.</p> <p>Special thanks to <a href="https://github.com/tpgillam"><code>@tpgillam</code></a> for the first contribution!</p> <h2>🐛 Bug fixes</h2> <ul> <li>Fix "lowest" resolution strategy with lower-bound only <a href="https://github.com/tpgillam"><code>@tpgillam</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li> <li>Use working-directory to detect empty workdir <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known checksums for 0.9.4 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li> <li>chore: update known checksums for 0.9.3 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li> </ul> <h2>📚 Documentation</h2> <ul> <li>Change version in docs to v7 <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump github/codeql-action from 4.30.7 to 4.30.8 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li> <li>Bump actions/setup-node from 5.0.0 to 6.0.0 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li> <li>Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li> <li>Update lockfile with latest npm <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li> </ul> <h2>v7.1.0 🌈 Support all the use cases</h2> <h2>Changes</h2> <p><strong>Support all the use cases!!!</strong></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`85856786d1`"><code>8585678</code></a> Bump dependencies (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li> <li><a href="`22d500a65c`"><code>22d500a</code></a> Bump github/codeql-action from 4.30.8 to 4.30.9 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li> <li><a href="`14d557131d`"><code>14d5571</code></a> chore: update known checksums for 0.9.5 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li> <li><a href="`29cd2350cd`"><code>29cd235</code></a> Use tar for extracting the uv zip file on Windows too (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li> <li><a href="`2ddd2b9cb3`"><code>2ddd2b9</code></a> chore: update known checksums for 0.9.4 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li> <li><a href="`b7bf78939d`"><code>b7bf789</code></a> Fix "lowest" resolution strategy with lower-bound only (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li> <li><a href="`cb6c0a53d9`"><code>cb6c0a5</code></a> Change version in docs to v7 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li> <li><a href="`dffc6292f2`"><code>dffc629</code></a> Use working-directory to detect empty workdir (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li> <li><a href="`6e346e1653`"><code>6e346e1</code></a> chore: update known checksums for 0.9.3 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li> <li><a href="`3ccd0fd498`"><code>3ccd0fd</code></a> Bump github/codeql-action from 4.30.7 to 4.30.8 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/setup-uv/compare/v6.0.1...85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=7.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-03 13:43:04 +01:00
Sébastien Han	3dbff6bf3f	fix: help mypy & fix precommit on main (#4037 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 21s Details UI Tests / ui-tests (22) (push) Successful in 1m15s Details # What does this PR do? Add type to help mypy figure out. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 05:39:50 -05:00
Ashwin Bharambe	d45137a399	fix(ci): export UV_INDEX_STRATEGY to current shell before running uv sync (#4020 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 16s Details UI Tests / ui-tests (22) (push) Successful in 1m6s Details Fixes latent bug where UV_INDEX_STRATEGY was only exported to GITHUB_ENV but not to the current shell. While this bug doesn't currently affect main (since UV_EXTRA_INDEX_URL is only set on release branches), it's a latent bug that could cause issues if the logic changes in the future or if someone tests with UV_EXTRA_INDEX_URL set. The setup-runner action only exported UV_INDEX_STRATEGY to GITHUB_ENV (for subsequent steps), not to the current shell environment. Since uv sync runs in the same step, it would never see the variable if it were set. This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the variable available in the current shell before running uv commands. Related: #4019 (same fix for release-0.3.x where the bug is actively triggered)	2025-11-01 12:57:24 -07:00
Charlie Doern	93401836b7	feat: llama stack run --providers (#3989 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Pre-commit / pre-commit (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 56s Details # What does this PR do? llama stack run --providers takes a list of providers in the format of api1=provider1,api2=provider2 this allows users to run with a simple list of providers. given the architecture of `create_app`, this run config needs to be written to disk. use ~/.llama/distribution/providers-run/run.yaml each time for consistency resolves #3956 ## Test Plan new unit tests to ensure --providers. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-31 16:21:32 -07:00
Ashwin Bharambe	b2a5428a14	fix(ci): unset empty UV index env vars to prevent uv errors (#4012 ) Fixes container builds failing with UV index strategy errors when build args are passed with empty values. Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="") become environment variables with empty string values in RUN commands. UV interprets these as if --index-strategy "" was passed on the command line, causing build failures with "error: a value is required for '--index-strategy <UV_INDEX_STRATEGY>'". This is a footgun because empty string ≠ unset variable, and ARGs silently propagate to all RUN commands, only failing when declared with empty defaults. The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of RUN blocks, saves the values early, and only restores them for editable installs with RC dependencies. All other install modes (PyPI, test-pypi, client) now run with a clean environment.	2025-10-31 13:29:14 -07:00
Ashwin Bharambe	f8fe3018af	fix(ci): use test.pypi as extra index for RC dependencies (#4009 ) Backports UV index configuration fixes from `release-0.3.x` (PR #4002). The main issue: when we created the release branch infrastructure, we configured UV to use `test.pypi` as the PRIMARY index to resolve RC dependencies. This caused UV to look for ALL packages there first, which led to problems - some packages don't have binary wheels on `test.pypi`, so UV tried building from source and failed (like the `psycopg2-binary` issue we hit). The fix is simple: use PyPI as primary (default) and `test.pypi` as an EXTRA index. UV will check PyPI first for everything, and only fall back to `test.pypi` for packages not found there (like our RC client versions). This PR includes: - Fixed `install-llama-stack-client` action to output `UV_EXTRA_INDEX_URL` instead of `UV_INDEX_URL` - New `uv-run-with-index.sh` wrapper that auto-detects release branches and sets UV env vars - Updated pre-commit hooks (`uv-lock`, codegen, etc.) to use the wrapper - Pass UV env vars as Docker build args in all locations - Scope UV env vars properly in Containerfile (inline for llama-stack install, explicitly unset before distribution deps) - Export UV env vars to `GITHUB_ENV` in setup-runner for cross-step persistence The wrapper detects release branches automatically in both CI and local environments, so this "just works" without manual configuration. On main (non-release branch), the wrapper becomes a no-op. Tested and validated on `release-0.3.x` where all CI checks pass.	2025-10-31 12:55:43 -07:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Ashwin Bharambe	61aab1889b	fix(ci): remove precommit trigger workflow (#4008 ) Not safe!	2025-10-31 11:41:26 -07:00
Francisco Arceo	7b79cd05d5	feat: Adding Prompts to admin UI (#3987 ) # What does this PR do? 1. Updates Llama Stack Typescript client to include `prompts`api in playground client. 2. Updates the UI to display prompts and execute basic CRUD operations for prompts. (2) adds an explicit "Preview" section when creating the prompt to show users how the Prompts API behaves as you dynamically edit the prompt content. See example here: <p align="center"><img width="468.5" height="333" alt="Screenshot 2025-10-31 at 12 22 34 PM" src="https://github.com/user-attachments/assets/3542ce7f-56fe-4fb4-b0a3-5cfba5917f6d" /></p> Some screen shots: <details><Summary>Click me to expand!</Summary> ### Prompts List with Prompts <img width="1906" height="1108" alt="Screenshot 2025-10-31 at 12 20 05 PM" src="https://github.com/user-attachments/assets/494a4748-ea6a-4527-8cfe-8959cb741c0f" /> ### Empty Prompts List <img width="1889" height="1123" alt="Screenshot 2025-10-31 at 12 08 44 PM" src="https://github.com/user-attachments/assets/ac95b807-d311-4725-86da-0258b3cce81a" /> ### Create Prompt <img width="1918" height="1167" alt="Screenshot 2025-10-31 at 11 03 29 AM" src="https://github.com/user-attachments/assets/b3100a78-f4f3-410f-af89-f7e7fe4a89e7" /> ### Submit Prompt with error <img width="1901" height="1213" alt="Screenshot 2025-10-31 at 12 09 28 PM" src="https://github.com/user-attachments/assets/dca71354-a602-449d-a0d8-0ed3d009a275" /> </details> ## Closes https://github.com/llamastack/llama-stack/issues/3322 ## Test Plan Added tests and manual testing. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-31 11:37:25 -07:00
Ashwin Bharambe	c2fd17474e	fix: stop printing server log, it is confusing Some checks failed Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 54s Details	2025-10-31 11:22:08 -07:00
Ashwin Bharambe	5f95c1f8cc	fix(ci): install client from release branch before uv sync (#4001 ) Fixes CI failures on release branches where uv sync can't resolve RC dependencies. The problem: on release branches like `release-0.3.x`, pyproject.toml requires `llama-stack-client>=0.3.1rc1`. But RC versions only exist on test.pypi, not PyPI. So uv sync fails before we even get a chance to install the client from git. The fix is simple - on release branches, pre-install the client from the matching git branch first, then run uv sync. This satisfies the RC requirement and lets dependency resolution succeed. Modified setup-runner and pre-commit workflows to do this. Also cleaned up some duplicate logic in setup-test-environment that's now handled centrally. Example failure: `5415478835`	2025-10-31 06:16:20 -07:00
Ashwin Bharambe	6d80ca4bf7	fix(ci): replace unused LLAMA_STACK_CLIENT_DIR with direct install (#4000 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details UI Tests / ui-tests (22) (push) Successful in 27s Details Replace unused `LLAMA_STACK_CLIENT_DIR` env var (from old `llama stack build`) with direct `uv pip install` for release branch client installation. cc @ehhuang	2025-10-30 22:09:25 -07:00
Jiayi Ni	fa7699d2c3	feat: Add rerank API for NVIDIA Inference Provider (#3329 ) # What does this PR do? Add rerank API for NVIDIA Inference Provider. <!-- If resolving an issue, uncomment and update the line below --> Closes #3278 ## Test Plan Unit test: ``` pytest tests/unit/providers/nvidia/test_rerank_inference.py ``` Integration test: ``` pytest -s -v tests/integration/inference/test_rerank.py --stack-config="inference=nvidia" --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3 --env NVIDIA_API_KEY="" --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-10-30 21:42:09 -07:00
Ashwin Bharambe	c396de57a4	ci: standardize release branch pattern to release-X.Y.x (#3999 ) Standardize CI workflows to use `release-X.Y.x` branch pattern instead of multiple numeric variants. That's the pattern we are settling on. See https://github.com/llamastack/llama-stack-ops/pull/20 for reference.	2025-10-30 21:33:32 -07:00
Doug Edgar	e8cd8508b5	fix: handle missing external_providers_dir (#3974 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 50s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes the handling of the external_providers_dir configuration field to align with its ongoing deprecation, in favor of the provider `module` specification approach. It addresses the issue in #3950, where using the default provided run.yaml config resulted in the `external_providers_dir` parameter being set to the literal string `None`, and crashing the llama-stack server when starting. <!-- If resolving an issue, uncomment and update the line below --> Closes #3950 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Built a new container image from `podman build . -f containers/Containerfile --build-arg DISTRO_NAME=starter --tag llama-stack:starter` - Tested it locally with `podman run -it localhost/llama-stack:starter` - Tested it on an OpenShift 4.19 cluster, deployed via the llama-stack-k8s-operator. Signed-off-by: Doug Edgar <dedgar@redhat.com>	2025-10-30 17:01:31 -07:00
Derek Higgins	ff2b270e2f	fix: relax structured output test assertions to handle whitespace and… (#3997 ) … case variations The ollama/llama3.2:3b-instruct-fp16 model returns string values with trailing whitespace in structured JSON output. Updated test assertions to use case-insensitive substring matching instead of exact equality. Use .lower() for case-insensitive comparison Check if expected value is contained in actual value (handles whitespace) Closes: #3996 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-30 16:55:23 -07:00
ehhuang	0e384a55a1	feat: support `workers` in run config (#3992 ) # What does this PR do? ## Test Plan Set workers: 4 in run.yaml. Start server and observe logs multiple times.	2025-10-30 16:34:12 -07:00
Ashwin Bharambe	6f90a7af4b	ci: target release-X.Y.x branches instead of release-X.Y.x-maint (#3995 ) We will be updating our release procedure to be more "normal" or "sane". We will - create release branches like normal people - land cherry-picks onto those branches - run releases off of those branches - no more "rc" branch pollution either Given that, this PR cleans things up a bit - Remove `-maint` suffix from release branch patterns in CI workflows - Update branch matching to `release-X.Y.x` format	2025-10-30 16:27:13 -07:00
Ashwin Bharambe	90234d6973	ci: support release branches and match client branch (#3990 ) - Update workflows to trigger on release-X.Y.x-maint branches - When PR targets release branch, fetch matching branch from llama-stack-client-python - Falls back to main if matching client branch doesn't exist - Updated workflows: - integration-tests.yml - integration-auth-tests.yml - integration-sql-store-tests.yml - integration-vector-io-tests.yml - unit-tests.yml - backward-compat.yml - pre-commit.yml	2025-10-30 15:20:34 -07:00
Ashwin Bharambe	c2ae42b343	fix(ci): show pre-commit output easily on failure (#3985 ) Right now, the failed Step which is opened by GH by default tells me to just go up and click and scroll through for no reason.	2025-10-30 11:48:20 -07:00
Ashwin Bharambe	77c8bc6fa7	fix(ci): add back server:ci-tests to replay tests (#3976 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details Python Package Build Test / build (3.12) (push) Failing after 39s Details Unit Tests / unit-tests (3.12) (push) Failing after 40s Details UI Tests / ui-tests (22) (push) Successful in 42s Details It is useful for local debugging. If both server and docker are failing, you can just run server locally to debug which is much easier to do.	2025-10-30 11:02:59 -07:00
ehhuang	5e20938832	fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture (#3982 ) # What does this PR do? this is meant to be a manual flag ## Test Plan CI	2025-10-30 09:13:04 -07:00
Sébastien Han	b4ea05ada9	chore: add batches to openapi schema (#3980 ) # What does this PR do? While working on https://github.com/llamastack/llama-stack/pull/3944 I realized that the batches API wasn't generated. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-30 07:08:35 -07:00
Derek Higgins	19d85003de	test: Updated test skips that were marked with "inline::vllm" (#3979 ) This should be "remote::vllm". This causes some log probs tests to be skipped with remote vllm. (They fail if run). Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-30 14:48:21 +01:00
Ashwin Bharambe	174ef162b3	fix(mypy): add fast and full mypy modes (#3975 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test llama stack list-deps / generate-matrix (push) Successful in 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Test Llama Stack Build / build (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details UI Tests / ui-tests (22) (push) Successful in 38s Details `mypy` became very slow for the common path. This can make local pre-commit runs very slow. Let's restore that. - restore fast mirrors-mypy hook for local runs - add optional mypy-full hook and docs so devs can match CI - run full mypy in CI with a hint when failures occur ### Test Plan - uv run pre-commit run mypy --all-files - uv run pre-commit run mypy-full --hook-stage manual --all-files - uv run --group dev --group type_checking mypy	2025-10-29 19:02:32 -07:00
Charlie Doern	e8ecc99524	fix!: remove chunk_id property from Chunk class (#3954 ) # What does this PR do? chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 18:59:59 -07:00
Charlie Doern	0ef9166c7e	fix: make integration-tests.sh Mac friendly (#3971 ) # What does this PR do? When running ./scripts/integration-tests.sh --network host on mac fails regularly due to how Docker runs on MacOS. if on mac, keep network bridge mode. before: === Starting Docker Container === Using image: localhost/distribution-ci-tests:dev WARNING: Published ports are discarded when using host network mode Waiting for Docker container to start... ❌ Docker container failed to start Container logs: INFO 2025-10-29 18:38:32,180 llama_stack.cli.stack.run:100 cli: Using run configuration: /workspace/src/llama_stack/distributions/ci-tests/run.yaml ... (stack starts but is not reachable on network) after: === Starting Docker Container === Using image: localhost/distribution-ci-tests:dev Using bridge networking with port mapping (non-Linux) Waiting for Docker container to start... ✅ Docker container started successfully === Running Integration Tests === ## Test Plan integration tests pass! Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 14:12:09 -07:00
Ashwin Bharambe	da8f014b96	feat(models): list models available via provider_data header (#3968 ) ## Summary When users provide API keys via `X-LlamaStack-Provider-Data` header, `models.list()` now returns models they can access from those providers, not just pre-registered models from the registry. This complements the routing fix from `f88416ef8` which enabled inference calls with `provider_id/model_id` format for unregistered models. Users can now discover which models are available to them before making inference requests. The implementation reuses `NeedsRequestProviderData.get_request_provider_data()` to validate credentials, then dynamically fetches models from providers without caching them since they're user-specific. Registry models take precedence to respect any pre-configured aliases. ## Test Script ```python #!/usr/bin/env python3 import json import os from openai import OpenAI # Test 1: Without provider_data header client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy") models = client.models.list() anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id] print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic") # Test 2: With provider_data header containing Anthropic API key anthropic_api_key = os.environ["ANTHROPIC_API_KEY"] client_with_key = OpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="dummy", default_headers={ "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key}) } ) models_with_key = client_with_key.models.list() anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id] print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic") print(f"Anthropic models: {anthropic_with}") assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key" print("\n✓ Test passed!") ``` Run with a stack that has Anthropic provider configured (but without API key in config): ```bash ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py ```	2025-10-29 14:03:03 -07:00
Ashwin Bharambe	c9d4b6c54f	chore(mypy): part-04 resolve mypy errors in meta_reference agents (#3969 ) ## Summary Fixes all mypy type errors in `providers/inline/agents/meta_reference/` and removes exclusions from pyproject.toml. ## Changes - Fix type annotations for Safety API message parameters (OpenAIMessageParam) - Add Action enum usage in access control checks - Correct method signatures to match API supertype (parameter ordering) - Handle optional return types with proper None checks - Remove 3 meta_reference exclusions from mypy config Files fixed: 25 errors across 3 files (safety.py, persistence.py, agents.py)	2025-10-29 13:37:28 -07:00
Omar Abdelwahab	e6b27db30a	docs: A getting started notebook featuring simple agent examples. (#3955 ) # What does this PR do? Getting started notebook featuring simple agent examples. --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-10-29 14:13:34 -04:00
Ashwin Bharambe	7dc48a75e5	chore: delete openapi.stainless.yaml for now. not source of truth. (#3967 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Test llama stack list-deps / list-deps (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 2m34s Details This is really not the source of truth yet and is causing more confusion right now.	2025-10-29 10:45:38 -07:00
Nathan Weinberg	b90c6a2c8b	fix(docs): remove leftover telemetry sidebar section (#3961 ) Leftover telemetry section was preventing `npm run build` from completing successfully Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 11:20:13 -04:00
Nathan Weinberg	10977caff3	fix: typo in .gitignore (#3960 ) typo in https://github.com/llamastack/llama-stack/pull/3959 (whoops) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 11:08:47 -04:00
Ashwin Bharambe	a4f97559d1	fix(mypy): part-03 completely resolve meta reference responses impl typing issues (#3951 ) ## Summary Resolves all mypy errors in meta reference agent OpenAI responses implementation by adding proper type narrowing, None checks, and Sequence type support. ## Changes - Fixed streaming.py, openai_responses.py, utils.py, tool_executor.py, agent_instance.py - Added Sequence type support to schema generator (ensures correct JSON schema generation) - Applied union type narrowing and None checks throughout ## Test plan - All modified files pass mypy type checking (0 errors) - Schema generator produces correct `type: array` for Sequence types --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 08:07:15 -07:00
Ashwin Bharambe	e5c27dbcbf	fix(mypy): part-02 resolve OpenAI compatibility layer type issues (#3947 ) ## Summary Fixes 111 mypy type errors in OpenAI compatibility layer (PR3 in mypy remediation series). Changes: - `litellm_openai_mixin.py`: Added type annotations, None checks for tool_config/model_store access - `openai_compat.py`: Added None checks throughout, fixed TypedDict expansions, proper type conversions for messages/tool_calls Result: 23 → 1 errors in litellm file, 88 → 0 errors in openai_compat file --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 08:06:40 -07:00
Ashwin Bharambe	ce31aa1704	fix(mypy-cleanup): part-01 resolve meta reference agent type issues (126 errors) (#3945 ) Error fixes in Agents implementation (`meta-reference` provider) -- adding proper type annotations and using type narrowing for optional attributes. Essentially a bunch of `if x and x_foo := getattr(x, "foo")` instead of `x.foo` directly Part of ongoing mypy remediation effort. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 07:54:30 -07:00
Nathan Weinberg	22bf0d0471	chore: ignore API docs generation (#3959 ) See `1432743473` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 10:27:53 -04:00
Nathan Weinberg	b6bb8fbf64	ci: add pre-commit check ensuring FIPS compliance (#3899 ) # What does this PR do? this commit adds a new pre-commit hook to scan for non-FIPS compliant function usage within llama-stack Closes #3427 ## Test Plan Ran locally Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 10:21:35 -04:00
Ashwin Bharambe	e809d21357	feat: add backward compatibility tests for run.yaml (#3952 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (push) Failing after 45s Details API Conformance Tests / check-schema-compatibility (push) Successful in 54s Details UI Tests / ui-tests (22) (push) Successful in 52s Details Pre-commit / pre-commit (push) Successful in 3m28s Details This adds automated backward compatibility testing for `run.yaml` files. As we evolve `StackRunConfig`, changes can inadvertently break existing user configurations. This workflow catches those breaks before merge. We test old run.yaml files (from main and the latest release) against the PR's new code. If configs that worked before now fail, the PR is blocked unless explicitly acknowledged as a breaking change. Two test layers: - Schema validation: Quick pytest checks that configs parse without errors - Integration tests: Full test suite execution to catch runtime semantic issues (cross-field validations, provider initialization, etc.) What we test against: - main branch: Breaking changes here block the PR (this is the gate) - Latest release: Informational only - shows if we've drifted from what users have If tests fail, the PR author must acknowledge the breaking change by adding `!:` to the PR title (e.g., `feat!: change xyz`) or including `BREAKING CHANGE:` in a commit message. Once acknowledged, the check passes with a warning. These jobs are run: 1. `check-main-compatibility` - Schema validation of all distribution run.yaml files from main 2. `test-integration-main` - Full integration test suite using main's ci-tests run.yaml 3. `test-integration-release` - Integration tests with latest release config (informational) 4. `check-schema-release-compatibility` - Schema checks against release (informational) The integration tests catch issues that schema validation alone would miss, like assertion failures in `StackRunConfig.validate_server_stores()` or provider-specific runtime logic. Resolves #3311 Related to #3237	2025-10-28 21:51:56 -07:00
Derek Higgins	c678682cdd	chore: remove unused methods from InferenceRouter (#3953 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 24s Details Test llama stack list-deps / generate-matrix (push) Successful in 25s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (push) Failing after 32s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 40s Details UI Tests / ui-tests (22) (push) Successful in 59s Details Test Llama Stack Build / build (push) Failing after 1m1s Details Pre-commit / pre-commit (push) Successful in 5m23s Details Remove unused methods that became obsolete after `d266c59c`: o _compute_and_log_token_usage o _count_tokens o stream_tokens_and_compute_metrics o count_tokens_and_compute_metrics These methods are no longer referenced anywhere in the codebase following the removal of deprecated inference.chat_completion implementations. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-28 17:12:41 -07:00
ehhuang	1aa8979050	test: enable telemetry tests in server mode (#3927 ) # What does this PR do? - added a server-based test OLTP collector ## Test Plan CI	2025-10-28 16:33:48 -07:00
ehhuang	1f9d48cd54	feat: openai files provider (#3946 ) # What does this PR do? - Adds OpenAI files provider - Note that file content retrieval is pretty limited by `purpose` https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com ## Test Plan Modify run yaml to use openai files provider: ``` files: - provider_id: openai provider_type: remote::openai config: api_key: ${env.OPENAI_API_KEY:=} metadata_store: backend: sql_default table_name: openai_files_metadata # Then run files tests ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files ```	2025-10-28 16:25:03 -07:00
raghotham	feabcdd67b	docs: add documentation on how to use custom run yaml in docker (#3949 ) as title test plan: ```yaml # custom-ollama-run.yaml version: 2 image_name: starter external_providers_dir: /.llama/providers.d apis: - inference - vector_io - files - safety - tool_runtime - agents providers: inference: # Single Ollama provider for all models - provider_id: ollama provider_type: remote::ollama config: url: ${env.OLLAMA_URL:=http://localhost:11434} vector_io: - provider_id: faiss provider_type: inline::faiss config: persistence: namespace: vector_io::faiss backend: kv_default files: - provider_id: meta-reference-files provider_type: inline::localfs config: storage_dir: /.llama/files metadata_store: table_name: files_metadata backend: sql_default safety: - provider_id: llama-guard provider_type: inline::llama-guard config: excluded_categories: [] tool_runtime: - provider_id: rag-runtime provider_type: inline::rag-runtime agents: - provider_id: meta-reference provider_type: inline::meta-reference config: persistence: agent_state: namespace: agents backend: kv_default responses: table_name: responses backend: sql_default max_write_queue_size: 10000 num_writers: 4 storage: backends: kv_default: type: kv_sqlite db_path: /.llama/kvstore.db sql_default: type: sql_sqlite db_path: /.llama/sql_store.db stores: metadata: namespace: registry backend: kv_default inference: table_name: inference_store backend: sql_default max_write_queue_size: 10000 num_writers: 4 conversations: table_name: openai_conversations backend: sql_default registered_resources: models: # All models use the same 'ollama' provider - model_id: llama3.2-vision:latest provider_id: ollama provider_model_id: llama3.2-vision:latest model_type: llm - model_id: llama3.2:3b provider_id: ollama provider_model_id: llama3.2:3b model_type: llm # Embedding models - model_id: nomic-embed-text-v2-moe provider_id: ollama provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K model_type: embedding metadata: embedding_dimension: 768 shields: [] vector_dbs: [] datasets: [] scoring_fns: [] benchmarks: [] tool_groups: [] server: port: 8321 telemetry: enabled: true vector_stores: default_provider_id: faiss default_embedding_model: provider_id: ollama model_id: toshk0/nomic-embed-text-v2-moe:Q6_K ``` ```bash docker run -it --pull always -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT -v ~/.llama:/root/.llama -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml -e RUN_CONFIG_PATH=/app/custom-run.yaml -e OLLAMA_URL=http://host.docker.internal:11434/ llamastack/distribution-starter:0.3.0 --port $LLAMA_STACK_PORT ```	2025-10-28 16:05:44 -07:00
Ashwin Bharambe	f88416ef87	fix(inference): enable routing of models with provider_data alone (#3928 ) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-10-28 11:16:37 -07:00
Ashwin Bharambe	94b0592240	fix(mypy): add type stubs and fix typing issues (#3938 ) Adds type stubs and fixes mypy errors for better type coverage. Changes: - Added type_checking dependency group with type stubs (torchtune, trl, etc.) - Added lm-format-enforcer to pre-commit hook - Created HFAutoModel Protocol for type-safe HuggingFace model handling - Added mypy.overrides for untyped libraries (torchtune, fairscale, etc.) - Fixed type issues in post-training providers, databricks, and api_recorder Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude list). --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 11:00:09 -07:00
Ashwin Bharambe	1d385b5b75	fix(mypy): resolve OpenAI SDK and provider type issues (#3936 ) ## Summary - Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls - Fix incorrect OpenAIChatCompletionChunk import in vllm provider - Refactor to avoid type:ignore comments by using conditional kwargs ## Changes openai_mixin.py (9 errors fixed): - Build kwargs conditionally for embeddings.create() to avoid NotGiven/Omit mismatch - Only include parameters when they have actual values (not None) gemini.py (9 errors fixed): - Apply same conditional kwargs pattern - Add missing Any import vllm.py (2 errors fixed): - Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference - Remove incorrect alias from openai package ## Technical Notes The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type `NotGiven` but parameter signatures expect `Omit`. By only passing parameters with actual values, we avoid this mismatch entirely without needing `# type: ignore` comments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:54:29 -07:00
Ashwin Bharambe	d009dc29f7	fix(mypy): resolve provider utility and testing type issues (#3935 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 51s Details Pre-commit / pre-commit (push) Successful in 2m0s Details Fixes mypy type errors in provider utilities and testing infrastructure: - `mcp.py`: Cast incompatible client types, wrap image data properly - `batches.py`: Rename walrus variable to avoid shadowing - `api_recorder.py`: Use cast for Pydantic field annotation No functional changes. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:37:27 -07:00
Ashwin Bharambe	fcf07790c8	fix(mypy): resolve model implementation typing issues (#3934 ) ## Summary Fixes mypy type errors across 4 model implementation files (Phase 2d of mypy suppression removal plan): - `src/llama_stack/models/llama/llama3/multimodal/image_transform.py` (10 errors fixed) - `src/llama_stack/models/llama/checkpoint.py` (2 errors fixed) - `src/llama_stack/models/llama/hadamard_utils.py` (1 error fixed) - `src/llama_stack/models/llama/llama3/multimodal/encoder_utils.py` (1 error fixed) ## Changes ### image_transform.py - Fixed return type annotation for `find_supported_resolutions` from `Tensor` to `list[tuple[int, int]]` - Fixed parameter and return type annotations for `resize_without_distortion` from `Tensor` to `Image.Image` - Resolved variable shadowing by using separate names: `possible_resolutions_list` for the list and `possible_resolutions_tensor` for the tensor ### checkpoint.py - Replaced deprecated `torch.BFloat16Tensor` and `torch.cuda.BFloat16Tensor` with `torch.set_default_dtype(torch.bfloat16)` - Fixed variable shadowing by renaming numpy array to `ckpt_paths_array` to distinguish from the parameter `ckpt_paths: list[Path]` ### hadamard_utils.py - Added `isinstance` assertion to narrow type from `nn.Module` to `nn.Linear` before accessing `in_features` attribute ### encoder_utils.py - Fixed variable shadowing by using `masks_list` for list accumulation and `masks` for the final Tensor result ## Test plan - Verified all files pass mypy type checking (only optional dependency import warnings remain) - No functional changes - only type annotations and variable naming improvements Stacks on PR #3933 Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:28:29 -07:00
Ashwin Bharambe	6ce59b5df8	fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933 ) Fixes mypy type errors in provider utilities: - MongoDB: Fix AsyncMongoClient parameters, use async iteration for cursor - Batches: Handle memoryview\|bytes union for file decoding - Auth: Add missing imports, validate JWKS URI, conditionally pass parameters Fixes 11 type errors. No functional changes.	2025-10-28 10:23:39 -07:00

... 2 3 4 5 6 ...

3213 commits