llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-06 10:37:22 +00:00

Author	SHA1	Message	Date
Eric Huang	b7be18f4db	chore!: BREAKING CHANGE: remove sqlite from telemetry config # What does this PR do? ## Test Plan	2025-10-14 14:13:24 -07:00
Jiayi Ni	d875e427bf	refactor: use `extra_body` to pass in `input_type` params for asymmetric embedding models for NVIDIA Inference Provider (#3804 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 9s Details API Conformance Tests / check-schema-compatibility (push) Successful in 16s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Previously, the NVIDIA inference provider implemented a custom `openai_embeddings` method with a hardcoded `input_type="query"` parameter, which is required by NVIDIA asymmetric embedding models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)). Recently `extra_body` parameter is added to the embeddings API ([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)). So, this PR updates the NVIDIA inference provider to use the base `OpenAIMixin.openai_embeddings` method instead and pass the `input_type` through the `extra_body` parameter for asymmetric embedding models. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run the following command for the ```embedding_model```: ```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```, ```nvidia/nv-embedqa-mistral-7b-v2```, and ```snowflake/arctic-embed-l```. ``` pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record ```	2025-10-14 13:52:55 -07:00
ehhuang	866c13cdc2	chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs (#3740 ) # What does this PR do? As discussed on discord, we do not need to reinvent the wheel for telemetry. Instead we'll lean into the canonical OTEL stack. Logs/traces/metrics will still be sent via OTEL - they just won't be stored on, queried through Stack. This is the first of many PRs to remove telemetry API from Stack. 1) removed webmethod decorators to remove from API spec 2) removed tests as @iamemilio is adding them on otel directly. ## Test Plan	2025-10-14 13:48:40 -07:00
IAN MILLER	007efa6eb5	refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-10-14 10:44:20 -04:00
Cesare Pompeiano	0dbf79c328	fix: Fixed WatsonX remote inference provider (#3801 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 31s Details UI Tests / ui-tests (22) (push) Successful in 46s Details Pre-commit / pre-commit (push) Successful in 2m13s Details # What does this PR do? This PR fixes issues with the WatsonX provider so it works correctly with LiteLLM. The main problem was that WatsonX requests failed because the provider data validator didn’t properly handle the API key and project ID. This was fixed by updating the WatsonXProviderDataValidator and ensuring the provider data is loaded correctly. The openai_chat_completion method was also updated to match the behavior of other providers while adding WatsonX-specific fields like project_id. It still calls await super().openai_chat_completion.__func__(self, params) to keep the existing setup and tracing logic. After these changes, WatsonX requests now run correctly. ## Test Plan The changes were tested by running chat completion requests and confirming that credentials and project parameters are passed correctly. I have tested with my WatsonX credentials, by using the cli with `uv run llama-stack-client inference chat-completion --session` --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-10-14 14:52:32 +02:00
Sébastien Han	1136daf310	fix: replace python-jose with PyJWT for JWT handling (#3756 ) # What does this PR do? This commit migrates the authentication system from python-jose to PyJWT to eliminate the dependency on the archived rsa package. The migration includes: - Refactored OAuth2TokenAuthProvider to use PyJWT's PyJWKClient for clean JWKS handling - Removed manual JWKS fetching, caching and key extraction logic in favor of PyJWT's built-in functionality The new implementation is cleaner, more maintainable, and follows PyJWT best practices while maintaining full backward compatibility. ## Test Plan Unit tests. Auth CI. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-14 09:35:48 +02:00
Francisco Arceo	968c364a3e	chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… (#3802 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 18s Details UI Tests / ui-tests (22) (push) Successful in 29s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? 2 main changes: 1. Remove `provider_id` requirement in call to vector stores and 2. Removes "register first embedding model" logic - Now forces embedding model id as required on Vector Store creation Simplifies the UX for OpenAI to: ```python vs = client.vector_stores.create( name="my_citations_db", extra_body={ "embedding_model": "ollama/nomic-embed-text:latest", } ) ``` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-13 10:25:36 -07:00
raghotham	b95f095a54	feat: Allow :memory: for kvstore (#3696 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m21s Details ## Test Plan added unit tests	2025-10-13 11:19:27 +02:00
Ashwin Bharambe	ecc8a554d2	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m23s Details Applies the same pattern from https://github.com/llamastack/llama-stack/pull/3777 to embeddings and vector_stores.create() endpoints. This should _not_ be a breaking change since (a) our tests were already using the `extra_body` parameter when passing in to the backend (b) but the backend probably wasn't extracting the parameters correctly. This PR will fix that. Updated APIs: `openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()`	2025-10-12 19:01:52 -07:00
slekkala1	3bb6ef351b	chore!: Safety api refactoring to use OpenAIMessageParam (#3796 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m28s Details # What does this PR do? Remove usage of deprecated `Message` from Safety apis ## Test Plan CI	2025-10-12 08:01:00 -07:00
dependabot[bot]	82cbcada39	chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui (#3788 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m26s Details Bumps [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) from 0.542.0 to 0.545.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/lucide-icons/lucide/releases">lucide-react's releases</a>.</em></p> <blockquote> <h2>Version 0.545.0</h2> <h2>What's Changed</h2> <ul> <li>fix(icons): changed <code>flame</code> icon by <a href="https://github.com/jamiemlaw"><code>@jamiemlaw</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3600">lucide-icons/lucide#3600</a></li> <li>fix(icons): arcified <code>square-m</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3549">lucide-icons/lucide#3549</a></li> <li>chore(deps-dev): bump vite from 6.3.5 to 6.3.6 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3611">lucide-icons/lucide#3611</a></li> <li>fix(icons): changed <code>combine</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3200">lucide-icons/lucide#3200</a></li> <li>fix(icons): changed <code>building-2</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3509">lucide-icons/lucide#3509</a></li> <li>chore(deps): bump devalue from 5.1.1 to 5.3.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3638">lucide-icons/lucide#3638</a></li> <li>feat(icons): Add <code>motorbike</code> icon by <a href="https://github.com/jamiemlaw"><code>@jamiemlaw</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3371">lucide-icons/lucide#3371</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.544.0...0.545.0">https://github.com/lucide-icons/lucide/compare/0.544.0...0.545.0</a></p> <h2>Version 0.544.0</h2> <h2>What's Changed</h2> <ul> <li>docs: update lucide-static documentation about raw string imports by <a href="https://github.com/pascalduez"><code>@pascalduez</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3524">lucide-icons/lucide#3524</a></li> <li>feat(icons): added <code>ev-charger</code> icon by <a href="https://github.com/UsamaKhan"><code>@UsamaKhan</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/2781">lucide-icons/lucide#2781</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/pascalduez"><code>@pascalduez</code></a> made their first contribution in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3524">lucide-icons/lucide#3524</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.543.0...0.544.0">https://github.com/lucide-icons/lucide/compare/0.543.0...0.544.0</a></p> <h2>Version 0.543.0</h2> <h2>What's Changed</h2> <ul> <li>feat(preview-comment): put x-ray at top if there are more than 7 changed icons to prevent them from being cut of by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3589">lucide-icons/lucide#3589</a></li> <li>fix(icons): changed <code>church</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/2971">lucide-icons/lucide#2971</a></li> <li>chore(metadata): Added tags to <code>messages-square</code> by <a href="https://github.com/jamiemlaw"><code>@jamiemlaw</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3529">lucide-icons/lucide#3529</a></li> <li>fix(icons): Optimise <code>bug</code> icons by <a href="https://github.com/jamiemlaw"><code>@jamiemlaw</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3574">lucide-icons/lucide#3574</a></li> <li>fix(icons): changed list/text & derived icons by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3568">lucide-icons/lucide#3568</a></li> <li>fix(icons): changed <code>panel-top-bottom-dashed</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3584">lucide-icons/lucide#3584</a></li> <li>fix(icons): changed <code>message-square-quote</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3550">lucide-icons/lucide#3550</a></li> <li>fix(meta): added tag to <code>ship</code> metadata by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3559">lucide-icons/lucide#3559</a></li> <li>fix(meta): add tags to <code>id-card-lanyard</code> metadata by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3534">lucide-icons/lucide#3534</a></li> <li>fix(icons): changed <code>calendar-cog</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3583">lucide-icons/lucide#3583</a></li> <li>chore(deps): bump astro from 5.5.2 to 5.13.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3564">lucide-icons/lucide#3564</a></li> <li>feat(packages): add new package for flutter by <a href="https://github.com/vqh2602"><code>@vqh2602</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3536">lucide-icons/lucide#3536</a></li> <li>feat(icons): added <code>house-heart</code> icon by <a href="https://github.com/danielbayley"><code>@danielbayley</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3239">lucide-icons/lucide#3239</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.542.0...0.543.0">https://github.com/lucide-icons/lucide/compare/0.542.0...0.543.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`1cfb3ff70e`"><code>1cfb3ff</code></a> chore(deps-dev): bump vite from 6.3.5 to 6.3.6 (<a href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3611">#3611</a>)</li> <li>See full diff in <a href="https://github.com/lucide-icons/lucide/commits/0.545.0/packages/lucide-react">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lucide-react&package-manager=npm_and_yarn&previous-version=0.542.0&new-version=0.545.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-11 21:40:48 -04:00
dependabot[bot]	e94840d298	chore(ui-deps): bump framer-motion from 12.23.12 to 12.23.24 in /llama_stack/ui (#3792 ) Bumps [framer-motion](https://github.com/motiondivision/motion) from 12.23.12 to 12.23.24. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/motiondivision/motion/blob/main/CHANGELOG.md">framer-motion's changelog</a>.</em></p> <blockquote> <h2>[12.23.24] 2025-10-10</h2> <h3>Fixed</h3> <ul> <li>Ensure that when a component remounts, it continues to fire animations even when <code>initial={false}</code>.</li> </ul> <h2>[12.23.23] 2025-10-10</h2> <h3>Added</h3> <ul> <li>Exporting <code>PresenceChild</code> and <code>PopChild</code> type for internal use.</li> </ul> <h2>[12.23.22] 2025-09-25</h2> <h3>Added</h3> <ul> <li>Exporting <code>HTMLElements</code> and <code>useComposedRefs</code> type for internal use.</li> </ul> <h2>[12.23.21] 2025-09-24</h2> <h3>Fixed</h3> <ul> <li>Fixing main-thread <code>scroll</code> with animations that contain <code>delay</code>.</li> </ul> <h2>[12.23.20] 2025-09-24</h2> <h3>Fixed</h3> <ul> <li>Suppress non-animatable value warning for instant animations.</li> </ul> <h2>[12.23.19] 2025-09-23</h2> <h3>Fixed</h3> <ul> <li>Remove support for changing <code>ref</code> prop.</li> </ul> <h2>[12.23.18] 2025-09-19</h2> <h3>Fixed</h3> <ul> <li><code><motion /></code> components now support changing <code>ref</code> prop.</li> </ul> <h2>[12.23.17] 2025-09-19</h2> <h3>Fixed</h3> <ul> <li>Ensure <code>animate()</code> <code>onComplete</code> only fires once, when all values are complete.</li> </ul> <h2>[12.23.16] 2025-09-19</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`b5df740a46`"><code>b5df740</code></a> v12.23.24</li> <li><a href="`808ebce630`"><code>808ebce</code></a> Updating changelog</li> <li><a href="`237eee2246`"><code>237eee2</code></a> v12.23.23</li> <li><a href="`834965c803`"><code>834965c</code></a> Updating changelog</li> <li><a href="`40690864e9`"><code>4069086</code></a> Update README.md</li> <li><a href="`6da6b61e94`"><code>6da6b61</code></a> Update README.md with new sponsor links</li> <li><a href="`e36683149d`"><code>e366831</code></a> Update README.md</li> <li><a href="`7796f4f1e0`"><code>7796f4f</code></a> Update Gold section with new links and images</li> <li><a href="`d1bb93757c`"><code>d1bb937</code></a> Update sponsor section in README.md</li> <li><a href="`97fba16059`"><code>97fba16</code></a> Update sponsorship logos in README</li> <li>Additional commits viewable in <a href="https://github.com/motiondivision/motion/compare/v12.23.12...v12.23.24">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=framer-motion&package-manager=npm_and_yarn&previous-version=12.23.12&new-version=12.23.24)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-11 21:36:01 -04:00
dependabot[bot]	25ea94fcf7	chore(ui-deps): bump eslint from 9.26.0 to 9.37.0 in /llama_stack/ui (#3791 ) Bumps [eslint](https://github.com/eslint/eslint) from 9.26.0 to 9.37.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/eslint/eslint/releases">eslint's releases</a>.</em></p> <blockquote> <h2>v9.37.0</h2> <h2>Features</h2> <ul> <li><a href="`39f7fb493a`"><code>39f7fb4</code></a> feat: <code>preserve-caught-error</code> should recognize all static "cause" keys (<a href="https://redirect.github.com/eslint/eslint/issues/20163">#20163</a>) (Pixel998)</li> <li><a href="`f81eabc584`"><code>f81eabc</code></a> feat: support TS syntax in <code>no-restricted-imports</code> (<a href="https://redirect.github.com/eslint/eslint/issues/19562">#19562</a>) (Nitin Kumar)</li> </ul> <h2>Bug Fixes</h2> <ul> <li><a href="`a129cced7a`"><code>a129cce</code></a> fix: correct <code>no-loss-of-precision</code> false positives for leading zeros (<a href="https://redirect.github.com/eslint/eslint/issues/20164">#20164</a>) (Francesco Trotta)</li> <li><a href="`09e04fcc3f`"><code>09e04fc</code></a> fix: add missing AST token types (<a href="https://redirect.github.com/eslint/eslint/issues/20172">#20172</a>) (Pixel998)</li> <li><a href="`861c6da2bd`"><code>861c6da</code></a> fix: correct <code>ESLint</code> typings (<a href="https://redirect.github.com/eslint/eslint/issues/20122">#20122</a>) (Pixel998)</li> </ul> <h2>Documentation</h2> <ul> <li><a href="`b950359c5f`"><code>b950359</code></a> docs: fix typos across the docs (<a href="https://redirect.github.com/eslint/eslint/issues/20182">#20182</a>) (루밀LuMir)</li> <li><a href="`42498a2798`"><code>42498a2</code></a> docs: improve ToC accessibility by hiding non-semantic character (<a href="https://redirect.github.com/eslint/eslint/issues/20181">#20181</a>) (Percy Ma)</li> <li><a href="`29ea092b93`"><code>29ea092</code></a> docs: Update README (GitHub Actions Bot)</li> <li><a href="`5c97a04578`"><code>5c97a04</code></a> docs: show <code>availableUntil</code> in deprecated rule banner (<a href="https://redirect.github.com/eslint/eslint/issues/20170">#20170</a>) (Pixel998)</li> <li><a href="`90a71bf502`"><code>90a71bf</code></a> docs: update <code>README</code> files to add badge and instructions (<a href="https://redirect.github.com/eslint/eslint/issues/20115">#20115</a>) (루밀LuMir)</li> <li><a href="`1603ae1526`"><code>1603ae1</code></a> docs: update references from <code>master</code> to <code>main</code> (<a href="https://redirect.github.com/eslint/eslint/issues/20153">#20153</a>) (루밀LuMir)</li> </ul> <h2>Chores</h2> <ul> <li><a href="`afe8a13469`"><code>afe8a13</code></a> chore: update <code>@eslint/js</code> dependency to version 9.37.0 (<a href="https://redirect.github.com/eslint/eslint/issues/20183">#20183</a>) (Francesco Trotta)</li> <li><a href="`abee4ca1fa`"><code>abee4ca</code></a> chore: package.json update for <code>@eslint/js</code> release (Jenkins)</li> <li><a href="`fc9381f6ca`"><code>fc9381f</code></a> chore: fix typos in comments (<a href="https://redirect.github.com/eslint/eslint/issues/20175">#20175</a>) (overlookmotel)</li> <li><a href="`e1574a22d3`"><code>e1574a2</code></a> chore: unpin jiti (<a href="https://redirect.github.com/eslint/eslint/issues/20173">#20173</a>) (renovate[bot])</li> <li><a href="`e1ac05e2fa`"><code>e1ac05e</code></a> refactor: mark <code>ESLint.findConfigFile()</code> as <code>async</code>, add missing docs (<a href="https://redirect.github.com/eslint/eslint/issues/20157">#20157</a>) (Pixel998)</li> <li><a href="`347906d627`"><code>347906d</code></a> chore: update eslint (<a href="https://redirect.github.com/eslint/eslint/issues/20149">#20149</a>) (renovate[bot])</li> <li><a href="`0cb5897e24`"><code>0cb5897</code></a> test: remove tmp dir created for circular fixes in multithread mode test (<a href="https://redirect.github.com/eslint/eslint/issues/20146">#20146</a>) (Milos Djermanovic)</li> <li><a href="`bb995665e3`"><code>bb99566</code></a> ci: pin <code>jiti</code> to version 2.5.1 (<a href="https://redirect.github.com/eslint/eslint/issues/20151">#20151</a>) (Pixel998)</li> <li><a href="`177f669adc`"><code>177f669</code></a> perf: improve worker count calculation for <code>"auto"</code> concurrency (<a href="https://redirect.github.com/eslint/eslint/issues/20067">#20067</a>) (Francesco Trotta)</li> <li><a href="`448b57bca3`"><code>448b57b</code></a> chore: Mark deprecated formatting rules as available until v11.0.0 (<a href="https://redirect.github.com/eslint/eslint/issues/20144">#20144</a>) (Milos Djermanovic)</li> </ul> <h2>v9.36.0</h2> <h2>Features</h2> <ul> <li><a href="`47afcf668d`"><code>47afcf6</code></a> feat: correct <code>preserve-caught-error</code> edge cases (<a href="https://redirect.github.com/eslint/eslint/issues/20109">#20109</a>) (Francesco Trotta)</li> </ul> <h2>Bug Fixes</h2> <ul> <li><a href="`75b74d865d`"><code>75b74d8</code></a> fix: add missing rule option types (<a href="https://redirect.github.com/eslint/eslint/issues/20127">#20127</a>) (ntnyq)</li> <li><a href="`1c0d85049e`"><code>1c0d850</code></a> fix: update <code>eslint-all.js</code> to use <code>Object.freeze</code> for <code>rules</code> object (<a href="https://redirect.github.com/eslint/eslint/issues/20116">#20116</a>) (루밀LuMir)</li> <li><a href="`7d61b7fadc`"><code>7d61b7f</code></a> fix: add missing scope types to <code>Scope.type</code> (<a href="https://redirect.github.com/eslint/eslint/issues/20110">#20110</a>) (Pixel998)</li> <li><a href="`7a670c301b`"><code>7a670c3</code></a> fix: correct rule option typings in <code>rules.d.ts</code> (<a href="https://redirect.github.com/eslint/eslint/issues/20084">#20084</a>) (Pixel998)</li> </ul> <h2>Documentation</h2> <ul> <li><a href="`b73ab12acd`"><code>b73ab12</code></a> docs: update examples to use <code>defineConfig</code> (<a href="https://redirect.github.com/eslint/eslint/issues/20131">#20131</a>) (sethamus)</li> <li><a href="`31d9392699`"><code>31d9392</code></a> docs: fix typos (<a href="https://redirect.github.com/eslint/eslint/issues/20118">#20118</a>) (Pixel998)</li> <li><a href="`c7f861b3f8`"><code>c7f861b</code></a> docs: Update README (GitHub Actions Bot)</li> <li><a href="`6b0c08b106`"><code>6b0c08b</code></a> docs: Update README (GitHub Actions Bot)</li> <li><a href="`91f97c5046`"><code>91f97c5</code></a> docs: Update README (GitHub Actions Bot)</li> </ul> <h2>Chores</h2> <ul> <li><a href="`12411e8d45`"><code>12411e8</code></a> chore: upgrade <code>@eslint/js</code><a href="https://github.com/9"><code>@9</code></a>.36.0 (<a href="https://redirect.github.com/eslint/eslint/issues/20139">#20139</a>) (Milos Djermanovic)</li> <li><a href="`488cba6b39`"><code>488cba6</code></a> chore: package.json update for <code>@eslint/js</code> release (Jenkins)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`d5d1bdf5fd`"><code>d5d1bdf</code></a> 9.37.0</li> <li><a href="`94865ff41c`"><code>94865ff</code></a> Build: changelog update for 9.37.0</li> <li><a href="`afe8a13469`"><code>afe8a13</code></a> chore: update <code>@eslint/js</code> dependency to version 9.37.0 (<a href="https://redirect.github.com/eslint/eslint/issues/20183">#20183</a>)</li> <li><a href="`abee4ca1fa`"><code>abee4ca</code></a> chore: package.json update for <code>@eslint/js</code> release</li> <li><a href="`b950359c5f`"><code>b950359</code></a> docs: fix typos across the docs (<a href="https://redirect.github.com/eslint/eslint/issues/20182">#20182</a>)</li> <li><a href="`42498a2798`"><code>42498a2</code></a> docs: improve ToC accessibility by hiding non-semantic character (<a href="https://redirect.github.com/eslint/eslint/issues/20181">#20181</a>)</li> <li><a href="`fc9381f6ca`"><code>fc9381f</code></a> chore: fix typos in comments (<a href="https://redirect.github.com/eslint/eslint/issues/20175">#20175</a>)</li> <li><a href="`e1574a22d3`"><code>e1574a2</code></a> chore: unpin jiti (<a href="https://redirect.github.com/eslint/eslint/issues/20173">#20173</a>)</li> <li><a href="`29ea092b93`"><code>29ea092</code></a> docs: Update README</li> <li><a href="`a129cced7a`"><code>a129cce</code></a> fix: correct <code>no-loss-of-precision</code> false positives for leading zeros (<a href="https://redirect.github.com/eslint/eslint/issues/20164">#20164</a>)</li> <li>Additional commits viewable in <a href="https://github.com/eslint/eslint/compare/v9.26.0...v9.37.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint&package-manager=npm_and_yarn&previous-version=9.26.0&new-version=9.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-11 18:00:29 -07:00
dependabot[bot]	190b96ea62	chore(ui-deps): bump @types/react-dom from 19.2.0 to 19.2.1 in /llama_stack/ui (#3789 ) Bumps [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom) from 19.2.0 to 19.2.1. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react-dom&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-11 18:00:22 -07:00
dependabot[bot]	4fb39f0a6a	chore(ui-deps): bump @types/react from 19.2.0 to 19.2.2 in /llama_stack/ui (#3790 ) Bumps [@types/react](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react) from 19.2.0 to 19.2.2. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-11 18:00:18 -07:00
Ashwin Bharambe	e6378872c7	fix(misc): pre-commit fix for server.py	2025-10-11 16:47:59 -07:00
Ashwin Bharambe	7c63aebd64	feat(responses)!: add reasoning and annotation added events (#3793 ) Implements missing streaming events from OpenAI Responses API spec: - reasoning text/summary events for o1/o3 models, - refusal events for safety moderation - annotation events for citations, - and file search streaming events. Added optional reasoning_content field to chat completion chunks to support non-standard provider extensions. NOTE: OpenAI does _not_ fill reasoning_content when users use the chat_completion APIs. This means there is no way for us to implement Responses (with reasoning) by using OpenAI chat completions! We'd need to transparently punt to OpenAI's responses endpoints if we wish to do that. For others though (vLLM, etc.) we can use it. ## Test Plan File search streaming test passes: ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --setup gpt --inference-mode replay --pattern test_response_file_search_streaming_events ``` Need more complex setup and validation for reasoning tests (need a vLLM powered OSS model maybe gpt-oss which can return reasoning_content). I will do that in a followup PR.	2025-10-11 16:47:14 -07:00
Ashwin Bharambe	f365961731	fix(tests): handle TEST_CONTEXT not being set	2025-10-11 15:31:08 -07:00
Francisco Arceo	a165b8b5bb	chore!: BREAKING CHANGE removing VectorDB APIs (#3774 ) # What does this PR do? Removes VectorDBs from API surface and our tests. Moves tests to Vector Stores. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-11 14:07:08 -07:00
ehhuang	06e4cd8e02	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? Allows passing through extra_body parameters to inference providers. With this, we removed the 2 vllm-specific parameters from completions API into `extra_body`. Before/After <img width="1883" height="324" alt="image" src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1" /> closes #2720 ## Test Plan CI and added new test ``` ❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes Uninstalled 3 packages in 125ms Installed 3 packages in 19ms INFO 2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base INFO 2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server (stack_config=server:starter) ============================================================================================================== test session starts ============================================================================================================== platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-1 configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 285 items / 284 deselected / 1 selected tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 11.773s PASSEDTerminating llama stack server process... Terminating process 98444 and its group... Server process and children terminated gracefully ============================================================================================================= slowest 10 durations ============================================================================================================== 11.88s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 3.02s call tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] ================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s ================================================================================================= ```	2025-10-10 16:21:44 -07:00
ehhuang	80d58ab519	chore: refactor (chat)completions endpoints to use shared params struct (#3761 ) # What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * #3777 * __->__ #3761	2025-10-10 15:46:34 -07:00
Derek Higgins	6954fe2274	fix(auth): allow unauthenticated access to health and version endpoints (#3736 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test Llama Stack Build / build (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 2m1s Details The AuthenticationMiddleware was blocking all requests without an Authorization header, including health and version endpoints that are needed by monitoring tools, load balancers, and Kubernetes probes. This commit allows endpoints ending in /health or /version to bypass authentication, enabling operational tooling to function properly without requiring credentials. Closes: #3735 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-10 13:41:43 -07:00
Varsha	32fde8d9a8	feat: Add /v1/embeddings endpoint to batches API (#3384 ) # What does this PR do? This PR extends the Llama Stack Batches API to support the /v1/embeddings endpoint, enabling efficient batch processing of embedding requests alongside the existing /v1/chat/completions and /v1/completions support. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes: https://github.com/llamastack/llama-stack/issues/3145 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` (stack-client) ➜ llama-stack git:(support/embeddings-api) conda activate stack-client && python -m pytest tests/unit/providers/batches/test_reference.py -v ============================================================================================================================================ test session starts ============================================================================================================================================= platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 46 items tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_and_retrieve_batch_success PASSED [ 2%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_without_metadata PASSED [ 4%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_completion_window PASSED [ 6%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[/v1/invalid/endpoint] PASSED [ 8%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[] PASSED [ 10%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_metadata PASSED [ 13%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_retrieve_batch_not_found PASSED [ 15%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_success PASSED [ 17%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[failed] PASSED [ 19%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[expired] PASSED [ 21%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[completed] PASSED [ 23%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_not_found PASSED [ 26%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_empty PASSED [ 28%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_single_batch PASSED [ 30%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_multiple_batches PASSED [ 32%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_limit PASSED [ 34%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_pagination PASSED [ 36%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_invalid_after PASSED [ 39%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_kvstore_persistence PASSED [ 41%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_not_found PASSED [ 43%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_exists_empty_content PASSED [ 45%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_mixed_valid_invalid_json PASSED [ 47%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_model PASSED [ 50%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED [ 52%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED [ 54%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED [ 56%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED [ 58%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[model-body.model-invalid_request-Model parameter is required] PASSED [ 60%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[messages-body.messages-invalid_request-Messages parameter is required] PASSED [ 63%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED [ 65%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED [ 67%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED [ 69%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED [ 71%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[model-body.model-invalid_request-Model parameter is required] PASSED [ 73%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[prompt-body.prompt-invalid_request-Prompt parameter is required] PASSED [ 76%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_url_mismatch PASSED [ 78%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_multiple_errors_per_request PASSED [ 80%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_request_format PASSED [ 82%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[custom_id-custom_id-12345-Custom_id must be a string] PASSED [ 84%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[url-url-123-URL must be a string] PASSED [ 86%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[method-method-invalid_value2-Method must be a string] PASSED [ 89%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[body-body-invalid_value3-Body must be a JSON dictionary object] PASSED [ 91%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[model-body.model-123-Model must be a string] PASSED [ 93%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[messages-body.messages-invalid messages format-Messages must be an array] PASSED [ 95%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_max_concurrent_batches PASSED [ 97%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_embeddings_endpoint PASSED [100%] ``` --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 13:25:58 -07:00
Ashwin Bharambe	1394403360	feat(responses): implement usage tracking in streaming responses (#3771 ) Implementats usage accumulation to StreamingResponseOrchestrator. The most important part was to pass `stream_options = { "include_usage": true }` to the chat_completion call. This means I will have to record all responses tests again because request hash will change :) Test changes: - Add usage assertions to streaming and non-streaming tests - Update test recordings with actual usage data from OpenAI	2025-10-10 12:27:03 -07:00
Francisco Arceo	e7d21e1ee3	feat: Add support for Conversations in Responses API (#3743 ) # What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 11:57:40 -07:00
Ashwin Bharambe	548ccff368	fix(mypy): fix wrong attribute access (#3770 )	2025-10-10 09:30:43 -07:00
grs	8bf07f91cb	feat: reuse previous mcp tool listings where possible (#3710 ) # What does this PR do? This PR checks whether, if a previous response is linked, there are mcp_list_tools objects that can be reused instead of listing the tools explicitly every time. Closes #3106 ## Test Plan Tested manually. Added unit tests to cover new behaviour. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 09:28:25 -07:00
Matthew Farrellee	0066d986c5	feat: use SecretStr for inference provider auth credentials (#3724 ) # What does this PR do? use SecretStr for OpenAIMixin providers - RemoteInferenceProviderConfig now has auth_credential: SecretStr - the default alias is api_key (most common name) - some providers override to use api_token (RunPod, vLLM, Databricks) - some providers exclude it (Ollama, TGI, Vertex AI) addresses #3517 ## Test Plan ci w/ new tests	2025-10-10 07:32:50 -07:00
Ashwin Bharambe	e039b61d26	feat(responses)!: add in_progress, failed, content part events (#3765 ) ## Summary - add schema + runtime support for response.in_progress / response.failed / response.incomplete - stream content parts with proper indexes and reasoning slots - align tests + docs with the richer event payloads ## Testing - uv run pytest tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input - uv run pytest tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py	2025-10-10 07:27:34 -07:00
Akram Ben Aissi	a548169b99	fix: allow skipping model availability check for vLLM (#3739 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Allows model check to fail gracefully instead of crashing on startup. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> set VLLM_URL to your VLLM server ``` (base) akram@Mac llama-stack % LAMA_STACK_LOGGING="all=debug" VLLM_ENABLE_MODEL_DISCOVERY=false MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run ``` ``` INFO 2025-10-08 20:11:24,637 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-10-08 20:11:24,866 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues ERROR 2025-10-08 20:11:26,160 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&redirect_uri=ht tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&response_type=code&scope=user%3Ainfo+user%3Acheck-access&state=9fba207425 5851c718aca717a5887d76%3A%2Fmodels">Found</a>. [...] INFO 2025-10-08 20:11:26,295 uvicorn.error:84 uncategorized: Started server process [83144] INFO 2025-10-08 20:11:26,296 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-08 20:11:26,297 llama_stack.core.server.server:170 core::server: Starting up INFO 2025-10-08 20:11:26,297 llama_stack.core.stack:399 core: starting registry refresh task INFO 2025-10-08 20:11:26,311 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-10-08 20:11:26,312 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ERROR 2025-10-08 20:11:26,791 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&redirect_uri=ht tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&response_type=code&scope=user%3Ainfo+user%3Acheck-access&state=8ef0cba3e1 71a4f8b04cb445cfb91a4c%3A%2Fmodels">Found</a>. ```	2025-10-10 07:23:13 -07:00
Ashwin Bharambe	aaf5036235	feat(responses): add usage types to inference and responses APIs (#3764 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Details API Conformance Tests / check-schema-compatibility (push) Successful in 36s Details UI Tests / ui-tests (22) (push) Successful in 55s Details Pre-commit / pre-commit (push) Successful in 2m7s Details ## Summary Adds OpenAI-compatible usage tracking types to enable reporting token consumption for both streaming and non-streaming responses. ## Type Definitions Chat Completion Usage (inference API): ```python class OpenAIChatCompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails \| None completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails \| None ``` Response Usage (responses API): ```python class OpenAIResponseUsage(BaseModel): input_tokens: int output_tokens: int total_tokens: int input_tokens_details: OpenAIResponseUsageInputTokensDetails \| None output_tokens_details: OpenAIResponseUsageOutputTokensDetails \| None ``` This matches OpenAI's usage reporting format and enables PR #3766 to implement usage tracking in streaming responses. Co-authored-by: Claude <noreply@anthropic.com>	2025-10-10 09:22:59 -04:00
Ashwin Bharambe	ebae0385bb	fix: update dangling references to llama download command (#3763 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 2m14s Details ## Summary After removing model management CLI in #3700, this PR updates remaining references to the old `llama download` command to use `huggingface-cli download` instead. ## Changes - Updated error messages in `meta_reference/common.py` to recommend `huggingface-cli download` - Updated error messages in `torchtune/recipes/lora_finetuning_single_device.py` to use `huggingface-cli download` - Updated post-training notebook to use `huggingface-cli download` instead of `llama download` - Fixed typo: "you model" -> "your model" ## Test Plan - Verified error messages provide correct guidance for users - Checked that notebook instructions are up-to-date with current tooling	2025-10-09 18:35:02 -07:00
Ashwin Bharambe	8fe4a216b5	fix(inference): propagate 401/403 errors from remote providers (#3762 ) ## Summary Fixes #2990 Remote provider authentication errors (401/403) were being converted to 500 Internal Server Error, preventing users from understanding why their requests failed. ## The Problem When a request with an invalid API key was sent to a remote provider: - Provider correctly returns 401 with error details - Llama Stack's `translate_exception()` didn't recognize provider SDK exceptions - Fell through to generic 500 error handler - User received: "Internal server error: An unexpected error occurred." ## The Fix Added handler in `translate_exception()` that checks for exceptions with a `status_code` attribute and preserves the original HTTP status code and error message. Before: ```json HTTP 500 {"detail": "Internal server error: An unexpected error occurred."} ``` After: ```json HTTP 401 {"detail": "Error code: 401 - {'error': {'message': 'Invalid API Key', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}"} ``` ## Tested With - ✅ groq: 401 "Invalid API Key" - ✅ openai: 401 "Incorrect API key provided" - ✅ together: 401 "Invalid API key provided" - ✅ fireworks: 403 "unauthorized" ## Test Plan Automated test script: https://gist.github.com/ashwinb/1199dd7585ffa3f4be67b111cc65f2f3 The test script: 1. Builds separate stacks for each provider 2. Registers models (with validation temporarily disabled for testing) 3. Sends requests with invalid API keys via `x-llamastack-provider-data` header 4. Verifies HTTP status codes are 401/403 (not 500) Results before fix: All providers returned 500 Results after fix: All providers correctly return 401/403 Manual verification: ```bash # 1. Build stack llama stack build --image-type venv --providers inference=remote::groq # 2. Start stack llama stack run # 3. Send request with invalid API key curl http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -H 'x-llamastack-provider-data: {"groq_api_key": "invalid-key"}' \ -d '{"model": "groq/llama3-70b-8192", "messages": [{"role": "user", "content": "test"}]}' # Expected: HTTP 401 with provider error message (not 500) ``` ## Impact - Works with all remote providers using OpenAI SDK (groq, openai, together, fireworks, etc.) - Works with any provider SDK that follows the pattern of exceptions with `status_code` attribute - No breaking changes - only affects error responses	2025-10-09 18:34:39 -07:00
Matthew Farrellee	145b2bcf25	feat: make object registration idempotent (#3752 ) # What does this PR do? objects (vector dbs, models, scoring functions, etc) have an identifier and associated object values. we allow exact duplicate registrations. we reject registrations when the identifier exists and the associated object values differ. note: model are namespaced, i.e. {provider_id}/{identifier}, while other object types are not ## Test Plan ci w/ new tests	2025-10-09 17:04:28 -07:00
Sébastien Han	7ee0ee7843	chore!: remove model mgmt from CLI for Hugging Face CLI (#3700 ) This change removes the `llama model` and `llama download` subcommands from the CLI, replacing them with recommendations to use the Hugging Face CLI instead. Rationale for this change: - The model management functionality was largely duplicating what Hugging Face CLI already provides, leading to unnecessary maintenance overhead (except the download source from Meta?) - Maintaining our own implementation required fixing bugs and keeping up with changes in model repositories and download mechanisms - The Hugging Face CLI is more mature, widely adopted, and better maintained - This allows us to focus on the core Llama Stack functionality rather than reimplementing model management tools Changes made: - Removed all model-related CLI commands and their implementations - Updated documentation to recommend using `huggingface-cli` for model downloads - Removed Meta-specific download logic and statements - Simplified the CLI to focus solely on stack management operations Users should now use: - `huggingface-cli download` for downloading models - `huggingface-cli scan-cache` for listing downloaded models This is a breaking change as it removes previously available CLI commands. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-09 16:50:33 -07:00
Ashwin Bharambe	841d0c3583	fix(testing): improve api_recorder error messages for missing recordings (#3760 ) Replaces opaque error messages when recordings are not found with somewhat better guidance Before: ``` No recorded response found for request hash: abc123... To record this response, run with LLAMA_STACK_TEST_INFERENCE_MODE=record ``` After: ``` Recording not found for request hash: abc123 Model: gpt-4 \| Request: POST https://api.openai.com/v1/chat/completions Run './scripts/integration-tests.sh --inference-mode record-if-missing' with required API keys to generate. ```	2025-10-09 15:04:16 -07:00
Ashwin Bharambe	f50ce11a3b	feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403 ) Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ```	2025-10-09 14:27:51 -07:00
grs	26fd5dbd34	fix: add traces for tool calls and mcp tool listing (#3722 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details UI Tests / ui-tests (22) (push) Successful in 42s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? Adds traces around tool execution and mcp tool listing for better observability. Closes #3108 ## Test Plan Manually examined traces in jaeger to verify the added information was available. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-10-09 09:59:09 -07:00
Sébastien Han	4b9ebbf6a2	chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750 ) Reverts llamastack/llama-stack#3624 Causing https://github.com/llamastack/llama-stack/issues/3749	2025-10-09 09:17:37 -04:00
Ashwin Bharambe	79bed44b04	fix(tests): ensure test isolation in server mode (#3737 ) Propagate test IDs from client to server via HTTP headers to maintain proper test isolation when running with server-based stack configs. Without this, recorded/replayed inference requests in server mode would leak across tests. Changes: - Patch client _prepare_request to inject test ID into provider data header - Sync test context from provider data on server side before storage operations - Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config - Configure console width for cleaner log output in CI - Add SQLITE_STORE_DIR temp directory for test data isolation	2025-10-08 12:03:36 -07:00
grs	96886afaca	fix(responses): fix regression in support for mcp tool require_approval argument (#3731 ) # What does this PR do? It prevents a tool call message being added to the chat completions message without a corresponding tool call result, which is needed in the case that an approval is required first or if the approval request is denied. In both these cases the tool call messages is popped of the next turn messages. Closes #3728 ## Test Plan Ran the integration tests Manual check of both approval and denial against gpt-4o Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-10-08 10:47:17 -04:00
Bill Murdock	5d711d4bcb	fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 32s Details Pre-commit / pre-commit (push) Successful in 1m29s Details # What does this PR do? - The watsonx.ai provider now uses the LiteLLM mixin instead of using IBM's library, which does not seem to be working (see #3165 for context). - The watsonx.ai provider now lists all the models available by calling the watsonx.ai server instead of having a hard coded list of known models. (That list gets out of date quickly) - An edge case in [llama_stack/core/routers/inference.py](https://github.com/llamastack/llama-stack/pull/3674/files#diff-a34bc966ed9befd9f13d4883c23705dff49be0ad6211c850438cdda6113f3455) is addressed that was causing my manual tests to fail. - Fixes `b64_encode_openai_embeddings_response` which was trying to enumerate over a dictionary and then reference elements of the dictionary using .field instead of ["field"]. That method is called by the LiteLLM mixin for embedding models, so it is needed to get the watsonx.ai embedding models to work. - A unit test along the lines of the one in #3348 is added. A more comprehensive plan for automatically testing the end-to-end functionality for inference providers would be a good idea, but is out of scope for this PR. - Updates to the watsonx distribution. Some were in response to the switch to LiteLLM (e.g., updating the Python packages needed). Others seem to be things that were already broken that I found along the way (e.g., a reference to a watsonx specific doc template that doesn't seem to exist). Closes #3165 Also it is related to a line-item in #3387 but doesn't really address that goal (because it uses the LiteLLM mixin, not the OpenAI one). I tried the OpenAI one and it doesn't work with watsonx.ai, presumably because the watsonx.ai service is not OpenAI compatible. It works with LiteLLM because LiteLLM has a provider implementation for watsonx.ai. ## Test Plan The test script below goes back and forth between the OpenAI and watsonx providers. The idea is that the OpenAI provider shows how it should work and then the watsonx provider output shows that it is also working with watsonx. Note that the result from the MCP test is not as good (the Llama 3.3 70b model does not choose tools as wisely as gpt-4o), but it is still working and providing a valid response. For more details on setup and the MCP server being used for testing, see [the AI Alliance sample notebook](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/) that these examples are drawn from. ```python #!/usr/bin/env python3 import json from llama_stack_client import LlamaStackClient from litellm import completion import http.client def print_response(response): """Print response in a nicely formatted way""" print(f"ID: {response.id}") print(f"Status: {response.status}") print(f"Model: {response.model}") print(f"Created at: {response.created_at}") print(f"Output items: {len(response.output)}") for i, output_item in enumerate(response.output): if len(response.output) > 1: print(f"\n--- Output Item {i+1} ---") print(f"Output type: {output_item.type}") if output_item.type in ("text", "message"): print(f"Response content: {output_item.content[0].text}") elif output_item.type == "file_search_call": print(f" Tool Call ID: {output_item.id}") print(f" Tool Status: {output_item.status}") # 'queries' is a list, so we join it for clean printing print(f" Queries: {', '.join(output_item.queries)}") # Display results if they exist, otherwise note they are empty print(f" Results: {output_item.results if output_item.results else 'None'}") elif output_item.type == "mcp_list_tools": print_mcp_list_tools(output_item) elif output_item.type == "mcp_call": print_mcp_call(output_item) else: print(f"Response content: {output_item.content}") def print_mcp_call(mcp_call): """Print MCP call in a nicely formatted way""" print(f"\n🛠️ MCP Tool Call: {mcp_call.name}") print(f" Server: {mcp_call.server_label}") print(f" ID: {mcp_call.id}") print(f" Arguments: {mcp_call.arguments}") if mcp_call.error: print("Error: {mcp_call.error}") elif mcp_call.output: print("Output:") # Try to format JSON output nicely try: parsed_output = json.loads(mcp_call.output) print(json.dumps(parsed_output, indent=4)) except: # If not valid JSON, print as-is print(f" {mcp_call.output}") else: print(" ⏳ No output yet") def print_mcp_list_tools(mcp_list_tools): """Print MCP list tools in a nicely formatted way""" print(f"\n🔧 MCP Server: {mcp_list_tools.server_label}") print(f" ID: {mcp_list_tools.id}") print(f" Available Tools: {len(mcp_list_tools.tools)}") print("=" * 80) for i, tool in enumerate(mcp_list_tools.tools, 1): print(f"\n{i}. {tool.name}") print(f" Description: {tool.description}") # Parse and display input schema schema = tool.input_schema if schema and 'properties' in schema: properties = schema['properties'] required = schema.get('required', []) print(" Parameters:") for param_name, param_info in properties.items(): param_type = param_info.get('type', 'unknown') param_desc = param_info.get('description', 'No description') required_marker = " (required)" if param_name in required else " (optional)" print(f" • {param_name} ({param_type}){required_marker}") if param_desc: print(f" {param_desc}") if i < len(mcp_list_tools.tools): print("-" * 40) def main(): """Main function to run all the tests""" # Configuration LLAMA_STACK_URL = "http://localhost:8321/" LLAMA_STACK_MODEL_IDS = [ "openai/gpt-3.5-turbo", "openai/gpt-4o", "llama-openai-compat/Llama-3.3-70B-Instruct", "watsonx/meta-llama/llama-3-3-70b-instruct" ] # Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml. OPENAI_MODEL_ID = LLAMA_STACK_MODEL_IDS[1] WATSONX_MODEL_ID = LLAMA_STACK_MODEL_IDS[-1] NPS_MCP_URL = "http://localhost:3005/sse/" print("=== Llama Stack Testing Script ===") print(f"Using OpenAI model: {OPENAI_MODEL_ID}") print(f"Using WatsonX model: {WATSONX_MODEL_ID}") print(f"MCP URL: {NPS_MCP_URL}") print() # Initialize client print("Initializing LlamaStackClient...") client = LlamaStackClient(base_url="http://localhost:8321") # Test 1: List models print("\n=== Test 1: List Models ===") try: models = client.models.list() print(f"Found {len(models)} models") except Exception as e: print(f"Error listing models: {e}") raise e # Test 2: Basic chat completion with OpenAI print("\n=== Test 2: Basic Chat Completion (OpenAI) ===") try: chat_completion_response = client.chat.completions.create( model=OPENAI_MODEL_ID, messages=[{"role": "user", "content": "What is the capital of France?"}] ) print("OpenAI Response:") for chunk in chat_completion_response.choices[0].message.content: print(chunk, end="", flush=True) print() except Exception as e: print(f"Error with OpenAI chat completion: {e}") raise e # Test 3: Basic chat completion with WatsonX print("\n=== Test 3: Basic Chat Completion (WatsonX) ===") try: chat_completion_response_wxai = client.chat.completions.create( model=WATSONX_MODEL_ID, messages=[{"role": "user", "content": "What is the capital of France?"}], ) print("WatsonX Response:") for chunk in chat_completion_response_wxai.choices[0].message.content: print(chunk, end="", flush=True) print() except Exception as e: print(f"Error with WatsonX chat completion: {e}") raise e # Test 4: Tool calling with OpenAI print("\n=== Test 4: Tool Calling (OpenAI) ===") tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a specific location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g., San Francisco, CA", }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }, }, "required": ["location"], }, }, } ] messages = [ {"role": "user", "content": "What's the weather like in Boston, MA?"} ] try: print("--- Initial API Call ---") response = client.chat.completions.create( model=OPENAI_MODEL_ID, messages=messages, tools=tools, tool_choice="auto", # "auto" is the default ) print("OpenAI tool calling response received") except Exception as e: print(f"Error with OpenAI tool calling: {e}") raise e # Test 5: Tool calling with WatsonX print("\n=== Test 5: Tool Calling (WatsonX) ===") try: wxai_response = client.chat.completions.create( model=WATSONX_MODEL_ID, messages=messages, tools=tools, tool_choice="auto", # "auto" is the default ) print("WatsonX tool calling response received") except Exception as e: print(f"Error with WatsonX tool calling: {e}") raise e # Test 6: Streaming with WatsonX print("\n=== Test 6: Streaming Response (WatsonX) ===") try: chat_completion_response_wxai_stream = client.chat.completions.create( model=WATSONX_MODEL_ID, messages=[{"role": "user", "content": "What is the capital of France?"}], stream=True ) print("Model response: ", end="") for chunk in chat_completion_response_wxai_stream: # Each 'chunk' is a ChatCompletionChunk object. # We want the content from the 'delta' attribute. if hasattr(chunk, 'choices') and chunk.choices is not None: content = chunk.choices[0].delta.content # The first few chunks might have None content, so we check for it. if content is not None: print(content, end="", flush=True) print() except Exception as e: print(f"Error with streaming: {e}") raise e # Test 7: MCP with OpenAI print("\n=== Test 7: MCP Integration (OpenAI) ===") try: mcp_llama_stack_client_response = client.responses.create( model=OPENAI_MODEL_ID, input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.", tools=[ { "type": "mcp", "server_url": NPS_MCP_URL, "server_label": "National Parks Service tools", "allowed_tools": ["search_parks", "get_park_events"], } ] ) print_response(mcp_llama_stack_client_response) except Exception as e: print(f"Error with MCP (OpenAI): {e}") raise e # Test 8: MCP with WatsonX print("\n=== Test 8: MCP Integration (WatsonX) ===") try: mcp_llama_stack_client_response = client.responses.create( model=WATSONX_MODEL_ID, input="What is the capital of France?" ) print_response(mcp_llama_stack_client_response) except Exception as e: print(f"Error with MCP (WatsonX): {e}") raise e # Test 9: MCP with Llama 3.3 print("\n=== Test 9: MCP Integration (Llama 3.3) ===") try: mcp_llama_stack_client_response = client.responses.create( model=WATSONX_MODEL_ID, input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.", tools=[ { "type": "mcp", "server_url": NPS_MCP_URL, "server_label": "National Parks Service tools", "allowed_tools": ["search_parks", "get_park_events"], } ] ) print_response(mcp_llama_stack_client_response) except Exception as e: print(f"Error with MCP (Llama 3.3): {e}") raise e # Test 10: Embeddings print("\n=== Test 10: Embeddings ===") try: conn = http.client.HTTPConnection("localhost:8321") payload = json.dumps({ "model": "watsonx/ibm/granite-embedding-278m-multilingual", "input": "Hello, world!", }) headers = { 'Content-Type': 'application/json', 'Accept': 'application/json' } conn.request("POST", "/v1/openai/v1/embeddings", payload, headers) res = conn.getresponse() data = res.read() print(data.decode("utf-8")) except Exception as e: print(f"Error with Embeddings: {e}") raise e print("\n=== Testing Complete ===") if __name__ == "__main__": main() ``` --------- Signed-off-by: Bill Murdock <bmurdock@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-08 07:29:43 -04:00
Omar Abdelwahab	702fcd1abf	fix: Raising an error message to the user when registering an existing provider. (#3624 ) When the user wants to change the attributes (which could include model name, dimensions,...etc) of an already registered provider, they will get an error message asking that they first unregister the provider before registering a new one. # What does this PR do? This PR updated the register function to raise an error to the user when they attempt to register a provider that was already registered asking them to un-register the existing provider first. <!-- If resolving an issue, uncomment and update the line below --> #2313 ## Test Plan Tested the change with /tests/unit/registry/test_registry.py --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-10-08 12:09:23 +02:00
ehhuang	0cde3d956d	chore: require valid logging category (#3712 ) # What does this PR do? grep'd and audited all usage of 'get_logger' with help of Claude. ## Test Plan CI	2025-10-08 11:10:33 +02:00
ehhuang	a3f5072776	chore!: remove --env from `llama stack run` (#3711 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Installer CI / lint (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Installer CI / smoke-test-on-dev (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? user can simply set env vars in the beginning of the command.`FOO=BAR llama stack run ...` ## Test Plan Run TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711). * #3714 * __->__ #3711	2025-10-07 20:58:15 -07:00
slekkala1	1ac320b7e6	chore: remove dead code (#3729 ) # What does this PR do? Removing some dead code, found by vulture and checked by claude that there are no references or imports for these ## Test Plan CI	2025-10-07 20:26:02 -07:00
ehhuang	b6e9f41041	chore: Revert "fix: fix nvidia provider (#3716 )" (#3730 ) This reverts commit `c940fe7938`. @wukaixingxp I stamped to fast. Let's wait for @mattf's review.	2025-10-07 19:16:51 -07:00
Kai Wu	c940fe7938	fix: fix nvidia provider (#3716 ) # What does this PR do? (Used claude to solve #3715, coded with claude but tested by me) ## From claude summary: <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Problem: The `NVIDIAInferenceAdapter` class was missing the `alias_to_provider_id_map` attribute, which caused the error: `ERROR 'NVIDIAInferenceAdapter' object has no attribute 'alias_to_provider_id_map'` Root Cause: The `NVIDIAInferenceAdapter` only inherited from `OpenAIMixin`, but some parts of the system expected it to have the `alias_to_provider_id_map` attribute, which is provided by the `ModelRegistryHelper` class. Solution: 1. Added ModelRegistryHelper import: Imported the `ModelRegistryHelper` class from `llama_stack.providers.utils.inference.model_registry` 2. Updated inheritance: Changed the class declaration to inherit from both `OpenAIMixin` and `ModelRegistryHelper` 3. Added proper initialization: Added an `__init__` method that properly initializes the `ModelRegistryHelper` with empty model entries (since NVIDIA uses dynamic model discovery) and the allowed models from the configuration Key Changes: * Added `from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper` * Changed class declaration from `class NVIDIAInferenceAdapter(OpenAIMixin):` to `class NVIDIAInferenceAdapter(OpenAIMixin, ModelRegistryHelper):` * Added `__init__` method that calls `ModelRegistryHelper.__init__(self, model_entries=[], allowed_models=config.allowed_models)` The inheritance order is important - `OpenAIMixin` comes first to ensure its `check_model_availability()` method takes precedence over the `ModelRegistryHelper` version, as mentioned in the class documentation. This fix ensures that the `NVIDIAInferenceAdapter` has the required `alias_to_provider_id_map` attribute while maintaining all existing functionality.<!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Launching llama-stack server successfully, see logs: ``` NVIDIA_API_KEY=dummy NVIDIA_BASE_URL=http://localhost:8912 llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --image-type venv & [2] 3753042 (venv) nvidia@nv-meta-H100-testing-gpu01:~/kai/llama-stack$ WARNING 2025-10-07 00:29:09,848 root:266 uncategorized: Unknown logging category: openai::conversations. Falling back to default 'root' level: 20 WARNING 2025-10-07 00:29:09,932 root:266 uncategorized: Unknown logging category: cli. Falling back to default 'root' level: 20 INFO 2025-10-07 00:29:09,937 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:09,937 llama_stack.cli.stack.run:136 cli: Using run configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml Using virtual environment: /home/nvidia/kai/venv Virtual environment already activated + '[' -n /home/nvidia/.llama/distributions/starter/starter-run.yaml ']' + yaml_config_arg=/home/nvidia/.llama/distributions/starter/starter-run.yaml + llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --port 8321 WARNING 2025-10-07 00:29:11,432 root:266 uncategorized: Unknown logging category: openai::conversations. Falling back to default 'root' level: 20 WARNING 2025-10-07 00:29:11,593 root:266 uncategorized: Unknown logging category: cli. Falling back to default 'root' level: 20 INFO 2025-10-07 00:29:11,603 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,604 llama_stack.cli.stack.run:136 cli: Using run configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,624 llama_stack.cli.stack.run:155 cli: No image type or image name provided. Assuming environment packages. INFO 2025-10-07 00:29:11,625 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,644 llama_stack.cli.stack.run:230 cli: HTTPS enabled with certificates: Key: None Cert: None INFO 2025-10-07 00:29:11,645 llama_stack.cli.stack.run:232 cli: Listening on ['::', '0.0.0.0']:8321 INFO 2025-10-07 00:29:11,816 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,836 llama_stack.core.server.server:480 core::server: Run configuration: INFO 2025-10-07 00:29:11,845 llama_stack.core.server.server:483 core::server: apis: - agents - batches - datasetio - eval - files - inference - post_training - safety - scoring - telemetry - tool_runtime - vector_io benchmarks: [] datasets: [] image_name: starter inference_store: db_path: /home/nvidia/.llama/distributions/starter/inference_store.db type: sqlite metadata_store: db_path: /home/nvidia/.llama/distributions/starter/registry.db type: sqlite models: [] providers: agents: - config: persistence_store: db_path: /home/nvidia/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /home/nvidia/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference batches: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/batches.db type: sqlite provider_id: reference provider_type: inline::reference datasetio: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /home/nvidia/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /home/nvidia/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '******' url: https://api.fireworks.ai/inference/v1 provider_id: fireworks provider_type: remote::fireworks - config: api_key: '****' url: https://api.together.xyz/v1 provider_id: together provider_type: remote::together - config: {} provider_id: bedrock provider_type: remote::bedrock - config: api_key: '****' append_api_version: true url: http://localhost:8912 provider_id: nvidia provider_type: remote::nvidia - config: api_key: '****' base_url: https://api.openai.com/v1 provider_id: openai provider_type: remote::openai - config: api_key: '****' provider_id: anthropic provider_type: remote::anthropic - config: api_key: '****' provider_id: gemini provider_type: remote::gemini - config: api_key: '****' url: https://api.groq.com provider_id: groq provider_type: remote::groq - config: api_key: '****' url: https://api.sambanova.ai/v1 provider_id: sambanova provider_type: remote::sambanova - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: meta provider_id: torchtune-cpu provider_type: inline::torchtune-cpu safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard - config: {} provider_id: code-scanner provider_type: inline::code-scanner scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: "\u200B" sinks: sqlite sqlite_db_path: /home/nvidia/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss - config: db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec.db kvstore: db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec_registry.db type: sqlite provider_id: sqlite-vec provider_type: inline::sqlite-vec scoring_fns: [] server: port: 8321 shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 INFO 2025-10-07 00:29:12,138 llama_stack.providers.remote.inference.nvidia.nvidia:49 inference::nvidia: Initializing NVIDIAInferenceAdapter(http://localhost:8912)... INFO 2025-10-07 00:29:12,921 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-10-07 00:29:13,524 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues ERROR 2025-10-07 00:29:13,679 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:13,681 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:13,682 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed with: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} WARNING 2025-10-07 00:29:13,684 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} Handling connection for 8912 INFO 2025-10-07 00:29:14,047 llama_stack.providers.utils.inference.openai_mixin:448 providers::utils: NVIDIAInferenceAdapter.list_provider_model_ids() returned 3 models ERROR 2025-10-07 00:29:14,062 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,063 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,099 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed with: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" WARNING 2025-10-07 00:29:14,100 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" ERROR 2025-10-07 00:29:14,102 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,103 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,105 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,106 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,107 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,109 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. INFO 2025-10-07 00:29:14,454 uvicorn.error:84 uncategorized: Started server process [3753046] INFO 2025-10-07 00:29:14,455 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-07 00:29:14,457 llama_stack.core.server.server:170 core::server: Starting up INFO 2025-10-07 00:29:14,458 llama_stack.core.stack:415 core: starting registry refresh task ERROR 2025-10-07 00:29:14,459 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,461 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,462 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed with: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} WARNING 2025-10-07 00:29:14,463 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} ERROR 2025-10-07 00:29:14,465 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,466 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. INFO 2025-10-07 00:29:14,500 uvicorn.error:62 uncategorized: Application startup complete. ERROR 2025-10-07 00:29:14,502 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed with: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" WARNING 2025-10-07 00:29:14,503 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" ERROR 2025-10-07 00:29:14,504 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,506 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,507 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,508 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,510 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,511 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. INFO 2025-10-07 00:29:14,513 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` tested with curl model, it also works: ``` curl http://localhost:8321/v1/models {"data":[{"identifier":"bedrock/meta.llama3-1-8b-instruct-v1:0","provider_resource_id":"meta.llama3-1-8b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-70b-instruct-v1:0","provider_resource_id":"meta.llama3-1-70b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-405b-instruct-v1:0","provider_resource_id":"meta.llama3-1-405b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/bigcode/starcoder2-7b","provider_resource_id":"bigcode/starcoder2-7b","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/meta/llama-3.3-70b-instruct","provider_resource_id":"meta/llama-3.3-70b-instruct","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2","provider_resource_id":"nvidia/llama-3.2-nv-embedqa-1b-v2","provider_id":"nvidia","type":"model","metadata":{"embedding_dimension":2048,"context_length":8192},"model_type":"embedding"},{"identifier":"sentence-transformers/all-MiniLM-L6-v2","provider_resource_id":"all-MiniLM-L6-v2","provider_id":"sentence-transformers","type":"model","metadata":{"embedding_dimension":384},"model_type":"embedding"}]}% ``` --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-07 18:23:12 -07:00
slekkala1	c2d97a9db9	chore: fix flaky unit test and add proper shutdown for file batches (#3725 ) # What does this PR do? Have been running into flaky unit test failures: `5217035494` Fixing below 1. Shutting down properly by cancelling any stale file batches tasks running in background. 2. Also, use unique_kvstore_config, so the test dont use same db path and maintain test isolation ## Test Plan Ran unit test locally and CI	2025-10-07 14:23:14 -07:00
Akram Ben Aissi	1970b4aa4b	fix: improve model availability checks: Allows use of unavailable models on startup (#3717 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m28s Details - Allows use of unavailable models on startup - Add has_model method to ModelsRoutingTable for checking pre-registered models - Update check_model_availability to check model_store before provider APIs # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Start llama stack and point unavailable vLLM ``` VLLM_URL=https://my-unavailable-vllm/v1 MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run ``` llama stack will start without crashing but only notifying error. ``` - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 INFO 2025-10-07 06:40:41,804 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-10-07 06:40:42,066 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues ERROR 2025-10-07 06:40:58,882 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out. WARNING 2025-10-07 06:40:58,883 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out. [...] INFO 2025-10-07 06:40:59,036 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO 2025-10-07 06:41:04,064 openai._base_client:1618 uncategorized: Retrying request to /models in 0.398814 seconds INFO 2025-10-07 06:41:09,497 openai._base_client:1618 uncategorized: Retrying request to /models in 0.781908 seconds ERROR 2025-10-07 06:41:15,282 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out. WARNING 2025-10-07 06:41:15,283 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out. ```	2025-10-07 14:27:24 -04:00

1 2 3 4 5 ...

1756 commits