mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-04 04:04:14 +00:00
1642 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
f4ab154ade
|
feat: add dynamic model registration support to TGI inference (#3417)
Some checks failed
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 43s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
# What does this PR do? adds dynamic model support to TGI add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" ## Test Plan tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai` |
||
|
ab321739f2
|
feat: create HTTP DELETE API endpoints to unregister ScoringFn and Benchmark resources in Llama Stack (#3371)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR provides functionality for users to unregister ScoringFn and Benchmark resources for `scoring` and `eval` APIs. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3051 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Updated integration and unit tests via CI workflow |
||
|
01bdcce4d2
|
chore(recorder): update mocks to be closer to non-mock environment (#3442)
# What does this PR do? the @required_args decorator in openai-python is masking the async nature of the {AsyncCompletions,chat.AsyncCompletions}.create method. see https://github.com/openai/openai-python/issues/996 this means two things - 0. we cannot use iscoroutine in the recorder to detect async vs non 1. our mocks are inappropriately introducing identifiable async for (0), we update the iscoroutine check w/ detection of /v1/models, which is the only non-async function we mock & record. for (1), we could leave everything as is and assume (0) will catch errors. to be defensive, we update the unit tests to mock below create methods, allowing the true openai-python create() methods to be tested. |
||
|
b6cb817897
|
chore(ui-deps): bump @radix-ui/react-select from 2.2.5 to 2.2.6 in /llama_stack/ui (#3437)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 19s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 1m39s
Bumps [@radix-ui/react-select](https://github.com/radix-ui/primitives) from 2.2.5 to 2.2.6. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/radix-ui/primitives/commits">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
36fd97e306
|
chore(ui-deps): bump next from 15.3.3 to 15.5.3 in /llama_stack/ui (#3438)
Bumps [next](https://github.com/vercel/next.js) from 15.3.3 to 15.5.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.5.3</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: validation return types of pages API routes (<a href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>)</li> <li>fix: relative paths in dev in validator.ts (<a href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>)</li> <li>fix: remove satisfies keyword from type validation to preserve old TS compatibility (<a href="https://redirect.github.com/vercel/next.js/issues/83071">#83071</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/bgub"><code>@bgub</code></a> for helping!</p> <h2>v15.5.2</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: disable unknownatrules lint rule entirely (<a href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>)</li> <li>revert: add ?dpl to fonts in /_next/static/media (<a href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/bgub"><code>@bgub</code></a> and <a href="https://github.com/ztanner"><code>@ztanner</code></a> for helping!</p> <h2>v15.5.1</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: aliased navigations should apply scroll handling (<a href="https://redirect.github.com/vercel/next.js/issues/82900">#82900</a>)</li> <li>Turbopack: fix invalid NFT entry with file behind symlink (<a href="https://redirect.github.com/vercel/next.js/issues/82887">#82887</a>)</li> <li>fix: typesafe linking to route handlers and pages API routes (<a href="https://redirect.github.com/vercel/next.js/issues/82858">#82858</a>)</li> <li>fix: change "noUnknownAtRules" to "warn" for Biome (<a href="https://redirect.github.com/vercel/next.js/issues/82974">#82974</a>)</li> <li>fix: add path normalization to getRelativePath for Windows (<a href="https://redirect.github.com/vercel/next.js/issues/82918">#82918</a>)</li> <li>feat: add typesafety with config.typedRoutes to redirect() and permanentRedirect() (<a href="https://redirect.github.com/vercel/next.js/issues/82860">#82860</a>)</li> <li>fix: avoid importing types that will be unused (<a href="https://redirect.github.com/vercel/next.js/issues/82856">#82856</a>)</li> <li>fix: update the config.api.responseLimit type (<a href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>)</li> <li>fix: update validation return types (<a href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/bgub"><code>@bgub</code></a>, <a href="https://github.com/mischnic"><code>@mischnic</code></a>, and <a href="https://github.com/ztanner"><code>@ztanner</code></a> for helping!</p> <h2>v15.5.1-canary.39</h2> <h3>Core Changes</h3> <ul> <li>[metadata] change the metadata routes params to promises: <a href="https://redirect.github.com/vercel/next.js/issues/83560">#83560</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
6787755c0c
|
chore(recorder): add support for NOT_GIVEN (#3430)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 41s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m31s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
# What does this PR do? the recorder mocks the openai-python interface. the openai-python interface allows NOT_GIVEN as an input option. this change properly handles NOT_GIVEN. ## Test Plan ci (coverage for chat, completions, embeddings) |
||
|
3de9ad0a87
|
chore(recorder, tests): add test for openai /v1/models (#3426)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do? - [x] adds a test for the recorder's handling of /v1/models - [x] adds a fix for /v1/models handling ## Test Plan ci |
||
|
f67081d2d6
|
feat: migrate to FIPS-validated cryptographic algorithms (#3423)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do? Migrates MD5 and SHA-1 hash algorithms to SHA-256. In particular, replaces: - MD5 in chunk ID generation. - MD5 in file verification. - SHA-1 in model identifier digests. And updates all related test expectations. Original discussion: https://github.com/llamastack/llama-stack/discussions/3413 <!-- If resolving an issue, uncomment and update the line below --> Closes #3424. ## Test Plan Unit tests from scripts/unit-tests.sh were updated to match the new hash output, and ran to verify the tests pass. Signed-off-by: Doug Edgar <dedgar@redhat.com> |
||
|
8ef1189be7
|
chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions (#3404)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 31s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do? update vLLM inference provider to use OpenAIMixin for openai-compat functions inference recordings from Qwen3-0.6B and vLLM 0.8.3 - ``` docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \ vllm/vllm-openai:latest \ --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes ``` ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference ``` |
||
|
d15368a302
|
chore: Updating documentation, adding exception handling for Vector Stores in RAG Tool, more tests on migration, and migrate off of inference_api for context_retriever for RAG (#3367)
# What does this PR do? - Updating documentation on migration from RAG Tool to Vector Stores and Files APIs - Adding exception handling for Vector Stores in RAG Tool - Add more tests on migration from RAG Tool to Vector Stores - Migrate off of inference_api for context_retriever for RAG <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Integration and unit tests added Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
f31bcc11bc
|
feat: add Azure OpenAI inference provider support (#3396)
# What does this PR do? Llama-stack now supports a new OpenAI compatible endpoint with Azure OpenAI. The starter distro has been updated to add the new remote inference provider. A few tests have been modified and improved. ## Test Plan Deploy a model in the Aure portal then: ``` $ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py ... Results: ``` ============================================= test session starts ============================================== platform darwin -- Python 3.12.8, pytest-8.4.1, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 27 items tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity] SKIPPED [ 3%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix] SKIPPED [ 7%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity] SKIPPED [ 11%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1] SKIPPED [ 14%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini] SKIPPED [ 18%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01] PASSED [ 22%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 25%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 29%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True] PASSED [ 33%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True] PASSED [ 37%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini] SKIPPEDed files.) [ 40%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0] SKIPPED [ 44%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02] PASSED [ 48%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 51%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 55%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False] PASSED [ 59%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False] PASSED [ 62%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01] PASSED [ 66%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 70%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01] PASSED [ 74%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True] PASSED [ 77%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True] PASSED [ 81%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02] PASSED [ 85%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 88%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02] PASSED [ 92%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False] PASSED [ 96%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False] PASSED [100%] =========================================== short test summary info ============================================ SKIPPED [3] tests/integration/inference/test_openai_completion.py:63: Model azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI completions. SKIPPED [3] tests/integration/inference/test_openai_completion.py:118: Model azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body parameters. SKIPPED [1] tests/integration/inference/test_openai_completion.py:124: Model azure/gpt-5-mini hosted by remote::azure doesn't support chat completion calls with base64 encoded files. ================================== 20 passed, 7 skipped, 2 warnings in 51.77s ================================== ``` Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
c2d281e01b
|
chore(replay): improve replay robustness with un-validated construction (#3414)
# What does this PR do? some providers do not produce spec compliant outputs. when this happens the replay infra will fail to construct the proper types and will return a dict to the client. the client likely does not expect a dict. this was discovered with tgi, which returns finish_reason="" when valid values are "stop", "length" or "content_filter" ## Test Plan ci |
||
|
2838d5a20f
|
fix: AWS Bedrock inference profile ID conversion for region-specific endpoints (#3386)
Fixes #3370 AWS switched to requiring region-prefixed inference profile IDs instead of foundation model IDs for on-demand throughput. This was causing ValidationException errors. Added auto-detection based on boto3 client region to convert model IDs like meta.llama3-1-70b-instruct-v1:0 to us.meta.llama3-1-70b-instruct-v1:0 depending on the detected region. Also handles edge cases like ARNs, case insensitive regions, and None regions. Tested with this request. ```json { "model_id": "meta.llama3-1-8b-instruct-v1:0", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "tell me a riddle" } ], "sampling_params": { "strategy": { "type": "top_p", "temperature": 0.7, "top_p": 0.9 }, "max_tokens": 512 } } ``` <img width="1488" height="878" alt="image" src="https://github.com/user-attachments/assets/0d61beec-3869-4a31-8f37-9f554c280b88" /> |
||
|
8e05c68d15
|
chore: remove openai dependency from providers (#3398)
# What does this PR do? The openai package is already a dependency of the llama-stack project itself, so let's the project dictate which openai version we need and avoid potential breakage with unsatisfiable dependency resolution. Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
0c7f49490c
|
fix(inference_store): on duplicate chat completion IDs, replace (#3408)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 30s
UI Tests / ui-tests (22) (push) Successful in 35s
Pre-commit / pre-commit (push) Successful in 1m45s
# What does this PR do? Duplicate chat completion IDs can be generated during tests especially if they are replaying recorded responses across different tests. No need to warn or error under those circumstances. In the wild, this is not likely to happen at all (no evidence) so we aren't really hiding any problem. |
||
|
d4e45cd5f1
|
chore(ui-deps): bump tailwindcss from 4.1.6 to 4.1.13 in /llama_stack/ui (#3362)
Bumps [tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss) from 4.1.6 to 4.1.13. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/tailwindlabs/tailwindcss/releases">tailwindcss's releases</a>.</em></p> <blockquote> <h2>v4.1.13</h2> <h3>Changed</h3> <ul> <li>Drop warning from browser build (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li> <li>Drop exact duplicate declarations when emitting CSS (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li> </ul> <h3>Fixed</h3> <ul> <li>Don't transition <code>visibility</code> when using <code>transition</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li> <li>Discard matched variants with unknown named values (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li> <li>Discard matched variants with non-string values (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li> <li>Show suggestions for known <code>matchVariant</code> values (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li> <li>Replace deprecated <code>clip</code> with <code>clip-path</code> in <code>sr-only</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li> <li>Hide internal fields from completions in <code>matchUtilities</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li> <li>Ignore <code>.vercel</code> folders by default (can be overridden by <code>@source …</code> rules) (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li> <li>Consider variants starting with <code>@-</code> to be invalid (e.g. <code>@-2xl:flex</code>) (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li> <li>Do not allow custom variants to start or end with a <code>-</code> or <code>_</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>, <a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li> <li>Upgrade: Migrate <code>aria</code> theme keys to <code>@custom-variant</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li> <li>Upgrade: Migrate <code>data</code> theme keys to <code>@custom-variant</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li> <li>Upgrade: Migrate <code>supports</code> theme keys to <code>@custom-variant</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li> </ul> <h2>v4.1.12</h2> <h3>Fixed</h3> <ul> <li>Don't consider the global important state in <code>@apply</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li> <li>Add missing suggestions for <code>flex-<number></code> utilities (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li> <li>Fix trailing <code>)</code> from interfering with extraction in Clojure keywords (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li> <li>Detect classes inside Elixir charlist, word list, and string sigils (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li> <li>Track source locations through <code>@plugin</code> and <code>@config</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li> <li>Allow boolean values of <code>process.env.DEBUG</code> in <code>@tailwindcss/node</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18485">#18485</a>)</li> <li>Ignore consecutive semicolons in the CSS parser (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18532">#18532</a>)</li> <li>Center the dropdown icon added to an input with a paired datalist by default (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18511">#18511</a>)</li> <li>Extract candidates in Slang templates (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18565">#18565</a>)</li> <li>Improve error messages when encountering invalid functional utility names (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18568">#18568</a>)</li> <li>Discard CSS AST objects with <code>false</code> or <code>undefined</code> properties (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18571">#18571</a>)</li> <li>Allow users to disable URL rebasing in <code>@tailwindcss/postcss</code> via <code>transformAssetUrls: false</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18321">#18321</a>)</li> <li>Fix false-positive migrations in <code>addEventListener</code> and JavaScript variable names (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18718">#18718</a>)</li> <li>Fix Standalone CLI showing default Bun help when run via symlink on Windows (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18723">#18723</a>)</li> <li>Read from <code>--border-color-*</code> theme keys in <code>divide-*</code> utilities for backwards compatibility (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18704/">#18704</a>)</li> <li>Don't scan <code>.hdr</code> and <code>.exr</code> files for classes by default (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18734">#18734</a>)</li> </ul> <h2>v4.1.11</h2> <h3>Fixed</h3> <ul> <li>Add heuristic to skip candidate migrations inside <code>emit(…)</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18330">#18330</a>)</li> <li>Extract candidates with variants in Clojure/ClojureScript keywords (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18338">#18338</a>)</li> <li>Document <code>--watch=always</code> in the CLI's usage (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18337">#18337</a>)</li> <li>Add support for Vite 7 to <code>@tailwindcss/vite</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18384">#18384</a>)</li> </ul> <h2>v4.1.10</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md">tailwindcss's changelog</a>.</em></p> <blockquote> <h2>[4.1.13] - 2025-09-03</h2> <h3>Changed</h3> <ul> <li>Drop warning from browser build (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li> <li>Drop exact duplicate declarations when emitting CSS (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li> </ul> <h3>Fixed</h3> <ul> <li>Don't transition <code>visibility</code> when using <code>transition</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li> <li>Discard matched variants with unknown named values (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li> <li>Discard matched variants with non-string values (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li> <li>Show suggestions for known <code>matchVariant</code> values (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li> <li>Replace deprecated <code>clip</code> with <code>clip-path</code> in <code>sr-only</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li> <li>Hide internal fields from completions in <code>matchUtilities</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li> <li>Ignore <code>.vercel</code> folders by default (can be overridden by <code>@source …</code> rules) (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li> <li>Consider variants starting with <code>@-</code> to be invalid (e.g. <code>@-2xl:flex</code>) (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li> <li>Do not allow custom variants to start or end with a <code>-</code> or <code>_</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>, <a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li> <li>Upgrade: Migrate <code>aria</code> theme keys to <code>@custom-variant</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li> <li>Upgrade: Migrate <code>data</code> theme keys to <code>@custom-variant</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li> <li>Upgrade: Migrate <code>supports</code> theme keys to <code>@custom-variant</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li> </ul> <h2>[4.1.12] - 2025-08-13</h2> <h3>Fixed</h3> <ul> <li>Don't consider the global important state in <code>@apply</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li> <li>Add missing suggestions for <code>flex-<number></code> utilities (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li> <li>Fix trailing <code>)</code> from interfering with extraction in Clojure keywords (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li> <li>Detect classes inside Elixir charlist, word list, and string sigils (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li> <li>Track source locations through <code>@plugin</code> and <code>@config</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li> <li>Allow boolean values of <code>process.env.DEBUG</code> in <code>@tailwindcss/node</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18485">#18485</a>)</li> <li>Ignore consecutive semicolons in the CSS parser (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18532">#18532</a>)</li> <li>Center the dropdown icon added to an input with a paired datalist by default (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18511">#18511</a>)</li> <li>Extract candidates in Slang templates (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18565">#18565</a>)</li> <li>Improve error messages when encountering invalid functional utility names (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18568">#18568</a>)</li> <li>Discard CSS AST objects with <code>false</code> or <code>undefined</code> properties (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18571">#18571</a>)</li> <li>Allow users to disable URL rebasing in <code>@tailwindcss/postcss</code> via <code>transformAssetUrls: false</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18321">#18321</a>)</li> <li>Fix false-positive migrations in <code>addEventListener</code> and JavaScript variable names (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18718">#18718</a>)</li> <li>Fix Standalone CLI showing default Bun help when run via symlink on Windows (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18723">#18723</a>)</li> <li>Read from <code>--border-color-*</code> theme keys in <code>divide-*</code> utilities for backwards compatibility (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18704/">#18704</a>)</li> <li>Don't scan <code>.hdr</code> and <code>.exr</code> files for classes by default (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18734">#18734</a>)</li> </ul> <h2>[4.1.11] - 2025-06-26</h2> <h3>Fixed</h3> <ul> <li>Add heuristic to skip candidate migrations inside <code>emit(…)</code> (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18330">#18330</a>)</li> <li>Extract candidates with variants in Clojure/ClojureScript keywords (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18338">#18338</a>)</li> <li>Document <code>--watch=always</code> in the CLI's usage (<a href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18337">#18337</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
e980436a2e
|
chore: introduce write queue for inference_store (#3383)
# What does this PR do? Adds a write worker queue for writes to inference store. This avoids overwhelming request processing with slow inference writes. ## Test Plan Benchmark: ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` ## RPS from 21 -> 57 |
||
|
a6b1588dc6
|
revert: Fireworks chat completion broken due to telemetry (#3402)
Reverts llamastack/llama-stack#3392 |
||
|
f6bf36343d
|
chore: logging perf improvments (#3393)
# What does this PR do? - Use BackgroundLogger when logging metric events. - Reuse event loop in BackgroundLogger ## Test Plan ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` ### RPS from 57 -> 62 |
||
|
935b8e28de
|
fix: Fireworks chat completion broken due to telemetry (#3392)
# What does this PR do? Fix fireworks chat completion broken due to telemetry expecting response.usage Closes https://github.com/llamastack/llama-stack/issues/3391 ## Test Plan 1. `uv run --with llama-stack llama stack build --distro starter --image-type venv --run` Try ``` curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ``` {"id":"chatcmpl-ee922a08-0df0-4974-b0d3-b322113e8bc0","choices":[{"message":{"role":"assistant","content":"Hello! How can I assist you today?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1757456375,"model":"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct"}% ``` Without fix fails as mentioned in https://github.com/llamastack/llama-stack/issues/3391 Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
c86e45496e
|
ci: Re-enable pre-commit to fail (#3399)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 58s
Pre-commit / pre-commit (push) Successful in 1m14s
If pre-commit fails, the workflow must fail. --------- Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
0e27016cf2
|
chore: update the vertexai inference impl to use openai-python for openai-compat functions (#3377)
# What does this PR do? update VertexAI inference provider to use openai-python for openai-compat functions ## Test Plan ``` $ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py ... ``` i don't have an account to test this. `get_api_key` may also need to be updated per https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com> |
||
|
1c23aeb937
|
feat: Add vector_db_id to chunk metadata (#3304)
# What does this PR do? When running RAG in a multi vector DB setting, it can be difficult to trace where retrieved chunks originate from. This PR adds the `vector_db_id` into each chunk’s metadata, making it easier to understand which database a given chunk came from. This is helpful for debugging and for analyzing retrieval behavior of multiple DBs. Relevant code: ```python for vector_db_id, result in zip(vector_db_ids, results): for chunk, score in zip(result.chunks, result.scores): if not hasattr(chunk, "metadata") or chunk.metadata is None: chunk.metadata = {} chunk.metadata["vector_db_id"] = vector_db_id chunks.append(chunk) scores.append(score) ``` ## Test Plan * Ran Llama Stack in debug mode. * Verified that `vector_db_id` was added to each chunk’s metadata. * Confirmed that the metadata was printed in the console when using the RAG tool. --------- Co-authored-by: are-ces <cpompeia@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
dd1f946b3e
|
feat: include a default inference store during llama stack build (#3373)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do? enables completions storage when using `llama stack build --providers` - - GET /v1/chat/completions - GET /v1/chat/completions/{id} todo: llama stack build and distro codegen should use the same code paths ## Test Plan ci |
||
|
9d3a234bf3
|
chore: remove unused variable (#3389)
# What does this PR do? ## Test Plan |
||
|
28696c3f30 |
build: Bump version to 0.2.21
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
UI Tests / ui-tests (22) (push) Successful in 37s
Test External API and Providers / test-external (venv) (push) Failing after 41s
Pre-commit / pre-commit (push) Successful in 2m0s
|
||
|
30468d0c43
|
fix(deps): bump datasets versions for all providers (#3382)
Not doing so results in errors of the kind you see in:
|
||
|
ef02b9ea10
|
fix: environment variable typo in inference recorder error message (#3374)
The error message was referencing LLAMA_STACK_INFERENCE_MODE instead of the correct LLAMA_STACK_TEST_INFERENCE_MODE environment variable. |
||
|
ad6ea7fb91
|
feat: Adding OpenAI Prompts API (#3319)
# What does this PR do? This PR adds support for OpenAI Prompts API. Note, OpenAI does not explicitly expose the Prompts API but instead makes it available in the Responses API and in the [Prompts Dashboard](https://platform.openai.com/docs/guides/prompting#create-a-prompt). I have added the following APIs: - CREATE - GET - LIST - UPDATE - Set Default Version The Set Default Version API is made available only in the Prompts Dashboard and configures which prompt version is returned in the GET (the latest version is the default). Overall, the expected functionality in Responses will look like this: ```python from openai import OpenAI client = OpenAI() response = client.responses.create( prompt={ "id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e", "version": "2", "variables": { "city": "San Francisco", "age": 30, } } ) ``` ### Resolves https://github.com/llamastack/llama-stack/issues/3276 ## Test Plan Unit tests added. Integration tests can be added after client generation. ## Next Steps 1. Update Responses API to support Prompt API 2. I'll enhance the UI to implement the Prompt Dashboard. 3. Add cache for lower latency --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
072dca0609
|
feat: Add Kubernetes auth provider to use SelfSubjectReview and kubernetes api server (#2559)
# What does this PR do? Add Kubernetes authentication provider support - Add KubernetesAuthProvider class for token validation using Kubernetes SelfSubjectReview API - Add KubernetesAuthProviderConfig with configurable API server URL, TLS settings, and claims mapping - Implement authentication via POST requests to /apis/authentication.k8s.io/v1/selfsubjectreviews endpoint - Add support for parsing Kubernetes SelfSubjectReview response format to extract user information - Add KUBERNETES provider type to AuthProviderType enum - Update create_auth_provider factory function to handle 'kubernetes' provider type - Add comprehensive unit tests for KubernetesAuthProvider functionality - Add documentation with configuration examples and usage instructions The provider validates tokens by sending SelfSubjectReview requests to the Kubernetes API server and extracts user information from the userInfo structure in the response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> What This Verifies: Authentication header validation Token validation with Kubernetes SelfSubjectReview and kubernetes server API endpoint Error handling for invalid tokens and HTTP errors Request payload structure and headers ``` python -m pytest tests/unit/server/test_auth.py -k "kubernetes" -v ``` Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com> |
||
|
e1b81ce1fc
|
chore(ui-deps): bump @radix-ui/react-dropdown-menu from 2.1.14 to 2.1.16 in /llama_stack/ui (#3361)
Bumps [@radix-ui/react-dropdown-menu](https://github.com/radix-ui/primitives) from 2.1.14 to 2.1.16. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/radix-ui/primitives/commits">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
e508aef320
|
chore(ui-deps): bump lucide-react from 0.510.0 to 0.542.0 in /llama_stack/ui (#3363)
Bumps [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) from 0.510.0 to 0.542.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/lucide-icons/lucide/releases">lucide-react's releases</a>.</em></p> <blockquote> <h2>Version 0.542.0</h2> <h2>What's Changed</h2> <ul> <li>feat(docs): add MDN Web Docs & Nuxt to showcase by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3590">lucide-icons/lucide#3590</a></li> <li>feat(icons): added <code>list-chevrons-down-up</code> icon by <a href="https://github.com/juliankellydesign"><code>@juliankellydesign</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3492">lucide-icons/lucide#3492</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/juliankellydesign"><code>@juliankellydesign</code></a> made their first contribution in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3492">lucide-icons/lucide#3492</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.541.0...0.542.0">https://github.com/lucide-icons/lucide/compare/0.541.0...0.542.0</a></p> <h2>Version 0.541.0</h2> <h2>What's Changed</h2> <ul> <li>feat(packages/lucide): added support for providing a custom root element by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3543">lucide-icons/lucide#3543</a></li> <li>fix(icons): optimized <code>chrome</code> icon & renamed to <code>chromium</code> by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3572">lucide-icons/lucide#3572</a></li> <li>fix(icons): changed <code>wallpaper</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3566">lucide-icons/lucide#3566</a></li> <li>fix(icons): optimized <code>cog</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3548">lucide-icons/lucide#3548</a></li> <li>fix(icons): changed <code>building</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3510">lucide-icons/lucide#3510</a></li> <li>feat(dpi-preview): add previous version for easier comparison by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3532">lucide-icons/lucide#3532</a></li> <li>feat(icons): added 'panel-dashed' variants + update tags on existing icons by <a href="https://github.com/irvineacosta"><code>@irvineacosta</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3500">lucide-icons/lucide#3500</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.540.0...0.541.0">https://github.com/lucide-icons/lucide/compare/0.540.0...0.541.0</a></p> <h2>Version 0.540.0</h2> <h2>What's Changed</h2> <ul> <li>fix(license): add full text of Feather license by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3530">lucide-icons/lucide#3530</a></li> <li>fix(icons): changed <code>umbrella</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3490">lucide-icons/lucide#3490</a></li> <li>docs(site): added official statement on brand logos in Lucide by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3541">lucide-icons/lucide#3541</a></li> <li>fix(icons): changed <code>camera</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3539">lucide-icons/lucide#3539</a></li> <li>feat(icons): added <code>rose</code> icon by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/1972">lucide-icons/lucide#1972</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.539.0...0.540.0">https://github.com/lucide-icons/lucide/compare/0.539.0...0.540.0</a></p> <h2>Version 0.539.0</h2> <h2>What's Changed</h2> <ul> <li>feat(icons): added <code>brick-wall-shield</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3476">lucide-icons/lucide#3476</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/lucide-icons/lucide/compare/0.538.0...0.539.0">https://github.com/lucide-icons/lucide/compare/0.538.0...0.539.0</a></p> <h2>Version 0.538.0</h2> <h2>What's Changed</h2> <ul> <li>fix(icons): changed <code>apple</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3505">lucide-icons/lucide#3505</a></li> <li>fix(icons): changed <code>store</code> icon by <a href="https://github.com/karsa-mistmere"><code>@karsa-mistmere</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3501">lucide-icons/lucide#3501</a></li> <li>fix(icons): changed <code>mic-off</code> icon by <a href="https://github.com/lieonlion"><code>@lieonlion</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/2823">lucide-icons/lucide#2823</a></li> <li>chore(deps): bump astro from 5.5.2 to 5.12.8 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3523">lucide-icons/lucide#3523</a></li> <li>fix(icons): deprecate rail-symbol by <a href="https://github.com/jguddas"><code>@jguddas</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/2862">lucide-icons/lucide#2862</a></li> <li>feat(icons): added <code>kayak</code> icon by <a href="https://github.com/jpjacobpadilla"><code>@jpjacobpadilla</code></a> in <a href="https://redirect.github.com/lucide-icons/lucide/pull/3054">lucide-icons/lucide#3054</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
91c7c4570e
|
chore(ui-deps): bump sonner from 2.0.6 to 2.0.7 in /llama_stack/ui (#3364)
Bumps [sonner](https://github.com/emilkowalski/sonner) from 2.0.6 to 2.0.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/emilkowalski/sonner/releases">sonner's releases</a>.</em></p> <blockquote> <h2>v2.0.7</h2> <p>Sonner now supports multiple <code><Toaster /></code> components, see more <a href="https://sonner.emilkowal.ski/toaster#multiple-toasters">here</a>.</p> <h2>What's Changed</h2> <ul> <li>feat: add testId prop for individual toast components by <a href="https://github.com/b-like-bahar"><code>@b-like-bahar</code></a> in <a href="https://redirect.github.com/emilkowalski/sonner/pull/660">emilkowalski/sonner#660</a></li> <li>feat(toaster): add support for multiple toasters with unique identifiers by <a href="https://github.com/taroj1205"><code>@taroj1205</code></a> in <a href="https://redirect.github.com/emilkowalski/sonner/pull/665">emilkowalski/sonner#665</a></li> <li>fix: tests by <a href="https://github.com/emilkowalski"><code>@emilkowalski</code></a> in <a href="https://redirect.github.com/emilkowalski/sonner/pull/677">emilkowalski/sonner#677</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/b-like-bahar"><code>@b-like-bahar</code></a> made their first contribution in <a href="https://redirect.github.com/emilkowalski/sonner/pull/660">emilkowalski/sonner#660</a></li> <li><a href="https://github.com/taroj1205"><code>@taroj1205</code></a> made their first contribution in <a href="https://redirect.github.com/emilkowalski/sonner/pull/665">emilkowalski/sonner#665</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7">https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
fe134d90e5
|
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3360)
Bumps [react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom) and [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom). These dependencies needed to be updated together. Updates `react-dom` from 19.1.0 to 19.1.1 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/facebook/react/releases">react-dom's releases</a>.</em></p> <blockquote> <h2>19.1.1 (July 28, 2025)</h2> <h3>React</h3> <ul> <li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a href="https://redirect.github.com/facebook/react/pull/33680">#33680</a> by <a href="https://github.com/hoxyq"><code>@hoxyq</code></a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's changelog</a>.</em></p> <blockquote> <h2>19.1.1 (July 28, 2025)</h2> <h3>React</h3> <ul> <li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a href="https://redirect.github.com/facebook/react/pull/33680">#33680</a> by <a href="https://github.com/hoxyq"><code>@hoxyq</code></a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
6a35bd7bb6
|
chore: update the anthropic inference impl to use openai-python for openai-compat functions (#3366)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do? update the Anthropic inference provider to use openai-python for the openai-compat endpoints ## Test Plan ci Co-authored-by: raghotham <rsm@meta.com> |
||
|
d23607483f
|
chore: update the groq inference impl to use openai-python for openai-compat functions (#3348)
# What does this PR do? update Groq inference provider to use OpenAIMixin for openai-compat endpoints changes on api.groq.com - - json_schema is now supported for specific models, see https://console.groq.com/docs/structured-outputs#supported-models - response_format with streaming is now supported for models that support response_format - groq no longer returns a 400 error if tools are provided and tool_choice is not "required" ## Test Plan ``` $ GROQ_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::groq --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model groq/llama-3.3-70b-versatile tests/integration/inference/test_openai_completion.py -k 'not store' ... SKIPPED [3] tests/integration/inference/test_openai_completion.py:44: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support OpenAI completions. SKIPPED [3] tests/integration/inference/test_openai_completion.py:94: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support vllm extra_body parameters. SKIPPED [4] tests/integration/inference/test_openai_completion.py:73: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support n param. SKIPPED [1] tests/integration/inference/test_openai_completion.py💯 Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support chat completion calls with base64 encoded files. ======================= 8 passed, 11 skipped, 8 deselected, 2 warnings in 5.13s ======================== ``` --------- Co-authored-by: raghotham <rsm@meta.com> |
||
|
bf02cd846f
|
chore: update the sambanova inference impl to use openai-python for openai-compat functions (#3345)
# What does this PR do? update SambaNova inference provider to use OpenAIMixin for openai-compat endpoints ## Test Plan ``` $ SAMBANOVA_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::sambanova --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model sambanova/Meta-Llama-3.3-70B-Instruct tests/integration/inference -k 'not store' ... FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=sambanova/Meta-Llama-3.3-70B-Instruct-inference:chat_completion:tool_calling_tools_absent-True] - AttributeError: 'NoneType' object has no attribute 'delta' FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=sambanova/Meta-Llama-3.3-70B-Instruct-inference:chat_completion:tool_calling_tools_absent-False] - llama_stack_client.InternalServerError: Error code: 500 - {'detail': 'Internal server error: An une... =========== 2 failed, 16 passed, 68 skipped, 8 deselected, 3 xfailed, 13 warnings in 15.85s ============ ``` the two failures also exist before this change. they are part of the deprecated inference.chat_completion tests that flow through litellm. they can be resolved later. |
||
|
d6c3b36390
|
chore: update the gemini inference impl to use openai-python for openai-compat functions (#3351)
# What does this PR do? update the Gemini inference provider to use openai-python for the openai-compat endpoints partially addresses #3349, does not address /inference/completion or /inference/chat-completion ## Test Plan ci |
||
|
7cd1c2c238
|
feat: Updating Rag Tool to use Files API and Vector Stores API (#3344)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (push) Failing after 23s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m32s
|
||
|
df1526991f
|
feat(batches, completions): add /v1/completions support to /v1/batches (#3309)
# What does this PR do? add support for /v1/completions to the /v1/batches api ## Test Plan ci |
||
|
e2fe39aee1
|
feat!: Migrate Vector DB IDs to Vector Store IDs (breaking change) (#3253)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 35s
Pre-commit / pre-commit (push) Successful in 1m15s
# What does this PR do? This change migrates the VectorDB id generation to Vector Stores. This is a breaking change for **_some users_** that may have application code using the `vector_db_id` parameter in the request of the VectorDB protocol instead of the `VectorDB.identifier` in the response. By default we will now create a Vector Store every time we register a VectorDB. The caveat with this approach is that this maps the `vector_db_id` → `vector_store.name`. This is a reasonable tradeoff to transition users towards OpenAI Vector Stores. As an added benefit, registering VectorDBs will result in them appearing in the VectorStores admin UI. ### Why? This PR makes the `POST` API call to `/v1/vector-dbs` swap the `vector_db_id` parameter in the **request body** into the VectorStore's name field and sets the `vector_db_id` to the generated vector store id (e.g., `vs_038247dd-4bbb-4dbb-a6be-d5ecfd46cfdb`). That means that users would have to do something like follows in their application code: ```python res = client.vector_dbs.register( vector_db_id='my-vector-db-id', embedding_model='ollama/all-minilm:l6-v2', embedding_dimension=384, ) vector_db_id = res.identifier ``` And then the rest of their code would behave, including `VectorIO`'s insert protocol using `vector_db_id` in the request. An alternative implementation would be to just delete the `vector_db_id` parameter in `VectorDB` but the end result would still require users having to write `vector_db_id = res.identifier` since `VectorStores.create()` generates the ID for you. So this approach felt the easiest way to migrate users towards VectorStores (subsequent PRs will be added to trigger `files.create()` and `vector_stores.files.create()`). ## Test Plan Unit tests and integration tests have been added. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
64b2977162
|
fix: Fix locations of distrubution runtime directories (#3336)
The defaults were mixed up Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
0b00c68d59
|
fix: use lambda pattern for bedrock config env vars (#3307)
# What does this PR do? Improved bedrock provider config to read from environment variables like AWS_ACCESS_KEY_ID. Updated all fields to use default_factory with lambda patterns like the nvidia provider does. Now the environment variables work as documented. Closes #3305 ## Test Plan Ran the new bedrock config tests: ```bash python -m pytest tests/unit/providers/inference/bedrock/test_config.py -v Verified existing provider tests still work: python -m pytest tests/unit/providers/test_configs.py -v |
||
|
55a8c5f439
|
fix: show descriptive MCP server connection errors instead of generic 500s (#3256)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 1m20s
Pre-commit / pre-commit (push) Successful in 2m37s
What does this PR do? Fixes error handling when MCP server connections fail. Instead of returning generic 500 errors, now provides descriptive error messages with proper HTTP status codes. Closes #3107 Test Plan Before fix: curl -X GET "http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server" Returns: {"detail": "Internal server error: An unexpected error occurred."} (500) After fix: curl -X GET "http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server" Returns: {"error": {"detail": "Failed to connect to MCP server at http://localhost:9999/sse: Connection refused"}} (502) Tests: - Added unit test for ConnectionError → 502 translation - Manually tested with unreachable MCP servers (connection refused) |
||
|
561d2fc6b8
|
fix: Move to older version for docker container failure [fireworks-ai] (#3338)
# What does this PR do?
Noticed the test
https://github.com/llamastack/llama-stack-ops/actions/workflows/test-maybe-cut.yaml
are still failing randomly.
Earlier fixed this with 0.18.0 of fireworks here
https://github.com/llamastack/llama-stack/pull/3267, the local testing
may have inadvertently picked a lower version with `<=` which I assumed
picks latest version.
Now tested with `==` to find the version where it broke and pinning to
version(`<=`) where it was passing.
## Test Plan
Tested locally with the following commands to start a container
Build container
`llama stack build --distro starter --image-type container`
start container `docker run -d -p 8321:8321 --name llama-stack-test
distribution-starter:0.2.20`
check health `http://localhost:8321/v1/health`
Above steps fails without the fix
Tested with `==` to ensure the same version is picked in local testing
instead of anything lower.
Following here for the fix from `fireworks-ai`
|
||
|
bcc7f2c7d0
|
chore: async inference store write (#3318)
# What does this PR do? ## Test Plan ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 30 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` Before: ============================================================ BENCHMARK RESULTS ============================================================ Total time: 30.00s Concurrent users: 50 Total requests: 1267 Successful requests: 1267 Failed requests: 0 Success rate: 100.0% Requests per second: 42.23 After: ============================================================ BENCHMARK RESULTS ============================================================ Total time: 30.00s Concurrent users: 50 Total requests: 1449 Successful requests: 1449 Failed requests: 0 Success rate: 100.0% Requests per second: 48.30 |
||
|
5bbca56cfc
|
fix: Make SentenceTransformer embedding operations non-blocking (#3335)
- Wrap model loading with asyncio.to_thread() to prevent blocking during model download/initialization - Wrap encoding operations with asyncio.to_thread() to run in background thread - Convert _load_sentence_transformer_model() to async method This ensures the async event loop remains responsive during embedding operations. Closes: #3332 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
85f33762d7
|
refactor(server): remove hardcoded 409 and 404 status codes in server.py using httpx constants (#3333)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is eliminating hardcoded status codes: `409` CONFLICT and `404` NOT_FOUND in `server.py` using `httpx` built-in constants. This implementation will follow the existing structure to improve readability, extensibility and developer experience. This is already was implemented in #3131 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> `./scripts/unit-tests.sh` |
||
|
5d52e0d2c5
|
chore: handle missing finish_reason (#3328)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do? Sometimes the stream don't have chunks with finish_reason, e.g. canceled stream, which throws a pydantic error as OpenAIChoice.finish_reason: str ## Test Plan observe no more such error when benchmarking |
||
|
c3d3a0b833
|
feat(tests): auto-merge all model list responses and unify recordings (#3320)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 1m7s
Pre-commit / pre-commit (push) Successful in 2m34s
One needed to specify record-replay related environment variables for running integration tests. We could not use defaults because integration tests could be run against Ollama instances which could be running different models. For example, text vs vision tests needed separate instances of Ollama because a single instance typically cannot serve both of these models if you assume the standard CI worker configuration on Github. As a result, `client.list()` as returned by the Ollama client would be different between these runs and we'd end up overwriting responses. This PR "solves" it by adding a small amount of complexity -- we store model list responses specially, keyed by the hashes of the models they return. At replay time, we merge all of them and pretend that we have the union of all models available. ## Test Plan Re-recorded all the tests using `scripts/integration-tests.sh --inference-mode record`, including the vision tests. |