Commit graph

1690 commits

Author SHA1 Message Date
Matthew Farrellee
01bdcce4d2
chore(recorder): update mocks to be closer to non-mock environment (#3442)
# What does this PR do?

the @required_args decorator in openai-python is masking the async
nature of the {AsyncCompletions,chat.AsyncCompletions}.create method.
see https://github.com/openai/openai-python/issues/996

this means two things -

 0. we cannot use iscoroutine in the recorder to detect async vs non
 1. our mocks are inappropriately introducing identifiable async

for (0), we update the iscoroutine check w/ detection of /v1/models,
which is the only non-async function we mock & record.

for (1), we could leave everything as is and assume (0) will catch
errors. to be defensive, we update the unit tests to mock below create
methods, allowing the true openai-python create() methods to be tested.
2025-09-15 15:25:53 -04:00
dependabot[bot]
b6cb817897
chore(ui-deps): bump @radix-ui/react-select from 2.2.5 to 2.2.6 in /llama_stack/ui (#3437)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 19s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 1m39s
Bumps [@radix-ui/react-select](https://github.com/radix-ui/primitives)
from 2.2.5 to 2.2.6.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-select&package-manager=npm_and_yarn&previous-version=2.2.5&new-version=2.2.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 09:46:14 +02:00
dependabot[bot]
36fd97e306
chore(ui-deps): bump next from 15.3.3 to 15.5.3 in /llama_stack/ui (#3438)
Bumps [next](https://github.com/vercel/next.js) from 15.3.3 to 15.5.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.3</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: validation return types of pages API routes (<a
href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>)</li>
<li>fix: relative paths in dev in validator.ts (<a
href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>)</li>
<li>fix: remove satisfies keyword from type validation to preserve old
TS compatibility (<a
href="https://redirect.github.com/vercel/next.js/issues/83071">#83071</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a> for helping!</p>
<h2>v15.5.2</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: disable unknownatrules lint rule entirely (<a
href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>)</li>
<li>revert: add ?dpl to fonts in /_next/static/media (<a
href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a> and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v15.5.1</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: aliased navigations should apply scroll handling (<a
href="https://redirect.github.com/vercel/next.js/issues/82900">#82900</a>)</li>
<li>Turbopack: fix invalid NFT entry with file behind symlink (<a
href="https://redirect.github.com/vercel/next.js/issues/82887">#82887</a>)</li>
<li>fix: typesafe linking to route handlers and pages API routes (<a
href="https://redirect.github.com/vercel/next.js/issues/82858">#82858</a>)</li>
<li>fix: change &quot;noUnknownAtRules&quot; to &quot;warn&quot; for
Biome (<a
href="https://redirect.github.com/vercel/next.js/issues/82974">#82974</a>)</li>
<li>fix: add path normalization to getRelativePath for Windows (<a
href="https://redirect.github.com/vercel/next.js/issues/82918">#82918</a>)</li>
<li>feat: add typesafety with config.typedRoutes to redirect() and
permanentRedirect() (<a
href="https://redirect.github.com/vercel/next.js/issues/82860">#82860</a>)</li>
<li>fix: avoid importing types that will be unused (<a
href="https://redirect.github.com/vercel/next.js/issues/82856">#82856</a>)</li>
<li>fix: update the config.api.responseLimit type (<a
href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>)</li>
<li>fix: update validation return types (<a
href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a>, <a
href="https://github.com/mischnic"><code>@​mischnic</code></a>, and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v15.5.1-canary.39</h2>
<h3>Core Changes</h3>
<ul>
<li>[metadata] change the metadata routes params to promises: <a
href="https://redirect.github.com/vercel/next.js/issues/83560">#83560</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="07d1cbc9c6"><code>07d1cbc</code></a>
v15.5.3</li>
<li><a
href="db56d77595"><code>db56d77</code></a>
[backport] fix: validation return types of pages API routes (<a
href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83580">#83580</a>)</li>
<li><a
href="7a806231f8"><code>7a80623</code></a>
[backport] fix: relative paths in dev in validator.ts (<a
href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83190">#83190</a>)</li>
<li><a
href="fddaeb85a0"><code>fddaeb8</code></a>
[backport] fix: remove <code>satisfies</code> keyword from type
validation to preserve o...</li>
<li><a
href="497ec6aa08"><code>497ec6a</code></a>
v15.5.2</li>
<li><a
href="bc72f41a2e"><code>bc72f41</code></a>
[backport] revert: add ?dpl to fonts in <code>/_next/static/media</code>
(<a
href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83066">#83066</a>)</li>
<li><a
href="c8faf6800b"><code>c8faf68</code></a>
[backport] fix: disable unknownatrules lint rule entirely (<a
href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83060">#83060</a>)</li>
<li><a
href="cc68ced552"><code>cc68ced</code></a>
v15.5.1</li>
<li><a
href="1ce9857276"><code>1ce9857</code></a>
[backport] fix: update validation return types (<a
href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83027">#83027</a>)</li>
<li><a
href="b93c894717"><code>b93c894</code></a>
[backport] fix: update the config.api.responseLimit type (<a
href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83028">#83028</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.3.3...v15.5.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.3.3&new-version=15.5.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 09:46:05 +02:00
Matthew Farrellee
6787755c0c
chore(recorder): add support for NOT_GIVEN (#3430)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 41s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m31s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
# What does this PR do?

the recorder mocks the openai-python interface. the openai-python
interface allows NOT_GIVEN as an input option. this change properly
handles NOT_GIVEN.


## Test Plan

ci (coverage for chat, completions, embeddings)
2025-09-13 11:11:38 -07:00
Matthew Farrellee
3de9ad0a87
chore(recorder, tests): add test for openai /v1/models (#3426)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do?

- [x] adds a test for the recorder's handling of /v1/models
- [x] adds a fix for /v1/models handling

## Test Plan

ci
2025-09-12 14:59:56 -07:00
Doug Edgar
f67081d2d6
feat: migrate to FIPS-validated cryptographic algorithms (#3423)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do?
Migrates MD5 and SHA-1 hash algorithms to SHA-256.

In particular, replaces:   
   - MD5 in chunk ID generation.
   - MD5 in file verification.
   - SHA-1 in model identifier digests.

And updates all related test expectations.

Original discussion:
https://github.com/llamastack/llama-stack/discussions/3413

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3424.

## Test Plan
Unit tests from scripts/unit-tests.sh were updated to match the new hash
output, and ran to verify the tests pass.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-09-12 11:18:19 +02:00
Matthew Farrellee
8ef1189be7
chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions (#3404)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 31s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?

update vLLM inference provider to use OpenAIMixin for openai-compat
functions

inference recordings from Qwen3-0.6B and vLLM 0.8.3 -
```
docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \
    vllm/vllm-openai:latest \
    --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes
```

## Test Plan

```
./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference
```
2025-09-11 09:04:38 -04:00
Francisco Arceo
d15368a302
chore: Updating documentation, adding exception handling for Vector Stores in RAG Tool, more tests on migration, and migrate off of inference_api for context_retriever for RAG (#3367)
# What does this PR do?

- Updating documentation on migration from RAG Tool to Vector Stores and
Files APIs
- Adding exception handling for Vector Stores in RAG Tool
- Add more tests on migration from RAG Tool to Vector Stores
- Migrate off of inference_api for context_retriever for RAG

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Integration and unit tests added

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-11 14:20:11 +02:00
Sébastien Han
f31bcc11bc
feat: add Azure OpenAI inference provider support (#3396)
# What does this PR do?

Llama-stack now supports a new OpenAI compatible endpoint with Azure
OpenAI. The starter distro has been updated to add the new remote
inference provider.

A few tests have been modified and improved.

## Test Plan

Deploy a model in the Aure portal then:

```
$ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py
...

Results:

```
============================================= test session starts
============================================== platform darwin -- Python
3.12.8, pytest-8.4.1, pluggy-1.6.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir:
.pytest_cache
metadata: {'Python': '3.12.8', 'Platform':
'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1',
'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1',
'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0',
'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval':
'0.11.0', 'hydra-core': '1.3.2'}} rootdir:
/Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0,
json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1,
nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO,
asyncio_default_fixture_loop_scope=None,
asyncio_default_test_loop_scope=function collected 27 items


tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix]
SKIPPED [ 7%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 11%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1]
SKIPPED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini]
SKIPPED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 22%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 29%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 33%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini]
SKIPPEDed files.) [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0]
SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 62%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 66%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 77%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 88%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 92%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [100%]

=========================================== short test summary info
============================================ SKIPPED [3]
tests/integration/inference/test_openai_completion.py:63: Model
azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI
completions. SKIPPED [3]
tests/integration/inference/test_openai_completion.py:118: Model
azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body
parameters. SKIPPED [1]
tests/integration/inference/test_openai_completion.py:124: Model
azure/gpt-5-mini hosted by remote::azure doesn't support chat completion
calls with base64 encoded files. ================================== 20
passed, 7 skipped, 2 warnings in 51.77s
==================================
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-11 13:48:38 +02:00
Matthew Farrellee
c2d281e01b
chore(replay): improve replay robustness with un-validated construction (#3414)
# What does this PR do?

some providers do not produce spec compliant outputs. when this happens
the replay infra will fail to construct the proper types and will return
a dict to the client. the client likely does not expect a dict.

this was discovered with tgi, which returns finish_reason="" when valid
values are "stop", "length" or "content_filter"

## Test Plan

ci
2025-09-11 13:48:19 +02:00
Sumanth Kamenani
2838d5a20f
fix: AWS Bedrock inference profile ID conversion for region-specific endpoints (#3386)
Fixes #3370

AWS switched to requiring region-prefixed inference profile IDs instead
of foundation model IDs for on-demand throughput. This was causing
ValidationException errors.

Added auto-detection based on boto3 client region to convert model IDs
like meta.llama3-1-70b-instruct-v1:0 to
us.meta.llama3-1-70b-instruct-v1:0 depending on the detected region.

Also handles edge cases like ARNs, case insensitive regions, and None
regions.

Tested with this request.
```json
{
  "model_id": "meta.llama3-1-8b-instruct-v1:0",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "tell me a riddle"
    }
  ],
  "sampling_params": {
     "strategy": {
        "type": "top_p",
        "temperature": 0.7,
        "top_p": 0.9
      },
      "max_tokens": 512
  }
}
```
<img width="1488" height="878" alt="image"
src="https://github.com/user-attachments/assets/0d61beec-3869-4a31-8f37-9f554c280b88"
/>
2025-09-11 11:41:53 +02:00
Sébastien Han
8e05c68d15
chore: remove openai dependency from providers (#3398)
# What does this PR do?

The openai package is already a dependency of the llama-stack project
itself, so let's the project dictate which openai version we need and
avoid potential breakage with unsatisfiable dependency resolution.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-11 10:19:59 +02:00
Ashwin Bharambe
0c7f49490c
fix(inference_store): on duplicate chat completion IDs, replace (#3408)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 30s
UI Tests / ui-tests (22) (push) Successful in 35s
Pre-commit / pre-commit (push) Successful in 1m45s
# What does this PR do?

Duplicate chat completion IDs can be generated during tests especially
if they are replaying recorded responses across different tests. No need
to warn or error under those circumstances. In the wild, this is not
likely to happen at all (no evidence) so we aren't really hiding any
problem.
2025-09-10 14:34:18 -07:00
dependabot[bot]
d4e45cd5f1
chore(ui-deps): bump tailwindcss from 4.1.6 to 4.1.13 in /llama_stack/ui (#3362)
Bumps
[tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss)
from 4.1.6 to 4.1.13.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/releases">tailwindcss's
releases</a>.</em></p>
<blockquote>
<h2>v4.1.13</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>v4.1.12</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
<li>Fix trailing <code>)</code> from interfering with extraction in
Clojure keywords (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Detect classes inside Elixir charlist, word list, and string sigils
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li>
<li>Track source locations through <code>@plugin</code> and
<code>@config</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Allow boolean values of <code>process.env.DEBUG</code> in
<code>@tailwindcss/node</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18485">#18485</a>)</li>
<li>Ignore consecutive semicolons in the CSS parser (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18532">#18532</a>)</li>
<li>Center the dropdown icon added to an input with a paired datalist by
default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18511">#18511</a>)</li>
<li>Extract candidates in Slang templates (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18565">#18565</a>)</li>
<li>Improve error messages when encountering invalid functional utility
names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18568">#18568</a>)</li>
<li>Discard CSS AST objects with <code>false</code> or
<code>undefined</code> properties (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18571">#18571</a>)</li>
<li>Allow users to disable URL rebasing in
<code>@tailwindcss/postcss</code> via <code>transformAssetUrls:
false</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18321">#18321</a>)</li>
<li>Fix false-positive migrations in <code>addEventListener</code> and
JavaScript variable names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18718">#18718</a>)</li>
<li>Fix Standalone CLI showing default Bun help when run via symlink on
Windows (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18723">#18723</a>)</li>
<li>Read from <code>--border-color-*</code> theme keys in
<code>divide-*</code> utilities for backwards compatibility (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18704/">#18704</a>)</li>
<li>Don't scan <code>.hdr</code> and <code>.exr</code> files for classes
by default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18734">#18734</a>)</li>
</ul>
<h2>v4.1.11</h2>
<h3>Fixed</h3>
<ul>
<li>Add heuristic to skip candidate migrations inside
<code>emit(…)</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18330">#18330</a>)</li>
<li>Extract candidates with variants in Clojure/ClojureScript keywords
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18338">#18338</a>)</li>
<li>Document <code>--watch=always</code> in the CLI's usage (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18337">#18337</a>)</li>
<li>Add support for Vite 7 to <code>@tailwindcss/vite</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18384">#18384</a>)</li>
</ul>
<h2>v4.1.10</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md">tailwindcss's
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.13] - 2025-09-03</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>[4.1.12] - 2025-08-13</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
<li>Fix trailing <code>)</code> from interfering with extraction in
Clojure keywords (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Detect classes inside Elixir charlist, word list, and string sigils
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li>
<li>Track source locations through <code>@plugin</code> and
<code>@config</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Allow boolean values of <code>process.env.DEBUG</code> in
<code>@tailwindcss/node</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18485">#18485</a>)</li>
<li>Ignore consecutive semicolons in the CSS parser (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18532">#18532</a>)</li>
<li>Center the dropdown icon added to an input with a paired datalist by
default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18511">#18511</a>)</li>
<li>Extract candidates in Slang templates (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18565">#18565</a>)</li>
<li>Improve error messages when encountering invalid functional utility
names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18568">#18568</a>)</li>
<li>Discard CSS AST objects with <code>false</code> or
<code>undefined</code> properties (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18571">#18571</a>)</li>
<li>Allow users to disable URL rebasing in
<code>@tailwindcss/postcss</code> via <code>transformAssetUrls:
false</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18321">#18321</a>)</li>
<li>Fix false-positive migrations in <code>addEventListener</code> and
JavaScript variable names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18718">#18718</a>)</li>
<li>Fix Standalone CLI showing default Bun help when run via symlink on
Windows (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18723">#18723</a>)</li>
<li>Read from <code>--border-color-*</code> theme keys in
<code>divide-*</code> utilities for backwards compatibility (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18704/">#18704</a>)</li>
<li>Don't scan <code>.hdr</code> and <code>.exr</code> files for classes
by default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18734">#18734</a>)</li>
</ul>
<h2>[4.1.11] - 2025-06-26</h2>
<h3>Fixed</h3>
<ul>
<li>Add heuristic to skip candidate migrations inside
<code>emit(…)</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18330">#18330</a>)</li>
<li>Extract candidates with variants in Clojure/ClojureScript keywords
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18338">#18338</a>)</li>
<li>Document <code>--watch=always</code> in the CLI's usage (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18337">#18337</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1334c99db8"><code>1334c99</code></a>
Prepare v4.1.13 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18868">#18868</a>)</li>
<li><a
href="65dc530f05"><code>65dc530</code></a>
Do not allow variants to end with <code>-</code> or <code>_</code> (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18872">#18872</a>)</li>
<li><a
href="54c3f308e9"><code>54c3f30</code></a>
Do not allow variants to start with <code>-</code> (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18867">#18867</a>)</li>
<li><a
href="494051ca08"><code>494051c</code></a>
Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18869">#18869</a>)</li>
<li><a
href="c318329a1e"><code>c318329</code></a>
chore: remove redundant words (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18853">#18853</a>)</li>
<li><a
href="ddc84b079b"><code>ddc84b0</code></a>
update test after prettier change</li>
<li><a
href="f1331a857a"><code>f1331a8</code></a>
run prettier</li>
<li><a
href="e5513b6c75"><code>e5513b6</code></a>
Fix missing code block delimiters in comment blocks (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18837">#18837</a>)</li>
<li><a
href="5e2a160d8b"><code>5e2a160</code></a>
Drop exact duplicate declarations from output CSS within a style rule
(<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18809">#18809</a>)</li>
<li><a
href="b1fb02a2d7"><code>b1fb02a</code></a>
Hide internal fields from completions in <code>matchUtilities</code> (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18820">#18820</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tailwindlabs/tailwindcss/commits/v4.1.13/packages/tailwindcss">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tailwindcss&package-manager=npm_and_yarn&previous-version=4.1.6&new-version=4.1.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:18:14 -07:00
ehhuang
e980436a2e
chore: introduce write queue for inference_store (#3383)
# What does this PR do?
Adds a write worker queue for writes to inference store. This avoids
overwhelming request processing with slow inference writes.

## Test Plan

Benchmark:
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
## RPS from 21 -> 57
2025-09-10 11:57:42 -07:00
Francisco Arceo
a6b1588dc6
revert: Fireworks chat completion broken due to telemetry (#3402)
Reverts llamastack/llama-stack#3392
2025-09-10 11:53:38 -07:00
ehhuang
f6bf36343d
chore: logging perf improvments (#3393)
# What does this PR do?
- Use BackgroundLogger when logging metric events.
- Reuse event loop in BackgroundLogger

## Test Plan
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
### RPS from 57 -> 62
2025-09-10 11:52:23 -07:00
slekkala1
935b8e28de
fix: Fireworks chat completion broken due to telemetry (#3392)
# What does this PR do?
Fix fireworks chat completion broken due to telemetry expecting
response.usage
 Closes https://github.com/llamastack/llama-stack/issues/3391

## Test Plan
1. `uv run --with llama-stack llama stack build --distro starter
--image-type venv --run`
Try 

```
curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'
```
```
{"id":"chatcmpl-ee922a08-0df0-4974-b0d3-b322113e8bc0","choices":[{"message":{"role":"assistant","content":"Hello! How can I assist you today?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1757456375,"model":"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct"}%   
```

Without fix fails as mentioned in
https://github.com/llamastack/llama-stack/issues/3391

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-10 08:48:01 -07:00
Sébastien Han
c86e45496e
ci: Re-enable pre-commit to fail (#3399)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 58s
Pre-commit / pre-commit (push) Successful in 1m14s
If pre-commit fails, the workflow must fail.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-10 10:00:46 -04:00
Matthew Farrellee
0e27016cf2
chore: update the vertexai inference impl to use openai-python for openai-compat functions (#3377)
# What does this PR do?

update VertexAI inference provider to use openai-python for
openai-compat functions

## Test Plan

```
$ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
...
```

i don't have an account to test this. `get_api_key` may also need to be
updated per
https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-09-10 15:39:29 +02:00
Cesare Pompeiano
1c23aeb937
feat: Add vector_db_id to chunk metadata (#3304)
# What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to
trace where retrieved chunks originate from. This PR adds the
`vector_db_id` into each chunk’s metadata, making it easier to
understand which database a given chunk came from. This is helpful for
debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

```python
for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)
```

## Test Plan

* Ran Llama Stack in debug mode.
* Verified that `vector_db_id` was added to each chunk’s metadata.
* Confirmed that the metadata was printed in the console when using the
RAG tool.

---------

Co-authored-by: are-ces <cpompeia@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-10 11:19:21 +02:00
Matthew Farrellee
dd1f946b3e
feat: include a default inference store during llama stack build (#3373)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

enables completions storage when using `llama stack build --providers` -
 - GET /v1/chat/completions
 - GET /v1/chat/completions/{id}

todo: llama stack build and distro codegen should use the same code
paths

## Test Plan

ci
2025-09-09 15:54:58 -07:00
ehhuang
9d3a234bf3
chore: remove unused variable (#3389)
# What does this PR do?


## Test Plan
2025-09-09 15:51:20 -07:00
github-actions[bot]
28696c3f30 build: Bump version to 0.2.21
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
UI Tests / ui-tests (22) (push) Successful in 37s
Test External API and Providers / test-external (venv) (push) Failing after 41s
Pre-commit / pre-commit (push) Successful in 2m0s
2025-09-08 22:30:03 +00:00
Ashwin Bharambe
30468d0c43
fix(deps): bump datasets versions for all providers (#3382)
Not doing so results in errors of the kind you see in:
4989026435
2025-09-08 15:13:42 -07:00
Derek Higgins
ef02b9ea10
fix: environment variable typo in inference recorder error message (#3374)
The error message was referencing LLAMA_STACK_INFERENCE_MODE instead of
the correct LLAMA_STACK_TEST_INFERENCE_MODE environment variable.
2025-09-08 17:51:38 +02:00
Francisco Arceo
ad6ea7fb91
feat: Adding OpenAI Prompts API (#3319)
# What does this PR do?
This PR adds support for OpenAI Prompts API.

Note, OpenAI does not explicitly expose the Prompts API but instead
makes it available in the Responses API and in the [Prompts
Dashboard](https://platform.openai.com/docs/guides/prompting#create-a-prompt).

I have added the following APIs:
- CREATE
- GET
- LIST
- UPDATE
- Set Default Version

The Set Default Version API is made available only in the Prompts
Dashboard and configures which prompt version is returned in the GET
(the latest version is the default).

Overall, the expected functionality in Responses will look like this:

```python
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
  prompt={
    "id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e",
    "version": "2",
    "variables": {
        "city": "San Francisco",
        "age": 30,
    }
  }
)
```

### Resolves https://github.com/llamastack/llama-stack/issues/3276


## Test Plan
Unit tests added. Integration tests can be added after client
generation.

## Next Steps
1. Update Responses API to support Prompt API
2. I'll enhance the UI to implement the Prompt Dashboard. 
3. Add cache for lower latency

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-08 11:05:13 -04:00
Akram Ben Aissi
072dca0609
feat: Add Kubernetes auth provider to use SelfSubjectReview and kubernetes api server (#2559)
# What does this PR do?
Add Kubernetes authentication provider support
- Add KubernetesAuthProvider class for token validation using Kubernetes
SelfSubjectReview API
- Add KubernetesAuthProviderConfig with configurable API server URL, TLS
settings, and claims mapping
- Implement authentication via POST requests to
/apis/authentication.k8s.io/v1/selfsubjectreviews endpoint
- Add support for parsing Kubernetes SelfSubjectReview response format
to extract user information
- Add KUBERNETES provider type to AuthProviderType enum
- Update create_auth_provider factory function to handle 'kubernetes'
provider type
- Add comprehensive unit tests for KubernetesAuthProvider functionality
- Add documentation with configuration examples and usage instructions

The provider validates tokens by sending SelfSubjectReview requests to
the Kubernetes API server and extracts user information from the
userInfo structure in the response.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
What This Verifies:
Authentication header validation
Token validation with Kubernetes SelfSubjectReview and kubernetes server
API endpoint
Error handling for invalid tokens and HTTP errors
Request payload structure and headers

```
python -m pytest tests/unit/server/test_auth.py -k "kubernetes" -v
```

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
2025-09-08 11:25:10 +02:00
dependabot[bot]
e1b81ce1fc
chore(ui-deps): bump @radix-ui/react-dropdown-menu from 2.1.14 to 2.1.16 in /llama_stack/ui (#3361)
Bumps
[@radix-ui/react-dropdown-menu](https://github.com/radix-ui/primitives)
from 2.1.14 to 2.1.16.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-dropdown-menu&package-manager=npm_and_yarn&previous-version=2.1.14&new-version=2.1.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:59:44 +02:00
dependabot[bot]
e508aef320
chore(ui-deps): bump lucide-react from 0.510.0 to 0.542.0 in /llama_stack/ui (#3363)
Bumps
[lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react)
from 0.510.0 to 0.542.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/lucide-icons/lucide/releases">lucide-react's
releases</a>.</em></p>
<blockquote>
<h2>Version 0.542.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(docs): add MDN Web Docs &amp; Nuxt to showcase by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3590">lucide-icons/lucide#3590</a></li>
<li>feat(icons): added <code>list-chevrons-down-up</code> icon by <a
href="https://github.com/juliankellydesign"><code>@​juliankellydesign</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3492">lucide-icons/lucide#3492</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/juliankellydesign"><code>@​juliankellydesign</code></a>
made their first contribution in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3492">lucide-icons/lucide#3492</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.541.0...0.542.0">https://github.com/lucide-icons/lucide/compare/0.541.0...0.542.0</a></p>
<h2>Version 0.541.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(packages/lucide): added support for providing a custom root
element by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3543">lucide-icons/lucide#3543</a></li>
<li>fix(icons): optimized <code>chrome</code> icon &amp; renamed to
<code>chromium</code> by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3572">lucide-icons/lucide#3572</a></li>
<li>fix(icons): changed <code>wallpaper</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3566">lucide-icons/lucide#3566</a></li>
<li>fix(icons): optimized <code>cog</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3548">lucide-icons/lucide#3548</a></li>
<li>fix(icons): changed <code>building</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3510">lucide-icons/lucide#3510</a></li>
<li>feat(dpi-preview): add previous version for easier comparison by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3532">lucide-icons/lucide#3532</a></li>
<li>feat(icons): added 'panel-dashed' variants + update tags on existing
icons by <a
href="https://github.com/irvineacosta"><code>@​irvineacosta</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3500">lucide-icons/lucide#3500</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.540.0...0.541.0">https://github.com/lucide-icons/lucide/compare/0.540.0...0.541.0</a></p>
<h2>Version 0.540.0</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(license): add full text of Feather license by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3530">lucide-icons/lucide#3530</a></li>
<li>fix(icons): changed <code>umbrella</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3490">lucide-icons/lucide#3490</a></li>
<li>docs(site): added official statement on brand logos in Lucide by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3541">lucide-icons/lucide#3541</a></li>
<li>fix(icons): changed <code>camera</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3539">lucide-icons/lucide#3539</a></li>
<li>feat(icons): added <code>rose</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/1972">lucide-icons/lucide#1972</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.539.0...0.540.0">https://github.com/lucide-icons/lucide/compare/0.539.0...0.540.0</a></p>
<h2>Version 0.539.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(icons): added <code>brick-wall-shield</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3476">lucide-icons/lucide#3476</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.538.0...0.539.0">https://github.com/lucide-icons/lucide/compare/0.538.0...0.539.0</a></p>
<h2>Version 0.538.0</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(icons): changed <code>apple</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3505">lucide-icons/lucide#3505</a></li>
<li>fix(icons): changed <code>store</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3501">lucide-icons/lucide#3501</a></li>
<li>fix(icons): changed <code>mic-off</code> icon by <a
href="https://github.com/lieonlion"><code>@​lieonlion</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2823">lucide-icons/lucide#2823</a></li>
<li>chore(deps): bump astro from 5.5.2 to 5.12.8 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3523">lucide-icons/lucide#3523</a></li>
<li>fix(icons): deprecate rail-symbol by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2862">lucide-icons/lucide#2862</a></li>
<li>feat(icons): added <code>kayak</code> icon by <a
href="https://github.com/jpjacobpadilla"><code>@​jpjacobpadilla</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3054">lucide-icons/lucide#3054</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e71198d9b3"><code>e71198d</code></a>
chore: icon alias improvements (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/2861">#2861</a>)</li>
<li><a
href="3e644fda2d"><code>3e644fd</code></a>
chore(scripts): Refactor scripts to typescript (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3316">#3316</a>)</li>
<li><a
href="19fa01b5fc"><code>19fa01b</code></a>
build(deps-dev): bump vite from 6.3.2 to 6.3.4 (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3181">#3181</a>)</li>
<li>See full diff in <a
href="https://github.com/lucide-icons/lucide/commits/0.542.0/packages/lucide-react">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lucide-react&package-manager=npm_and_yarn&previous-version=0.510.0&new-version=0.542.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:59:24 +02:00
dependabot[bot]
91c7c4570e
chore(ui-deps): bump sonner from 2.0.6 to 2.0.7 in /llama_stack/ui (#3364)
Bumps [sonner](https://github.com/emilkowalski/sonner) from 2.0.6 to
2.0.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/emilkowalski/sonner/releases">sonner's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.7</h2>
<p>Sonner now supports multiple <code>&lt;Toaster /&gt;</code>
components, see more <a
href="https://sonner.emilkowal.ski/toaster#multiple-toasters">here</a>.</p>
<h2>What's Changed</h2>
<ul>
<li>feat: add testId prop for individual toast components by <a
href="https://github.com/b-like-bahar"><code>@​b-like-bahar</code></a>
in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/660">emilkowalski/sonner#660</a></li>
<li>feat(toaster): add support for multiple toasters with unique
identifiers by <a
href="https://github.com/taroj1205"><code>@​taroj1205</code></a> in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/665">emilkowalski/sonner#665</a></li>
<li>fix: tests by <a
href="https://github.com/emilkowalski"><code>@​emilkowalski</code></a>
in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/677">emilkowalski/sonner#677</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/b-like-bahar"><code>@​b-like-bahar</code></a>
made their first contribution in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/660">emilkowalski/sonner#660</a></li>
<li><a href="https://github.com/taroj1205"><code>@​taroj1205</code></a>
made their first contribution in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/665">emilkowalski/sonner#665</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7">https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3ba7aa17ab"><code>3ba7aa1</code></a>
v2.0.7</li>
<li><a
href="0604827063"><code>0604827</code></a>
fix: tests (<a
href="https://redirect.github.com/emilkowalski/sonner/issues/677">#677</a>)</li>
<li><a
href="c50fe92dfb"><code>c50fe92</code></a>
fix tests</li>
<li><a
href="0600a5cb40"><code>0600a5c</code></a>
feat(toaster): add support for multiple toasters with unique identifiers
(<a
href="https://redirect.github.com/emilkowalski/sonner/issues/665">#665</a>)</li>
<li><a
href="c14bf44a03"><code>c14bf44</code></a>
feat: add testId prop for individual toast components (<a
href="https://redirect.github.com/emilkowalski/sonner/issues/660">#660</a>)</li>
<li>See full diff in <a
href="https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sonner&package-manager=npm_and_yarn&previous-version=2.0.6&new-version=2.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:59:02 +02:00
dependabot[bot]
fe134d90e5
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3360)
Bumps
[react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom)
and
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom).
These dependencies needed to be updated together.
Updates `react-dom` from 19.1.0 to 19.1.1
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/releases">react-dom's
releases</a>.</em></p>
<blockquote>
<h2>19.1.1 (July 28, 2025)</h2>
<h3>React</h3>
<ul>
<li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a
href="https://redirect.github.com/facebook/react/pull/33680">#33680</a>
by <a href="https://github.com/hoxyq"><code>@​hoxyq</code></a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's
changelog</a>.</em></p>
<blockquote>
<h2>19.1.1 (July 28, 2025)</h2>
<h3>React</h3>
<ul>
<li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a
href="https://redirect.github.com/facebook/react/pull/33680">#33680</a>
by <a href="https://github.com/hoxyq"><code>@​hoxyq</code></a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="87e33ca2b7"><code>87e33ca</code></a>
Set release versions to 19.1.1</li>
<li><a
href="b793948e15"><code>b793948</code></a>
Bump next prerelease version numbers (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32782">#32782</a>)</li>
<li>See full diff in <a
href="https://github.com/facebook/react/commits/v19.1.1/packages/react-dom">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/react-dom` from 19.1.5 to 19.1.9
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:58:45 +02:00
Matthew Farrellee
6a35bd7bb6
chore: update the anthropic inference impl to use openai-python for openai-compat functions (#3366)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do?

update the Anthropic inference provider to use openai-python for the
openai-compat endpoints

## Test Plan

ci

Co-authored-by: raghotham <rsm@meta.com>
2025-09-07 14:00:42 -07:00
Matthew Farrellee
d23607483f
chore: update the groq inference impl to use openai-python for openai-compat functions (#3348)
# What does this PR do?

update Groq inference provider to use OpenAIMixin for openai-compat
endpoints

changes on api.groq.com -
- json_schema is now supported for specific models, see
https://console.groq.com/docs/structured-outputs#supported-models
- response_format with streaming is now supported for models that
support response_format
- groq no longer returns a 400 error if tools are provided and
tool_choice is not "required"


## Test Plan

```
$ GROQ_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::groq --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model groq/llama-3.3-70b-versatile tests/integration/inference/test_openai_completion.py -k 'not store'
...
SKIPPED [3] tests/integration/inference/test_openai_completion.py:44: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support OpenAI completions.
SKIPPED [3] tests/integration/inference/test_openai_completion.py:94: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:73: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py💯 Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support chat completion calls with base64 encoded files.
======================= 8 passed, 11 skipped, 8 deselected, 2 warnings in 5.13s ========================
```

---------

Co-authored-by: raghotham <rsm@meta.com>
2025-09-06 15:36:27 -07:00
Matthew Farrellee
bf02cd846f
chore: update the sambanova inference impl to use openai-python for openai-compat functions (#3345)
# What does this PR do?

update SambaNova inference provider to use OpenAIMixin for openai-compat
endpoints

## Test Plan

```
$ SAMBANOVA_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::sambanova --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model sambanova/Meta-Llama-3.3-70B-Instruct tests/integration/inference -k 'not store'
...
FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=sambanova/Meta-Llama-3.3-70B-Instruct-inference:chat_completion:tool_calling_tools_absent-True] - AttributeError: 'NoneType' object has no attribute 'delta'
FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=sambanova/Meta-Llama-3.3-70B-Instruct-inference:chat_completion:tool_calling_tools_absent-False] - llama_stack_client.InternalServerError: Error code: 500 - {'detail': 'Internal server error: An une...
=========== 2 failed, 16 passed, 68 skipped, 8 deselected, 3 xfailed, 13 warnings in 15.85s ============
```

the two failures also exist before this change. they are part of the
deprecated inference.chat_completion tests that flow through litellm.
they can be resolved later.
2025-09-06 12:25:13 -07:00
Matthew Farrellee
d6c3b36390
chore: update the gemini inference impl to use openai-python for openai-compat functions (#3351)
# What does this PR do?

update the Gemini inference provider to use openai-python for the
openai-compat endpoints

partially addresses #3349, does not address /inference/completion or
/inference/chat-completion

## Test Plan

ci
2025-09-06 12:22:20 -07:00
Francisco Arceo
7cd1c2c238
feat: Updating Rag Tool to use Files API and Vector Stores API (#3344)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (push) Failing after 23s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m32s
2025-09-06 07:26:34 -06:00
Matthew Farrellee
df1526991f
feat(batches, completions): add /v1/completions support to /v1/batches (#3309)
# What does this PR do?

add support for /v1/completions to the /v1/batches api


## Test Plan

ci
2025-09-05 11:59:57 -07:00
Francisco Arceo
e2fe39aee1
feat!: Migrate Vector DB IDs to Vector Store IDs (breaking change) (#3253)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 35s
Pre-commit / pre-commit (push) Successful in 1m15s
# What does this PR do?
This change migrates the VectorDB id generation to Vector Stores.

This is a breaking change for **_some users_** that may have application
code using the `vector_db_id` parameter in the request of the VectorDB
protocol instead of the `VectorDB.identifier` in the response.

By default we will now create a Vector Store every time we register a
VectorDB. The caveat with this approach is that this maps the
`vector_db_id` → `vector_store.name`. This is a reasonable tradeoff to
transition users towards OpenAI Vector Stores.

As an added benefit, registering VectorDBs will result in them appearing
in the VectorStores admin UI.

### Why?
This PR makes the `POST` API call to `/v1/vector-dbs` swap the
`vector_db_id` parameter in the **request body** into the VectorStore's
name field and sets the `vector_db_id` to the generated vector store id
(e.g., `vs_038247dd-4bbb-4dbb-a6be-d5ecfd46cfdb`).

That means that users would have to do something like follows in their
application code:

```python
res = client.vector_dbs.register(
    vector_db_id='my-vector-db-id', 
    embedding_model='ollama/all-minilm:l6-v2', 
    embedding_dimension=384,
)
vector_db_id = res.identifier
```

And then the rest of their code would behave, including `VectorIO`'s
insert protocol using `vector_db_id` in the request.

An alternative implementation would be to just delete the `vector_db_id`
parameter in `VectorDB` but the end result would still require users
having to write `vector_db_id = res.identifier` since
`VectorStores.create()` generates the ID for you.

So this approach felt the easiest way to migrate users towards
VectorStores (subsequent PRs will be added to trigger `files.create()`
and `vector_stores.files.create()`).

## Test Plan
Unit tests and integration tests have been added.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-05 15:40:34 +02:00
Derek Higgins
64b2977162
fix: Fix locations of distrubution runtime directories (#3336)
The defaults were mixed up

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-09-05 14:09:36 +02:00
Sumanth Kamenani
0b00c68d59
fix: use lambda pattern for bedrock config env vars (#3307)
# What does this PR do?

Improved bedrock provider config to read from environment variables like
AWS_ACCESS_KEY_ID. Updated all
fields to use default_factory with lambda patterns like the nvidia
provider does.

  Now the environment variables work as documented.

  Closes #3305

  ## Test Plan

  Ran the new bedrock config tests:
  ```bash
python -m pytest tests/unit/providers/inference/bedrock/test_config.py
-v

Verified existing provider tests still work:
  python -m pytest tests/unit/providers/test_configs.py -v
2025-09-05 10:45:11 +02:00
Sumanth Kamenani
55a8c5f439
fix: show descriptive MCP server connection errors instead of generic 500s (#3256)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 1m20s
Pre-commit / pre-commit (push) Successful in 2m37s
What does this PR do?

Fixes error handling when MCP server connections fail. Instead of
returning generic 500 errors, now provides
   descriptive error messages with proper HTTP status codes.

  Closes #3107

  Test Plan

  Before fix:
curl -X GET
"http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server"
Returns: {"detail": "Internal server error: An unexpected error
occurred."} (500)

  After fix:
curl -X GET
"http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server"
Returns: {"error": {"detail": "Failed to connect to MCP server at
http://localhost:9999/sse: Connection
  refused"}} (502)

  Tests:
  - Added unit test for ConnectionError → 502 translation
  - Manually tested with unreachable MCP servers (connection refused)
2025-09-04 13:25:02 -07:00
slekkala1
561d2fc6b8
fix: Move to older version for docker container failure [fireworks-ai] (#3338)
# What does this PR do?
Noticed the test
https://github.com/llamastack/llama-stack-ops/actions/workflows/test-maybe-cut.yaml
are still failing randomly.

Earlier fixed this with 0.18.0 of fireworks here
https://github.com/llamastack/llama-stack/pull/3267, the local testing
may have inadvertently picked a lower version with `<=` which I assumed
picks latest version.
Now tested with `==` to find the version where it broke and pinning to
version(`<=`) where it was passing.


## Test Plan
Tested locally with the following commands to start a container

Build container
`llama stack build --distro starter --image-type container`
start container `docker run -d -p 8321:8321 --name llama-stack-test
distribution-starter:0.2.20`
check health `http://localhost:8321/v1/health`
Above steps fails without the fix

Tested with `==` to ensure the same version is picked in local testing
instead of anything lower.

Following here for the fix from `fireworks-ai`
1410674695

https://github.com/llamastack/llama-stack/issues/3273
2025-09-04 11:47:46 -07:00
ehhuang
bcc7f2c7d0
chore: async inference store write (#3318)
# What does this PR do?


## Test Plan
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 30 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
Before:

============================================================
BENCHMARK RESULTS
============================================================
Total time: 30.00s
Concurrent users: 50
Total requests: 1267
Successful requests: 1267
Failed requests: 0
Success rate: 100.0%
Requests per second: 42.23


After:

============================================================
BENCHMARK RESULTS
============================================================
Total time: 30.00s
Concurrent users: 50
Total requests: 1449
Successful requests: 1449
Failed requests: 0
Success rate: 100.0%
Requests per second: 48.30
2025-09-04 11:37:46 -07:00
Derek Higgins
5bbca56cfc
fix: Make SentenceTransformer embedding operations non-blocking (#3335)
- Wrap model loading with asyncio.to_thread() to prevent blocking during
model download/initialization
- Wrap encoding operations with asyncio.to_thread() to run in background
thread
- Convert _load_sentence_transformer_model() to async method

This ensures the async event loop remains responsive during embedding
operations.

Closes: #3332

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-04 13:58:41 -04:00
IAN MILLER
85f33762d7
refactor(server): remove hardcoded 409 and 404 status codes in server.py using httpx constants (#3333)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is eliminating hardcoded status codes: `409` CONFLICT and `404`
NOT_FOUND in `server.py` using `httpx` built-in constants. This
implementation will follow the existing structure to improve
readability, extensibility and developer experience. This is already was
implemented in #3131

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
`./scripts/unit-tests.sh`
2025-09-04 18:15:13 +02:00
ehhuang
5d52e0d2c5
chore: handle missing finish_reason (#3328)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?
Sometimes the stream don't have chunks with finish_reason, e.g. canceled
stream, which throws a pydantic error as OpenAIChoice.finish_reason: str

## Test Plan
observe no more such error when benchmarking
2025-09-04 13:23:18 +02:00
Ashwin Bharambe
c3d3a0b833
feat(tests): auto-merge all model list responses and unify recordings (#3320)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 1m7s
Pre-commit / pre-commit (push) Successful in 2m34s
One needed to specify record-replay related environment variables for
running integration tests. We could not use defaults because integration
tests could be run against Ollama instances which could be running
different models. For example, text vs vision tests needed separate
instances of Ollama because a single instance typically cannot serve
both of these models if you assume the standard CI worker configuration
on Github. As a result, `client.list()` as returned by the Ollama client
would be different between these runs and we'd end up overwriting
responses.

This PR "solves" it by adding a small amount of complexity -- we store
model list responses specially, keyed by the hashes of the models they
return. At replay time, we merge all of them and pretend that we have
the union of all models available.

## Test Plan

Re-recorded all the tests using `scripts/integration-tests.sh
--inference-mode record`, including the vision tests.
2025-09-03 11:33:03 -07:00
ehhuang
d948e63340
chore: Improve error message for missing provider dependencies (#3315)
Generated with CC:

Replace cryptic KeyError with clear, actionable error message that
shows:
- Which API the failing provider belongs to
- The provider ID and type that's failing
- Which dependency is missing
- Clear instructions on how to fix the issue


## Test plan
Use a run config with Agents API and no safety provider

Before: KeyError: <Api.safety: 'safety'>
After: Failed to resolve 'agents' provider 'meta-reference' of type
'inline::meta-reference': required dependency 'safety' is not available.
Please add a 'safety' provider to your configuration or check if the
provider is properly configured.
2025-09-03 16:11:59 +02:00
Cesare Pompeiano
ccaf6aaa51
chore(python-deps): replace ibm_watson_machine_learning with ibm_watsonx_ai (#3302)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
UI Tests / ui-tests (22) (push) Successful in 1m23s
Pre-commit / pre-commit (push) Successful in 3m5s
# What does this PR do?

This PR updates the Watsonx provider dependencies from
`ibm_watson_machine_learning` to `ibm_watsonx_ai`.

The old package `ibm_watson_machine_learning` is in **deprecation mode**
([[PyPI
link](https://pypi.org/project/ibm-watson-machine-learning/)](https://pypi.org/project/ibm-watson-machine-learning/))
and relies on older versions of dependencies such as `pandas`. Updating
to `ibm_watsonx_ai` ensures compatibility with current dependency
versions and ongoing support.

## Test Plan

I verified the update by running an inference using a model provided by
Watsonx. The model ran successfully, confirming that the new dependency
works as expected.

Co-authored-by: are-ces <cpompeia@redhat.com>
2025-09-03 11:33:35 +02:00