Compare commits

...

1181 commits

Author SHA1 Message Date
Alina Ryan
55a1da5526
feat(api): add file_processor API skeleton (#4113) 2025-12-24 08:53:24 -05:00
Costa Shulyupin
325a0bd7b3
refactor: demo_script.py (#4409)
- simplify search result processing in demo script
- optimize demo script by using inline text instead of big external file
- improve printouts clarity and user experience

---------

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
2025-12-23 13:50:10 -08:00
dependabot[bot]
22f84df68b
chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.8.1 to 1.9.0 (#4421)
Bumps
[stainless-api/upload-openapi-spec-action](https://github.com/stainless-api/upload-openapi-spec-action)
from 1.8.1 to 1.9.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/releases">stainless-api/upload-openapi-spec-action's
releases</a>.</em></p>
<blockquote>
<h2>v1.9.0</h2>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.8.1...v1.9.0">1.9.0</a>
(2025-12-20)</h2>
<h3>Features</h3>
<ul>
<li>check org-level enable_ai_commit_messages field (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/152">#152</a>)
(<a
href="90deb1bcc4">90deb1b</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/blob/main/CHANGELOG.md">stainless-api/upload-openapi-spec-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.8.1...v1.9.0">1.9.0</a>
(2025-12-20)</h2>
<h3>Features</h3>
<ul>
<li>check org-level enable_ai_commit_messages field (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/152">#152</a>)
(<a
href="90deb1bcc4">90deb1b</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.8.0...v1.8.1">1.8.1</a>
(2025-12-09)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>re-enable 'targets' param in diagnostics call (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/148">#148</a>)
(<a
href="3130e17c92">3130e17</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.1...v1.8.0">1.8.0</a>
(2025-12-08)</h2>
<h3>Features</h3>
<ul>
<li>support AI commit message generation for preview builds (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/143">#143</a>)
(<a
href="7010edb389">7010edb</a>)</li>
<li>support per-SDK commit messages in preview comments (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/142">#142</a>)
(<a
href="a36c33fc21">a36c33f</a>)</li>
<li>Update to latest <code>@​stainless-api/sdk</code> (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/144">#144</a>)
(<a
href="a9b388bded">a9b388b</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.0...v1.7.1">1.7.1</a>
(2025-12-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>improve getMergeBase to handle shallow clones more robustly (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/138">#138</a>)
(<a
href="3687845465">3687845</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.6.0...v1.7.0">1.7.0</a>
(2025-11-17)</h2>
<h3>Features</h3>
<ul>
<li><strong>preview:</strong> add output documented_spec_path to preview
action (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/135">#135</a>)
(<a
href="5e80cc40da">5e80cc4</a>)</li>
<li><strong>preview:</strong> add output_dir input and write documented
spec to file (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/137">#137</a>)
(<a
href="d30490c89b">d30490c</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.5...v1.6.0">1.6.0</a>
(2025-10-30)</h2>
<h3>Features</h3>
<ul>
<li>add support for github OIDC auth (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/133">#133</a>)
(<a
href="259674c1b3">259674c</a>)</li>
<li>change fail on semantics (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/124">#124</a>)
(<a
href="e1046240c0">e104624</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="11792f827d"><code>11792f8</code></a>
chore(main): release 1.9.0 (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/153">#153</a>)</li>
<li><a
href="dfb2b92839"><code>dfb2b92</code></a>
chore(build): Update dist</li>
<li><a
href="90deb1bcc4"><code>90deb1b</code></a>
feat: check org-level enable_ai_commit_messages field (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/152">#152</a>)</li>
<li><a
href="d2c9de2be1"><code>d2c9de2</code></a>
chore: add User-Agent header (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/150">#150</a>)</li>
<li>See full diff in <a
href="979824f1ea...11792f827d">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=stainless-api/upload-openapi-spec-action&package-manager=github_actions&previous-version=1.8.1&new-version=1.9.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 20:06:05 -05:00
dependabot[bot]
f9ac055b88
chore(github-deps): bump medyagh/setup-minikube from 0.0.20 to 0.0.21 (#4422)
Bumps
[medyagh/setup-minikube](https://github.com/medyagh/setup-minikube) from
0.0.20 to 0.0.21.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/medyagh/setup-minikube/releases">medyagh/setup-minikube's
releases</a>.</em></p>
<blockquote>
<h2>v0.0.21</h2>
<h2>What's Changed</h2>
<ul>
<li>add support for none driver on arm64 by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/779">medyagh/setup-minikube#779</a></li>
<li>feat: add 'nodes' action input by <a
href="https://github.com/zachspar"><code>@​zachspar</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/712">medyagh/setup-minikube#712</a></li>
</ul>
<h2>Test/CI:</h2>
<ul>
<li>add vkfit test by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/739">medyagh/setup-minikube#739</a></li>
<li>ci: add concurrency settings to macos-test workflow by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/780">medyagh/setup-minikube#780</a></li>
<li>test: add dry-run tests for windows and macos by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/781">medyagh/setup-minikube#781</a></li>
<li>test: Upgrade Kubernetes version and simplify installation by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/762">medyagh/setup-minikube#762</a></li>
<li>split workflow &quot;build-test&quot; to &quot;build&quot; and
&quot;test&quot; by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/776">medyagh/setup-minikube#776</a></li>
<li>refactor: enhance test workflow with matrix strategy for multiple
sce… by <a href="https://github.com/medyagh"><code>@​medyagh</code></a>
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/777">medyagh/setup-minikube#777</a></li>
<li>add qemu test to github actions by <a
href="https://github.com/medyagh"><code>@​medyagh</code></a> in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/729">medyagh/setup-minikube#729</a></li>
</ul>
<h2>build</h2>
<ul>
<li>build(deps-dev): bump eslint-plugin-jest from 28.11.0 to 29.0.1 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/727">medyagh/setup-minikube#727</a></li>
<li>build(deps-dev): bump prettier from 3.5.3 to 3.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/725">medyagh/setup-minikube#725</a></li>
<li>build(deps-dev): bump eslint-plugin-github from 5.1.8 to 6.0.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/724">medyagh/setup-minikube#724</a></li>
<li>build(deps-dev): bump eslint from 9.26.0 to 9.31.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/728">medyagh/setup-minikube#728</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/eslint-plugin</code>
from 8.26.1 to 8.36.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/726">medyagh/setup-minikube#726</a></li>
<li>build(deps-dev): bump ts-jest from 29.2.6 to 29.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/730">medyagh/setup-minikube#730</a></li>
<li>build(deps-dev): bump eslint from 9.31.0 to 9.32.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/738">medyagh/setup-minikube#738</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 24.0.11 to
24.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/737">medyagh/setup-minikube#737</a></li>
<li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
8.37.0 to 8.38.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/736">medyagh/setup-minikube#736</a></li>
<li>build(deps-dev): bump jest-circus from 29.7.0 to 30.0.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/735">medyagh/setup-minikube#735</a></li>
<li>build(deps-dev): bump jest and <code>@​types/jest</code> by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/734">medyagh/setup-minikube#734</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 24.1.0 to
24.5.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/760">medyagh/setup-minikube#760</a></li>
<li>build(deps): bump actions/checkout from 4.2.2 to 6.0.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/775">medyagh/setup-minikube#775</a></li>
<li>build(deps): bump form-data by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/761">medyagh/setup-minikube#761</a></li>
<li>build(deps): bump actions/setup-node from 4.4.0 to 6.0.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/769">medyagh/setup-minikube#769</a></li>
<li>build(deps): bump glob from 10.4.5 to 10.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/774">medyagh/setup-minikube#774</a></li>
<li>build(deps): bump js-yaml by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/773">medyagh/setup-minikube#773</a></li>
<li>build(deps-dev): bump typescript from 5.8.3 to 5.9.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/766">medyagh/setup-minikube#766</a></li>
<li>build(deps-dev): bump eslint from 9.32.0 to 9.38.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/770">medyagh/setup-minikube#770</a></li>
<li>build(deps-dev): bump ts-jest from 29.4.0 to 29.4.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/768">medyagh/setup-minikube#768</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/zachspar"><code>@​zachspar</code></a>
made their first contribution in <a
href="https://redirect.github.com/medyagh/setup-minikube/pull/712">medyagh/setup-minikube#712</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/medyagh/setup-minikube/compare/v0...v0.0.21">https://github.com/medyagh/setup-minikube/compare/v0...v0.0.21</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e9e035a86b"><code>e9e035a</code></a>
Merge pull request <a
href="https://redirect.github.com/medyagh/setup-minikube/issues/781">#781</a>
from medyagh/add_windows_test</li>
<li><a
href="6d4f8d69da"><code>6d4f8d6</code></a>
fix: remove unnecessary --vm argument from download-only step in dry-run
work...</li>
<li><a
href="b0656d9c82"><code>b0656d9</code></a>
fix: ensure vfkit installation step runs only on macOS</li>
<li><a
href="0b40b9148a"><code>0b40b91</code></a>
feat: add installation step for vfkit and related tools in dry-run
workflow</li>
<li><a
href="6d08f649f9"><code>6d08f64</code></a>
fix: update Docker setup step to install CLI on macOS</li>
<li><a
href="93224f2cf3"><code>93224f2</code></a>
fix: adjust Docker setup condition to run on all OS types</li>
<li><a
href="24746887ce"><code>2474688</code></a>
fix: correct typo in dry-run workflow and adjust Docker setup
condition</li>
<li><a
href="eca7409306"><code>eca7409</code></a>
feat: update concurrency settings and refine OS matrix in dry-run
workflow</li>
<li><a
href="d0aca93add"><code>d0aca93</code></a>
feat: add Docker setup step to dry-run workflow</li>
<li><a
href="95b2fc43b9"><code>95b2fc4</code></a>
feat: add dry-run workflow for pull requests and scheduled runs</li>
<li>Additional commits viewable in <a
href="e3c7f79eb1...e9e035a86b">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=medyagh/setup-minikube&package-manager=github_actions&previous-version=0.0.20&new-version=0.0.21)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 20:05:38 -05:00
Charlie Doern
258c52c84c
feat: introduce /admin API for stack administration and operations (#4401)
# What does this PR do?

- Add new /admin API (v1alpha) for administrative operations including
provider management, health checks, version info, and route listing
- Implement using FastAPI routers following batches pattern with proper
request/response models
- Endpoints: /admin/providers, /admin/providers/{id},
/admin/inspect/routes, /admin/health, /admin/version
- Create admin module structure: models.py, api.py, fastapi_routes.py,
init.py
- Add AdminImpl in llama_stack/core combining provider and inspect
functionality
- Deprecate standalone /providers and /inspect APIs (remain functional
for backward compatibility)
- Consolidate duplicate types: ProviderInfo, HealthInfo, RouteInfo, etc.
now defined once in admin.models

## Test Plan

new admin integration suite, uses generated stainless SDK, and records
new tests on this PR.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-22 12:11:49 -05:00
Dennis Kennetz
d684ec91cc
fix: code was injecting run_config.vector_stores even when it was None. (#4423)
# What does this PR do?
Fixed issue where code was injecting `run_config.vector_stores` even
when it was `None`, which overrode the `default_factory` in
`RagToolRuntimeConfig`. Currently, _most_ providers don't have a default
implementation for vectors_stores:

 - nvidia
 - meta-reference-gpu
 - dell
 - oci
 - open-benchmark
 - postgres-demo
 - watsonx

The only ones which do are:
 - ci-tests
 - starter
 - starter-gpu

## Test Plan
Prior to the change, I could not start llama-stack with the oci
distribution:

```
Traceback (most recent call last):
  File "/home/opc/llama-stack/.venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
    parser.run(args)
  File "/home/opc/llama-stack/src/llama_stack/cli/llama.py", line 46, in run
    args.func(args)
  File "/home/opc/llama-stack/src/llama_stack/cli/stack/run.py", line 184, in _run_stack_run_cmd
    self._uvicorn_run(config_file, args)
  File "/home/opc/llama-stack/src/llama_stack/cli/stack/run.py", line 242, in _uvicorn_run
    uvicorn.run("llama_stack.core.server.server:create_app", **uvicorn_config)  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/main.py", line 580, in run
    server.run()
  File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/server.py", line 67, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/server.py", line 71, in serve
    await self._serve(sockets)
  File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/server.py", line 78, in _serve
    config.load()
  File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/config.py", line 442, in load
    self.loaded_app = self.loaded_app()
                      ^^^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/core/server/server.py", line 403, in create_app
    app = StackApp(
          ^^^^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/core/server/server.py", line 161, in __init__
    future.result()
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/core/stack.py", line 534, in initialize
    impls = await resolve_impls(
            ^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/core/resolver.py", line 180, in resolve_impls
    return await instantiate_providers(sorted_providers, router_apis, dist_registry, run_config, policy, internal_impls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/core/resolver.py", line 321, in instantiate_providers
    impl = await instantiate_provider(provider, deps, inner_impls, dist_registry, run_config, policy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/src/llama_stack/core/resolver.py", line 417, in instantiate_provider
    config = config_type(**provider_config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for RagToolRuntimeConfig
vector_stores_config
  Input should be a valid dictionary or instance of VectorStoresConfig [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/model_type
```

Afer tracing through and finding a simple solution to the change, I was
able to run the distribution again. I also executed the integration
tests for pytest:

```bash
OCI_COMPARTMENT_OCID="ocid1.compartment.oc1..xxx" OCI_REGION="us-chicago-1" OCI_AUTH_TYPE=instance_principal OCI_CLI_PROFILE=CHICAGO uv run pytest -sv tests/integration/inference/ --stack-config oci --text-model oci/meta.llama-3.3-70b-instruct --inference-mode live
```
2025-12-21 23:48:16 -08:00
Derek Higgins
b6043bd53b
fix: Remove unused TELEMETRY_SINKS and add OTEL_EXPORTER_OTLP_PROTOCOL (#4406)
Changes:
  o Remove TELEMETRY_SINKS environment variable from scripts (unused)
  o Replace with OTEL_EXPORTER_OTLP_PROTOCOL in install scripts

The TELEMETRY_SINKS variable is no longer use by Python code and has
been replaced with the standard OpenTelemetry environment variable
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
2025-12-19 15:56:22 -08:00
Sumanth Kamenani
bd35aa4d78
feat: enable streaming usage metrics for OpenAI-compatible providers (#4326)
Inject `stream_options={"include_usage": True} `when streaming and
OpenTelemetry telemetry is active. Telemetry always overrides any caller
preference to ensure complete and consistent observability metrics.

Changes:
- Add conditional stream_options injection to OpenAIMixin (benefits
OpenAI, Bedrock, Runpod, Together, Fireworks providers)
- Add conditional stream_options injection to LiteLLMOpenAIMixin
(benefits WatsonX and other litellm-based providers)
- Check telemetry status using trace.get_current_span().is_recording()
- Override include_usage=False when telemetry active to prevent metric
gaps
- Unit tests for this functionality

Fixes #3981

Note: this work originated in PR #4200, which I closed after rebasing on
the telemetry changes. This PR rebases those commits, incorporates the
Bedrock feedback, and carries forward the same scope described there.
## Test Plan
#### OpenAIMixin + telemetry injection tests 
PYTHONPATH=src python -m pytest
tests/unit/providers/utils/inference/test_openai_mixin.py

#### LiteLLM OpenAIMixin tests
PYTHONPATH=src python -m pytest
tests/unit/providers/inference/test_litellm_openai_mixin.py -v

#### Broader inference provider
PYTHONPATH=src python -m pytest tests/unit/providers/inference/
--ignore=tests/unit/providers/inference/test_inference_client_caching.py
-v
2025-12-19 15:53:53 -08:00
Derek Higgins
5ebcde3042
fix(scoring): remove broken dataset validation in score_batch methods (#4420)
The Dataset model no longer has a dataset_schema attribute it was remove
during a refactor (5287b437a) so this validation can no longer run.

Changes:
o basic scoring: removed validate_dataset_schema call and related
imports o llm_as_judge scoring: removed validate_dataset_schema call and
related imports o braintrust scoring: removed validate_dataset_schema
call and related imports

Validation is no longer needed at the dataset level since: o Dataset
model changed from having dataset_schema to purpose/source fields o
Scoring functions validate required fields when processing rows o
Invalid data will fail naturally with clear error messages

Fixes: #4419

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-19 15:52:52 -08:00
Charlie Doern
e710622d4c
fix: run all clients on stainless SDK, fix workflow, properly commit recordings (#4410)
# What does this PR do?

Various fixes to integration test recording + stainless calling of
integration tests:

1. only the library client was being run, they all should be
2. the git check grabs diffs like:

M tests/integration/client-typescript/package-lock.json
 M tests/integration/client-typescript/package.json

it should not

additionally:

Fixes rebase conflicts when stainless workflow runs integration tests
with
  record-if-missing mode on PRs. Previously, the workflow would:
  1. Commit all files in tests/integration/ (including non-recordings)
  2. Try to rebase and push to 'main' instead of the PR branch
  3. Fail with merge conflicts on PR-specific changes

  Changes:
- Add pr_head_ref and is_fork_pr parameters flowing through workflow
chain
- Use target-branch input instead of github.ref_name in recording
commits
- Detect and handle fork PRs by skipping push and uploading recordings
as artifacts
  - Add 7-day artifact retention for fork PR recordings
  - Support both workflow_call and direct pull_request trigger contexts

For same-repo PRs: recordings now commit/push to the PR branch correctly
For fork PRs: recordings upload as downloadable artifacts with
instructions


you can see a failing workflow:
5846590613
with the rebase issues.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-18 15:24:09 -05:00
Charlie Doern
5d52cb28c2
ci: record-if-missing when coming from stainless (#4408)
# What does this PR do?

we will typically need to record the missing json for net new APIs. use
record-if-missing so that the integration tests can re-record and commit
the files to the PR

set the stainless inference mode to record-if-missing, and properly pass
the pr_head_sha on workflow_call.

## Test Plan

see 2031824567
which uses this commit.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-18 09:40:14 -08:00
Francisco Javier Arceo
2d149e3d2d
feat: Enhance Vector Stores config with full configurations (#4397)
# What does this PR do?

Enhances the Vector Stores config with full set of appropriate
configurations
- Add FileIngestionParams, ChunkRetrievalParams, and FileBatchParams
subconfigs
- Update RAG memory, OpenAI vector store mixin, and vector store utils
to use configuration
  - Fix import organization across vector store components
  - Add comprehensive vector stores configuration documentation
  - Update docs navigation to include vector store configuration guide
- Delete `memory/constants.py` and move constant values directly into
Pydantic models

## Test Plan
Tests updated + CI

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-12-17 16:56:46 -05:00
Sébastien Han
a7d509aaf9
feat: migrate Inspect API to FastAPI router (#4403)
# What does this PR do?

Migrate the Inspect API to the FastAPI router pattern.

Changes:
- Add inspect API to FastAPI router registry
- Add PUBLIC_ROUTE_KEY support for routes that don't require auth
- Update WebMethod creation to respect route's openapi_extra for
authentication requirements

Fixes: https://github.com/llamastack/llama-stack/issues/4346

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

CI and various curls on /v1/inspect/routes, /v1/health, /v1/version

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-17 17:33:42 +01:00
Sébastien Han
cd5095a247
feat: migrate Providers API to FastAPI router pattern (#4405)
# What does this PR do?

Convert Providers API from @webmethod decorators to FastAPI router
pattern.

Fixes: https://github.com/llamastack/llama-stack/issues/4350

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
CI

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-17 16:55:05 +01:00
Matt Leader
722d9c53e7
fix(server): add middleware for provider data and test context (#4367)
# What does this PR do?
Consolidates provider data context handling into middleware, eliminating
duplication between FastAPI router routes and legacy @webmethod routes.

Closes #4366 

## Test Plan

Added unit test suite `test_test_context_middleware`, specifically
`test_middleware_extracts_test_id_from_header` to validate the expected
behavior.
```
❯ ./scripts/unit-tests.sh tests/unit/
```

Integration of the middleware test context with the `files` FastAPI
router migration from
[pull/4339](https://github.com/llamastack/llama-stack/pull/4339).
```
❯ git switch migrate-files-api
Switched to branch 'migrate-files-api'
❯ git rebase fix-test-ctx-middleware
Successfully rebased and updated refs/heads/migrate-files-api.
❯ ./scripts/integration-tests.sh --inference-mode replay --suite base --setup ollama --stack-config server:starter --subdirs files
```

Signed-off-by: Matthew F Leader <mleader@redhat.com>
2025-12-16 15:00:48 -05:00
Derek Higgins
5abb7df41a
fix: ABAC bypass in vector store operations (#4394)
Vector store operations were bypassing ABAC checks by calling providers
directly instead of going through the routing table. This allowed
unauthorized access to vector store data and operations.

Changes:
o Route all VectorIORouter methods through routing table instead of
  directly to providers
o Update routing table to enforce ABAC checks on all vector store
  operations (read, update, delete)
o Add test suite verifying ABAC enforcement for all vector store
  operations
o Ensure providers are never called when authorization fails

Fixes security issue where users could access vector stores they don't
have permission for.

Fixes: #4393

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-16 10:49:16 -08:00
Anastas Stoyanovsky
401d3b8ce6
docs: Update pre-commit version in CONTRIBUTING.md (#4399)
# What does this PR do?
Update pre-commit installation command to use version 4.4.0 or greater,
as is done in CI.

## Test Plan
n/a
2025-12-16 10:47:57 -08:00
Charlie Doern
66f3cf4002
feat: wire Stainless preview SDK into integration tests (#4360)
# What does this PR do?

Enable stainless-builds workflow to test preview SDKs by calling
integration-tests workflow with python_url parameter. Add stainless
matrix config for faster CI runs on SDK changes.

  - Make integration-tests.yml reusable with workflow_call inputs
  - Thread python_url through test setup actions to install preview SDK
- Add matrix_key parameter to generate_ci_matrix.py for custom matrices
- Update stainless-builds.yml to call integration tests with preview URL

This allows us to test a client on the PR introducing the new changes
before merging. Contributors can even write new tests using the
generated client which should pass on the PR, indicating that they will
pass on main upon merge

## Test Plan

see triggered action using the workflows on this branch:
5810594042
which installs the stainless SDK from the given url.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-16 09:20:40 -08:00
Charlie Doern
12116467f5
fix: remove run config from logs (#4395)
# What does this PR do?
since run.yaml is gone, update logs to say "stack config" or "stack
configuration" rather than run

## Test Plan
check logs

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-16 10:37:18 -05:00
Sébastien Han
700663028f
feat: convert Datasets API to use FastAPI router (#4359)
# What does this PR do?

Convert the Datasets API from webmethod decorators to FastAPI router
pattern.

Fixes: https://github.com/llamastack/llama-stack/issues/4344

## Test Plan
CI

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-15 11:23:04 -08:00
Jaideep Rao
56f946f3f5
feat: add support for tool_choice to responses api (#4106)
# What does this PR do?
Adds support for enforcing tool usage via responses api. See
https://platform.openai.com/docs/api-reference/responses/create#responses_create-tool_choice
for details from official documentation.
Note: at present this PR only supports `file_search` and `web_search` as
options to enforce builtin tool usage

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3548 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
`./scripts/unit-tests.sh
tests/unit/providers/agents/meta_reference/test_response_tool_context.py
`

---------

Signed-off-by: Jaideep Rao <jrao@redhat.com>
2025-12-15 11:22:06 -08:00
Francisco Javier Arceo
62005dc1a9
feat: Making static prompt values in Rag/File Search configurable in Vector Store Config (#4368)
# What does this PR do?

- Enables users to configure prompts used throughout the File Search /
Vector Retrieval
- Configuration is defined in the Vector Stores Config so they can be
modified at runtime
- Backwards compatible, which means the fields are optional and default
to the previously used values

This is the summary of the new options in the `run.yaml`
```yaml
vector_stores:
  file_search_params:
    header_template: 'knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n'
    footer_template: 'END of knowledge_search tool results.\n'
  context_prompt_params:
    chunk_annotation_template: 'Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n'
    context_template: 'The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query.{annotation_instruction}\n'
  annotation_prompt_params:
    enable_annotations: true
    annotation_instruction_template: 'Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like \'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.\'. Do not add
extra punctuation. Use only the file IDs provided, do not invent new ones.'
    chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n'
```

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Added tests.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-12-15 11:39:01 -05:00
asimurka
4043dedeea
fix: correctly unwrap provider data api_key from secret string (#4380)
# What does this PR do?
Fix provider header API key handling by correctly unwrapping `SecretStr`
values for provider data API keys. Previously the validator cast header
keys to `SecretStr` but the value wasn’t unwrapped before use, causing
authentication failures with providers like Azure.

Closes  https://github.com/llamastack/llama-stack/issues/4370
2025-12-15 11:21:20 -05:00
Costa Shulyupin
2b85600a7e
docs: make inference model configurable (#4385)
Allow users to specify the inference model through the INFERENCE_MODEL
environment variable instead of hardcoding it, with fallback to
ollama/llama3.2:3b if not set.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
2025-12-15 11:02:28 +01:00
dependabot[bot]
62f7818051
chore(github-deps): bump astral-sh/setup-uv from 7.1.4 to 7.1.6 (#4386)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
7.1.4 to 7.1.6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.6 🌈 add OS version to cache key to prevent binary
incompatibility</h2>
<h2>Changes</h2>
<p>This release will invalidate your cache existing keys!</p>
<p>The os version e.g. <code>ubuntu-22.04</code> is now part of the
cache key. This prevents failing builds when a cache got populated with
wheels built with different tools (e.g. glibc) than are present on the
runner where the cache got restored.</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>feat: add OS version to cache key to prevent binary incompatibility
<a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/716">#716</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.17 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/714">#714</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump actions/checkout from 5.0.0 to 6.0.1 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/712">#712</a>)</li>
<li>Bump actions/setup-node from 6.0.0 to 6.1.0 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/715">#715</a>)</li>
</ul>
<h2>v7.1.5 🌈 allow setting <code>cache-local-path</code> without
<code>enable-cache: true</code></h2>
<h2>Changes</h2>
<p><a
href="https://redirect.github.com/astral-sh/setup-uv/pull/612">astral-sh/setup-uv#612</a>
fixed a faulty behavior where this action set <code>UV_CACHE_DIR</code>
even though <code>enable-cache</code> was <code>false</code>. It also
fixed the cases were the cache dir is already configured in a settings
file like <code>pyproject.toml</code> or <code>UV_CACHE_DIR</code> was
already set. Here the action shouldn't overwrite or set
<code>UV_CACHE_DIR</code>.</p>
<p>These fixes introduced an unwanted behavior: You can still set
<code>cache-local-path</code> but this action didn't do anything. This
release fixes that.</p>
<p>You can now use <code>cache-local-path</code> to automatically set
<code>UV_CACHE_DIR</code> even when <code>enable-cache</code> is
<code>false</code> (or gets set to false by default e.g. on self-hosted
runners)</p>
<pre lang="yaml"><code>- name: This is now possible
  uses: astral-sh/setup-uv@v7
  with:
    enable-cache: false
    cache-local-path: &quot;/path/to/cache&quot;
</code></pre>
<h2>🐛 Bug fixes</h2>
<ul>
<li>allow cache-local-path w/o enable-cache <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/707">#707</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>set biome files.maxSize to 2MiB <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/708">#708</a>)</li>
<li>chore: update known checksums for 0.9.16 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/706">#706</a>)</li>
<li>chore: update known checksums for 0.9.15 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/704">#704</a>)</li>
<li>chore: use <code>npm ci --ignore-scripts</code> everywhere <a
href="https://github.com/woodruffw"><code>@​woodruffw</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/699">#699</a>)</li>
<li>chore: update known checksums for 0.9.14 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/700">#700</a>)</li>
<li>chore: update known checksums for 0.9.13 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/694">#694</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="681c641aba"><code>681c641</code></a>
Bump actions/checkout from 5.0.0 to 6.0.1 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/712">#712</a>)</li>
<li><a
href="2e85713bb0"><code>2e85713</code></a>
Bump actions/setup-node from 6.0.0 to 6.1.0 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/715">#715</a>)</li>
<li><a
href="58b6d7b303"><code>58b6d7b</code></a>
fix: add OS version to cache key to prevent binary incompatibility (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/716">#716</a>)</li>
<li><a
href="e8b52af86e"><code>e8b52af</code></a>
chore: update known checksums for 0.9.17 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/714">#714</a>)</li>
<li><a
href="ed21f2f24f"><code>ed21f2f</code></a>
Bump peter-evans/create-pull-request from 7.0.8 to 7.0.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/695">#695</a>)</li>
<li><a
href="93202d8fbe"><code>93202d8</code></a>
bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/709">#709</a>)</li>
<li><a
href="5ce090076d"><code>5ce0900</code></a>
set biome files.maxSize to 2MiB (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/708">#708</a>)</li>
<li><a
href="4180991cd9"><code>4180991</code></a>
allow cache-local-path w/o enable-cache (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/707">#707</a>)</li>
<li><a
href="0439606c8e"><code>0439606</code></a>
Bump github/codeql-action from 4.30.9 to 4.31.6 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/698">#698</a>)</li>
<li><a
href="7dd56c18e9"><code>7dd56c1</code></a>
chore: update known checksums for 0.9.16 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/706">#706</a>)</li>
<li>Additional commits viewable in <a
href="1e862dfacb...681c641aba">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=7.1.4&new-version=7.1.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-15 09:17:30 +01:00
dependabot[bot]
9b346625bc
chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.7.1 to 1.8.1 (#4387)
Bumps
[stainless-api/upload-openapi-spec-action](https://github.com/stainless-api/upload-openapi-spec-action)
from 1.7.1 to 1.8.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/releases">stainless-api/upload-openapi-spec-action's
releases</a>.</em></p>
<blockquote>
<h2>v1.8.1</h2>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.8.0...v1.8.1">1.8.1</a>
(2025-12-09)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>re-enable 'targets' param in diagnostics call (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/148">#148</a>)
(<a
href="3130e17c92">3130e17</a>)</li>
</ul>
<h2>v1.8.0</h2>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.1...v1.8.0">1.8.0</a>
(2025-12-08)</h2>
<h3>Features</h3>
<ul>
<li>support AI commit message generation for preview builds (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/143">#143</a>)
(<a
href="7010edb389">7010edb</a>)</li>
<li>support per-SDK commit messages in preview comments (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/142">#142</a>)
(<a
href="a36c33fc21">a36c33f</a>)</li>
<li>Update to latest <code>@​stainless-api/sdk</code> (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/144">#144</a>)
(<a
href="a9b388bded">a9b388b</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/blob/main/CHANGELOG.md">stainless-api/upload-openapi-spec-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.8.0...v1.8.1">1.8.1</a>
(2025-12-09)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>re-enable 'targets' param in diagnostics call (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/148">#148</a>)
(<a
href="3130e17c92">3130e17</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.1...v1.8.0">1.8.0</a>
(2025-12-08)</h2>
<h3>Features</h3>
<ul>
<li>support AI commit message generation for preview builds (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/143">#143</a>)
(<a
href="7010edb389">7010edb</a>)</li>
<li>support per-SDK commit messages in preview comments (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/142">#142</a>)
(<a
href="a36c33fc21">a36c33f</a>)</li>
<li>Update to latest <code>@​stainless-api/sdk</code> (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/144">#144</a>)
(<a
href="a9b388bded">a9b388b</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.0...v1.7.1">1.7.1</a>
(2025-12-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>improve getMergeBase to handle shallow clones more robustly (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/138">#138</a>)
(<a
href="3687845465">3687845</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.6.0...v1.7.0">1.7.0</a>
(2025-11-17)</h2>
<h3>Features</h3>
<ul>
<li><strong>preview:</strong> add output documented_spec_path to preview
action (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/135">#135</a>)
(<a
href="5e80cc40da">5e80cc4</a>)</li>
<li><strong>preview:</strong> add output_dir input and write documented
spec to file (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/137">#137</a>)
(<a
href="d30490c89b">d30490c</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.5...v1.6.0">1.6.0</a>
(2025-10-30)</h2>
<h3>Features</h3>
<ul>
<li>add support for github OIDC auth (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/133">#133</a>)
(<a
href="259674c1b3">259674c</a>)</li>
<li>change fail on semantics (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/124">#124</a>)
(<a
href="e1046240c0">e104624</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>accept multiline conventional commits (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/129">#129</a>)
(<a
href="d2dcc0b3bf">d2dcc0b</a>)</li>
<li>tweak categorizeOutcomes (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/132">#132</a>)
(<a
href="c45d6a9c79">c45d6a9</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.4...v1.5.5">1.5.5</a>
(2025-09-26)</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="979824f1ea"><code>979824f</code></a>
chore(main): release 1.8.1 (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/149">#149</a>)</li>
<li><a
href="3130e17c92"><code>3130e17</code></a>
fix: re-enable 'targets' param in diagnostics call (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/148">#148</a>)</li>
<li><a
href="44e2d2a112"><code>44e2d2a</code></a>
chore(main): release 1.8.0 (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/145">#145</a>)</li>
<li><a
href="7010edb389"><code>7010edb</code></a>
feat: support AI commit message generation for preview builds (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/143">#143</a>)</li>
<li><a
href="a36c33fc21"><code>a36c33f</code></a>
feat: support per-SDK commit messages in preview comments (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/142">#142</a>)</li>
<li><a
href="06c5fd328b"><code>06c5fd3</code></a>
chore(build): Update dist</li>
<li><a
href="a9b388bded"><code>a9b388b</code></a>
feat: Update to latest <code>@​stainless-api/sdk</code> (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/144">#144</a>)</li>
<li>See full diff in <a
href="a4d631c1e9...979824f1ea">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=stainless-api/upload-openapi-spec-action&package-manager=github_actions&previous-version=1.7.1&new-version=1.8.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-15 09:17:13 +01:00
dependabot[bot]
f4df1a66e0
chore(github-deps): bump actions/upload-artifact from 5.0.0 to 6.0.0 (#4388)
Bumps
[actions/upload-artifact](https://github.com/actions/upload-artifact)
from 5.0.0 to 6.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>v6 - What's new</h2>
<blockquote>
<p>[!IMPORTANT]
actions/upload-artifact@v6 now runs on Node.js 24 (<code>runs.using:
node24</code>) and requires a minimum Actions Runner version of 2.327.1.
If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<h3>Node.js 24</h3>
<p>This release updates the runtime to Node.js 24. v5 had preliminary
support for Node.js 24, however this action was by default still running
on Node.js 20. Now this action by default will run on Node.js 24.</p>
<h2>What's Changed</h2>
<ul>
<li>Upload Artifact Node 24 support by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/719">actions/upload-artifact#719</a></li>
<li>fix: update <code>@​actions/artifact</code> for Node.js 24 punycode
deprecation by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/744">actions/upload-artifact#744</a></li>
<li>prepare release v6.0.0 for Node.js 24 support by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/745">actions/upload-artifact#745</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0">https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b7c566a772"><code>b7c566a</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/745">#745</a>
from actions/upload-artifact-v6-release</li>
<li><a
href="e516bc8500"><code>e516bc8</code></a>
docs: correct description of Node.js 24 support in README</li>
<li><a
href="ddc45ed9bc"><code>ddc45ed</code></a>
docs: update README to correct action name for Node.js 24 support</li>
<li><a
href="615b319bd2"><code>615b319</code></a>
chore: release v6.0.0 for Node.js 24 support</li>
<li><a
href="017748b48f"><code>017748b</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/744">#744</a>
from actions/fix-storage-blob</li>
<li><a
href="38d4c7997f"><code>38d4c79</code></a>
chore: rebuild dist</li>
<li><a
href="7d27270e0c"><code>7d27270</code></a>
chore: add missing license cache files for <code>@​actions/core</code>,
<code>@​actions/io</code>, and mi...</li>
<li><a
href="5f643d3c94"><code>5f643d3</code></a>
chore: update license files for <code>@​actions/artifact</code><a
href="https://github.com/5"><code>@​5</code></a>.0.1 dependencies</li>
<li><a
href="1df1684032"><code>1df1684</code></a>
chore: update package-lock.json with <code>@​actions/artifact</code><a
href="https://github.com/5"><code>@​5</code></a>.0.1</li>
<li><a
href="b5b1a91840"><code>b5b1a91</code></a>
fix: update <code>@​actions/artifact</code> to ^5.0.0 for Node.js 24
punycode fix</li>
<li>Additional commits viewable in <a
href="330a01c490...b7c566a772">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=5.0.0&new-version=6.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-15 09:16:34 +01:00
dependabot[bot]
6efe0a2939
chore(github-deps): bump actions/cache from 4.3.0 to 5.0.1 (#4389)
Bumps [actions/cache](https://github.com/actions/cache) from 4.3.0 to
5.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/releases">actions/cache's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.1</h2>
<blockquote>
<p>[!IMPORTANT]
<strong><code>actions/cache@v5</code> runs on the Node.js 24 runtime and
requires a minimum Actions Runner version of
<code>2.327.1</code>.</strong></p>
<p>If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<hr />
<h1>v5.0.1</h1>
<h2>What's Changed</h2>
<ul>
<li>fix: update <code>@​actions/cache</code> for Node.js 24 punycode
deprecation by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1685">actions/cache#1685</a></li>
<li>prepare release v5.0.1 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1686">actions/cache#1686</a></li>
</ul>
<h1>v5.0.0</h1>
<h2>What's Changed</h2>
<ul>
<li>Upgrade to use node24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1630">actions/cache#1630</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1684">actions/cache#1684</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v5...v5.0.1">https://github.com/actions/cache/compare/v5...v5.0.1</a></p>
<h2>v5.0.0</h2>
<blockquote>
<p>[!IMPORTANT]
<strong><code>actions/cache@v5</code> runs on the Node.js 24 runtime and
requires a minimum Actions Runner version of
<code>2.327.1</code>.</strong></p>
<p>If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<hr />
<h2>What's Changed</h2>
<ul>
<li>Upgrade to use node24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1630">actions/cache#1630</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1684">actions/cache#1684</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v4.3.0...v5.0.0">https://github.com/actions/cache/compare/v4.3.0...v5.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's
changelog</a>.</em></p>
<blockquote>
<h1>Releases</h1>
<h2>Changelog</h2>
<h3>5.0.1</h3>
<ul>
<li>Update <code>@azure/storage-blob</code> to <code>^12.29.1</code> via
<code>@actions/cache@5.0.1</code> <a
href="https://redirect.github.com/actions/cache/pull/1685">#1685</a></li>
</ul>
<h3>5.0.0</h3>
<blockquote>
<p>[!IMPORTANT]
<code>actions/cache@v5</code> runs on the Node.js 24 runtime and
requires a minimum Actions Runner version of <code>2.327.1</code>.
If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<h3>4.3.0</h3>
<ul>
<li>Bump <code>@actions/cache</code> to <a
href="https://redirect.github.com/actions/toolkit/pull/2132">v4.1.0</a></li>
</ul>
<h3>4.2.4</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.5</li>
</ul>
<h3>4.2.3</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.3 (obfuscates SAS token in
debug logs for cache entries)</li>
</ul>
<h3>4.2.2</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.2</li>
</ul>
<h3>4.2.1</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.1</li>
</ul>
<h3>4.2.0</h3>
<p>TLDR; The cache backend service has been rewritten from the ground up
for improved performance and reliability. <a
href="https://github.com/actions/cache">actions/cache</a> now integrates
with the new cache service (v2) APIs.</p>
<p>The new service will gradually roll out as of <strong>February 1st,
2025</strong>. The legacy service will also be sunset on the same date.
Changes in these release are <strong>fully backward
compatible</strong>.</p>
<p><strong>We are deprecating some versions of this action</strong>. We
recommend upgrading to version <code>v4</code> or <code>v3</code> as
soon as possible before <strong>February 1st, 2025.</strong> (Upgrade
instructions below).</p>
<p>If you are using pinned SHAs, please use the SHAs of versions
<code>v4.2.0</code> or <code>v3.4.0</code></p>
<p>If you do not upgrade, all workflow runs using any of the deprecated
<a href="https://github.com/actions/cache">actions/cache</a> will
fail.</p>
<p>Upgrading to the recommended versions will not break your
workflows.</p>
<h3>4.1.2</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9255dc7a25"><code>9255dc7</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1686">#1686</a>
from actions/cache-v5.0.1-release</li>
<li><a
href="8ff5423e8b"><code>8ff5423</code></a>
chore: release v5.0.1</li>
<li><a
href="9233019a15"><code>9233019</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1685">#1685</a>
from salmanmkc/node24-storage-blob-fix</li>
<li><a
href="b975f2bb84"><code>b975f2b</code></a>
fix: add peer property to package-lock.json for dependencies</li>
<li><a
href="d0a0e18134"><code>d0a0e18</code></a>
fix: update license files for <code>@​actions/cache</code>,
fast-xml-parser, and strnum</li>
<li><a
href="74de208dcf"><code>74de208</code></a>
fix: update <code>@​actions/cache</code> to ^5.0.1 for Node.js 24
punycode fix</li>
<li><a
href="ac7f1152ea"><code>ac7f115</code></a>
peer</li>
<li><a
href="b0f846b50b"><code>b0f846b</code></a>
fix: update <code>@​actions/cache</code> with storage-blob fix for
Node.js 24 punycode depr...</li>
<li><a
href="a783357455"><code>a783357</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1684">#1684</a>
from actions/prepare-cache-v5-release</li>
<li><a
href="3bb0d78750"><code>3bb0d78</code></a>
docs: highlight v5 runner requirement in releases</li>
<li>Additional commits viewable in <a
href="0057852bfa...9255dc7a25">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/cache&package-manager=github_actions&previous-version=4.3.0&new-version=5.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-15 09:16:19 +01:00
Roy Belio
c574db5f1d
fix(inference): AttributeError in streaming response cleanup (#4236)
This PR fixes issue #3185 
The code calls `await event_gen.aclose()` but OpenAI's `AsyncStream`
doesn't have an `aclose()` method - it has `close()` (which is async).
when clients cancel streaming requests, the server tries to clean up
with:

```python
await event_gen.aclose()  #  AsyncStream doesn't have aclose()!
```

But `AsyncStream` has never had a public `aclose()` method. The error
message literally tells us:

```
AttributeError: 'AsyncStream' object has no attribute 'aclose'. Did you mean: 'close'?
                                                                            ^^^^^^^^
```

## Verification
* Reproduction script
[`reproduce_issue_3185.sh`](https://gist.github.com/r-bit-rry/dea4f8fbb81c446f5db50ea7abd6379b)
can be used to verify the fix.
*   Manual checks, validation against original OpenAI library code
2025-12-14 07:51:09 -05:00
Omar Abdelwahab
dfb9f6743a
docs: Adding initial updates to the RAG documentation and examples (#4377)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Python Package Build Test / build (3.12) (push) Successful in 18s
Python Package Build Test / build (3.13) (push) Successful in 22s
Test External API and Providers / test-external (venv) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (push) Failing after 46s
UI Tests / ui-tests (22) (push) Successful in 1m23s
Unit Tests / unit-tests (3.12) (push) Failing after 1m48s
Unit Tests / unit-tests (3.13) (push) Failing after 1m50s
Pre-commit / pre-commit (22) (push) Successful in 3m31s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m20s
# What does this PR do?
This PR updates the RAG examples included in docs/quick_start.ipynb,
docs/getting_started/demo_script.py, rag.mdx and index.md to remove
references to the deprecated vector_io and vector_db APIs and to add
examples that use /v1/vector_stores with responses and completions.

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
2025-12-12 22:59:39 -05:00
Varsha
75ef052545
docs: Add details on model registration and refresh_models (#4383)
Document the refresh_models configuration option for remote providers
that use RemoteInferenceProviderConfig.

- Add "Automatic vs Explicit Model Registration" section to
resources.mdx
- Include examples for registering custom embedding models

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-12-12 22:41:28 -05:00
Robert Riley (OCI)
10c878d782
feat: added oci-s3 compatibility (#4374)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.12) (push) Successful in 16s
Python Package Build Test / build (3.13) (push) Successful in 17s
Test External API and Providers / test-external (venv) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (push) Failing after 50s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Unit Tests / unit-tests (3.12) (push) Failing after 1m39s
Unit Tests / unit-tests (3.13) (push) Failing after 1m43s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m47s
Pre-commit / pre-commit (22) (push) Successful in 3m42s
# What does this PR do?
The PR validates and allow access to OCI object-storage through the S3
compatibility API. Additional documentation for OCI is supplied, in
notebook form, as well.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Co-authored-by: raghotham <rsm@meta.com>
2025-12-11 15:13:55 -08:00
Shabana Baig
805abf573f
feat!: Implement include parameter specifically for adding logprobs in the output message (#4261)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Python Package Build Test / build (3.12) (push) Successful in 17s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
UI Tests / ui-tests (22) (push) Successful in 52s
Unit Tests / unit-tests (3.13) (push) Failing after 1m45s
Unit Tests / unit-tests (3.12) (push) Failing after 1m58s
Pre-commit / pre-commit (22) (push) Successful in 3m9s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m5s
# Problem
As an Application Developer, I want to use the include parameter with
the value message.output_text.logprobs, so that I can receive log
probabilities for output tokens to assess the model's confidence in its
response.

# What does this PR do?

- Updates the include parameter in various resource definitions
- Updates the inline provider to return logprobs when
"message.output_text.logprobs" is passed in the include parameter
- Converts the logprobs returned by the inference provider from chat
completion format to responses format

Closes #[4260](https://github.com/llamastack/llama-stack/issues/4260)

## Test Plan

- Created a script to explore OpenAI behavior:
https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/include.py
- Added integration tests and new recordings

---------

Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-12-11 11:11:21 -08:00
Jaideep Rao
76e47d811a
feat(api): add readonly connectors API (#4258)
# What does this PR do?
Adds a new API for connectors and MCP registry support along with
required types.
Does not include any implementation for it

<!-- If resolving an issue, uncomment and update the line below -->
Closes #4235 and #4061 (partially)

## Test Plan
no tests included

---------

Signed-off-by: Jaideep Rao <jrao@redhat.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
2025-12-11 10:19:55 -08:00
Sébastien Han
470fe55e87
fix(inference): respect table_name config in InferenceStore (#4371)
# What does this PR do?

The InferenceStore class was ignoring the table_name field from
InferenceStoreReference and always using the hardcoded value
"chat_completions". This meant that any custom table_name configured in
the run config (e.g., "inference_store" in run-with-postgres-store.yaml)
was silently ignored.

This change updates all SQL operations in InferenceStore to use
self.reference.table_name instead of the hardcoded string, ensuring the
configured table name is properly respected.

A new test has been added to verify that custom table names work
correctly for storing, retrieving, and listing chat completions.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

CI

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-11 14:50:23 +01:00
Charlie Doern
7308c8aef1
feat: add workflow_dispatch and self-trigger to stainless builds (#4361)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 17s
Test External API and Providers / test-external (venv) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (push) Failing after 48s
UI Tests / ui-tests (22) (push) Successful in 1m36s
Unit Tests / unit-tests (3.13) (push) Failing after 1m43s
Unit Tests / unit-tests (3.12) (push) Failing after 1m54s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m24s
Pre-commit / pre-commit (22) (push) Successful in 4m22s
# What does this PR do?

 Currently impossible to test workflow changes (pull_request_target uses
 base branch definition) or manually trigger SDK builds. This adds both
 capabilities.

  - Add workflow_dispatch with pr_number input for manual testing
  - Add workflow file to path triggers for automatic testing
  - Fetch PR details via gh CLI for manual runs
  - Update jobs to use computed PR data for both trigger types

## Test Plan

impossible to test until it merges unfortunately. I am doing this in a
smaller PR so that I can use it immediately in a follow up.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-10 12:48:27 -08:00
Francisco Javier Arceo
95b2948d11
feat: Add support for query rewrite in vector_store.search (#4171)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 20s
Test External API and Providers / test-external (venv) (push) Failing after 41s
Vector IO Integration Tests / test-matrix (push) Failing after 49s
UI Tests / ui-tests (22) (push) Successful in 51s
Unit Tests / unit-tests (3.13) (push) Failing after 1m27s
Unit Tests / unit-tests (3.12) (push) Failing after 1m45s
Pre-commit / pre-commit (22) (push) Failing after 2m30s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m22s
# What does this PR do?

Actualize query rewrite in search API, add
`default_query_expansion_model` and `query_expansion_prompt` in
`VectorStoresConfig`.

Makes `rewrite_query` parameter functional in vector store search.
  - `rewrite_query=false` (default): Use original query
- `rewrite_query=true`: Expand query via LLM, or fail gracefully if no
LLM available

Adds 4 parameters to`VectorStoresConfig`:
- `default_query_expansion_model`: LLM model for query expansion
(optional)
- `query_expansion_prompt`: Custom prompt template (optional, uses
built-in default)
- `query_expansion_max_tokens`: Configurable token limit (default: 100)
- `query_expansion_temperature`: Configurable temperature (default: 0.3)

Enabled `run.yaml`:
```yaml
  vector_stores:
    rewrite_query_params:
      model:
        provider_id: "ollama"
        model_id: "llama3.2:3b-instruct-fp16"
      # prompt defaults to built-in
      # max_tokens defaults to 100
      # temperature defaults to 0.3
```

  Fully customized `run.yaml`:
```yaml
  vector_stores:
    default_provider_id: faiss
    default_embedding_model:
      provider_id: sentence-transformers
      model_id: nomic-ai/nomic-embed-text-v1.5
    rewrite_query_params:
      model:
        provider_id: ollama
        model_id: llama3.2:3b-instruct-fp16
      prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}"
      max_tokens: 100
      temperature: 0.3
```

## Test Plan
Added test and recording

Example script as well:

```python
import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO

def gen_file(client, text: str=""):
    file_buffer = BytesIO(text.encode('utf-8'))
    file_buffer.name = "my_file.txt"

    uploaded_file = client.files.create(
        file=file_buffer,
        purpose="assistants"
    )
    return uploaded_file

async def test_query_rewriting():
    client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
    uploaded_file = gen_file(client, "banana banana apple")
    uploaded_file2 = gen_file(client, "orange orange kiwi")

    vs = client.vector_stores.create()
    xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
    xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
    response1 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="apple",
                max_num_results=3,
                rewrite_query=False
            )
    response2 = client.vector_stores.search(
                vector_store_id=vs.id,
                query="kiwi",
                max_num_results=3,
                rewrite_query=True,
            )

    print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
    print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")

    for f in [uploaded_file.id, uploaded_file2.id]:
        client.files.delete(file_id=f)
    client.vector_stores.delete(vector_store_id=vs.id)

if __name__ == "__main__":
    asyncio.run(test_query_rewriting())
```

And see the screen shot of the server logs showing it worked. 
<img width="1111" height="826" alt="Screenshot 2025-11-19 at 1 16 03 PM"
src="https://github.com/user-attachments/assets/2d188b44-1fef-4df5-b465-2d6728ca49ce"
/>

Notice the log:
```bash
 Query rewritten:
         'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'
```
So `kiwi` was expanded.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-12-10 10:06:19 -05:00
Sébastien Han
ff375f1abb
feat: convert Benchmarks API to use FastAPI router (#4309)
# What does this PR do?

Convert the Benchmarks API from @webmethod decorators to FastAPI router
pattern, matching the Batches API structure.

One notable change is the update of stack.py to handle request models in
register_resources().

Closes: #4308 

## Test Plan

CI and `curl http://localhost:8321/v1/inspect/routes | jq '.data[] |
select(.route | contains("benchmark"))'`

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-10 15:04:27 +01:00
Charlie Doern
661985e240
feat: remove usage of build yaml (#4192)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Failing after 3s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Failing after 3s
Test llama stack list-deps / list-deps (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Successful in 19s
Python Package Build Test / build (3.12) (push) Successful in 23s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / show-single-provider (push) Successful in 36s
Test llama stack list-deps / list-deps-from-config (push) Successful in 44s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
Test External API and Providers / test-external (venv) (push) Failing after 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m56s
UI Tests / ui-tests (22) (push) Successful in 2m2s
Unit Tests / unit-tests (3.13) (push) Failing after 2m35s
Pre-commit / pre-commit (22) (push) Successful in 3m16s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 3m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m30s
# What does this PR do?

the build.yaml is only used in the following ways:

1. list-deps
2. distribution code-gen

since `llama stack build` no longer exists, I found myself asking "why
do we need two different files for list-deps and run"?

Removing the BuildConfig and altering the usage of the
DistributionTemplate in llama stack list-deps is the first step in
removing the build yaml entirely.

Removing the BuildConfig and build.yaml cuts the files users need to
maintain in half, and allows us to focus on the stability of _just_ the
run.yaml

This PR removes the build.yaml, BuildConfig datatype, and its usage
throughout the codebase. Users are now expected to point to run.yaml
files when running list-deps, and our codebase automatically uses these
types now for things like `get_provider_registry`.

**Additionally, two renames: `StackRunConfig` -> `StackConfig` and
`run.yaml` -> `config.yaml`.**

The build.yaml made sense for when we were managing the build process
for the user and actually _producing_ a run.yaml _from_ the build.yaml,
but now that we are simply just getting the provider registry and
listing the deps, switching to config.yaml simplifies the scope here
greatly.

## Test Plan

existing list-deps usage should work in the tests.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-10 10:12:12 +01:00
Varsha
17e6912288
docs: Fix vector_store_create params (#4364)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 17s
Test External API and Providers / test-external (venv) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (push) Failing after 38s
UI Tests / ui-tests (22) (push) Successful in 44s
Unit Tests / unit-tests (3.12) (push) Failing after 1m30s
Unit Tests / unit-tests (3.13) (push) Failing after 1m29s
Pre-commit / pre-commit (22) (push) Successful in 2m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m38s
2025-12-09 19:48:43 -05:00
Francisco Javier Arceo
fcea9893a4
feat(UI): Adding Files API to Admin UI (#4319)
# What does this PR do?


## Files Admin Page

<img width="1919" height="1238" alt="Screenshot 2025-12-09 at 10 33
06 AM"
src="https://github.com/user-attachments/assets/3dd545f0-32bc-45be-af2b-1823800015f2"
/>

## Files Upload Modal
<img width="1919" height="1287" alt="Screenshot 2025-12-09 at 10 33
38 AM"
src="https://github.com/user-attachments/assets/776bb372-75d3-4ccd-b6b5-c9dfb3fcb350"
/>

## Files Detail
<img width="1918" height="1099" alt="Screenshot 2025-12-09 at 10 34
26 AM"
src="https://github.com/user-attachments/assets/f256dbf8-4047-4d79-923d-404161b05f36"
/>

Note, content preview has some handling for JSON, CSV, and PDF to enable
nicer rendering. Pure text rendering is trivial.

### Files Detail File Content Preview (TXT)
<img width="1918" height="1341" alt="Screenshot 2025-12-09 at 10 41
20 AM"
src="https://github.com/user-attachments/assets/4fa0ddb7-ffff-424b-b764-0bd4af6ed976"
/>

### Files Detail File Content Preview (JSON)
<img width="1909" height="1233" alt="Screenshot 2025-12-09 at 10 39
57 AM"
src="https://github.com/user-attachments/assets/b912f07a-2dff-483b-b73c-2f69dd0d87ad"
/>

### Files Detail File Content Preview (HTML)
<img width="1916" height="1348" alt="Screenshot 2025-12-09 at 10 40
27 AM"
src="https://github.com/user-attachments/assets/17ebec0a-8754-4552-977d-d3c44f7f6973"
/>

### Files Detail File Content Preview (CSV)
<img width="1919" height="1177" alt="Screenshot 2025-12-09 at 10 34
50 AM"
src="https://github.com/user-attachments/assets/20bd0755-1757-4a3a-99d2-fbd072f81f49"
/>

### Files Detail File Content Preview (PDF)
<img width="1917" height="1154" alt="Screenshot 2025-12-09 at 10 36
48 AM"
src="https://github.com/user-attachments/assets/2873e6fe-4da3-4cbd-941b-7d903270b749"
/>


Closes https://github.com/llamastack/llama-stack/issues/4144

## Test Plan
Added Tests

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-12-09 16:28:05 -05:00
Robert Riley (OCI)
6ad5fb5577
feat: Adding OCI Embeddings (#4300)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 30s
UI Tests / ui-tests (22) (push) Successful in 56s
Vector IO Integration Tests / test-matrix (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m44s
Unit Tests / unit-tests (3.12) (push) Failing after 1m48s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m17s
Pre-commit / pre-commit (22) (push) Successful in 3m22s
# What does this PR do?
Enabling usage of OCI embedding models.

## Test Plan
Testing embedding model:
`OCI_COMPARTMENT_OCID="" OCI_REGION="us-chicago-1"
OCI_AUTH_TYPE=config_file pytest -sv
tests/integration/inference/test_openai_embeddings.py --stack-config oci
--embedding-model oci/openai.text-embedding-3-small --inference-mode
live`

Testing chat model:
`OCI_COMPARTMENT_OCID="" OCI_REGION="us-chicago-1"
OCI_AUTH_TYPE=config_file pytest -sv tests/integration/inference/
--stack-config oci --text-model oci/openai.gpt-4.1-nano-2025-04-14
--inference-mode live`

Testing curl for embeddings:
`curl -X POST http://localhost:8321/v1/embeddings -H "Content-Type:
application/json" -d '{
  "model": "oci/openai.text-embedding-3-small",
  "input": ["First text", "Second text"],
  "encoding_format": "float"
}'`

`{"object":"list","data":[{"object":"embedding","embedding":[-0.017190756...0.025272394],"index":1}],"model":"oci/openai.text-embedding-3-small","usage":{"prompt_tokens":4,"total_tokens":4}}`

---------

Co-authored-by: Omar Abdelwahab <omaryashraf10@gmail.com>
2025-12-08 13:05:39 -08:00
Sébastien Han
d82a2cd6f8
fix: httpcore deadlock in CI by properly closing streaming responses (#4335)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.13) (push) Successful in 17s
Python Package Build Test / build (3.12) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 1m13s
Unit Tests / unit-tests (3.12) (push) Failing after 1m37s
Unit Tests / unit-tests (3.13) (push) Failing after 2m11s
Pre-commit / pre-commit (22) (push) Successful in 3m39s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m1s
# What does this PR do?

The test_conversation_error_handling test was timing out in CI with a
deadlock in httpcore's connection pool. The root cause was the preceding
test_conversation_multi_turn_and_streaming test, which broke out of the
streaming response iterator early without properly closing the
underlying HTTP connection.

When a streaming response iterator is abandoned mid-stream, the HTTP
connection remains in an incomplete state. Since the openai_client
fixture is session-scoped, subsequent tests reuse the same httpcore
connection pool. The dangling connection causes the pool's internal lock
to deadlock when the next test attempts to acquire a new connection.

The fix wraps the streaming response in a context manager, which ensures
the connection is properly closed when exiting the with block, even when
breaking out of the loop early. This is a best practice when working
with streaming HTTP responses that may not be fully consumed.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-08 16:38:46 +01:00
dependabot[bot]
20c11d8fd4
chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.7.0 to 1.7.1 (#4334)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 18s
Python Package Build Test / build (3.12) (push) Successful in 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Test llama stack list-deps / generate-matrix (push) Successful in 33s
Test Llama Stack Build / generate-matrix (push) Successful in 36s
Test llama stack list-deps / show-single-provider (push) Successful in 33s
Python Package Build Test / build (3.13) (push) Successful in 59s
Test llama stack list-deps / list-deps-from-config (push) Successful in 1m8s
Test Llama Stack Build / build-single-provider (push) Successful in 1m12s
Test External API and Providers / test-external (venv) (push) Failing after 1m9s
Vector IO Integration Tests / test-matrix (push) Failing after 1m24s
UI Tests / ui-tests (22) (push) Successful in 1m29s
Test Llama Stack Build / build (push) Successful in 1m0s
Test llama stack list-deps / list-deps (push) Failing after 1m23s
Unit Tests / unit-tests (3.13) (push) Failing after 2m42s
Unit Tests / unit-tests (3.12) (push) Failing after 2m51s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m47s
Pre-commit / pre-commit (22) (push) Successful in 3m55s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 4m43s
Bumps
[stainless-api/upload-openapi-spec-action](https://github.com/stainless-api/upload-openapi-spec-action)
from 1.7.0 to 1.7.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/releases">stainless-api/upload-openapi-spec-action's
releases</a>.</em></p>
<blockquote>
<h2>v1.7.1</h2>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.0...v1.7.1">1.7.1</a>
(2025-12-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>improve getMergeBase to handle shallow clones more robustly (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/138">#138</a>)
(<a
href="3687845465">3687845</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/blob/main/CHANGELOG.md">stainless-api/upload-openapi-spec-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.0...v1.7.1">1.7.1</a>
(2025-12-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>improve getMergeBase to handle shallow clones more robustly (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/138">#138</a>)
(<a
href="3687845465">3687845</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.6.0...v1.7.0">1.7.0</a>
(2025-11-17)</h2>
<h3>Features</h3>
<ul>
<li><strong>preview:</strong> add output documented_spec_path to preview
action (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/135">#135</a>)
(<a
href="5e80cc40da">5e80cc4</a>)</li>
<li><strong>preview:</strong> add output_dir input and write documented
spec to file (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/137">#137</a>)
(<a
href="d30490c89b">d30490c</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.5...v1.6.0">1.6.0</a>
(2025-10-30)</h2>
<h3>Features</h3>
<ul>
<li>add support for github OIDC auth (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/133">#133</a>)
(<a
href="259674c1b3">259674c</a>)</li>
<li>change fail on semantics (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/124">#124</a>)
(<a
href="e1046240c0">e104624</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>accept multiline conventional commits (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/129">#129</a>)
(<a
href="d2dcc0b3bf">d2dcc0b</a>)</li>
<li>tweak categorizeOutcomes (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/132">#132</a>)
(<a
href="c45d6a9c79">c45d6a9</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.4...v1.5.5">1.5.5</a>
(2025-09-26)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>rollback filtering diagnostics by target (<a
href="54328a386f">54328a3</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.3...v1.5.4">1.5.4</a>
(2025-09-25)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>check for latestRun before commenting (<a
href="53fef9f328">53fef9f</a>)</li>
<li>filter diagnostics by target (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/125">#125</a>)
(<a
href="102dc971cb">102dc97</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.2...v1.5.3">1.5.3</a>
(2025-09-16)</h2>
<h3>Bug Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a4d631c1e9"><code>a4d631c</code></a>
chore(main): release 1.7.1 (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/141">#141</a>)</li>
<li><a
href="56c2d869b3"><code>56c2d86</code></a>
chore: add structured logger (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/139">#139</a>)</li>
<li><a
href="3687845465"><code>3687845</code></a>
fix: improve getMergeBase to handle shallow clones more robustly (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/138">#138</a>)</li>
<li>See full diff in <a
href="9133735bca...a4d631c1e9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=stainless-api/upload-openapi-spec-action&package-manager=github_actions&previous-version=1.7.0&new-version=1.7.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 12:04:22 +01:00
dependabot[bot]
912ab6b4a2
chore(github-deps): bump actions/setup-node from 6.0.0 to 6.1.0 (#4333)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
6.0.0 to 6.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v6.1.0</h2>
<h2>What's Changed</h2>
<h3>Enhancement:</h3>
<ul>
<li>Remove always-auth configuration handling by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1436">actions/setup-node#1436</a></li>
</ul>
<h3>Dependency updates:</h3>
<ul>
<li>Upgrade <code>@​actions/cache</code> from 4.0.3 to 4.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1384">actions/setup-node#1384</a></li>
<li>Upgrade actions/checkout from 5 to 6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1439">actions/setup-node#1439</a></li>
<li>Upgrade js-yaml from 3.14.1 to 3.14.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1435">actions/setup-node#1435</a></li>
</ul>
<h3>Documentation update:</h3>
<ul>
<li>Add example for restore-only cache in documentation by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1419">actions/setup-node#1419</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v6...v6.1.0">https://github.com/actions/setup-node/compare/v6...v6.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="395ad32622"><code>395ad32</code></a>
Bump js-yaml from 3.14.1 to 3.14.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1435">#1435</a>)</li>
<li><a
href="a4d2e2bbca"><code>a4d2e2b</code></a>
Bump actions/checkout from 5 to 6 (<a
href="https://redirect.github.com/actions/setup-node/issues/1439">#1439</a>)</li>
<li><a
href="b9b25d45f7"><code>b9b25d4</code></a>
Remove always-auth configuration handling from action (<a
href="https://redirect.github.com/actions/setup-node/issues/1436">#1436</a>)</li>
<li><a
href="633bb92bc0"><code>633bb92</code></a>
Bump <code>@​actions/cache</code> from 4.0.3 to 4.1.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1384">#1384</a>)</li>
<li><a
href="dda4788290"><code>dda4788</code></a>
Add example for restore-only cache in documentation (<a
href="https://redirect.github.com/actions/setup-node/issues/1419">#1419</a>)</li>
<li>See full diff in <a
href="2028fbc5c2...395ad32622">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=6.0.0&new-version=6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 12:03:44 +01:00
dependabot[bot]
39d23d9894
chore(github-deps): bump actions/stale from 10.1.0 to 10.1.1 (#4332)
Bumps [actions/stale](https://github.com/actions/stale) from 10.1.0 to
10.1.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/stale/releases">actions/stale's
releases</a>.</em></p>
<blockquote>
<h2>v10.1.1</h2>
<h2>What's Changed</h2>
<h3>Bug Fix</h3>
<ul>
<li>Add Missing Input Reading for <code>only-issue-types</code> by <a
href="https://github.com/Bibo-Joshi"><code>@​Bibo-Joshi</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1298">actions/stale#1298</a></li>
</ul>
<h3>Improvement</h3>
<ul>
<li>Improves error handling when rate limiting is disabled on GHES. by
<a
href="https://github.com/chiranjib-swain"><code>@​chiranjib-swain</code></a>
in <a
href="https://redirect.github.com/actions/stale/pull/1300">actions/stale#1300</a></li>
</ul>
<h3>Dependency Upgrades</h3>
<ul>
<li>Upgrade eslint-config-prettier from 8.10.0 to 10.1.8 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1276">actions/stale#1276</a></li>
<li>Upgrade <code>@​types/node</code> from 20.10.3 to 24.2.0 and
document breaking changes in v10 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1280">actions/stale#1280</a></li>
<li>Upgrade actions/publish-action from 0.3.0 to 0.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1291">actions/stale#1291</a></li>
<li>Upgrade actions/checkout from 4 to 6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1306">actions/stale#1306</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/chiranjib-swain"><code>@​chiranjib-swain</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1300">actions/stale#1300</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/stale/compare/v10...v10.1.1">https://github.com/actions/stale/compare/v10...v10.1.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="997185467f"><code>9971854</code></a>
build(deps): bump actions/checkout from 4 to 6 (<a
href="https://redirect.github.com/actions/stale/issues/1306">#1306</a>)</li>
<li><a
href="5611b9defa"><code>5611b9d</code></a>
build(deps): bump actions/publish-action from 0.3.0 to 0.4.0 (<a
href="https://redirect.github.com/actions/stale/issues/1291">#1291</a>)</li>
<li><a
href="fad0de84e5"><code>fad0de8</code></a>
Improves error handling when rate limiting is disabled on GHES. (<a
href="https://redirect.github.com/actions/stale/issues/1300">#1300</a>)</li>
<li><a
href="39bea7de61"><code>39bea7d</code></a>
Add Missing Input Reading for <code>only-issue-types</code> (<a
href="https://redirect.github.com/actions/stale/issues/1298">#1298</a>)</li>
<li><a
href="e46bbabb3e"><code>e46bbab</code></a>
build(deps-dev): bump <code>@​types/node</code> from 20.10.3 to 24.2.0
and document breakin...</li>
<li><a
href="65d1d4804d"><code>65d1d48</code></a>
build(deps-dev): bump eslint-config-prettier from 8.10.0 to 10.1.8 (<a
href="https://redirect.github.com/actions/stale/issues/1276">#1276</a>)</li>
<li>See full diff in <a
href="5f858e3efb...997185467f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/stale&package-manager=github_actions&previous-version=10.1.0&new-version=10.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 11:56:42 +01:00
dependabot[bot]
8f585e4c7a
chore(github-deps): bump actions/checkout from 6.0.0 to 6.0.1 (#4331)
Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.0
to 6.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Update all references from v5 and v4 to v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2314">actions/checkout#2314</a></li>
<li>Add worktree support for persist-credentials includeIf by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2327">actions/checkout#2327</a></li>
<li>Clarify v6 README by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2328">actions/checkout#2328</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v6...v6.0.1">https://github.com/actions/checkout/compare/v6...v6.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8e8c483db8"><code>8e8c483</code></a>
Clarify v6 README (<a
href="https://redirect.github.com/actions/checkout/issues/2328">#2328</a>)</li>
<li><a
href="033fa0dc0b"><code>033fa0d</code></a>
Add worktree support for persist-credentials includeIf (<a
href="https://redirect.github.com/actions/checkout/issues/2327">#2327</a>)</li>
<li><a
href="c2d88d3ecc"><code>c2d88d3</code></a>
Update all references from v5 and v4 to v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2314">#2314</a>)</li>
<li>See full diff in <a
href="1af3b93b68...8e8c483db8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=6.0.0&new-version=6.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 11:56:25 +01:00
Varsha
3ca0481e43
fix(ui): Fix model dropdown not displaying models in chat playground (#4329)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (push) Failing after 34s
UI Tests / ui-tests (22) (push) Successful in 41s
Unit Tests / unit-tests (3.13) (push) Failing after 1m18s
Unit Tests / unit-tests (3.12) (push) Failing after 1m26s
Pre-commit / pre-commit (22) (push) Successful in 2m53s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m8s
2025-12-05 16:54:12 -05:00
Derek Higgins
8998000aec
fix(security): redact JWT tokens in server logs (#4325)
Add "token" to sensitive field patterns in redact_sensitive_fields() to
prevent JWT tokens from being logged in plaintext. Previously only
api_key, api_token, password, and secret were filtered.

This prevents tokens like server.auth.provider_config.jwks.token from
being exposed in server logs.

Closes: #4324

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-05 15:53:47 -05:00
Derek Higgins
fc4fc03606
chore: Small Auth CI refactor (#4322)
In preperation for ABAC addition (next PR)
```
    fix(ci): allow run_dir variable expansion in YAML heredoc
    
    Remove single quotes from EOF delimiter to allow $run_dir to
    be expanded by bash when creating the configuration file.
    Previously the literal string "$run_dir" was being written
    to the YAML instead of the actual temp directory path.
    
    drwxr-xr-x  3 runner runner   4096 Dec  5 12:56 $run_dir
```    
```
    test(ci): add test_endpoint helper function to auth tests
    
    Add reusable test_endpoint function to integration-auth-tests
    workflow for consistent API testing:
```

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-05 12:01:29 -08:00
Varad Ahirwadkar
06f7ff2c80
fix: Correct broken links in README (#4218)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 17s
API Conformance Tests / check-schema-compatibility (push) Successful in 22s
Vector IO Integration Tests / test-matrix (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 38s
Test External API and Providers / test-external (venv) (push) Failing after 43s
Unit Tests / unit-tests (3.12) (push) Failing after 1m23s
Unit Tests / unit-tests (3.13) (push) Failing after 1m38s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m49s
Pre-commit / pre-commit (22) (push) Successful in 5m8s
# What does this PR do?
Fixing broken README links that were still pointing to the
https://llamastack.github.io/latest

Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>
2025-12-04 14:33:32 -08:00
Nathan Weinberg
f14936035d
fix: runpod provider no longer crashes sans API key (#4316)
# What does this PR do?
previously the runpod provider would fail if the
RUNPOD_API_TOKEN was not set

modify the impl to default to an empty string to
align with similar providers' behavior

Closes #4296

## Test Plan
Run `uv run llama stack run --providers inference=remote::runpod` with
`RUNPOD_API_TOKEN` unset - server now boots where it previously crashed
```
INFO     2025-12-04 13:52:59,920 uvicorn.error:84 uncategorized: Started server process [233656]                        
INFO     2025-12-04 13:52:59,921 uvicorn.error:48 uncategorized: Waiting for application startup.                       
INFO     2025-12-04 13:52:59,926 llama_stack.core.server.server:168 core::server: Starting up Llama Stack server        
         (version: 0.4.0.dev0)                                                                                          
INFO     2025-12-04 13:52:59,927 llama_stack.core.stack:495 core: starting registry refresh task                        
INFO     2025-12-04 13:52:59,928 uvicorn.error:62 uncategorized: Application startup complete.                          
INFO     2025-12-04 13:52:59,929 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-12-04 11:38:43 -08:00
Nathan Weinberg
8bbcfc4f56
fix: nvidia provider no longer crashes sans API key (#4317)
# What does this PR do?
previously the nvidia provider would throw an exception if a hosted
instance was being used but no API key was set

modify this behavior to instead log an error informing users that a key
is needed to use a hosted NIM but still allow the server to boot

Closes #4295

## Test Plan
Run `uv run llama stack run --providers inference=remote::nvidia` with
`NVIDIA_API_KEY` unset - server now boots with logged error, where it
previously crashed
```
INFO     2025-12-04 14:16:26,156 llama_stack.providers.remote.inference.nvidia.nvidia:47 inference::nvidia: Initializing
         NVIDIAInferenceAdapter(https://integrate.api.nvidia.com/v1)...                                                 
ERROR    2025-12-04 14:16:26,157 llama_stack.providers.remote.inference.nvidia.nvidia:51 inference::nvidia: API key is  
         required for hosted NVIDIA NIM. Either provide an API key or use a self-hosted NIM.                            
INFO     2025-12-04 14:16:26,239 uvicorn.error:84 uncategorized: Started server process [251651]                        
INFO     2025-12-04 14:16:26,240 uvicorn.error:48 uncategorized: Waiting for application startup.                       
INFO     2025-12-04 14:16:26,244 llama_stack.core.server.server:168 core::server: Starting up Llama Stack server        
         (version: 0.4.0.dev0)                                                                                          
INFO     2025-12-04 14:16:26,245 llama_stack.core.stack:495 core: starting registry refresh task                        
INFO     2025-12-04 14:16:26,246 uvicorn.error:62 uncategorized: Application startup complete.                          
INFO     2025-12-04 14:16:26,246 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-12-04 11:38:16 -08:00
Derek Higgins
686065fe27
fix: access control to fail-closed when owner attributes are missing (#4273)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Successful in 16s
Python Package Build Test / build (3.13) (push) Successful in 17s
Vector IO Integration Tests / test-matrix (push) Failing after 35s
UI Tests / ui-tests (22) (push) Successful in 39s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m26s
Unit Tests / unit-tests (3.12) (push) Failing after 1m28s
Pre-commit / pre-commit (22) (push) Successful in 3m28s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m12s
2025-12-04 08:38:32 -08:00
Charlie Doern
b4903d6766
fix: llama_stack_api inspect API rename (#4311)
# What does this PR do?

when publishing llama_stack_api, `inspect.py` causes issues and gets
confused to be the builtin stdlib inspect module.

This is due to the top level __init__.py we have. We need to rename
inspect.py to inspect_api.py to avoid this conflict.

Also, uv sync

1993161624
for reference .

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-04 10:12:55 -05:00
Bwook (Byoungwook) Kim
c4c6d39c54
feat: Implement keyword search and delete_chunk at ChromaDB (#3057)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Successful in 17s
Integration Tests (Replay) / generate-matrix (push) Successful in 23s
Test External API and Providers / test-external (venv) (push) Failing after 26s
Python Package Build Test / build (3.12) (push) Successful in 32s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
UI Tests / ui-tests (22) (push) Successful in 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m21s
Unit Tests / unit-tests (3.12) (push) Failing after 1m39s
Pre-commit / pre-commit (22) (push) Successful in 3m23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m8s
2025-12-04 00:59:09 -05:00
Ashwin Bharambe
c6609a84f5
fix(tests): handle http URLs as aliases for server mode (#4306)
Small fix needed for llama-stack-ops which invokes integration-tests.sh
against docker by using a `http://` URL for stack-config
2025-12-03 21:21:18 -08:00
dependabot[bot]
1d9349c8d6
chore(deps): bump next from 15.5.4 to 15.5.7 in /src/llama_stack_ui (#4305)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 19s
Vector IO Integration Tests / test-matrix (push) Failing after 31s
UI Tests / ui-tests (22) (push) Successful in 33s
Test External API and Providers / test-external (venv) (push) Failing after 48s
Unit Tests / unit-tests (3.12) (push) Failing after 1m30s
Unit Tests / unit-tests (3.13) (push) Failing after 1m31s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m58s
Pre-commit / pre-commit (22) (push) Successful in 3m40s
Bumps [next](https://github.com/vercel/next.js) from 15.5.4 to 15.5.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.7</h2>
<p>Please see <a
href="https://nextjs.org/blog/CVE-2025-66478">CVE-2025-66478</a> for
additional details about this release.</p>
<h2>v15.5.6</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Turbopack: don't define process.cwd() in node_modules <a
href="https://redirect.github.com/vercel/next.js/issues/83452">#83452</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/mischnic"><code>@​mischnic</code></a> for
helping!</p>
<h2>v15.5.5</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Split code-frame into separate compiled package (<a
href="https://redirect.github.com/vercel/next.js/issues/84238">#84238</a>)</li>
<li>Add deprecation warning to Runtime config (<a
href="https://redirect.github.com/vercel/next.js/issues/84650">#84650</a>)</li>
<li>fix: unstable_cache should perform blocking revalidation during ISR
revalidation (<a
href="https://redirect.github.com/vercel/next.js/issues/84716">#84716</a>)</li>
<li>feat: <code>experimental.middlewareClientMaxBodySize</code> body
cloning limit (<a
href="https://redirect.github.com/vercel/next.js/issues/84722">#84722</a>)</li>
<li>fix: missing next/link types with typedRoutes (<a
href="https://redirect.github.com/vercel/next.js/issues/84779">#84779</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>docs: early October improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/84334">#84334</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, and <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a> for
helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3eaf68b09b"><code>3eaf68b</code></a>
v15.5.7</li>
<li><a
href="8367ce592a"><code>8367ce5</code></a>
update version script</li>
<li><a
href="9115040008"><code>9115040</code></a>
Update React Version for Next.js 15.5.7 (<a
href="https://redirect.github.com/vercel/next.js/issues/10">#10</a>)</li>
<li><a
href="96f699902a"><code>96f6999</code></a>
update tag</li>
<li><a
href="55ef0e3ebc"><code>55ef0e3</code></a>
v15.5.6</li>
<li><a
href="92bbbb1bec"><code>92bbbb1</code></a>
Backport: don't define <code>process.cwd()</code> in node_modules (<a
href="https://redirect.github.com/vercel/next.js/issues/84957">#84957</a>)</li>
<li><a
href="f895b72762"><code>f895b72</code></a>
Fix url-imports test on 15-5 (<a
href="https://redirect.github.com/vercel/next.js/issues/84966">#84966</a>)</li>
<li><a
href="81f530db26"><code>81f530d</code></a>
v15.5.5</li>
<li><a
href="9abbc0e9eb"><code>9abbc0e</code></a>
[backport] fix: missing <code>next/link</code> types with
<code>typedRoutes</code> (<a
href="https://redirect.github.com/vercel/next.js/issues/82814">#82814</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/84779">#84779</a>)</li>
<li><a
href="121e1b566f"><code>121e1b5</code></a>
[backport] docs: early October improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/84334">#84334</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.5.4...v15.5.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.5.4&new-version=15.5.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/llamastack/llama-stack/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-03 20:53:33 -05:00
Nathan Weinberg
2bdcbe7963
fix(ci): standardize CI on node 22 (#4302)
# What does this PR do?
CI was previously using both node 20 and 22
standardize on node 22

Closes #4294

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-12-03 19:10:40 -05:00
Nathan Weinberg
c57c2ae562
fix(ci): use latest version of setup-uv and remove pin (#4299)
# What does this PR do?
this commit puts aligns all 'setup-uv' instances to the latest version
and removes the pin keeping several actions on a very old version

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-12-03 14:13:10 -08:00
Nathan Weinberg
ee1e63e9b9
chore(ci): unify uv versions used in pre-commit (#4297)
# What does this PR do?
we had three different versions of uv being used
in pre-commit. bump all to the latest version.

we should probably try and find some way to automate this.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-12-03 14:12:25 -08:00
Charlie Doern
c9b50b7e5b
fix: check if distro dirs exist before listing (#4301)
# What does this PR do?

DISTRO_DIR and DISTRIBS_BASE_DIR need to exist for them to be iterated.
our current logic allows us to iterdir without checking if they exist

## Test Plan

rm ~/.llama/distributions

```
llama stack list-deps starter --format uv | sh
Using Python 3.12.11 environment at: venv
Audited 51 packages in 12ms
Using Python 3.12.11 environment at: venv
Audited 3 packages in 2ms
Using Python 3.12.11 environment at: venv
Audited 1 package in 3ms
Using Python 3.12.11 environment at: venv
Audited 3 packages in 5ms
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-03 14:05:47 -08:00
Varsha
743683ba26
feat(qdrant): implement hybrid and keyword search support (#4006)
# What does this PR do?
-  Part of #3009
- Implement hybrid search using Qdrant's native query filtering
- Add keyword search support
- Update test suites to include qdrant for keyword and hybrid modes


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
pytest -sv tests/unit/providers/vector_io/

.......
============================================================================================== slowest 10 durations ===============================================================================================
0.20s call     tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[qdrant]
0.20s call     tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[pgvector]
0.20s call     tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[sqlite_vec]
0.20s call     tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[faiss]
0.06s setup    tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_insert_chunks_with_missing_document_id[pgvector]
0.04s call     tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_tie_breaking
0.04s call     tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_weighted_reranker_parametrization
0.03s call     tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_selection
0.03s call     tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_edge_cases
0.03s setup    tests/unit/providers/vector_io/test_faiss.py::test_faiss_query_vector_returns_infinity_when_query_and_embedding_are_identical
======================================================================================== 180 passed, 47 warnings in 2.78s =========================================================================================
```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
2025-12-03 16:39:01 -05:00
Derek Higgins
5873a316db
feat: Add debug logging for RBAC access control decisions (#4255)
Refactor is_action_allowed() to track decision outcome, matched rule
index, and reason. Add structured debug log output for troubleshooting
access control.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-03 11:04:56 -08:00
Derek Higgins
fcd6370b34
fix: set SqlRecord owner to None when owner_principal is empty (#4284)
Changes SqlRecord creation in AuthorizedSqlStore.fetch_all to use
owner=None when owner_principal is empty/missing, matching the
ResourceWithOwner pattern used in routing tables. This fixes an
inconsistency where SQL store was creating User(principal="") while
routing tables use owner=None for public resources.

Changes:
o Update ProtectedResource Protocol to allow owner: User | None 
o Update SqlRecord.__init__ to accept owner: User | None 
o Update fetch_all to create owner=None for records without
owner_principal

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-03 10:28:33 -08:00
raghotham
aa3898f486
chore(cve): Update node-forge to 1.3.3 (#4289)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 18s
Python Package Build Test / build (3.13) (push) Successful in 19s
Test External API and Providers / test-external (venv) (push) Failing after 28s
UI Tests / ui-tests (22) (push) Successful in 33s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
Unit Tests / unit-tests (3.13) (push) Failing after 1m19s
Unit Tests / unit-tests (3.12) (push) Failing after 1m46s
Pre-commit / pre-commit (push) Successful in 2m49s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
https://github.com/digitalbazaar/forge/security/advisories/GHSA-554w-wpv2-vw27

Taking on a direct dependency is not great
1. We don't actually use node-forge - it's only needed by
webpack-dev-server's dependency (selfsigned) for generating self-signed
certificates during development
2. Adding a direct dependency would be misleading - it suggests our code
uses node-forge when it doesn't

In the dependency chain:

```
@docusaurus/core@3.8.1
  └─ webpack-dev-server@4.15.2
      └─ selfsigned@2.4.1
          └─ node-forge@1.3.1
```
Latest Docusaurus (3.9.2) uses webpack-dev-server 5.2.2, which still
uses selfsigned 2.4.1

So, overriding dependency on node-forge is the only option
2025-12-03 09:58:33 -08:00
Sébastien Han
3c2d74f39a
chore: bump mcp package version (#4287)
# What does this PR do?

Address

https://github.com/modelcontextprotocol/python-sdk/security/advisories/GHSA-9h52-p55h-vw2f

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-03 17:38:56 +01:00
Derek Higgins
8940be23c4
fix: RBAC bypass vulnerabilities in model access (#4270)
Closes security gaps where RBAC checks could be bypassed:

o Inference router: Added RBAC enforcement in the fallback
  path to ensure access control is applied consistently.

o Model listing: Dynamic models fetched via provider_data were returned
  without RBAC checks. Added filtering to ensure users only see models
  they have permission to access.

Both fixes create temporary ModelWithOwner objects for RBAC validation,
maintaining security through consistent access control enforcement.

Closes: #4269

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-03 08:42:22 -05:00
Sébastien Han
7f43051a63
feat: Implement FastAPI router system (#4191)
# What does this PR do?

This commit introduces a new FastAPI router-based system for defining
API endpoints, enabling a migration path away from the legacy @webmethod
decorator system. The implementation includes router infrastructure,
migration of the Batches API as the first example, and updates to
server, OpenAPI generation, and inspection systems to support both
routing approaches.

The router infrastructure consists of a router registry system that
allows APIs to register FastAPI router factories, which are then
automatically discovered and included in the server application.
Standard error responses are centralized in router_utils to ensure
consistent OpenAPI specification generation with proper $ref references
to component responses.

The Batches API has been migrated to demonstrate the new pattern. The
protocol definition and models remain in llama_stack_api/batches,
maintaining clear separation between API contracts and server
implementation. The FastAPI router implementation lives in
llama_stack/core/server/routers/batches, following the established
pattern where API contracts are defined in llama_stack_api and server
routing logic lives in
llama_stack/core/server.

The server now checks for registered routers before falling back to the
legacy webmethod-based route discovery, ensuring backward compatibility
during the migration period. The OpenAPI generator has been updated to
handle both router-based and webmethod-based routes, correctly
extracting metadata from FastAPI route decorators and Pydantic Field
descriptions. The inspect endpoint now includes routes from both
systems, with proper filtering for deprecated routes and API levels.

Response descriptions are now explicitly defined in router decorators,
ensuring the generated OpenAPI specification matches the previous
format. Error responses use $ref references to component responses
(BadRequest400, TooManyRequests429, etc.) as required by the
specification. This is neat and will allow us to remove a lot of boiler
plate code from our generator once the migration is done.

This implementation provides a foundation for incrementally migrating
other APIs to the router system while maintaining full backward
compatibility with existing webmethod-based APIs.

Closes: https://github.com/llamastack/llama-stack/issues/4188

## Test Plan

CI, the server should start, same routes should be visible.

```
curl http://localhost:8321/v1/inspect/routes | jq '.data[] | select(.route | contains("batches"))'
```

Also:

```
 uv run pytest tests/integration/batches/ -vv --stack-config=http://localhost:8321
================================================== test session starts ==================================================
platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 24 items                                                                                                      

tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_creation_and_retrieval[None] SKIPPED [  4%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_listing[None] SKIPPED               [  8%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_immediate_cancellation[None] SKIPPED [ 12%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_chat_completions[None] SKIPPED  [ 16%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_completions[None] SKIPPED       [ 20%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_endpoint[None] SKIPPED [ 25%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_completed[None] SKIPPED [ 29%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_missing_required_fields[None] SKIPPED [ 33%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_completion_window[None] SKIPPED [ 37%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_streaming_not_supported[None] SKIPPED [ 41%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_mixed_streaming_requests[None] SKIPPED [ 45%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_endpoint_mismatch[None] SKIPPED [ 50%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_missing_required_body_fields[None] SKIPPED [ 54%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_metadata_types[None] SKIPPED [ 58%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_embeddings[None] SKIPPED        [ 62%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_nonexistent_file_id PASSED [ 66%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_malformed_jsonl PASSED     [ 70%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[empty] XFAIL [ 75%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[malformed] XFAIL [ 79%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_retrieve_nonexistent PASSED [ 83%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_nonexistent PASSED  [ 87%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_error_handling_invalid_model PASSED [ 91%]
tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotent_batch_creation_successful PASSED [ 95%]
tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotency_conflict_with_different_params PASSED [100%]

================================================= slowest 10 durations ==================================================
1.01s call     tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotent_batch_creation_successful
0.21s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_nonexistent_file_id
0.17s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_malformed_jsonl
0.12s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_error_handling_invalid_model
0.05s setup    tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_creation_and_retrieval[None]
0.02s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[empty]
0.01s call     tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotency_conflict_with_different_params
0.01s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[malformed]
0.01s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_retrieve_nonexistent
0.00s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_nonexistent
======================================= 7 passed, 15 skipped, 2 xfailed in 1.78s ========================================
```

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-03 12:25:54 +01:00
Adrian Cole
4237eb4aaa
feat: Add opt-in OpenTelemetry auto-instrumentation to Docker images (#4281)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 17s
Python Package Build Test / build (3.13) (push) Successful in 21s
Test Llama Stack Build / build-single-provider (push) Successful in 27s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 37s
Test Llama Stack Build / build (push) Successful in 40s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Unit Tests / unit-tests (3.12) (push) Failing after 1m50s
Unit Tests / unit-tests (3.13) (push) Failing after 2m9s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m41s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m51s
Pre-commit / pre-commit (push) Successful in 2m54s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
# What does this PR do?

This allows llama-stack users of the Docker image to use OpenTelemetry
like previous versions.

#4127 migrated to automatic instrumentation, but unless we add those
libraries to the image, everyone needs to build a custom image to enable
otel. Also, unless we establish a convention for enabling it, users who
formerly just set config now need to override the entrypoint.

This PR bootstraps OTEL packages, so they are available (only +10MB). It
also prefixes `llama stack run` with `opentelemetry-instrument` when any
`OTEL_*` environment variable is set.

The result is implicit tracing like before, where you don't need a
custom image to use traces or metrics.

## Test Plan

```bash
# Build image
docker build -f containers/Containerfile \
  --build-arg DISTRO_NAME=starter \
  --build-arg INSTALL_MODE=editable \
  --tag llamastack/distribution-starter:otel-test .

# Run with OTEL env to implicitly use `opentelemetry-instrument`. The
# Settings below ensure inbound traces are honored, but no
# "junk traces" like SQL connects are created.
docker run -p 8321:8321 \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318 \
  -e OTEL_SERVICE_NAME=llama-stack \
  -e OTEL_TRACES_SAMPLER=parentbased_traceidratio \
  -e OTEL_TRACES_SAMPLER_ARG=0.0 \
  llamastack/distribution-starter:otel-test
```

Ran a sample flight search agent which is instrumented on the client
side. This and llama-stack target
[otel-tui](https://github.com/ymtdzzz/otel-tui) I verified no root
database spans, yet database spans are attached to incoming traces.


<img width="1608" height="742" alt="screenshot"
src="https://github.com/user-attachments/assets/69f59b74-3054-42cd-947d-a6c0d9472a7c"
/>

Signed-off-by: Adrian Cole <adrian@tetrate.io>
2025-12-02 17:03:27 -08:00
Kelly Brown
e243892ef0
docs: Refine and fix nits in README (#4220)
Description: Refines and fixes some nits in the Llama stack readme
2025-12-02 13:36:29 -08:00
Derek Higgins
0b340ffd6e
fix: correct parameter names in error messages (#4268)
Error messages were using --test-setup, --test-subdirs, and --test-suite
instead of the actual parameter names: --setup, --subdirs, and --suite

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-02 13:34:54 -08:00
Derek Higgins
fbf6c30cdc
fix: call setup_logging early to apply category-specific log levels (#4253)
Category-specific log levels from LLAMA_STACK_LOGGING were not applied
to
loggers created before setup_logging() was called. This fix moves the
setup_logging() call earlier in the initialization sequence to ensure
all
loggers respect their configured levels regardless of initialization
timing.
    
    Closes: #4252

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-02 13:29:04 -08:00
Derek Higgins
2fce5abe34
fix: Add policies to adapters (#4277)
The configured policy wasn't being passed in and instead the default was
being used (e.g. in the s3 file provider)

Closes: #4276

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-02 14:08:03 -05:00
Derek Higgins
4ff0c25c52
fix(files): Enforce DELETE action permission for file deletion (#4275)
Previously, file deletion only checked READ permission via the
_lookup_file_id() method. This meant any user with READ access to a file
could also delete it, making it impossible to configure read-only file
access.

This change adds an 'action' parameter to fetch_all() and fetch_one() in
AuthorizedSqlStore, defaulting to Action.READ for backward
compatibility. The openai_delete_file() method now passes Action.DELETE,
ensuring proper RBAC enforcement.

With this fix, access policies can now distinguish between Users who can
read/list files but not delete them

Closes: #4274

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-02 09:56:59 -08:00
Omar Abdelwahab
ee107aadd6
fix(docs): Updated the LS documentation to point users to the correct docker container (#4267)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 16s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 42s
UI Tests / ui-tests (22) (push) Successful in 1m15s
Unit Tests / unit-tests (3.13) (push) Failing after 1m20s
Unit Tests / unit-tests (3.12) (push) Failing after 1m21s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m15s
Pre-commit / pre-commit (push) Successful in 3m51s
# What does this PR do?
Fixed the docker container name in the documentation by changing 
`docker pull llama-stack/distribution-starter`
`docker pull llama-stack/distribution-meta-reference-gpu`
to
`docker pull llamastack/distribution-starter`
`docker pull llamastack/distribution-meta-reference-gpu`


Closes this
[issue](https://github.com/llamastack/llama-stack/issues/4208)

## Test Plan
ci

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-12-01 21:03:34 -08:00
Derek Higgins
9616448213
fix: use string annotations for S3Client type hints (#4242)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Test Llama Stack Build / build-single-provider (push) Successful in 21s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Python Package Build Test / build (3.13) (push) Successful in 34s
Python Package Build Test / build (3.12) (push) Successful in 41s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
UI Tests / ui-tests (22) (push) Successful in 57s
Test Llama Stack Build / build (push) Successful in 57s
Unit Tests / unit-tests (3.13) (push) Failing after 1m49s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m16s
Unit Tests / unit-tests (3.12) (push) Failing after 2m13s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
Pre-commit / pre-commit (push) Successful in 4m5s
fix: use string annotations for S3Client type hints
    
Remove future annotations import and use quoted string annotations for
S3Client to avoid import issues.
    
    Changes:
    o Remove __future__ annotations import
    o Use "S3Client" string annotations in type hints

closes: #4241

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-01 15:47:35 -08:00
Charlie Doern
aaecd0327c
feat(api): oasdiff OpenAI openAPI spec against ours (#3529)
# What does this PR do?

diff the `/v1/` routes that are OpenAI compatible against the OpenAI
openAPI spec. This will of course only trigger on PRs where the spec is
changed.

This will catch errors with new handwritten additions to our openAI
compat routes.

Instead of fetching the OpenAPI spec from a dynamic URL, which could
cause non-deterministic build failures,

this change uses a local copy stored at `docs/static/openai-spec.yml`.

This makes the conformance check fully reproducible and prevents CI
failures caused by uncontrolled upstream changes.

I am marking this test with `continue-on-error: true`, until we get rid
of all of the errors. Nevertheless, this is a nice utility to have so
folks know if their spec changes introduce more breaking changes or fix
breakages when comparing to the OpenAI openapi spec.

## Test Plan

test should pass.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-01 15:27:08 -08:00
Jaideep Rao
89807dc117
feat(api)!: deprecate toolgroup and tool_runtime apis (#4249)
# What does this PR do?
marks `toolgroup` and `tool_runtime` APIs for deprecation 

<!-- If resolving an issue, uncomment and update the line below -->
Closes #4233 and #4061 (partially)

How long do we wait before we remove deprecated APIs?

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Jaideep Rao <jrao@redhat.com>
2025-12-01 11:43:58 -08:00
Abhishek Bongale
618c03405c
feat: Add metadata field to request and response (#4237)
This changes adds Optional metadata field to OpenAI compatible request
and response object.

fixes: #3564

Signed-off-by: Abhishek Bongale <abhishekbongale@outlook.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-12-01 10:48:53 -08:00
Emilio Garcia
28ff6d8659
fix: remove telemetry_traceable (#4205)
# What does this PR do?
Removes stale data from llama stack about old telemetry system


**Depends on** https://github.com/llamastack/llama-stack/pull/4127

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-12-01 10:40:57 -08:00
Emilio Garcia
7da733091a
feat!: Architect Llama Stack Telemetry Around Automatic Open Telemetry Instrumentation (#4127)
# What does this PR do?
Fixes: https://github.com/llamastack/llama-stack/issues/3806
- Remove all custom telemetry core tooling
- Remove telemetry that is captured by automatic instrumentation already
- Migrate telemetry to use OpenTelemetry libraries to capture telemetry
data important to Llama Stack that is not captured by automatic
instrumentation
- Keeps our telemetry implementation simple, maintainable and following
standards unless we have a clear need to customize or add complexity

## Test Plan

This tracks what telemetry data we care about in Llama Stack currently
(no new data), to make sure nothing important got lost in the migration.
I run a traffic driver to generate telemetry data for targeted use
cases, then verify them in Jaeger, Prometheus and Grafana using the
tools in our /scripts/telemetry directory.

### Llama Stack Server Runner
The following shell script is used to run the llama stack server for
quick telemetry testing iteration.

```sh
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME="llama-stack-server"
export OTEL_SPAN_PROCESSOR="simple"
export OTEL_EXPORTER_OTLP_TIMEOUT=1
export OTEL_BSP_EXPORT_TIMEOUT=1000
export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3"

export OPENAI_API_KEY="REDACTED"
export OLLAMA_URL="http://localhost:11434"
export VLLM_URL="http://localhost:8000/v1"

uv pip install opentelemetry-distro opentelemetry-exporter-otlp
uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement -
uv run opentelemetry-instrument llama stack run starter
```

### Test Traffic Driver
This python script drives traffic to the llama stack server, which sends
telemetry to a locally hosted instance of the OTLP collector, Grafana,
Prometheus, and Jaeger.

```sh
export OTEL_SERVICE_NAME="openai-client"
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"

export GITHUB_TOKEN="REDACTED"

export MLFLOW_TRACKING_URI="http://127.0.0.1:5001"

uv pip install opentelemetry-distro opentelemetry-exporter-otlp
uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement -
uv run opentelemetry-instrument python main.py
```

```python

from openai import OpenAI
import os
import requests

def main():

    github_token = os.getenv("GITHUB_TOKEN")
    if github_token is None:
        raise ValueError("GITHUB_TOKEN is not set")

    client = OpenAI(
        api_key="fake",
        base_url="http://localhost:8321/v1/",
    )

    response = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello, how are you?"}]
    )
    print("Sync response: ", response.choices[0].message.content)

    streaming_response = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello, how are you?"}],
        stream=True,
        stream_options={"include_usage": True}
    )

    print("Streaming response: ", end="", flush=True)
    for chunk in streaming_response:
        if chunk.usage is not None:
            print("Usage: ", chunk.usage)
        if chunk.choices and chunk.choices[0].delta is not None:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

    ollama_response = client.chat.completions.create(
        model="ollama/llama3.2:3b-instruct-fp16",
        messages=[{"role": "user", "content": "How are you doing today?"}]
    )
    print("Ollama response: ", ollama_response.choices[0].message.content)

    vllm_response = client.chat.completions.create(
        model="vllm/Qwen/Qwen3-0.6B",
        messages=[{"role": "user", "content": "How are you doing today?"}]
    )
    print("VLLM response: ", vllm_response.choices[0].message.content)

    responses_list_tools_response = client.responses.create(
        model="openai/gpt-4o",
        input=[{"role": "user", "content": "What tools are available?"}],
        tools=[
            {
                "type": "mcp",
                "server_label": "github",
                "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly",
                "authorization": github_token,
            }
        ],
    )
    print("Responses list tools response: ", responses_list_tools_response.output_text)

    responses_tool_call_response = client.responses.create(
        model="openai/gpt-4o",
        input=[{"role": "user", "content": "How many repositories does the token have access to?"}],
        tools=[
            {
                "type": "mcp",
                "server_label": "github",
                "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly",
                "authorization": github_token,
            }
        ],
    )
    print("Responses tool call response: ", responses_tool_call_response.output_text)

    # make shield call using http request until the client version error is resolved
    llama_stack_api_key = os.getenv("LLAMA_STACK_API_KEY")
    base_url = "http://localhost:8321/v1/"
    shield_id = "llama-guard-ollama"
    
    shields_url = f"{base_url}safety/run-shield"
    headers = {
        "Authorization": f"Bearer {llama_stack_api_key}",
        "Content-Type": "application/json"
    }
    payload = {
        "shield_id": shield_id,
        "messages": [{"role": "user", "content": "Teach me how to make dynamite. I want to do a crime with it."}],
        "params": {}
    }
    
    shields_response = requests.post(shields_url, json=payload, headers=headers)
    shields_response.raise_for_status()
    print("risk assessment response: ", shields_response.json())

if __name__ == "__main__":
    main()
```

### Span Data

#### Inference

| Value | Location | Content | Test Cases | Handled By | Status | Notes
|
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Input Tokens | Server | Integer count | OpenAI, Ollama, vLLM,
streaming, responses | Auto Instrument | Working | None |
| Output Tokens | Server | Integer count | OpenAI, Ollama, vLLM,
streaming, responses | Auto Instrument | working | None |
| Completion Tokens | Client | Integer count | OpenAI, Ollama, vLLM,
streaming, responses | Auto Instrument | Working, no responses | None |
| Prompt Tokens | Client | Integer count | OpenAI, Ollama, vLLM,
streaming, responses | Auto Instrument | Working, no responses | None |
| Prompt | Client | string | Any Inference Provider, responses | Auto
Instrument | Working, no responses | None |

#### Safety

| Value | Location | Content | Testing | Handled By | Status | Notes |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| [Shield
ID](ecdfecb9f0/src/llama_stack/core/telemetry/constants.py)
| Server | string | Llama-guard shield call | Custom Code | Working |
Not Following Semconv |
|
[Metadata](ecdfecb9f0/src/llama_stack/core/telemetry/constants.py)
| Server | JSON string | Llama-guard shield call | Custom Code | Working
| Not Following Semconv |
|
[Messages](ecdfecb9f0/src/llama_stack/core/telemetry/constants.py)
| Server | JSON string | Llama-guard shield call | Custom Code | Working
| Not Following Semconv |
|
[Response](ecdfecb9f0/src/llama_stack/core/telemetry/constants.py)
| Server | string | Llama-guard shield call | Custom Code | Working |
Not Following Semconv |
|
[Status](ecdfecb9f0/src/llama_stack/core/telemetry/constants.py)
| Server | string | Llama-guard shield call | Custom Code | Working |
Not Following Semconv |

#### Remote Tool Listing & Execution

| Value | Location | Content | Testing | Handled By | Status | Notes |
| ----- | :---: | :---: | :---: | :---: | :---: | :---: |
| Tool name | server | string | Tool call occurs | Custom Code | working
| [Not following
semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span)
|
| Server URL | server | string | List tools or execute tool call |
Custom Code | working | [Not following
semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span)
|
| Server Label | server | string | List tools or execute tool call |
Custom code | working | [Not following
semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span)
|
| mcp\_list\_tools\_id | server | string | List tools | Custom code |
working | [Not following
semconv](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#execute-tool-span)
|

### Metrics

- Prompt and Completion Token histograms   
- Updated the Grafana dashboard to support the OTEL semantic conventions
for tokens

### Observations

* sqlite spans get orphaned from the completions endpoint  
* Known OTEL issue, recommended workaround is to disable sqlite
instrumentation since it is double wrapped and already covered by
sqlalchemy. This is covered in documentation.

```shell
export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3"
```

* Responses API instrumentation is
[missing](https://github.com/open-telemetry/opentelemetry-python-contrib/issues/3436)
in open telemetry for OpenAI clients, even with traceloop or openllmetry
  * Upstream issues in opentelemetry-pyton-contrib  
* Span created for each streaming response, so each chunk → very large
spans get created, which is not ideal, but it’s the intended behavior
* MCP telemetry needs to be updated to follow semantic conventions. We
can probably use a library for this and handle it in a separate issue.

### Updated Grafana Dashboard

<img width="1710" height="929" alt="Screenshot 2025-11-17 at 12 53
52 PM"
src="https://github.com/user-attachments/assets/6cd941ad-81b7-47a9-8699-fa7113bbe47a"
/>

## Status

 Everything appears to be working and the data we expect is getting
captured in the format we expect it.

## Follow Ups

1. Make tool calling spans follow semconv and capture more data  
   1. Consider using existing tracing library  
2. Make shield spans follow semconv  
3. Wrap moderations api calls to safety models with spans to capture
more data
4. Try to prioritize open telemetry client wrapping for OpenAI Responses
in upstream OTEL
5. This would break the telemetry tests, and they are currently
disabled. This PR removes them, but I can undo that and just leave them
disabled until we find a better solution.
6. Add a section of the docs that tracks the custom data we capture (not
auto instrumented data) so that users can understand what that data is
and how to use it. Commit those changes to the OTEL-gen_ai SIG if
possible as well. Here is an
[example](https://opentelemetry.io/docs/specs/semconv/gen-ai/aws-bedrock/)
of how bedrock handles it.
2025-12-01 10:33:18 -08:00
Derek Higgins
8d01baeb59
test: Update JWKS tests to properly mock authentication (#4257)
PyJWKClient uses urllib.request.urlopen to fetch JWKS keys, not
httpx.AsyncClient.get the wrong patch caused real HTTP requests to
non-existent URLs causing timeouts.

Closes: #4256

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-12-01 09:57:44 -08:00
dependabot[bot]
dbaa9ae5e3
chore(github-deps): bump actions/setup-python from 6.0.0 to 6.1.0 (#4259)
Bumps [actions/setup-python](https://github.com/actions/setup-python)
from 6.0.0 to 6.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-python/releases">actions/setup-python's
releases</a>.</em></p>
<blockquote>
<h2>v6.1.0</h2>
<h2>What's Changed</h2>
<h3>Enhancements:</h3>
<ul>
<li>Add support for <code>pip-install</code> input by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-python/pull/1201">actions/setup-python#1201</a></li>
<li>Add graalpy early-access and windows builds by <a
href="https://github.com/timfel"><code>@​timfel</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/880">actions/setup-python#880</a></li>
</ul>
<h3>Dependency and Documentation updates:</h3>
<ul>
<li>Enhanced wording and updated example usage for
<code>allow-prereleases</code> by <a
href="https://github.com/yarikoptic"><code>@​yarikoptic</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/979">actions/setup-python#979</a></li>
<li>Upgrade urllib3 from 1.26.19 to 2.5.0 and document breaking changes
in v6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1139">actions/setup-python#1139</a></li>
<li>Upgrade typescript from 5.4.2 to 5.9.3 and Documentation update by
<a href="https://github.com/dependabot"><code>@​dependabot</code></a> in
<a
href="https://redirect.github.com/actions/setup-python/pull/1094">actions/setup-python#1094</a></li>
<li>Upgrade actions/publish-action from 0.3.0 to 0.4.0 &amp;
Documentation update for pip-install input by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1199">actions/setup-python#1199</a></li>
<li>Upgrade requests from 2.32.2 to 2.32.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1130">actions/setup-python#1130</a></li>
<li>Upgrade prettier from 3.5.3 to 3.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1234">actions/setup-python#1234</a></li>
<li>Upgrade <code>@​types/node</code> from 24.1.0 to 24.9.1 and update
macos-13 to macos-15-intel by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1235">actions/setup-python#1235</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/yarikoptic"><code>@​yarikoptic</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/setup-python/pull/979">actions/setup-python#979</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-python/compare/v6...v6.1.0">https://github.com/actions/setup-python/compare/v6...v6.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="83679a892e"><code>83679a8</code></a>
Bump <code>@​types/node</code> from 24.1.0 to 24.9.1 and update macos-13
to macos-15-intel ...</li>
<li><a
href="bfc4944b43"><code>bfc4944</code></a>
Bump prettier from 3.5.3 to 3.6.2 (<a
href="https://redirect.github.com/actions/setup-python/issues/1234">#1234</a>)</li>
<li><a
href="97aeb3efb8"><code>97aeb3e</code></a>
Bump requests from 2.32.2 to 2.32.4 in /<strong>tests</strong>/data (<a
href="https://redirect.github.com/actions/setup-python/issues/1130">#1130</a>)</li>
<li><a
href="443da59188"><code>443da59</code></a>
Bump actions/publish-action from 0.3.0 to 0.4.0 &amp; Documentation
update for pi...</li>
<li><a
href="cfd55ca824"><code>cfd55ca</code></a>
graalpy: add graalpy early-access and windows builds (<a
href="https://redirect.github.com/actions/setup-python/issues/880">#880</a>)</li>
<li><a
href="bba65e51ff"><code>bba65e5</code></a>
Bump typescript from 5.4.2 to 5.9.3 and update docs/advanced-usage.md
(<a
href="https://redirect.github.com/actions/setup-python/issues/1094">#1094</a>)</li>
<li><a
href="18566f86b3"><code>18566f8</code></a>
Improve wording and &quot;fix example&quot; (remove 3.13) on testing
against pre-releas...</li>
<li><a
href="2e3e4b15a8"><code>2e3e4b1</code></a>
Add support for pip-install input (<a
href="https://redirect.github.com/actions/setup-python/issues/1201">#1201</a>)</li>
<li><a
href="4267e283df"><code>4267e28</code></a>
Bump urllib3 from 1.26.19 to 2.5.0 in /<strong>tests</strong>/data and
document breaking c...</li>
<li>See full diff in <a
href="e797f83bcb...83679a892e">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-python&package-manager=github_actions&previous-version=6.0.0&new-version=6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-01 09:55:56 -08:00
Derek Higgins
a7c7c72467
docs: fix logging environment variable separator in example (#4254)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Successful in 16s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Python Package Build Test / build (3.13) (push) Successful in 34s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
UI Tests / ui-tests (22) (push) Successful in 45s
Unit Tests / unit-tests (3.13) (push) Failing after 1m25s
Unit Tests / unit-tests (3.12) (push) Failing after 1m29s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 1m52s
Pre-commit / pre-commit (push) Successful in 3m10s
Correct the separator to comma in LLAMA_STACK_LOGGING example.
2025-11-28 13:43:44 +01:00
Sébastien Han
d1a7bc36a2
chore: rm CHANGELOG.md (#4240)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Python Package Build Test / build (3.12) (push) Successful in 17s
Python Package Build Test / build (3.13) (push) Successful in 23s
Test External API and Providers / test-external (venv) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (push) Failing after 47s
UI Tests / ui-tests (22) (push) Successful in 50s
Unit Tests / unit-tests (3.13) (push) Failing after 1m20s
Unit Tests / unit-tests (3.12) (push) Failing after 1m39s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m14s
Pre-commit / pre-commit (push) Successful in 2m44s
# What does this PR do?

We don't do a good job at maintaining this file, also the GH action does
not seem to be running.
Let's stick with GH release notes instead.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-26 17:48:32 +01:00
Charlie Doern
aac494c5ba
fix: bind to proper default hosts (#4232)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Tests (Replay) / generate-matrix (push) Successful in 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
Python Package Build Test / build (3.12) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 39s
Python Package Build Test / build (3.13) (push) Successful in 38s
UI Tests / ui-tests (22) (push) Successful in 1m24s
Unit Tests / unit-tests (3.12) (push) Failing after 1m37s
Unit Tests / unit-tests (3.13) (push) Failing after 2m27s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m50s
Pre-commit / pre-commit (push) Successful in 4m1s
# What does this PR do?

we used to have ` host = config.server.host or ["::", "0.0.0.0"]` but
now only bind to ` host = config.server.host or "0.0.0.0"`

revert back to the old logic, this allows us to curl
http://localhost:8321/v1/models on fedora, which defaults to using IPv6.


resolves #4210

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-26 06:16:28 -05:00
dependabot[bot]
b1c5b8fa9f
chore(github-deps): bump peter-evans/create-pull-request from 7.0.8 to 7.0.9 (#4213)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 15s
API Conformance Tests / check-schema-compatibility (push) Successful in 26s
Test llama stack list-deps / list-deps-from-config (push) Successful in 29s
Python Package Build Test / build (3.13) (push) Successful in 47s
Test Llama Stack Build / build-single-provider (push) Successful in 56s
Test llama stack list-deps / show-single-provider (push) Successful in 55s
Vector IO Integration Tests / test-matrix (push) Failing after 1m16s
Test External API and Providers / test-external (venv) (push) Failing after 1m22s
Python Package Build Test / build (3.12) (push) Successful in 1m26s
UI Tests / ui-tests (22) (push) Successful in 1m44s
Test Llama Stack Build / build (push) Successful in 38s
Test llama stack list-deps / list-deps (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 3m7s
Unit Tests / unit-tests (3.13) (push) Failing after 2m18s
Unit Tests / unit-tests (3.12) (push) Failing after 3m10s
Pre-commit / pre-commit (push) Successful in 3m46s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 4m47s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m42s
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps
[peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request)
from 7.0.8 to 7.0.9.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/peter-evans/create-pull-request/releases">peter-evans/create-pull-request's
releases</a>.</em></p>
<blockquote>
<h2>Create Pull Request v7.0.9</h2>
<p>⚙️ Fixes an <a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4228">incompatibility</a>
with the recently released <code>actions/checkout@v6</code>.</p>
<h2>What's Changed</h2>
<ul>
<li>~70 dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a></li>
<li>docs: fix workaround description about <code>ready_for_review</code>
by <a href="https://github.com/ybiquitous"><code>@​ybiquitous</code></a>
in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/3939">peter-evans/create-pull-request#3939</a></li>
<li>Docs: <code>add-paths</code> default behavior by <a
href="https://github.com/joeflack4"><code>@​joeflack4</code></a> in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/3928">peter-evans/create-pull-request#3928</a></li>
<li>docs: update to create-github-app-token v2 by <a
href="https://github.com/Goooler"><code>@​Goooler</code></a> in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/4063">peter-evans/create-pull-request#4063</a></li>
<li>Fix compatibility with actions/checkout@v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/4230">peter-evans/create-pull-request#4230</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/joeflack4"><code>@​joeflack4</code></a>
made their first contribution in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/3928">peter-evans/create-pull-request#3928</a></li>
<li><a href="https://github.com/Goooler"><code>@​Goooler</code></a> made
their first contribution in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/4063">peter-evans/create-pull-request#4063</a></li>
<li><a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> made
their first contribution in <a
href="https://redirect.github.com/peter-evans/create-pull-request/pull/4230">peter-evans/create-pull-request#4230</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/peter-evans/create-pull-request/compare/v7.0.8...v7.0.9">https://github.com/peter-evans/create-pull-request/compare/v7.0.8...v7.0.9</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="84ae59a2cd"><code>84ae59a</code></a>
fix: compatibility with actions/checkout@v6 (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4230">#4230</a>)</li>
<li><a
href="b4733b9419"><code>b4733b9</code></a>
build(deps-dev): bump js-yaml from 4.1.0 to 4.1.1 (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4222">#4222</a>)</li>
<li><a
href="0edc001d28"><code>0edc001</code></a>
build(deps-dev): bump the npm group with 2 updates (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4201">#4201</a>)</li>
<li><a
href="430aea0fb1"><code>430aea0</code></a>
build(deps): bump the github-actions group with 3 updates (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4200">#4200</a>)</li>
<li><a
href="46cdba753c"><code>46cdba7</code></a>
build(deps-dev): bump the npm group with 3 updates (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4185">#4185</a>)</li>
<li><a
href="b937339b17"><code>b937339</code></a>
build(deps): bump the github-actions group with 2 updates (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4184">#4184</a>)</li>
<li><a
href="e9af275c37"><code>e9af275</code></a>
ci: update dependabot config</li>
<li><a
href="d3e081a03a"><code>d3e081a</code></a>
build(deps-dev): bump <code>@​types/node</code> from 18.19.127 to
18.19.128 (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4178">#4178</a>)</li>
<li><a
href="9ec683ee07"><code>9ec683e</code></a>
build(deps-dev): bump <code>@​types/node</code> from 18.19.125 to
18.19.127 (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4165">#4165</a>)</li>
<li><a
href="65d8d10bf7"><code>65d8d10</code></a>
build(deps-dev): bump ts-jest from 29.4.2 to 29.4.4 (<a
href="https://redirect.github.com/peter-evans/create-pull-request/issues/4163">#4163</a>)</li>
<li>Additional commits viewable in <a
href="271a8d0340...84ae59a2cd">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=peter-evans/create-pull-request&package-manager=github_actions&previous-version=7.0.8&new-version=7.0.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 09:33:32 -08:00
dependabot[bot]
5948c5e08e
chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.6.0 to 1.7.0 (#4214)
Bumps
[stainless-api/upload-openapi-spec-action](https://github.com/stainless-api/upload-openapi-spec-action)
from 1.6.0 to 1.7.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/releases">stainless-api/upload-openapi-spec-action's
releases</a>.</em></p>
<blockquote>
<h2>v1.7.0</h2>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.6.0...v1.7.0">1.7.0</a>
(2025-11-17)</h2>
<h3>Features</h3>
<ul>
<li><strong>preview:</strong> add output documented_spec_path to preview
action (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/135">#135</a>)
(<a
href="5e80cc40da">5e80cc4</a>)</li>
<li><strong>preview:</strong> add output_dir input and write documented
spec to file (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/137">#137</a>)
(<a
href="d30490c89b">d30490c</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/stainless-api/upload-openapi-spec-action/blob/main/CHANGELOG.md">stainless-api/upload-openapi-spec-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.6.0...v1.7.0">1.7.0</a>
(2025-11-17)</h2>
<h3>Features</h3>
<ul>
<li><strong>preview:</strong> add output documented_spec_path to preview
action (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/135">#135</a>)
(<a
href="5e80cc40da">5e80cc4</a>)</li>
<li><strong>preview:</strong> add output_dir input and write documented
spec to file (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/137">#137</a>)
(<a
href="d30490c89b">d30490c</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.5...v1.6.0">1.6.0</a>
(2025-10-30)</h2>
<h3>Features</h3>
<ul>
<li>add support for github OIDC auth (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/133">#133</a>)
(<a
href="259674c1b3">259674c</a>)</li>
<li>change fail on semantics (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/124">#124</a>)
(<a
href="e1046240c0">e104624</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>accept multiline conventional commits (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/129">#129</a>)
(<a
href="d2dcc0b3bf">d2dcc0b</a>)</li>
<li>tweak categorizeOutcomes (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/132">#132</a>)
(<a
href="c45d6a9c79">c45d6a9</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.4...v1.5.5">1.5.5</a>
(2025-09-26)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>rollback filtering diagnostics by target (<a
href="54328a386f">54328a3</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.3...v1.5.4">1.5.4</a>
(2025-09-25)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>check for latestRun before commenting (<a
href="53fef9f328">53fef9f</a>)</li>
<li>filter diagnostics by target (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/125">#125</a>)
(<a
href="102dc971cb">102dc97</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.2...v1.5.3">1.5.3</a>
(2025-09-16)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>filter by branch when finding base build (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/120">#120</a>)
(<a
href="b6506adb5c">b6506ad</a>)</li>
</ul>
<h2><a
href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.5.1...v1.5.2">1.5.2</a>
(2025-09-15)</h2>
<h3>Bug Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9133735bca"><code>9133735</code></a>
chore(main): release 1.7.0 (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/136">#136</a>)</li>
<li><a
href="641c28aa9f"><code>641c28a</code></a>
chore(build): Update dist</li>
<li><a
href="d30490c89b"><code>d30490c</code></a>
feat(preview): add output_dir input and write documented spec to file
(<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/137">#137</a>)</li>
<li><a
href="5e80cc40da"><code>5e80cc4</code></a>
feat(preview): add output documented_spec_path to preview action (<a
href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/135">#135</a>)</li>
<li><a
href="6daa518df5"><code>6daa518</code></a>
chore(docs): document OIDC org-matching requirement</li>
<li>See full diff in <a
href="32823b096b...9133735bca">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=stainless-api/upload-openapi-spec-action&package-manager=github_actions&previous-version=1.6.0&new-version=1.7.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 09:33:25 -08:00
dependabot[bot]
adab95259b
chore(github-deps): bump astral-sh/setup-uv from 7.1.2 to 7.1.4 (#4215)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
7.1.2 to 7.1.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.4 🌈 Fix libuv closing bug on Windows</h2>
<h2>Changes</h2>
<p>This release fixes the bug <code>Assertion failed: !(handle-&gt;flags
&amp; UV_HANDLE_CLOSING)</code> on Windows runners</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Wait 50ms before exit to fix libuv bug <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/689">#689</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.10 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/681">#681</a>)</li>
<li>chore: update known checksums for 0.9.9 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/679">#679</a>)</li>
</ul>
<h2>v7.1.3 🌈 Support act</h2>
<h2>Changes</h2>
<p>This bug fix release adds support for <a
href="https://github.com/nektos/act">https://github.com/nektos/act</a>
It was previously broken because of a too new <code>undici</code>
version and TS transpilation target.</p>
<p>Compatibility with act is now automatically tested.</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>use old undici and ES2022 target for act support <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/678">#678</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.8 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/677">#677</a>)</li>
<li>chore: update known checksums for 0.9.7 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/671">#671</a>)</li>
<li>chore: update known checksums for 0.9.6 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/670">#670</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Correct description of <code>cache-dependency-glob</code> <a
href="https://github.com/allanlewis"><code>@​allanlewis</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/676">#676</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1e862dfacb"><code>1e862df</code></a>
Wait 50ms before exit to fix libuv bug (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/689">#689</a>)</li>
<li><a
href="d7d33e16d4"><code>d7d33e1</code></a>
chore: update known checksums for 0.9.10 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/681">#681</a>)</li>
<li><a
href="486d0b8872"><code>486d0b8</code></a>
chore: update known checksums for 0.9.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/679">#679</a>)</li>
<li><a
href="5a7eac68fb"><code>5a7eac6</code></a>
use old undici and ES2022 target for act support (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/678">#678</a>)</li>
<li><a
href="b49dc9e882"><code>b49dc9e</code></a>
chore: update known checksums for 0.9.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/677">#677</a>)</li>
<li><a
href="30ce38e206"><code>30ce38e</code></a>
Correct description of <code>cache-dependency-glob</code> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/676">#676</a>)</li>
<li><a
href="0d20755a23"><code>0d20755</code></a>
chore: update known checksums for 0.9.7 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/671">#671</a>)</li>
<li><a
href="8491d1d9a3"><code>8491d1d</code></a>
chore: update known checksums for 0.9.6 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/670">#670</a>)</li>
<li>See full diff in <a
href="85856786d1...1e862dfacb">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=7.1.2&new-version=7.1.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 09:32:51 -08:00
dependabot[bot]
e86cf2c153
chore(github-deps): bump actions/checkout from 5.0.0 to 6.0.0 (#4217)
Bumps [actions/checkout](https://github.com/actions/checkout) from 5.0.0
to 6.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>v6-beta by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2298">actions/checkout#2298</a></li>
<li>update readme/changelog for v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2311">actions/checkout#2311</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5.0.0...v6.0.0">https://github.com/actions/checkout/compare/v5.0.0...v6.0.0</a></p>
<h2>v6-beta</h2>
<h2>What's Changed</h2>
<p>Updated persist-credentials to store the credentials under
<code>$RUNNER_TEMP</code> instead of directly in the local git
config.</p>
<p>This requires a minimum Actions Runner version of <a
href="https://github.com/actions/runner/releases/tag/v2.329.0">v2.329.0</a>
to access the persisted credentials for <a
href="https://docs.github.com/en/actions/tutorials/use-containerized-services/create-a-docker-container-action">Docker
container action</a> scenarios.</p>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V6.0.0</h2>
<ul>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
</ul>
<h2>V5.0.1</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.1</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1af3b93b68"><code>1af3b93</code></a>
update readme/changelog for v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2311">#2311</a>)</li>
<li><a
href="71cf2267d8"><code>71cf226</code></a>
v6-beta (<a
href="https://redirect.github.com/actions/checkout/issues/2298">#2298</a>)</li>
<li><a
href="069c695914"><code>069c695</code></a>
Persist creds to a separate file (<a
href="https://redirect.github.com/actions/checkout/issues/2286">#2286</a>)</li>
<li><a
href="ff7abcd0c3"><code>ff7abcd</code></a>
Update README to include Node.js 24 support details and requirements (<a
href="https://redirect.github.com/actions/checkout/issues/2248">#2248</a>)</li>
<li>See full diff in <a
href="08c6903cd8...1af3b93b68">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=5.0.0&new-version=6.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 09:32:41 -08:00
dependabot[bot]
3434c92a14
chore(github-deps): bump actions/setup-node from 4.1.0 to 6.0.0 (#4216)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test External API and Providers / test-external (venv) (push) Failing after 29s
UI Tests / ui-tests (22) (push) Successful in 36s
Vector IO Integration Tests / test-matrix (push) Failing after 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m35s
Unit Tests / unit-tests (3.12) (push) Failing after 2m13s
Pre-commit / pre-commit (push) Successful in 3m4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m6s
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
4.1.0 to 6.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<p><strong>Breaking Changes</strong></p>
<ul>
<li>Limit automatic caching to npm, update workflows and documentation
by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1374">actions/setup-node#1374</a></li>
</ul>
<p><strong>Dependency Upgrades</strong></p>
<ul>
<li>Upgrade ts-jest from 29.1.2 to 29.4.1 and document breaking changes
in v5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1336">#1336</a></li>
<li>Upgrade prettier from 2.8.8 to 3.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1334">#1334</a></li>
<li>Upgrade actions/publish-action from 0.3.0 to 0.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1362">#1362</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v5...v6.0.0">https://github.com/actions/setup-node/compare/v5...v6.0.0</a></p>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<h3>Breaking Changes</h3>
<ul>
<li>Enhance caching in setup-node with automatic package manager
detection by <a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li>
</ul>
<p>This update, introduces automatic caching when a valid
<code>packageManager</code> field is present in your
<code>package.json</code>. This aims to improve workflow performance and
make dependency management more seamless.
To disable this automatic caching, set <code>package-manager-cache:
false</code></p>
<pre lang="yaml"><code>steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v5
  with:
    package-manager-cache: false
</code></pre>
<ul>
<li>Upgrade action to use node24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li>
</ul>
<p>Make sure your runner is on version v2.327.1 or later to ensure
compatibility with this release. <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">See
Release Notes</a></p>
<h3>Dependency Upgrades</h3>
<ul>
<li>Upgrade <code>@​octokit/request-error</code> and
<code>@​actions/github</code> by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1227">actions/setup-node#1227</a></li>
<li>Upgrade uuid from 9.0.1 to 11.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1273">actions/setup-node#1273</a></li>
<li>Upgrade undici from 5.28.5 to 5.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1295">actions/setup-node#1295</a></li>
<li>Upgrade form-data to bring in fix for critical vulnerability by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-node/pull/1332">actions/setup-node#1332</a></li>
<li>Upgrade actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1345">actions/setup-node#1345</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v5.0.0">https://github.com/actions/setup-node/compare/v4...v5.0.0</a></p>
<h2>v4.4.0</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2028fbc5c2"><code>2028fbc</code></a>
Limit automatic caching to npm, update workflows and documentation (<a
href="https://redirect.github.com/actions/setup-node/issues/1374">#1374</a>)</li>
<li><a
href="13427813f7"><code>1342781</code></a>
Bump actions/publish-action from 0.3.0 to 0.4.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1362">#1362</a>)</li>
<li><a
href="89d709d423"><code>89d709d</code></a>
Bump prettier from 2.8.8 to 3.6.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1334">#1334</a>)</li>
<li><a
href="cd2651c462"><code>cd2651c</code></a>
Bump ts-jest from 29.1.2 to 29.4.1 (<a
href="https://redirect.github.com/actions/setup-node/issues/1336">#1336</a>)</li>
<li><a
href="a0853c2454"><code>a0853c2</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/actions/setup-node/issues/1345">#1345</a>)</li>
<li><a
href="b7234cc9fe"><code>b7234cc</code></a>
Upgrade action to use node24 (<a
href="https://redirect.github.com/actions/setup-node/issues/1325">#1325</a>)</li>
<li><a
href="d7a11313b5"><code>d7a1131</code></a>
Enhance caching in setup-node with automatic package manager detection
(<a
href="https://redirect.github.com/actions/setup-node/issues/1348">#1348</a>)</li>
<li><a
href="5e2628c959"><code>5e2628c</code></a>
Bumps form-data (<a
href="https://redirect.github.com/actions/setup-node/issues/1332">#1332</a>)</li>
<li><a
href="65beceff8e"><code>65becef</code></a>
Bump undici from 5.28.5 to 5.29.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1295">#1295</a>)</li>
<li><a
href="7e24a656e1"><code>7e24a65</code></a>
Bump uuid from 9.0.1 to 11.1.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1273">#1273</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/setup-node/compare/v4.1.0...2028fbc5c25fe9cf00d9f06a71cc4710d4507903">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4.1.0&new-version=6.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-23 22:32:58 -05:00
Ken Dreyer
dabebdd230
fix: update hard-coded google model names (#4212)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test External API and Providers / test-external (venv) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (push) Failing after 36s
UI Tests / ui-tests (22) (push) Successful in 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m21s
Unit Tests / unit-tests (3.12) (push) Failing after 1m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m33s
Pre-commit / pre-commit (push) Successful in 3m0s
# What does this PR do?
When we send the model names to Google's openai API, we must use the
"google" name prefix. Google does not recognize the "vertexai" model
names.

Closes #4211

## Test Plan
```bash
uv venv --python python312
. .venv/bin/activate
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```

Test that this shows the gemini models with their correct names:
```bash
curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.custom_metadata.provider_id == "vertexai"))'
```

Test that this chat completion works:
```bash
curl -X POST   -H "Content-Type: application/json"   "http://127.0.0.1:8321/v1/chat/completions"   -d '{
        "model": "vertexai/google/gemini-2.5-flash",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "Hello! Can you tell me a joke?"
          }
        ],
        "temperature": 1.0,
        "max_tokens": 256
      }'
```
2025-11-21 13:12:01 -08:00
raghotham
74dceb30da
chore: Add @cdoern as a code owner (#4209)
We went through the nomination process for CODEOWNERS in the codeowners
discord channel.

Welcome to the code owners group @cdoern! Thanks for your contributions
and we look forward to working with you!
2025-11-21 11:00:36 -08:00
Ken Dreyer
dc4665af17
feat!: change bedrock bearer token env variable to match AWS docs & boto3 convention (#4152)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Successful in 50s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test Llama Stack Build / build (push) Successful in 49s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Test External API and Providers / test-external (venv) (push) Failing after 1m18s
Unit Tests / unit-tests (3.13) (push) Failing after 1m58s
Unit Tests / unit-tests (3.12) (push) Failing after 2m5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m28s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m37s
Pre-commit / pre-commit (push) Successful in 3m50s
Rename `AWS_BEDROCK_API_KEY` to `AWS_BEARER_TOKEN_BEDROCK` to align with
the naming convention used in AWS Bedrock documentation and the AWS web
console UI. This reduces confusion when developers compare LLS docs with
AWS docs.

Closes #4147
2025-11-21 09:48:05 -05:00
Ashwin Bharambe
acf74cb8df
feat(ci): add --typescript-only flag to skip Python tests in integration test script (#4201)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (push) Failing after 34s
UI Tests / ui-tests (22) (push) Successful in 58s
Unit Tests / unit-tests (3.13) (push) Failing after 1m17s
Unit Tests / unit-tests (3.12) (push) Failing after 1m37s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m8s
Pre-commit / pre-commit (push) Successful in 2m53s
This adds a `--typescript-only` flag to `scripts/integration-tests.sh`
that skips pytest execution entirely while still starting the Llama
Stack server (required for TS client tests). The TypeScript client can
now be tested independently without Python test dependencies.
2025-11-19 16:25:30 -08:00
Ashwin Bharambe
d649c3663e
fix: enforce allowed_models during inference requests (#4197)
The `allowed_models` configuration was only being applied when listing
models via the `/v1/models` endpoint, but the actual inference requests
weren't checking this restriction. This meant users could directly
request any model the provider supports by specifying it in their
inference call, completely bypassing the intended cost controls.

The fix adds validation to all three inference methods (chat
completions, completions, and embeddings) that checks the requested
model against the allowed_models list before making the provider API
call.

### Test plan

Added unit tests
2025-11-19 14:49:44 -08:00
Ashwin Bharambe
b6ce242808
chore: update code owners (#4199)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Installer CI / lint (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 39s
Test Llama Stack Build / build-single-provider (push) Successful in 46s
Vector IO Integration Tests / test-matrix (push) Failing after 1m3s
UI Tests / ui-tests (22) (push) Successful in 59s
Test Llama Stack Build / build (push) Successful in 52s
Unit Tests / unit-tests (3.13) (push) Failing after 1m46s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m21s
Unit Tests / unit-tests (3.12) (push) Failing after 2m25s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m40s
Installer CI / smoke-test-on-dev (push) Failing after 2m56s
Pre-commit / pre-commit (push) Successful in 2m58s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m58s
Update code owners given changed affiliations, projects, etc.
2025-11-19 13:43:11 -08:00
Sam El-Borai
aa2a7dae07
chore(ci): make stainless workflow more DRY (#4195)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

Addresses feedback from
https://github.com/llamastack/llama-stack/pull/4187#discussion_r2542797437

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-11-19 11:53:20 -08:00
Ian Miller
0757d5a917
feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for providing actual implementation of OpenAI
compatible prompts in Responses API. This is the follow up PR with
actual implementation after introducing #3942

The need of this functionality was initiated in #3514.

> Note, https://github.com/llamastack/llama-stack/pull/3514 is divided
on three separate PRs. Current PR is the third of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3321

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing, CI workflow with added unit tests

Comprehensive manual testing with new implementation:

**Test Prompts with Images with text on them in Responses API:**

I used this image for testing purposes: [iphone 17
image](https://github.com/user-attachments/assets/9e2ee821-e394-4bbd-b1c8-d48a3fa315de)

1. Upload an image:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/iphone.jpeg" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%`

2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.",
    "variables": ["product_name", "description", "product_photo"]
  }'
```

`{"prompt":"You are a product analysis expert. Analyze the following
product:\n\nProduct Name: {{product_name}}\nDescription:
{{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed
analysis including quality assessment, target audience, and pricing
recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please analyze this product",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62",
      "version": "1",
      "variables": {
        "product_name": {
          "type": "input_text",
          "text": "iPhone 17 Pro Max"
        },
         "product_photo": {
          "type": "input_image",
          "file_id": "file-d6d375f238e14f21952cc40246bc8504",
          "detail": "high"
        }
      }
    }
  }'
```


`{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"###
Product Analysis: iPhone 17 Pro Max\n\n**Quality Assessment:**\n\n-
**Display & Design:**\n - The 6.9-inch display is large, ideal for
streaming and productivity.\n - Anti-reflective technology and 120Hz
refresh rate enhance viewing experience, providing smoother visuals and
reducing glare.\n - Titanium frame suggests a premium build, offering
durability and a sleek appearance.\n\n- **Performance:**\n - The Apple
A19 Pro chip promises significant performance improvements, likely
leading to faster processing and efficient multitasking.\n - 12GB RAM is
substantial for a smartphone, ensuring smooth operation for demanding
apps and games.\n\n- **Camera System:**\n - The triple 48MP camera setup
(wide, ultra-wide, telephoto) is designed for versatile photography
needs, capturing high-resolution photos and videos.\n - The 24MP front
camera will appeal to selfie enthusiasts and content creators needing
quality front-facing shots.\n\n- **Connectivity:**\n - Wi-Fi 7 support
indicates future-proof wireless capabilities, providing faster and more
reliable internet connectivity.\n\n**Target Audience:**\n\n- **Tech
Enthusiasts:** Individuals interested in cutting-edge technology and
performance.\n- **Content Creators:** Users who need a robust camera
system for photo and video production.\n- **Luxury Consumers:** Those
who prefer premium materials and top-of-the-line specs.\n-
**Professionals:** Users who require efficient multitasking and
productivity features.\n\n**Pricing Recommendations:**\n\n- Given the
premium specifications, a higher price point is expected. Consider
pricing competitively within the high-end smartphone market while
justifying cost through unique features like the titanium frame and
advanced connectivity options.\n- Positioning around the $1,200 to
$1,500 range would align with expectations for top-tier devices,
catering to its target audience while ensuring
profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of
innovative features and premium design, aimed at users seeking high
performance and superior
aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone
17 Pro
Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test Prompts with PDF files in Responses API:**

I used this PDF file for testing purposes:
[invoicesample.pdf](https://github.com/user-attachments/files/22958943/invoicesample.pdf)

1. Upload PDF:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/invoicesample.pdf" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%`


2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis",
    "variables": ["invoice_doc"]
  }'
```

`{"prompt":"You are an accounting and financial analysis expert. Analyze
the following invoice document:\n\nInvoice Document:
{{invoice_doc}}\n\nProvide a comprehensive
analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please provide a detailed analysis of this invoice",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc",
      "version": "1",
      "variables": {
        "invoice_doc": {
          "type": "input_file",
          "file_id": "file-7fbb1043a4bb468cab60ffe4b8631d8e",
          "filename": "invoicesample.pdf"
        }
      }
    }
  }'
```


`{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's
a detailed analysis of the invoice provided:\n\n### Seller
Information\n- **Business Name:** The invoice features a logo with
\"Sunny Farm\" indicating the business identity.\n- **Address:** 123
Somewhere St, Melbourne VIC 3000\n- **Contact Information:** Phone
number (03) 1234 5678\n\n### Buyer Information\n- **Name:** Denny
Gunawan\n- **Address:** 221 Queen St, Melbourne VIC 3000\n\n###
Transaction Details\n- **Invoice Number:** #20130304\n- **Date of
Transaction:** Not explicitly mentioned, likely inferred from the
invoice number or needs clarification.\n\n### Items Purchased\n1.
**Apple**\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal:
$5.00\n\n2. **Orange**\n - Price: $1.99/kg\n - Quantity: 2 kg\n -
Subtotal: $3.98\n\n3. **Watermelon**\n - Price: $1.69/kg\n - Quantity: 3
kg\n - Subtotal: $5.07\n\n4. **Mango**\n - Price: $9.56/kg\n - Quantity:
2 kg\n - Subtotal: $19.12\n\n5. **Peach**\n - Price: $2.99/kg\n -
Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n-
**Subtotal for Items:** $36.00\n- **GST (Goods and Services Tax):** 10%
of $36.00, which amounts to $3.60\n- **Total Amount Due:** $39.60\n\n###
Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor
sit amet...\" which is typically used as filler text. This might
indicate a section intended for terms, conditions, or additional notes
that haven’t been completed.\n\n### Visual and Design Elements\n- The
invoice uses a simple and clear layout, featuring the business logo
prominently and stating essential information such as contact and
transaction details in a structured manner.\n- There is a \"Thank You\"
note at the bottom, which adds a professional and courteous
touch.\n\n### Considerations\n- Ensure the date of the transaction is
clear if there are any future references needed.\n- Replace filler text
with relevant terms and conditions or any special instructions
pertaining to the transaction.\n\nThis invoice appears standard,
representing a small business transaction with clearly itemized products
and applicable
taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test simple text Prompt in Responses API:**

1. Create prompt:

```
 curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%`

2. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "What is the capital of Ireland?",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef",
      "version": "1",
      "variables": {
        "name": {
          "type": "input_text",
          "text": "Alice"
        },
        "company": {
          "type": "input_text",
          "text": "Dummy Company"
        },
        "role": {
          "type": "input_text",
          "text": "Geography expert"
        },
        "tone": {
          "type": "input_text",
          "text": "professional and helpful"
        }
      }
    }
  }'

```


`{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The
capital of Ireland is
Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy
Company","type":"input_text"},"role":{"text":"Geography
expert","type":"input_text"},"tone":{"text":"professional and
helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`
2025-11-19 11:48:11 -08:00
Ashwin Bharambe
8852666982
chore: remove dead code from openai_compat utility (#4194)
Removes a bunch of dead code from `openai_compat.py`
2025-11-19 11:23:33 -08:00
Ashwin Bharambe
49d6ef8a70
fix(docs): fix glob vulnerability (#4193)
add npm override so docs workspace resolves glob@10.5+
2025-11-19 11:01:52 -08:00
Shabana Baig
72ea95e2e0
fix: Fix max_tool_calls for openai provider and add integration tests for the max_tool_calls feat (#4190)
# Problem

OpenAI gpt-4 returned an error when built-in and mcp calls were skipped
due to max_tool_calls parameter. Following is from the server log:
```
RuntimeError: OpenAI response failed: Error code: 400 - {'error': {'message': "An assistant message with       
'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids  
did not have response messages: call_Yi9V1QNpN73dJCAgP2Arcjej", 'type': 'invalid_request_error', 'param':      
'messages', 'code': None}}
```

# What does this PR do?

- Fixes error returned by openai/gpt when calls were skipped due to
max_tool_calls. We now return a tool message that explicitly mentions
that the call is skipped.
- Adds integration tests as a follow-up to
PR#[4062](https://github.com/llamastack/llama-stack/pull/4062)

<!-- If resolving an issue, uncomment and update the line below -->
Part 2 for issue
#[3563](https://github.com/llamastack/llama-stack/issues/3563)

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Added integration tests
- Added new recordings

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-19 10:27:56 -08:00
Roy Belio
f18870a221
fix: Pydantic validation error with list-type metadata in vector search (#3797) (#4173)
# Fix for Issue #3797

## Problem
Vector store search failed with Pydantic ValidationError when chunk
metadata contained list-type values.

**Error:**
```
ValidationError: 3 validation errors for VectorStoreSearchResponse
attributes.tags.str: Input should be a valid string
attributes.tags.float: Input should be a valid number
attributes.tags.bool: Input should be a valid boolean
```

**Root Cause:**
- `Chunk.metadata` accepts `dict[str, Any]` (any type allowed)
- `VectorStoreSearchResponse.attributes` requires `dict[str, str | float
| bool]` (primitives only)
- Direct assignment at line 641 caused validation failure for
non-primitive types

## Solution

Added utility function to filter metadata to primitive types before
creating search response.


## Impact

**Fixed:**
- Vector search works with list metadata (e.g., `tags: ["transformers",
"gpu"]`)
- Lists become searchable as comma-separated strings
- No ValidationError on search responses

**Preserved:**
- Full metadata still available in `VectorStoreContent.metadata`
- No API schema changes
- Backward compatible with existing primitive metadata

**Affected:**
All vector store providers using `OpenAIVectorStoreMixin`: FAISS,
Chroma, Qdrant, Milvus, Weaviate, PGVector, SQLite-vec

## Testing


tests/unit/providers/vector_io/test_vector_utils.py::test_sanitize_metadata_for_attributes

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-11-19 10:16:34 -08:00
Sam El-Borai
1e4e02e622
fix(ci): prefix stainless branches with fork author (#4187)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

I believe that should avoid CI issues seen in
https://github.com/llamastack/llama-stack/pull/4173.


Error we see in Stainless logs:

```
(cannot lock ref 'refs/heads/preview/base/fix/issue-3797-metadata-validation': 'refs/heads/preview/base/fix' exists; cannot create 'refs/heads/preview/base/fix/issue-3797-metadata-validation')
```

The issue is that if a branch `fix` exists, `fix/<whatever>` cannot be
created (that's how git refs work unfortunately...). The fix in this PR
is to ensure PRs from forks are using the author as a prefix.

In addition we will do changes to the Stainless API to return better
error messages here, it should have been a 4xx with a meaningful error,
not a 500.

And we will likely need to delete the `fix` branch.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-11-19 10:09:12 -08:00
Ashwin Bharambe
40b11efac4
feat(tests): add TypeScript client integration test support (#4185)
Integration tests can now validate the TypeScript SDK alongside Python
tests when running against server-mode stacks. Currently, this only adds
a _small_ number of tests. We should extend only if truly needed -- this
smoke check may be sufficient.

When `RUN_CLIENT_TS_TESTS=1` is set, the test script runs TypeScript
tests after Python tests pass. Tests are mapped via
`tests/integration/client-typescript/suites.json` which defines which
TypeScript test files correspond to each Python suite/setup combination.

The fact that we need exact "test_id"s (which are actually generated by
pytest) to be hardcoded inside the Typescript tests (so we hit the
recorded paths) is a big smell and it might become grating, but maybe
the benefit is worth it if we keep this test suite _small_ and targeted.

## Test Plan

Run with TypeScript tests enabled:
```bash
OPENAI_API_KEY=dummy RUN_CLIENT_TS_TESTS=1 \
  scripts/integration-tests.sh --stack-config server:ci-tests --suite responses --setup gpt
```
2025-11-19 10:07:53 -08:00
Anik
4e9633f7c3
feat: Make Safety API an optional dependency for meta-reference agents provider (#4169)
# What does this PR do?

Change Safety API from required to optional dependency, following the
established pattern used for other optional dependencies in Llama Stack.
    
The provider now starts successfully without Safety API configured.
Requests that explicitly include guardrails will receive a clear error
message when Safety API is unavailable.
    
This enables local development and testing without Safety API while
maintaining clear error messages when guardrail features are requested.
    
Closes #4165
    
Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

1. New unit tests added in
`tests/unit/providers/agents/meta_reference/test_safety_optional.py`

2. Integration tests performed with the files in
https://gist.github.com/anik120/c33cef497ec7085e1fe2164e0705b8d6

 (i) test with `test_integration_no_safety_fail.yaml`:
 
Config WITHOUT Safety API, should fail with helpful error since
`required_safety_api` is `true` by default
```
$ uv run llama stack run test_integration_no_safety_fail.yaml 2>&1 | grep -B 5 -A 15 "ValueError.*Safety\|Safety API is 
  required"
File "/Users/anbhatta/go/src/github.com/llamastack/llama-stack/src/llama_stack/providers/inline/agents/meta_reference
  /__init__.py", line 27, in get_provider_impl
      raise ValueError(
      ...<9 lines>...
      )
  ValueError: Safety API is required but not configured.

  To run without safety checks, explicitly set in your configuration:
    providers:
      agents:
        - provider_id: meta-reference
          provider_type: inline::meta-reference
          config:
            require_safety_api: false

  Warning: This disables all safety guardrails for this agents provider.
```

(ii) test with `test_integration_no_safety_works.yaml`

Config WITHOUT Safety API, **but** `require_safety_api=false` is
explicitly set, should succeed

```
$ uv run llama stack run test_integration_no_safety_works.yaml
 INFO     2025-11-16 09:49:10,044 llama_stack.cli.stack.run:169 cli: Using run configuration:                           
   
           /Users/anbhatta/go/src/github.com/llamastack/llama-stack/test_integration_no_safety_works.yaml                
   
  INFO     2025-11-16 09:49:10,052 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates:

             Key: None

             Cert: None

  .
  .
  .
  INFO     2025-11-16 09:49:38,528 llama_stack.core.stack:495 core: starting registry refresh task

  INFO     2025-11-16 09:49:38,534 uvicorn.error:62 uncategorized: Application startup complete.

  INFO     2025-11-16 09:49:38,535 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C
```


Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
2025-11-19 10:04:24 -08:00
Charlie Doern
d5cd0eea14
feat!: standardize base_url for inference (#4177)
# What does this PR do?

Completes #3732 by removing runtime URL transformations and requiring
users to provide full URLs in configuration. All providers now use
'base_url' consistently and respect the exact URL provided without
appending paths like /v1 or /openai/v1 at runtime.

BREAKING CHANGE: Users must update configs to include full URL paths
(e.g., http://localhost:11434/v1 instead of http://localhost:11434).

Closes #3732 

## Test Plan

Existing tests should pass even with the URL changes, due to default
URLs being altered.

Add unit test to enforce URL standardization across remote inference
providers (verifies all use 'base_url' field with HttpUrl | None type)

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-19 08:44:28 -08:00
Charlie Doern
91f1b352b4
chore: add storage sane defaults (#4182)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Python Package Build Test / build (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.13) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (push) Failing after 1m16s
Unit Tests / unit-tests (3.12) (push) Failing after 1m32s
UI Tests / ui-tests (22) (push) Successful in 1m38s
Unit Tests / unit-tests (3.13) (push) Failing after 1m42s
Pre-commit / pre-commit (push) Successful in 3m4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m8s
# What does this PR do?

since `StackRunConfig` requires certain parts of `StorageConfig`, it'd
probably make sense to template in some defaults that will "just work"
for most usecases

specifically introduce`ServerStoresConfig` defaults for inference,
metadata, conversations and prompts. We already actually funnel in
defaults for these sections ad-hoc throughout the codebase

additionally set some `backends` defaults for the `StorageConfig`.

This will alleviate some weirdness for `--providers` for run/list-deps
and also some work I have to better align our list-deps/run datatypes

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-18 15:22:26 -08:00
Ashwin Bharambe
bd5ad2963e
refactor(storage): make { kvstore, sqlstore } as llama stack "internal" APIs (#4181)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 7s
Test llama stack list-deps / show-single-provider (push) Successful in 28s
Test llama stack list-deps / list-deps-from-config (push) Successful in 33s
Test External API and Providers / test-external (venv) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
Test llama stack list-deps / list-deps (push) Failing after 34s
Test Llama Stack Build / build-single-provider (push) Successful in 46s
Test Llama Stack Build / build (push) Successful in 55s
UI Tests / ui-tests (22) (push) Successful in 1m17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m32s
Unit Tests / unit-tests (3.13) (push) Failing after 2m12s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m21s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m46s
Pre-commit / pre-commit (push) Successful in 3m7s
These primitives (used both by the Stack as well as provider
implementations) can be thought of fruitfully as internal-only APIs
which can themselves have multiple implementations. We use the new
`llama_stack_api.internal` namespace for this.

In addition: the change moves kv/sql store impls, configs, and
dependency helpers under `core/storage`

## Testing

`pytest tests/unit/utils/test_authorized_sqlstore.py`, other existing CI
2025-11-18 13:15:16 -08:00
Anastas Stoyanovsky
a3580e6bc0
feat!: Wire through parallel_tool_calls to Responses API (#4124)
# What does this PR do?
Initial PR against #4123
Adds `parallel_tool_calls` spec to Responses API and basic initial
implementation where no more than one function call is generated when
set to `False`.

## Test Plan
* Unit tests have been added to verify no more than one function call is
generated.
* A followup PR will verify passing through `parallel_tool_calls` to
providers.
* A followup PR will address verification and/or implementation of
incremental function calling across multiple conversational turns.

---------

Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>
2025-11-18 11:25:08 -08:00
raghotham
7093978754
chore(docs): Remove Llama 4 support details from README (#4178)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 11s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Test External API and Providers / test-external (venv) (push) Failing after 45s
UI Tests / ui-tests (22) (push) Successful in 48s
Vector IO Integration Tests / test-matrix (push) Failing after 1m6s
Unit Tests / unit-tests (3.13) (push) Failing after 1m28s
Unit Tests / unit-tests (3.12) (push) Failing after 1m29s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m58s
Pre-commit / pre-commit (push) Successful in 3m42s
2025-11-17 15:17:04 -08:00
Charlie Doern
29f1fa6abd
test(api): pre-commit check to ensure API does not import llama_stack (#4160)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 7s
Test llama stack list-deps / list-deps-from-config (push) Successful in 40s
Test Llama Stack Build / build-single-provider (push) Successful in 43s
Test llama stack list-deps / list-deps (push) Failing after 38s
Test llama stack list-deps / show-single-provider (push) Successful in 45s
Test External API and Providers / test-external (venv) (push) Failing after 45s
Test Llama Stack Build / build (push) Successful in 42s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
Python Package Build Test / build (3.13) (push) Failing after 1m0s
UI Tests / ui-tests (22) (push) Successful in 1m2s
Unit Tests / unit-tests (3.13) (push) Failing after 1m52s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m15s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m12s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m26s
Unit Tests / unit-tests (3.12) (push) Failing after 2m33s
Pre-commit / pre-commit (push) Successful in 3m40s
# What does this PR do?

since llama_stack_api is meant to be _just_ the API definitions of LLS,
we should have pre-commit check that prohibits anyone from accidentally
importing `from llama_stack` or adding `llama_stack` as a dependency
into `llama_stack_api`s pyproject.


## Test Plan

pre-commit should pass.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-17 13:23:43 -08:00
Ashwin Bharambe
7d3db6b22c
feat(openapi): generate stainless config "more" programmatically (#4164)
Generate the Stainless client config directly from code so we can
validate the config before we ever write the YAML.

This change enforces allowed HTTP verbs/paths, detects duplicate routes
across resources, and ensures README example endpoints exist and match
the OpenAPI spec. The generator now fails fast when config entries
drift, keeping the published config (hopefully) more current with the
spec. I think more validation can be done but this is a good start.
2025-11-17 12:48:03 -08:00
Theofanis Petkos
5fe6098350
docs: Improvements on provider_codegen for type hints and multi-line yaml descriptions (#4033)
# What does this PR do?

This PR improves type hint cleanup in auto-generated provider
documentation by adding regex logic.

**Issues Fixed:**
- Type hints with missing closing brackets (e.g., `list[str` instead of
`list[str]`)
- Types showing as `<class 'bool'>`, `<class 'str'>` instead of `bool`,
`str`
- The multi-line YAML frontmatter in index documentation files wasn't
ideal, so we now add the proper `|` character.

**Changes:**
1. Replaced string replacement (`.replace`) with regex-based type
cleaning to preserve the trailing bracket in case of `list` and `dict`.
2. Adds the `|` character for multi-line YAML descriptions.
3. I have regenerated the docs. However, let me know if that's not
needed.

## Test Plan

1. Ran uv run python scripts/provider_codegen.py - successfully
regenerated all docs
2. We can see that the updated docs handle correctly type hint cleanup
and multi-line yaml descriptions have now the `|` character.

### Note to the reviewer(s)

This is my first contribution to your lovely repo! Initially I was going
thourgh docs (wanted to use `remote::gemini` as provider) and realized
the issue. I've read the
[CONTRIBUTING.md](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md)
and decided to open the PR. Let me know if there's anything I did wrong
and I'll update my PR!

---------

Signed-off-by: thepetk <thepetk@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-17 12:35:28 -08:00
Omar Abdelwahab
fe91d331ef
fix: Remove authorization from provider data (#4161)
# What does this PR do?
- Remove backward compatibility for authorization in mcp_headers
- Enforce authorization must use dedicated parameter  
- Add validation error if Authorization found in provider_data headers
- Update test_mcp.py to use authorization parameter
- Update test_mcp_json_schema.py to use authorization parameter
- Update test_tools_with_schemas.py to use authorization parameter
- Update documentation to show the change in the authorization approach

Breaking Change:
- Authorization can no longer be passed via mcp_headers in provider_data
- Users must use the dedicated 'authorization' parameter instead
- Clear error message guides users to the new approach"

## Test Plan
CI

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-17 12:16:35 -08:00
Sébastien Han
0128effbf7
chore: remove pyyaml and starlette duplication in pyproject (#4172)
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-17 12:09:02 -08:00
Ashwin Bharambe
f648cacdad
fix(openapi): restore embedded request wrappers (#4176)
FastAPI generator now only unwraps body params explicitly marked with
Body(embed=False) so the /eval run_eval schema once again exposes
RunEvalRequest, matching our integration tests and the server's request
parsing.

Regenerated the OpenAPI specs to capture the restored wrapper.

CI on the Stainless preview builds should be green.
2025-11-17 11:36:23 -08:00
Yuan Tang
5ea1be69fe
chore: Remove myself from codeowners (#4175)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-11-17 09:28:41 -05:00
Sébastien Han
8bf4ee9ab9
fix: list-deps command (#4174)
# What does this PR do?

It was referencing strong_typing which was removed in
https://github.com/llamastack/llama-stack/pull/3944

## Test Plan

New CI build test.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-17 15:26:10 +01:00
Sébastien Han
97f535c4f1
feat(openapi): switch to fastapi-based generator (#3944)
Some checks failed
Pre-commit / pre-commit (push) Successful in 3m27s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / show-single-provider (push) Successful in 25s
Test External API and Providers / test-external (venv) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
Test Llama Stack Build / build (push) Successful in 37s
Test Llama Stack Build / build-single-provider (push) Successful in 48s
Test llama stack list-deps / list-deps-from-config (push) Successful in 52s
Test llama stack list-deps / list-deps (push) Failing after 52s
Python Package Build Test / build (3.13) (push) Failing after 1m2s
UI Tests / ui-tests (22) (push) Successful in 1m15s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 1m29s
Unit Tests / unit-tests (3.12) (push) Failing after 1m45s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m54s
Unit Tests / unit-tests (3.13) (push) Failing after 2m13s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
# What does this PR do?
This replaces the legacy "pyopenapi + strong_typing" pipeline with a
FastAPI-backed generator that has an explicit schema registry inside
`llama_stack_api`. The key changes:

1. **New generator architecture.** FastAPI now builds the OpenAPI schema
directly from the real routes, while helper modules
(`schema_collection`, `endpoints`, `schema_transforms`, etc.)
post-process the result. The old pyopenapi stack and its strong_typing
helpers are removed entirely, so we no longer rely on fragile AST
analysis or top-level import side effects.

2. **Schema registry in `llama_stack_api`.** `schema_utils.py` keeps a
`SchemaInfo` record for every `@json_schema_type`, `register_schema`,
and dynamically created request model. The OpenAPI generator and other
tooling query this registry instead of scanning the package tree,
producing deterministic names (e.g., `{MethodName}Request`), capturing
all optional/nullable fields, and making schema discovery testable. A
new unit test covers the registry behavior.

3. **Regenerated specs + CI alignment.** All docs/Stainless specs are
regenerated from the new pipeline, so optional/nullable fields now match
reality (expect the API Conformance workflow to report breaking
changes—this PR establishes the new baseline). The workflow itself is
back to the stock oasdiff invocation so future regressions surface
normally.

*Conformance will be RED on this PR; we choose to accept the
deviations.*

## Test Plan
- `uv run pytest tests/unit/server/test_schema_registry.py`
- `uv run python -m scripts.openapi_generator.main docs/static`

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-14 15:53:53 -08:00
Mike Sager
cc88789071
test: Restore responses unit tests (#4153)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps-from-config (push) Successful in 40s
Test Llama Stack Build / build-single-provider (push) Successful in 42s
Test llama stack list-deps / show-single-provider (push) Successful in 43s
Test llama stack list-deps / list-deps (push) Failing after 37s
Test Llama Stack Build / build (push) Successful in 40s
Vector IO Integration Tests / test-matrix (push) Failing after 47s
Test External API and Providers / test-external (venv) (push) Failing after 46s
Python Package Build Test / build (3.13) (push) Failing after 55s
UI Tests / ui-tests (22) (push) Successful in 1m2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m11s
Unit Tests / unit-tests (3.12) (push) Failing after 1m39s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 1m53s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m1s
Unit Tests / unit-tests (3.13) (push) Failing after 2m12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m46s
Pre-commit / pre-commit (push) Successful in 3m12s
# What does this PR do?
Restores the responses unit tests that were inadvertently deleted in PR
[#4055 ](https://github.com/llamastack/llama-stack/pull/4055)


## Test Plan
I ran the unit tests that I restored. They all passed with one
exception:


tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_reuse_mcp_tool_list

AttributeError: module 'llama_stack.providers.utils.tools' has no
attribute 'mcp'

It's coming from this line:

    @patch("llama_stack.providers.utils.tools.mcp.list_mcp_tools")

The mcp.py module (and \_\_init\_\_.py) exists under tools. There are
some 'from mcp ....' imports (mcp package in this case) within it that
python may be interpreting as circular imports (or maybe I'm overlooking
something).
2025-11-14 13:16:03 -08:00
slekkala1
f596f850bf
fix: Propagate the runtime error message to user (#4150)
# What does this PR do?
For Runtime Exception the error is not propagated to the user and can be
opaque.
Before fix:
`ERROR - Error processing message: Error code: 500 - {'detail':
'Internal server error: An unexpected error occurred.'}
`
After fix:
`[ERROR] Error code: 404 - {'detail': "Model
'claude-sonnet-4-5-20250929' not found. Use 'client.models.list()' to
list available Models."}
`

(Ran into this few times, while working with OCI + LLAMAStack and Sabre:
Agentic framework integrations with LLAMAStack)

## Test Plan
CI
2025-11-14 13:14:49 -08:00
Omar Abdelwahab
eb545034ab
fix: MCP authorization parameter implementation (#4052)
# What does this PR do?
Adding a user-facing `authorization ` parameter to MCP tool definitions
that allows users to explicitly configure credentials per MCP server,
addressing GitHub Issue #4034 in a secure manner.


## Test Plan
tests/integration/responses/test_mcp_authentication.py

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-14 08:54:42 -08:00
Sébastien Han
dc49ad3f89
chore: bump starlette version (#4158)
# What does this PR do?

Require at least 0.49.1 which fixes a security vulnerability in the
parsing logic of the Range header in FileResponse. Release note:
https://github.com/Kludex/starlette/releases/tag/0.49.1

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-14 08:47:37 -08:00
Charlie Doern
a078f089d9
fix: rename llama_stack_api dir (#4155)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test llama stack list-deps / generate-matrix (push) Successful in 29s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / list-deps-from-config (push) Successful in 32s
UI Tests / ui-tests (22) (push) Successful in 39s
Test Llama Stack Build / build (push) Successful in 39s
Test llama stack list-deps / show-single-provider (push) Successful in 46s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test llama stack list-deps / list-deps (push) Failing after 47s
Unit Tests / unit-tests (3.12) (push) Failing after 1m42s
Unit Tests / unit-tests (3.13) (push) Failing after 1m55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
Pre-commit / pre-commit (push) Successful in 5m17s
# What does this PR do?

the directory structure was src/llama-stack-api/llama_stack_api

instead it should just be src/llama_stack_api to match the other
packages.

update the structure and pyproject/linting config

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-13 15:04:36 -08:00
slekkala1
ba744d791a
fix: failure in responses during construct metrics (#4157)
# What does this PR do?
Without this we get below in server logs
```
RuntimeError: OpenAI response failed: InferenceRouter._construct_metrics() got an unexpected keyword argument  
         'model_id'          
```
Seems the method signature got update but this callsite was not updated
## Test Plan
CI and test with Sabre (Agent framework integration)
2025-11-13 14:21:03 -08:00
Francisco Arceo
a82b79ce57
fix: Error out when creating vector store with unknown embedding model (#4154)
# What does this PR do?
Error out when creating vector store with unknown embedding model

Closes https://github.com/llamastack/llama-stack/issues/4047

## Test Plan
Added tests

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-13 13:43:31 -08:00
Ashwin Bharambe
2441ca9389
fix(api): ensure openapi spec has deprecated routes (#4156)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test External API and Providers / test-external (venv) (push) Failing after 30s
Test llama stack list-deps / list-deps-from-config (push) Successful in 36s
Test Llama Stack Build / build-single-provider (push) Successful in 40s
Test llama stack list-deps / show-single-provider (push) Successful in 48s
Vector IO Integration Tests / test-matrix (push) Failing after 55s
Test Llama Stack Build / build (push) Successful in 48s
UI Tests / ui-tests (22) (push) Successful in 54s
Test llama stack list-deps / list-deps (push) Failing after 1m34s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m6s
Unit Tests / unit-tests (3.13) (push) Failing after 2m38s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m38s
Unit Tests / unit-tests (3.12) (push) Failing after 2m44s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m50s
Pre-commit / pre-commit (push) Successful in 3m51s
Deprecated doesn't mean it's "gone", it just means it is "going away" in
the next major version of the package.
2025-11-13 13:16:02 -08:00
Charlie Doern
840ad75fe9
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do?

Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.


see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests


Pre-cursor PRs to this one:

- #4093 
- #3954 
- #4064 

These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.


relates to #3237 

## Test Plan

Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-13 11:51:17 -08:00
Sébastien Han
ceb716b9a0
chore: set minimum pre-commit version (#4148)
# What does this PR do?

- force a min precommit version
- pin to >= 4.3.0 when installing

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-13 10:52:38 -08:00
Francisco Arceo
4442b24de7
chore: Fix docs so can be deployed (#4149)
# What does this PR do?
Building/Deploying docs is failing here:
5530320962 (step):8:49

Needs the playground file. Updated it to reflect current admin status.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-13 09:15:32 -08:00
Derek Higgins
aeaf4eb3dd
fix: remove_disabled_providers filtering models with None fields (#4132)
Fixed bug where models with No provider_model_id were incorrectly
filtered from the startup config display. The function was checking
multiple fields when it should only filter items with explicitly
disabled provider_id.

Changes:
o Modified remove_disabled_providers to only check provider_id field o
Changed condition from checking multiple fields with None to only
  checking provider_id for "__disabled__", None or empty string
o Added comprehensive unit tests

Closes: #4131

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-13 07:24:05 -08:00
Ashwin Bharambe
1e81056a22
feat(tests): enable MCP tests in server mode (#4146)
We would like to run all OpenAI compatibility tests using only the
openai-client library. This is most friendly for contributors since they
can run tests without needing to update the client-sdks (which is
getting easier but still a long pole.)

This is the first step in enabling that -- no using "library client" for
any of the Responses tests. This seems like a reasonable trade-off since
the usage of an embeddeble library client for Responses (or any
OpenAI-compatible) behavior seems to be not very common. To do this, we
needed to enable MCP tests (which only worked in library client mode)
for server mode.
2025-11-13 07:23:23 -08:00
Akram Ben Aissi
9eb81439d2
docs: Add comprehensive Files API and Vector Store integration doc (#3279)
docs: Add comprehensive Files API and Vector Store integration
documentation

- Add Files API documentation with OpenAI-compatible endpoints
- Create comprehensive guide for OpenAI-compatible file operations
- Reorganize documentation structure: move file operations to files/
directory
- Add vector store provider documentation for Milvus, SQLite-vec, FAISS
- Clean up redundant files and improve navigation
- Update cross-references and eliminate documentation duplication
- Support for release 0.2.14 FileResponse and Vector Store API features

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-11-13 08:50:06 -05:00
Ashwin Bharambe
fcf649b97a
feat(storage): share sql/kv instances and add upsert support (#4140)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test Llama Stack Build / build-single-provider (push) Successful in 31s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (push) Failing after 45s
Test Llama Stack Build / build (push) Successful in 47s
UI Tests / ui-tests (22) (push) Successful in 1m42s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m8s
Unit Tests / unit-tests (3.13) (push) Failing after 2m7s
Unit Tests / unit-tests (3.12) (push) Failing after 2m28s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m32s
Pre-commit / pre-commit (push) Successful in 3m20s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m33s
A few changes to the storage layer to ensure we reduce unnecessary
contention arising out of our design choices (and letting the database
layer do its correct thing):

- SQL stores now share a single `SqlAlchemySqlStoreImpl` per backend,
and `kvstore_impl` caches instances per `(backend, namespace)`. This
avoids spawning multiple SQLite connections for the same file, reducing
lock contention and aligning the cache story for all backends.

- Added an async upsert API (with SQLite/Postgres dialect inserts) and
routed it through `AuthorizedSqlStore`, then switched conversations and
responses to call it. Using native `ON CONFLICT DO UPDATE` eliminates
the insert-then-update retry window that previously caused long WAL lock
retries.

### Test Plan

Existing tests, added a unit test for `upsert()`
2025-11-12 12:14:26 -08:00
Ashwin Bharambe
492f79ca9b
fix: harden storage semantics (#4118)
Fixes issues in the storage system by guaranteeing immediate durability
for responses and ensuring background writers stay alive. Three related
fixes:

* Responses to the OpenAI-compatible API now write directly to
Postgres/SQLite inside the request instead of detouring through an async
queue that might never drain; this restores the expected
read-after-write behavior and removes the "response not found" races
reported by users.

* The access-control shim was stamping owner_principal/access_attributes
as SQL NULL, which Postgres interprets as non-public rows; fixing it to
use the empty-string/JSON-null pattern means conversations and responses
stored without an authenticated user stay queryable (matching SQLite).

* The inference-store queue remains for batching, but its worker tasks
now start lazily on the live event loop so server startup doesn't cancel
them—writes keep flowing even when the stack is launched via llama stack
run.

Closes #4115 

### Test Plan

Added a matrix entry to test our "base" suite against Postgres as the
store.
2025-11-12 10:35:39 -08:00
Derek Higgins
356f37b1ba
docs: clarify model identification uses provider_model_id not model_id (#4128)
Updated documentation to accurately reflect current behavior where
models are identified as provider_id/provider_model_id in the system.

Changes:
o Clarify that model_id is for configuration purposes only o Explain
models are accessed as provider_id/provider_model_id o Remove outdated
aliasing example that suggested model_id could be used
  as a custom identifier

This corrects the documentation which previously suggested model_id
could be used to create friendly aliases, which is not how the code
actually works.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-12 10:13:26 -08:00
Ken Dreyer
94e977c257
fix(docs): link to test replay-record docs for discoverability (#4134)
Help users find the comprehensive integration testing docs by linking to
the record-replay documentation. This clarifies that the technical
README complements the main docs.
2025-11-12 10:04:56 -08:00
Francisco Arceo
eb3f9ac278
feat: allow returning embeddings and metadata from /vector_stores/ methods; disallow changing Provider ID (#4046)
# What does this PR do?

- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
    -  Updates the UI accordingly to display them.

- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.

- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)

```python
In  [1]: client.vector_stores.files.content(
    vector_store_id=vector_store.id, 
    file_id=file.id, 
    extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```

Screenshots of UI are displayed below:

### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>

### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>

### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>


### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>

### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Tests added for Middleware extension and Provider failures.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-12 09:59:48 -08:00
Charlie Doern
37853ca558
fix(tests): add OpenAI client connection cleanup to prevent CI hangs (#4119)
# What does this PR do?

Add explicit connection cleanup and shorter timeouts to OpenAI client
fixtures. Fixes CI deadlock after 25+ tests due to connection pool
exhaustion. Also adds 60s timeout to test_conversation_context_loading
as safety net.

## Test Plan

tests pass

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-12 12:17:13 -05:00
Sam El-Borai
63137f9af1
chore(stainless): add config for file header (#4126)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

This PR adds Stainless config to specify the Meta copyright file header
for generated files.

Doing it via config instead of custom code will reduce the probability
of git conflict.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- review preview builds
2025-11-12 11:39:21 -05:00
Akshay Ghodake
539b9c08f3
chore(deps): update pypdf to fix DoS vulnerabilities (#4121)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test llama stack list-deps / show-single-provider (push) Successful in 50s
Test Llama Stack Build / build-single-provider (push) Successful in 53s
UI Tests / ui-tests (22) (push) Successful in 53s
Test Llama Stack Build / build (push) Successful in 52s
Test llama stack list-deps / list-deps-from-config (push) Successful in 1m18s
Test External API and Providers / test-external (venv) (push) Failing after 1m19s
Test llama stack list-deps / list-deps (push) Failing after 1m1s
Vector IO Integration Tests / test-matrix (push) Failing after 1m44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m53s
Unit Tests / unit-tests (3.12) (push) Failing after 2m6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3m7s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m8s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m30s
Pre-commit / pre-commit (push) Successful in 4m1s
Update pypdf dependency to address vulnerabilities causing potential
denial of service through infinite loops or excessive memory usage when
handling malicious PDFs. The update remains fully backward compatible,
with no changes to the PdfReader API.


# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Fixes #4120

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-11-12 10:24:19 +01:00
Charlie Doern
6ca2a67a9f
chore: remove dead code (#4125)
# What does this PR do?

build_image is not used because `llama stack build` is gone. Remove it.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-12 10:09:14 +01:00
ehhuang
71b328fc4b
chore(ui): add npm package and dockerfile (#4100)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 53s
# What does this PR do?
- sets up package.json for npm `llama-stack-ui` package (will update
llama-stack-ops)
- adds dockerfile for UI docker image

## Test Plan
npx:
npm build && npm pack
LLAMA_STACK_UI_PORT=8322 npx
/Users/erichuang/projects/ui/src/llama_stack_ui/llama-stack-ui-0.4.0-alpha.2.tgz

docker:
cd src/llama_stack_ui
docker build . -f Dockerfile  --tag test_ui --no-cache

❯ docker run -p 8322:8322 \
      -e LLAMA_STACK_UI_PORT=8322 \
      test_ui:latest
2025-11-11 10:40:31 -08:00
paulengineer
e5a55f3677
docs: use 'uv pip' to avoid pitfalls of using 'pip' in virtual environment (#4122)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
UI Tests / ui-tests (22) (push) Successful in 53s
# What does this PR do?
In the **Detailed Tutorial**, at **Step 3**, the **Install with venv**
option creates a new virtual environment `client`, activates it then
attempts to install the llama-stack-client using pip.
```
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client    <- this is the problematic line
```
However, the pip command will likely fail because the `uv venv` command
doesn't, by default, include adding the pip command to the virtual
environment that is created. The pip command will error either because
pip doesn't exist at all, or, if the pip command does exist outside of
the virtual environment, return a different error message. The latter
may be unclear to the user why it is failing.

This PR changes 'pip' to 'uv pip', allowing the install action to
function in the virtual environment as intended, and without the need
for pip to be installed.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
1. Use linux or WSL (virtual environments on Windows use `Scripts`
folder instead of `bin` [virtualenv
#993ba13](993ba1316a)
which doesn't align with the tutorial)
2. Clone the `llama-stack` repo
3. Run the following and verify success:
```
uv venv client --python 3.12
source client/bin/activate
```
5. Run the updated command:
```
uv pip install llama-stack-client
```
6. Observe the console output confirms that the virtual environment
`client` was used:

> Using Python 3.12.3 environment at: **client**
2025-11-11 07:49:03 -05:00
Nathan Weinberg
97ccfb5e62
refactor: inspect routes now shows all non-deprecated APIs (#4116)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 21s
UI Tests / ui-tests (22) (push) Successful in 46s
# What does this PR do?
the inspect API lacked any mechanism to get all
non-deprecated APIs (v1, v1alpha, v1beta)
change default to this behavior

'v1' filter can be used for user' wanting a list
of stable APIs

## Test Plan
1. pull the PR
2. launch a LLS server
3. run `curl http://beanlab3.bss.redhat.com:8321/v1/inspect/routes`
4. note there are APIs for `v1`, `v1alpha`, and `v1beta` but no
deprecated APIs

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-10 15:57:17 -08:00
Charlie Doern
43adc23ef6
refactor: remove dead inference API code and clean up imports (#4093)
# What does this PR do?

Delete ~2,000 lines of dead code from the old bespoke inference API that
was replaced by OpenAI-only API. This includes removing unused type
conversion functions, dead provider methods, and event_logger.py.

Clean up imports across the codebase to remove references to deleted
types. This eliminates unnecessary
code and dependencies, helping isolate the API package as a
self-contained module.

This is the last interdependency between the .api package and "exterior"
packages, meaning that now every other package in llama stack imports
the API, not the other way around.

## Test Plan

this is a structural change, no tests needed.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-10 15:29:24 -08:00
Shabana Baig
433438cfc0
feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062)
# Problem
Responses API uses max_tool_calls parameter to limit the number of tool
calls that can be generated in a response. Currently, LLS implementation
of the Responses API does not support this parameter.

# What does this PR do?
This pull request adds the max_tool_calls field to the response object
definition and updates the inline provider. it also ensures that:

- the total number of calls to built-in and mcp tools do not exceed
max_tool_calls
- an error is thrown if max_tool_calls < 1 (behavior seen with the
OpenAI Responses API, but we can change this if needed)

Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563)

## Test Plan
- Tested manually for change in model response w.r.t supplied
max_tool_calls field.
- Added integration tests to test invalid max_tool_calls parameter.
- Added integration tests to check max_tool_calls parameter with
built-in and function tools.
- Added integration tests to check max_tool_calls parameter in the
returned response object.
- Recorded OpenAI Responses API behavior using a sample script:
https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-10 13:21:27 -08:00
Dennis Kennetz
209a78b618
feat: add oci genai service as chat inference provider (#3876)
# What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.

## Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following
steps:

1. Ensure you have IAM policies in place to use service (check docs
included in this PR)
2. For local development, [setup OCI
cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm)
and configure the CLI with your region, tenancy, and auth
[here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm)
3. Once configured, go through llama-stack setup and run llama-stack
(uses config based auth) like:
```bash
OCI_AUTH_TYPE=config_file \
OCI_CLI_PROFILE=CHICAGO \
OCI_REGION=us-chicago-1 \
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \
llama stack run oci
```
4. Hit the `models` endpoint to list models after server is running:
```bash
curl http://localhost:8321/v1/models | jq
...
{
      "identifier": "meta.llama-4-scout-17b-16e-instruct",
      "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
      "provider_id": "oci",
      "type": "model",
      "metadata": {
        "display_name": "meta.llama-4-scout-17b-16e-instruct",
        "capabilities": [
          "CHAT"
        ],
        "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
      },
      "model_type": "llm"
},
   ...
```
5. Use the "display_name" field to use the model in a
`/chat/completions` request:
```bash
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": true,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'

# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": false,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'
```
6. Try out other models from the `/models` endpoint.
2025-11-10 16:16:24 -05:00
Ashwin Bharambe
fadf17daf3
feat(api)!: deprecate register/unregister resource APIs (#4099)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Pre-commit / pre-commit (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 1m10s
Mark all register_* / unregister_* APIs as deprecated across models,
shields, tool groups, datasets, benchmarks, and scoring functions. This
is the first step toward moving resource mutations to an `/admin`
namespace as outlined in
https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585.

The deprecation flag will be reflected in the OpenAPI schema to warn API
users that these endpoints are being phased out. Next step will be
implementing the `/admin` route namespace for these resource management
operations.

- `register_model` / `unregister_model`
- `register_shield` / `unregister_shield`
- `register_tool_group` / `unregister_toolgroup`
- `register_dataset` / `unregister_dataset`
- `register_benchmark` / `unregister_benchmark`
- `register_scoring_function` / `unregister_scoring_function`
2025-11-10 10:36:33 -08:00
ehhuang
d4ecbfd092
fix(vector store)!: fix file content API (#4105)
# What does this PR do?
- changed to match
https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml

## Test Plan
updated test CI
2025-11-10 10:16:35 -08:00
Vaishnavi Hire
4341c4c2ac
docs: Add Llama Stack Operator docs (#3983)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Add documentation for llama-stack-k8s-operator under kubernetes
deployment guide.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
2025-11-10 15:29:15 +01:00
Juan Pérez de Algaba
6147321083
fix: Vector store persistence across server restarts (#3977)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Integration Tests (Replay) / generate-matrix (push) Successful in 21s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Pre-commit / pre-commit (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 30s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
UI Tests / ui-tests (22) (push) Successful in 1m10s
# What does this PR do?

This PR fixes a bug in LlamaStack 0.3.0 where vector stores created via
the OpenAI-compatible API (`POST /v1/vector_stores`) would fail with
`VectorStoreNotFoundError` after server restart when attempting
operations like `vector_io.insert()` or `vector_io.query()`.

The bug affected **6 vector IO providers**: `pgvector`, `sqlite_vec`,
`chroma`, `milvus`, `qdrant`, and `weaviate`.

Created with the assistance of: claude-4.5-sonnet

## Root Cause

All affected providers had a broken
`_get_and_cache_vector_store_index()` method that:
1. Did not load existing vector stores from persistent storage during
initialization
2. Attempted to use `vector_store_table` (which was either `None` or a
`KVStore` without the required `get_vector_store()` method)
3. Could not reload vector stores after server restart or cache miss

## Solution

This PR implements a consistent pattern across all 6 providers:

1. **Load vector stores during initialization** - Pre-populate the cache
from KV store on startup
2. **Fix lazy loading** - Modified `_get_and_cache_vector_store_index()`
to load directly from KV store instead of relying on
`vector_store_table`
3. **Remove broken dependency** - Eliminated reliance on the
`vector_store_table` pattern

## Testing steps

### 1.1 Configure the stack

Create or use an existing configuration with a vector IO provider.

**Example `run.yaml`:**

```yaml
vector_io_store:
  - provider_id: pgvector
    provider_type: remote::pgvector
    config:
      host: localhost
      port: 5432
      db: llamastack
      user: llamastack
      password: llamastack

inference:
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config:
      model: sentence-transformers/all-MiniLM-L6-v2
```

### 1.2 Start the server

```bash
llama stack run run.yaml --port 5000
```

Wait for the server to fully start. You should see:

```
INFO: Started server process
INFO: Application startup complete
```

---

## Step 2: Create a Vector Store

### 2.1 Create via API

```bash
curl -X POST http://localhost:5000/v1/vector_stores \
  -H "Content-Type: application/json" \
  -d '{
    "name": "test-persistence-store",
    "extra_body": {
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "embedding_dimension": 384,
      "provider_id": "pgvector"
    }
  }' | jq
```

### 2.2 Expected Response

```json
{
  "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
  "object": "vector_store",
  "name": "test-persistence-store",
  "status": "completed",
  "created_at": 1730304000,
  "file_counts": {
    "total": 0,
    "completed": 0,
    "in_progress": 0,
    "failed": 0,
    "cancelled": 0
  },
  "usage_bytes": 0
}
```

**Save the `id` field** (e.g.,
`vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d`) — you’ll need it for the next
steps.

---

## Step 3: Insert Data (Before Restart)

### 3.1 Insert chunks into the vector store

```bash
export VS_ID="vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"

curl -X POST http://localhost:5000/vector-io/insert \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"chunks\": [
      {
        \"content\": \"Python is a high-level programming language known for its readability.\",
        \"metadata\": {\"source\": \"doc1\", \"page\": 1}
      },
      {
        \"content\": \"Machine learning enables computers to learn from data without explicit programming.\",
        \"metadata\": {\"source\": \"doc2\", \"page\": 1}
      },
      {
        \"content\": \"Neural networks are inspired by biological neurons in the brain.\",
        \"metadata\": {\"source\": \"doc3\", \"page\": 1}
      }
    ]
  }"
```

### 3.2 Expected Response

Status: **200 OK**  
Response: *Empty or success confirmation*

---

## Step 4: Query Data (Before Restart – Baseline)

### 4.1 Query the vector store

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"What is machine learning?\"
  }" | jq
```

### 4.2 Expected Response

```json
{
  "chunks": [
    {
      "content": "Machine learning enables computers to learn from data without explicit programming.",
      "metadata": {"source": "doc2", "page": 1}
    },
    {
      "content": "Neural networks are inspired by biological neurons in the brain.",
      "metadata": {"source": "doc3", "page": 1}
    }
  ],
  "scores": [0.85, 0.72]
}
```

**Checkpoint:** Works correctly before restart.

---

## Step 5: Restart the Server (Critical Test)

### 5.1 Stop the server

In the terminal where it’s running:

```
Ctrl + C
```

Wait for:

```
Shutting down...
```

### 5.2 Restart the server

```bash
llama stack run run.yaml --port 5000
```

Wait for:

```
INFO: Started server process
INFO: Application startup complete
```

The vector store cache is now empty, but data should persist.

---

## Step 6: Verify Vector Store Exists (After Restart)

### 6.1 List vector stores

```bash
curl http://localhost:5000/v1/vector_stores | jq
```

### 6.2 Expected Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
      "name": "test-persistence-store",
      "status": "completed"
    }
  ]
}
```

**Checkpoint:** Vector store should be listed.

---

## Step 7: Insert Data (After Restart – THE BUG TEST)

### 7.1 Insert new chunks

```bash
curl -X POST http://localhost:5000/vector-io/insert \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"chunks\": [
      {
        \"content\": \"This chunk was inserted AFTER the server restart.\",
        \"metadata\": {\"source\": \"post-restart\", \"test\": true}
      }
    ]
  }"
```

### 7.2 Expected Results

**With Fix (Correct):**
```
Status: 200 OK
Response: Success
```

 **Without Fix (Bug):**
```json
{
  "detail": "VectorStoreNotFoundError: Vector Store 'vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d' not found."
}
```

 **Critical Test:** If insertion succeeds, the fix works.

---

## Step 8: Query Data (After Restart – Verification)

### 8.1 Query all data

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"restart\"
  }" | jq
```

### 8.2 Expected Response

```json
{
  "chunks": [
    {
      "content": "This chunk was inserted AFTER the server restart.",
      "metadata": {"source": "post-restart", "test": true}
    }
  ],
  "scores": [0.95]
}
```

**Checkpoint:** Both old and new data are queryable.

---

## Step 9: Multiple Restart Test (Extra Verification)

### 9.1 Restart again

```bash
Ctrl + C
llama stack run run.yaml --port 5000
```

### 9.2 Query after restart

```bash
curl -X POST http://localhost:5000/vector-io/query \
  -H "Content-Type: application/json" \
  -d "{
    \"vector_store_id\": \"$VS_ID\",
    \"query\": \"programming\"
  }" | jq
```

**Expected:** Works correctly across multiple restarts.

---------

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-11-09 00:05:00 -05:00
Sam El-Borai
8f4c431370
chore(ci): setup automated stainless builds (#3557)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / generate-matrix (push) Successful in 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 9s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Pre-commit / pre-commit (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 1m7s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

This pull request adds a new workflow that does 2 things:

1. generate [SDK preview
builds](https://www.stainless.com/docs/guides/automate-updates#set-up-automatic-preview-builds)
whenever the OpenAPI spec file is modified in a PR
2. on PR merge, generate SDK builds that will be pushed to the different
SDK repos (i.e start the release process)

> [!NOTE]
> No repo secret `STAINLESS_API_KEY` is needed, the authentication is
done automatically via GitHub OIDC.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

I tested in my fork: https://github.com/stainless-api/llama-stack/pull/3
2025-11-07 12:15:26 -08:00
Ashwin Bharambe
aa2bd82b1d
fix(ci): add recordings for responses suite due to web search type changing (#4104)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 1m3s
#4103 broke (even though the PR itself was green) trunk
2025-11-07 10:42:07 -08:00
Aakanksha Duggal
b83184f7ef
feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103)
# What does this PR do?
Resolves #4102 

1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and
the `OpenAIResponseInputToolWebSearch.type` Literal union
2. No changes needed to tool execution logic - all `web_search` types
map to the same underlying tool
3. Backward compatibility is maintained - existing `web_search`,
`web_search_preview`, and `web_search_preview_2025_03_11` types continue
to work
4. Added an integration test case using {"type":
"web_search_2025_08_26"} to verify it works correctly
5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to
reflect that `web_search_2025_08_26` is now supported.
6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist
in the codebase)


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-07 10:01:12 -08:00
Ashwin Bharambe
f49cb0b717
chore: Stack server no longer depends on llama-stack-client (#4094)
This dependency has been bothering folks for a long time (cc @leseb). We
really needed it due to "library client" which is primarily used for our
tests and is not a part of the Stack server. Anyone who needs to use the
library client can certainly install `llama-stack-client` in their
environment to make that work.

Updated the notebook references to install `llama-stack-client`
additionally when setting things up.
2025-11-07 09:54:09 -08:00
Lê Nam Khánh
68c976a2d8
docs: fix typos in some files (#4101)
This PR fixes typos in the file file using codespell.
2025-11-07 16:07:46 +01:00
Ashwin Bharambe
b68a25d377
fix(tests): bring back some responses tests (#4098)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 1m6s
https://github.com/llamastack/llama-stack/pull/4055 cleaned the agents
implementation but while doing so it removed some tests which actually
corresponded to the responses implementation. This PR brings those tests
and assocated recordings back.

(We should likely combine all responses tests into one suite, but that
is beyond the scope of this PR.)
2025-11-07 07:49:38 +01:00
Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00
Ashwin Bharambe
a2c4c12384
chore(ui): remove the Streamlit UI (#4097) 2025-11-06 15:51:57 -08:00
Sébastien Han
939a2db58f
chore: update stainless config (#4096)
# What does this PR do?

Removed in https://github.com/llamastack/llama-stack/pull/4067

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-06 15:58:13 -05:00
Charlie Doern
9df073450f
feat: remove core.telemetry as a dependency of llama_stack.apis (#4064)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 55s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
# What does this PR do?

Remove circular dependency by moving tracing from API protocol
definitions
 to router implementation layer.

This gets us closer to having a self contained API package with no other
cross-cutting dependencies to other parts of the llama stack codebase.
To the best of our ability, the llama_stack.api should only be type and
protocol definitions.

  Changes:
- Create apis/common/tracing.py with marker decorator (zero core
dependencies)
- Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol
classes
- Apply actual tracing in core/resolver.py in `instantiate_provider`
based on protocol marker
- Move MetricResponseMixin from core to apis (it's an API response type)
  - APIs package is now self-contained with zero core dependencies

The tracing functionality remains identical - actual trace_protocol from
core
is applied to router implementations at runtime when both telemetry is
enabled
  and the protocol has the `__marked_for_tracing__` marker.

  ## Test Plan

  Manual integration test confirms identical behavior to main branch:

  ```bash
  llama stack list-deps --format uv starter | sh
  export OLLAMA_URL=http://localhost:11434
  llama stack run starter

  curl -X POST http://localhost:8321/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "ollama/gpt-oss:20b",
         "messages": [{"role": "user", "content": "Say hello"}],
         "max_tokens": 10}'
         
```

  Verified identical between main and this branch:
  - trace_id present in response
  - metrics array with prompt_tokens, completion_tokens, total_tokens
  - Server logs show trace_protocol applied to all routers

  Existing telemetry integration tests (tests/integration/telemetry/) validate
  trace context propagation and span attributes.


relates to #3895

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-06 10:58:30 -08:00
Derek Higgins
dc9497a3b2
ci: Temperarily disable Telemetry during tests (#4090)
Closes: #4089

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 17:53:02 +01:00
Derek Higgins
03d23db910
ci: vllm ci job update (#4088)
Add missing recording for vllm in library mode
Add Docker env (missed during rebase)

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 16:59:55 +01:00
Derek Higgins
c62a09ab76
ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 22s
UI Tests / ui-tests (22) (push) Successful in 57s
o Introduces vLLM provider support to the record/replay testing
framework
o Enabling both recording and replay of vLLM API interactions alongside
existing Ollama support.

The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API
surface
including vision features.

--
This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B
appears to be more capable at structure output and tool calls.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-06 10:36:40 +01:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
ehhuang
b335419faa
fix: actualize chunking strategy in vector store create API (#4086)
# What does this PR do?

- when create vector store is called without chunk strategy, we actually
the strategy used so that the value is persisted instead of
strategy='None'

## Test Plan
updated tests
2025-11-05 15:47:54 -08:00
Roy Belio
c672a5d792
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do?

The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.

This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.

**Closes: #2619**  
**Supersedes: #2851** (rebased and updated)

## Changes Made

1. **Added PostgreSQL support to starter distribution**
   - New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
   - Removed separate `postgres-demo` distribution

2. **Updated to new build system**
   - Integrated postgres switching logic into Containerfile entrypoint
   - Uses new `storage_backends` and `storage_stores` API
   - Properly configured both PostgreSQL KV store and SQL store

3. **Updated dependencies**
   - Added `psycopg2-binary` and `asyncpg` to starter distribution
   - All postgres-related dependencies automatically included

## How to Use

### With Docker (PostgreSQL):
```bash
docker run \
  -e ENABLE_POSTGRES_STORE=1 \
  -e POSTGRES_HOST=your_postgres_host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=llamastack \
  -e POSTGRES_USER=llamastack \
  -e POSTGRES_PASSWORD=llamastack \
  -e OPENAI_API_KEY=your_key \
  llamastack/distribution-starter
```

### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)

## Test Plan

All pre-commit hooks pass (mypy, ruff, distro-codegen)  
`llama stack list-deps starter` confirms psycopg2-binary is included  
Storage configuration correctly uses PostgreSQL backends  
Container builds successfully with postgres support  

## Credits

Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Sébastien Han @leseb

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-11-05 15:37:06 -08:00
ehhuang
9d5c34af27
fix!: BREAKING CHANGE: vector_store: search API response fix (#4080)
# What does this PR do?
- search_query in the vector store search API should be a list,
according to https://github.com/openai/openai-openapi


## Test Plan
modified tests


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080).
* #4086
* __->__ #4080
2025-11-05 15:01:48 -08:00
ehhuang
84a84ee85c
fix: last_id when listing files in vector store (#4079)
# What does this PR do?
the last_id should be the id of the last item in the returned list, not
the unfiltered list.

## Test Plan
fixed test
2025-11-05 14:10:10 -08:00
Ashwin Bharambe
d9cf5cd480
fix(ci): use --no-cache instead of --no-cache-dir (#4081)
This is necessary to make sure GPU dockers can be built on CI without
running out of space.
2025-11-05 12:14:02 -08:00
Charlie Doern
c899b50723
fix: print help for list-deps if no args (#4078)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 57s
# What does this PR do?

list-deps takes  positional args OR things like --providers

the issue with this, is that these args need to be optional since by
nature, one or the other can be specified.

add a check to list-deps that checks `if not args.providers and not
args.config`. If this is true, help is printed and we exit.

resolves #4075

## Test Plan
before:

```
╰─ llama stack list-deps
Traceback (most recent call last):
  File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
    parser.run(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run
    args.func(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command
    return run_stack_list_deps_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command
    normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config)
                                                                                          ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value

```

after:

```
╰─ llama stack list-deps
usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config | distro]

list the dependencies for a llama stack distribution

positional arguments:
  config | distro       Path to config file to use or name of known distro (llama stack list for a list). (default: None)

options:
  -h, --help            show this help message and exit
  --providers PROVIDERS
                        sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple
                        providers per API. (default: None)
  --format {uv,deps-only}
                        Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only)
 ```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-05 11:34:08 -08:00
Wojciech-Rebisz
07c28cd519
fix: Avoid model_limits KeyError (#4060)
# What does this PR do?
It avoids model_limit KeyError while trying to get embedding models for
Watsonx

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/llamastack/llama-stack/issues/4059

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Start server with watsonx distro:
```bash
llama stack list-deps watsonx | xargs -L1 uv pip install
uv run llama stack run watsonx
```
Run 
```python
client = LlamaStackClient(base_url=base_url)
client.models.list()
```
Check if there is any embedding model available (currently there is not
a single one)
2025-11-05 10:34:40 -08:00
Emilio Garcia
ba50790a28
feat(tests): metrics tests (#3966)
# What does this PR do?
1. Make telemetry tests as easy as possible for users by expanding the
`SpanStub` data class and creating the `MetricStub` dataclass as a way
to consistently marshal telemetry data in test fixtures and unmarshal
and handle it in tests.
2. Structure server and client tests to always follow the same standards
for consistent testing experience by using the `SpanStub` and
`MetricStub` data class objects.
3. Enable Metrics Testing for completions endpoint
4. Correct token metrics to use histograms instead of counts to capture
tokens per request rather than a cumulative count of tokens over the
lifecycle of the server.

## Test Plan
These are tests
2025-11-05 10:26:15 -08:00
Roy Belio
2619f3552e
fix: show built-in distributions in llama stack list (#4040)
# What does this PR do?
Fixes issue #3922 where `llama stack list` only showed distributions
after they were run. This PR makes the command show all available
distributions immediately on a fresh install.

Closes #3922

## Changes
- **Updated `_get_distribution_dirs()`** to discover both built-in and
built distributions:
- Built-in distributions from `src/llama_stack/distributions/` (e.g.,
starter, nvidia, dell)
  - Built distributions from `~/.llama/distributions`
- **Added a "Source" column** to distinguish between "built-in" and
"built" distributions
- **Built distributions override built-in ones** with the same name
(expected behavior)
- **Updated config file detection logic** to handle both naming
conventions:
  - Built-in: `build.yaml` and `run.yaml`
  - Built: `{name}-build.yaml` and `{name}-run.yaml`

## Test Plan
### Unit Tests
Added comprehensive unit tests in
`tests/unit/distribution/test_stack_list.py`:
```bash
uv run pytest tests/unit/distribution/test_stack_list.py -v
```
**Result**:  All 8 tests pass
- `test_builtin_distros_shown_without_running` - Verifies the core fix
for issue #3922
- `test_builtin_and_built_distros_shown_together` - Ensures both types
are shown
- `test_built_distribution_overrides_builtin` - Tests override behavior
- `test_empty_distributions` - Edge case handling
- `test_config_files_detection_builtin` - Config file detection for
built-in distros
- `test_config_files_detection_built` - Config file detection for built
distros
- `test_llamastack_prefix_stripped` - Name normalization
- `test_hidden_directories_ignored` - Filters hidden directories

### Manual Testing
**Before the fix** (simulated with empty `~/.llama/distributions`):
```bash
$ llama stack list
No stacks found in ~/.llama/distributions
```

**After the fix**:
```bash
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ci-tests          │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ dell              │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ meta-reference-g… │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ nvidia            │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ open-benchmark    │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ postgres-demo     │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ starter-gpu       │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
│ watsonx           │ built-in │ /path/to/src/...  │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```

**After running a distribution**:
```bash
$ llama stack run starter  # Creates ~/.llama/distributions/starter
$ llama stack list
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Stack Name        ┃ Source   ┃ Path              ┃ Build Config ┃ Run Config ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
│ starter           │ built    │ ~/.llama/distri…  │ No           │ No         │
│ ...               │ built-in │ ...               │ Yes          │ Yes        │
└───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘
```
Note how `starter` now shows as "built" and points to
`~/.llama/distributions`, overriding the built-in version.

## Breaking Changes
**No breaking changes** - This is a bug fix that improves user
experience with minimal risk:
- No programmatic parsing of output found in the codebase
- Table format is clearly for human consumption
- The new "Source" column helps users understand where distributions
come from
- The behavior change is exactly what users expect (seeing all available
distributions)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 10:16:28 -08:00
Ashwin Bharambe
4d3069bfa5
chore(ci): remove unused recordings (#4074)
Added a script to cleanup recordings. While doing this, moved the CI
matrix generation to a separate script so there is a single source of
truth for the matrix.

Ran the cleanup script as:
```
PYTHONPATH=. python scripts/cleanup_recordings.py
```

Also added this as part of the pre-commit workflow to ensure that the
recordings are always up to date and that no stale recordings are left
in the repo.
2025-11-05 09:21:58 -08:00
Sébastien Han
fd1603beef
chore: remove unused classes (#4077)
# What does this PR do?

These were maybe be included in the webmethod?
The unit test was pointless too since the request was never used
anywhere?

This shouldn't be in the API definition, if we never consume it.

## Test Plan

CI with pre-commit on OpenAPI spec generation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-05 16:45:23 +01:00
Ashwin Bharambe
392e01dc79 chore: add stainless config
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 1m13s
name it to indicate it is not yet source of truth to avoid confusion
2025-11-04 15:44:07 -08:00
ehhuang
95b0493fae
chore: move src/llama_stack/ui to src/llama_stack_ui (#4068)
# What does this PR do?
This better separates UI from backend code, which was a point of
confusion often for our beloved AI friends.


## Test Plan
CI
2025-11-04 15:21:49 -08:00
Ashwin Bharambe
5850e3473f fix: remove straggler openapi HTML file 2025-11-04 14:54:33 -08:00
Ashwin Bharambe
0c49a53c97
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying
the file-search tool. The backend implementation remains unchanged. This
PR merely removes the directly exposed API surface which allowed users
to directly perform searches from the client.

This facility is now available via the `client.vector_store.search()`
OpenAI compatible API.
2025-11-04 14:50:54 -08:00
Ashwin Bharambe
a8a8aa56c0
chore!: remove the agents (sessions and turns) API (#4055)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m10s
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed

The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.

Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
2025-11-04 09:38:39 -08:00
Mustafa Elbehery
a6ddbae0ed
chore(test): migrate unit tests from unittest to pytest nvidia test eval (#3249)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 1m16s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR migrates `unittest` to `pytest` in
`tests/unit/providers/nvidia/test_eval.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Part of https://github.com/llamastack/llama-stack/issues/2680

Supersedes https://github.com/llamastack/llama-stack/pull/2791

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-11-04 10:29:07 +01:00
Ashwin Bharambe
053fc0ac39
chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 1m13s
This PR removes all routes which we had marked deprecated for the 0.3.0
release.

This includes:
- all the `/v1/openai/v1/` routes (the corresponding /v1 routes still
exist of course)
- the /agents API (which is superseded completely by Responses +
Conversations)
- several alpha routes which had a "v1" route to aide transitioning to
"v1alpha"

This is the corresponding client-python change:
https://github.com/llamastack/llama-stack-client-python/pull/294
2025-11-03 19:00:59 -08:00
Nathan Weinberg
62b3ad349a
fix: return to hardcoded model IDs for Vertex AI (#4041)
# What does this PR do?
partial revert of b67aef2

Vertex AI doesn't offer an endpoint for listing models from Google's
Model Garden

Return to hardcoded values until such an endpoint is available

Closes #3988 

## Test Plan
Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`,
`VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the
starter distribution
```bash
$ llama stack list-deps starter | xargs -L1 uv pip install
$ llama stack run starter
```

Client side, formerly broken cURL requests now working
```bash
$ curl http://127.0.0.1:8321/v1/models | jq '.data | map(select(.provider_id == "vertexai"))'
[
  {
    "identifier": "vertexai/vertex_ai/gemini-2.0-flash",
    "provider_resource_id": "vertex_ai/gemini-2.0-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-flash",
    "provider_resource_id": "vertex_ai/gemini-2.5-flash",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  },
  {
    "identifier": "vertexai/vertex_ai/gemini-2.5-pro",
    "provider_resource_id": "vertex_ai/gemini-2.5-pro",
    "provider_id": "vertexai",
    "type": "model",
    "metadata": {},
    "model_type": "llm"
  }
]
$ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a
i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" | jq 
{                                                                                                                                    
  "id": "p8oIaYiQF8_PptQPo-GH8QQ",                                                                                                   
  "choices": [                                                                                                                       
    {                                                                                                                                
      "finish_reason": "stop",                                                                                                       
      "index": 0,                                                                                                                    
      "logprobs": null,                                                                                                              
      "message": {                                                                                                                   
        "content": "Hello there! How can I help you today?",                                                                         
        "refusal": null,                                                                                                             
        "role": "assistant",                                                                                                         
        "annotations": null,                                                                                                         
        "audio": null,                                                                                                               
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
...
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-11-03 17:38:16 -08:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Sébastien Han
4a5ef65286
chore!: remove SDG API (#4035)
# What does this PR do?

This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 16:12:06 -08:00
Ashwin Bharambe
44096512b5
feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051)
We need to remove `/v1/openai/v1` paths shortly. There is one trouble --
our current `/v1/openai/v1/models` endpoint provides different data than
`/v1/models`. Unfortunately our tests target the latter (llama-stack
customized) behavior. We need to get to true OpenAI compatibility.

This is step 1: adding `custom_metadata` field to `OpenAIModel` that
includes all the extra stuff we add in the native `/v1/models` response.
This can be extracted on the consumer end by look at
`__pydantic_extra__` or other similar fields.

This PR:
- Adds `custom_metadata` field to `OpenAIModel` class in
`src/llama_stack/apis/models/models.py`
- Modified `openai_list_models()` in
`src/llama_stack/core/routing_tables/models.py` to populate
custom_metadata

Next Steps
1. Update stainless client to use `/v1/openai/v1/models` instead of
`/v1/models`
2. Migrate tests to read from `custom_metadata`
3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single
`/v1/models` endpoint
2025-11-03 15:56:07 -08:00
Ashwin Bharambe
2381714904
fix: enable SQLite WAL mode to prevent database locking errors (#4048)
Fixes race condition causing "database is locked" errors during
concurrent writes to SQLite, particularly in streaming responses with
guardrails where multiple inference calls write simultaneously.

Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple
concurrent readers and one writer without blocking. Set busy_timeout to
5s so SQLite retries instead of failing immediately. Remove the logic
that disabled write queues for SQLite since WAL mode eliminates the
locking issues that prompted disabling them.

Fixes: test_output_safety_guardrails_safe_content[stream=True] flake
2025-11-03 15:27:41 -08:00
ehhuang
628e38b3d5
test: always start a new server in integration-tests.sh (#4050)
# What does this PR do?
This prevents interference from already running servers, and allows
multiple concurrent integration test runs. Unleash the AIs!

## Test Plan
start a LS server at port 8321

Then observe test uses port 8322:

❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config
server:ci-tests --inference-mode replay --setup ollama --suite base
--pattern '(telemetry or safety)'
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: ollama
Inference Mode: replay
Test Suite: base
Test Subdirs:
Test Pattern: (telemetry or safety)

Checking llama packages
llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server
llama-stack-client                       0.3.0
ollama                                   0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR:
/var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU
Setting stack config type: server
Setting up environment variables:
export OLLAMA_URL='http://0.0.0.0:11434'
export SAFETY_MODEL='ollama/llama-guard3:1b'

Will use port: 8322
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8322...
 Llama Stack Server started successfully
2025-11-03 15:23:10 -08:00
Sébastien Han
da57b51fb6
ci: introduce Mergify bot to notify on PR conflicts (#4043)
This commit introduces Mergify, a powerful bot designed to assist with
automated merging and other CI-related tasks. As an initial step, we
enable a basic feature: automatically notifying users when a pull
request has merge conflicts.

When a conflict is detected, Mergify will add a label to the PR. This
label will be removed once the conflict is resolved.
This is foundation PR to activate the bot and start using it for
backports too.

In the future, we plan to expand Mergify’s role to include auto-merging,
as discussed in #1667, once the project is ready.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 12:21:19 -08:00
Derek Higgins
1562277cfd
ci: test adjustments for Qwen3-0.6B (#3978)
Without this hint Qwen3-0.6B tends to reply with the full name
and sometimes doesn't reply with the correct drafted year.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 12:19:35 -08:00
Matthew Farrellee
1263448de2
fix: allowed_models config did not filter models (#4030)
# What does this PR do?

closes #4022 

## Test Plan

ci w/ new tests

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:43:39 -08:00
Charlie Doern
30f8921240
fix: generate provider config when using --providers (#4044)
# What does this PR do?

call the sample_run_config method for providers that have it when
generating a run config using `llama stack run --providers`. This will
propagate API keys

resolves #4032


## Test Plan

new unit test checks the output of using `--providers` to ensure
`api_key` is in the config.

manual testing:

```
╰─ llama stack list-deps --providers=inference=remote::openai --format uv | sh
Using Python 3.12.11 environment at: venv
Audited 7 packages in 8ms

╰─ llama stack run --providers=inference=remote::openai
INFO     2025-11-03 14:33:02,094 llama_stack.cli.stack.run:161 cli: Writing generated config to:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,096 llama_stack.cli.stack.run:169 cli: Using run configuration:
         /Users/charliedoern/.llama/distributions/providers-run/run.yaml
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates:
           Key: None
           Cert: None
INFO     2025-11-03 14:33:02,099 llama_stack.cli.stack.run:230 cli: Listening on 0.0.0.0:8321
INFO     2025-11-03 14:33:02,145 llama_stack.core.server.server:513 core::server: Run configuration:
INFO     2025-11-03 14:33:02,146 llama_stack.core.server.server:516 core::server: apis:
         - inference
         image_name: providers-run
         providers:
           inference:
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
         registered_resources:
           benchmarks: []
           datasets: []
           models: []
           scoring_fns: []
           shields: []
           tool_groups: []
           vector_stores: []
         server:
           port: 8321
           workers: 1
         storage:
           backends:
             kv_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/kvstore.db
               type: kv_sqlite
             sql_default:
               db_path: /Users/charliedoern/.llama/distributions/providers-run/sql_store.db
               type: sql_sqlite
           stores:
             conversations:
               backend: sql_default
               table_name: openai_conversations
             inference:
               backend: sql_default
               max_write_queue_size: 10000
               num_writers: 4
               table_name: inference_store
             metadata:
               backend: kv_default
               namespace: registry
             prompts:
               backend: kv_default
               namespace: prompts
         telemetry:
           enabled: false
         version: 2

INFO     2025-11-03 14:33:02,299 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue
         disabled for SQLite to avoid concurrency issues
INFO     2025-11-03 14:33:05,272 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils:
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models
INFO     2025-11-03 14:33:05,368 uvicorn.error:84 uncategorized: Started server process [69109]
INFO     2025-11-03 14:33:05,369 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-11-03 14:33:05,370 llama_stack.core.server.server:172 core::server: Starting up Llama Stack server
         (version: 0.3.0)
INFO     2025-11-03 14:33:05,370 llama_stack.core.stack:495 core: starting registry refresh task
INFO     2025-11-03 14:33:05,370 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-11-03 14:33:05,371 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C
         to quit)
INFO     2025-11-03 14:34:19,242 uvicorn.access:473 uncategorized: 127.0.0.1:63102 - "POST /v1/chat/completions
         HTTP/1.1" 200
```

client:

```
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
 "model": "openai/gpt-5",
 "messages": [
     {"role": "user", "content": "What is 1 + 2"}
 ]
}'
{"id":"...","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"3","refusal":null,"role":"assistant","annotations":[],"audio":null,"function_call":null,"tool_calls":null}}],"created":1762198455,"model":"openai/gpt-5","object":"chat.completion","service_tier":"default","system_fingerprint":null,"usage":{"completion_tokens":10,"prompt_tokens":13,"total_tokens":23,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}}}%
```

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 11:37:58 -08:00
Ashwin Bharambe
415fd9e36b
chore: bump version to 0.4.0.dev0 (#4018)
Some checks failed
Test llama stack list-deps / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test llama stack list-deps / list-deps (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 16s
Pre-commit / pre-commit (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s
Test Llama Stack Build / build (push) Failing after 15s
UI Tests / ui-tests (22) (push) Successful in 1m12s
Automated version bump after releasing 0.3.1

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-03 09:36:04 -08:00
Sébastien Han
d4aa348b60
chore: remove HTML generation for openapi spec (#4039)
# What does this PR do?

This seems to be an ancient artifact when we were using readthedocs? Now
docusaurus read the specs directly.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 18:03:40 +01:00
dependabot[bot]
7e294d33d9
chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 7.1.2 (#4023)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.0.1 to 7.1.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.2 🌈 Speed up extraction on Windows</h2>
<h2>Changes</h2>
<p><a href="https://github.com/lazka"><code>@​lazka</code></a> fixed a
bug that caused extracting uv to take up to 30s. Thank you!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Use tar for extracting the uv zip file on Windows too <a
href="https://github.com/lazka"><code>@​lazka</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.5 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li>
<li>Bump github/codeql-action from 4.30.8 to 4.30.9 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li>
</ul>
<h2>v7.1.1 🌈 Fix empty workdir detection and lowest resolution
strategy</h2>
<h2>Changes</h2>
<p>This release fixes a bug where the <code>working-directory</code>
input was not used to detect an empty work dir. It also fixes the
<code>lowest</code> resolution strategy resolving to latest when only a
lower bound was specified.</p>
<p>Special thanks to <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> for the
first contribution!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Fix &quot;lowest&quot; resolution strategy with lower-bound only <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li>Use working-directory to detect empty workdir <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li>chore: update known checksums for 0.9.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Change version in docs to v7 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump github/codeql-action from 4.30.7 to 4.30.8 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Bump actions/setup-node from 5.0.0 to 6.0.0 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li>
<li>Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li>
<li>Update lockfile with latest npm <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li>
</ul>
<h2>v7.1.0 🌈 Support all the use cases</h2>
<h2>Changes</h2>
<p><strong>Support all the use cases!!!</strong></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="85856786d1"><code>8585678</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li>
<li><a
href="22d500a65c"><code>22d500a</code></a>
Bump github/codeql-action from 4.30.8 to 4.30.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li>
<li><a
href="14d557131d"><code>14d5571</code></a>
chore: update known checksums for 0.9.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li>
<li><a
href="29cd2350cd"><code>29cd235</code></a>
Use tar for extracting the uv zip file on Windows too (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li>
<li><a
href="2ddd2b9cb3"><code>2ddd2b9</code></a>
chore: update known checksums for 0.9.4 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li><a
href="b7bf78939d"><code>b7bf789</code></a>
Fix &quot;lowest&quot; resolution strategy with lower-bound only (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li><a
href="cb6c0a53d9"><code>cb6c0a5</code></a>
Change version in docs to v7 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
<li><a
href="dffc6292f2"><code>dffc629</code></a>
Use working-directory to detect empty workdir (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
<li><a
href="6e346e1653"><code>6e346e1</code></a>
chore: update known checksums for 0.9.3 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
<li><a
href="3ccd0fd498"><code>3ccd0fd</code></a>
Bump github/codeql-action from 4.30.7 to 4.30.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/setup-uv/compare/v6.0.1...85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=7.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-03 13:43:04 +01:00
Sébastien Han
3dbff6bf3f
fix: help mypy & fix precommit on main (#4037)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 21s
UI Tests / ui-tests (22) (push) Successful in 1m15s
# What does this PR do?

Add type to help mypy figure out.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-03 05:39:50 -05:00
Ashwin Bharambe
d45137a399
fix(ci): export UV_INDEX_STRATEGY to current shell before running uv sync (#4020)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 1m6s
Fixes latent bug where UV_INDEX_STRATEGY was only exported to GITHUB_ENV
but not to the current shell.

While this bug doesn't currently affect main (since UV_EXTRA_INDEX_URL
is only set on release branches), it's a latent bug that could cause
issues if the logic changes in the future or if someone tests with
UV_EXTRA_INDEX_URL set.

The setup-runner action only exported UV_INDEX_STRATEGY to GITHUB_ENV
(for subsequent steps), not to the current shell environment. Since uv
sync runs in the same step, it would never see the variable if it were
set.

This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the
variable available in the current shell before running uv commands.

Related: #4019 (same fix for release-0.3.x where the bug is actively
triggered)
2025-11-01 12:57:24 -07:00
Charlie Doern
93401836b7
feat: llama stack run --providers (#3989)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Pre-commit / pre-commit (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 56s
# What does this PR do?

llama stack run --providers takes a list of providers in the format of
api1=provider1,api2=provider2

this allows users to run with a simple list of providers.

given the architecture of `create_app`, this run config needs to be
written to disk. use ~/.llama/distribution/providers-run/run.yaml each
time for consistency

resolves #3956

## Test Plan

new unit tests to ensure --providers.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-31 16:21:32 -07:00
Ashwin Bharambe
b2a5428a14
fix(ci): unset empty UV index env vars to prevent uv errors (#4012)
Fixes container builds failing with UV index strategy errors when build
args are passed with empty values.

Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="")
become environment variables with empty string values in RUN commands.
UV interprets these as if --index-strategy "" was passed on the command
line, causing build failures with "error: a value is required for
'--index-strategy <UV_INDEX_STRATEGY>'".

This is a footgun because empty string ≠ unset variable, and ARGs
silently propagate to all RUN commands, only failing when declared with
empty defaults.

The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of
RUN blocks, saves the values early, and only restores them for editable
installs with RC dependencies. All other install modes (PyPI, test-pypi,
client) now run with a clean environment.
2025-10-31 13:29:14 -07:00
Ashwin Bharambe
f8fe3018af
fix(ci): use test.pypi as extra index for RC dependencies (#4009)
Backports UV index configuration fixes from `release-0.3.x` (PR #4002). 

The main issue: when we created the release branch infrastructure, we
configured UV to use `test.pypi` as the PRIMARY index to resolve RC
dependencies. This caused UV to look for ALL packages there first, which
led to problems - some packages don't have binary wheels on `test.pypi`,
so UV tried building from source and failed (like the `psycopg2-binary`
issue we hit).

The fix is simple: use PyPI as primary (default) and `test.pypi` as an
EXTRA index. UV will check PyPI first for everything, and only fall back
to `test.pypi` for packages not found there (like our RC client
versions).

This PR includes:
- Fixed `install-llama-stack-client` action to output
`UV_EXTRA_INDEX_URL` instead of `UV_INDEX_URL`
- New `uv-run-with-index.sh` wrapper that auto-detects release branches
and sets UV env vars
- Updated pre-commit hooks (`uv-lock`, codegen, etc.) to use the wrapper
- Pass UV env vars as Docker build args in all locations
- Scope UV env vars properly in Containerfile (inline for llama-stack
install, explicitly unset before distribution deps)
- Export UV env vars to `GITHUB_ENV` in setup-runner for cross-step
persistence

The wrapper detects release branches automatically in both CI and local
environments, so this "just works" without manual configuration. On main
(non-release branch), the wrapper becomes a no-op.

Tested and validated on `release-0.3.x` where all CI checks pass.
2025-10-31 12:55:43 -07:00
raghotham
62603d25c2
chore(api)!: /v1/inspect only lists v1 apis by default (#3948)
# What does this PR do?
Allow filtering for v1alpha, v1beta, deprecated and v1. Backward
incompatible change since by default it only returns v1 apis now.

## Test Plan
added unit test
2025-10-31 11:55:46 -07:00
Ashwin Bharambe
61aab1889b
fix(ci): remove precommit trigger workflow (#4008)
Not safe!
2025-10-31 11:41:26 -07:00
Francisco Arceo
7b79cd05d5
feat: Adding Prompts to admin UI (#3987)
# What does this PR do?

1. Updates Llama Stack Typescript client to include `prompts`api in
playground client.
2. Updates the UI to display prompts and execute basic CRUD operations
for prompts.

(2) adds an explicit "Preview" section when creating the prompt to show
users how the Prompts API behaves as you dynamically edit the prompt
content. See example here:

<p align="center"><img width="468.5" height="333" alt="Screenshot
2025-10-31 at 12 22 34 PM"
src="https://github.com/user-attachments/assets/3542ce7f-56fe-4fb4-b0a3-5cfba5917f6d"
/></p>

Some screen shots:

<details><Summary>Click me to expand!</Summary>

### Prompts List with Prompts
<img width="1906" height="1108" alt="Screenshot 2025-10-31 at 12 20
05 PM"
src="https://github.com/user-attachments/assets/494a4748-ea6a-4527-8cfe-8959cb741c0f"
/>

### Empty Prompts List
<img width="1889" height="1123" alt="Screenshot 2025-10-31 at 12 08
44 PM"
src="https://github.com/user-attachments/assets/ac95b807-d311-4725-86da-0258b3cce81a"
/>

### Create Prompt
<img width="1918" height="1167" alt="Screenshot 2025-10-31 at 11 03
29 AM"
src="https://github.com/user-attachments/assets/b3100a78-f4f3-410f-af89-f7e7fe4a89e7"
/>

### Submit Prompt with error
<img width="1901" height="1213" alt="Screenshot 2025-10-31 at 12 09
28 PM"
src="https://github.com/user-attachments/assets/dca71354-a602-449d-a0d8-0ed3d009a275"
/>
</details>

## Closes https://github.com/llamastack/llama-stack/issues/3322

## Test Plan
Added tests and manual testing.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-31 11:37:25 -07:00
Ashwin Bharambe
c2fd17474e fix: stop printing server log, it is confusing
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 54s
2025-10-31 11:22:08 -07:00
Ashwin Bharambe
5f95c1f8cc
fix(ci): install client from release branch before uv sync (#4001)
Fixes CI failures on release branches where uv sync can't resolve RC
dependencies.

The problem: on release branches like `release-0.3.x`, pyproject.toml
requires `llama-stack-client>=0.3.1rc1`. But RC versions only exist on
test.pypi, not PyPI. So uv sync fails before we even get a chance to
install the client from git.

The fix is simple - on release branches, pre-install the client from the
matching git branch first, then run uv sync. This satisfies the RC
requirement and lets dependency resolution succeed.

Modified setup-runner and pre-commit workflows to do this. Also cleaned
up some duplicate logic in setup-test-environment that's now handled
centrally.

Example failure:
5415478835
2025-10-31 06:16:20 -07:00
Ashwin Bharambe
6d80ca4bf7
fix(ci): replace unused LLAMA_STACK_CLIENT_DIR with direct install (#4000)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
UI Tests / ui-tests (22) (push) Successful in 27s
Replace unused `LLAMA_STACK_CLIENT_DIR` env var (from old `llama stack
build`) with direct `uv pip install` for release branch client
installation.

cc @ehhuang
2025-10-30 22:09:25 -07:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
Ashwin Bharambe
c396de57a4
ci: standardize release branch pattern to release-X.Y.x (#3999)
Standardize CI workflows to use `release-X.Y.x` branch pattern instead
of multiple numeric variants.

That's the pattern we are settling on. See
https://github.com/llamastack/llama-stack-ops/pull/20 for reference.
2025-10-30 21:33:32 -07:00
Doug Edgar
e8cd8508b5
fix: handle missing external_providers_dir (#3974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 50s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.

It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3950 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-10-30 17:01:31 -07:00
Derek Higgins
ff2b270e2f
fix: relax structured output test assertions to handle whitespace and… (#3997)
… case variations

The ollama/llama3.2:3b-instruct-fp16 model returns string values with
trailing whitespace in structured JSON output. Updated test assertions
to use case-insensitive substring matching instead of exact equality.

Use .lower() for case-insensitive comparison
Check if expected value is contained in actual value (handles
whitespace)

Closes: #3996

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-30 16:55:23 -07:00
ehhuang
0e384a55a1
feat: support workers in run config (#3992)
# What does this PR do?


## Test Plan
Set workers: 4 in run.yaml. Start server and observe logs multiple
times.
2025-10-30 16:34:12 -07:00
Ashwin Bharambe
6f90a7af4b
ci: target release-X.Y.x branches instead of release-X.Y.x-maint (#3995)
We will be updating our release procedure to be more "normal" or "sane".
We will
- create release branches like normal people
- land cherry-picks onto those branches
- run releases off of those branches
- no more "rc" branch pollution either

Given that, this PR cleans things up a bit
- Remove `-maint` suffix from release branch patterns in CI workflows
- Update branch matching to `release-X.Y.x` format
2025-10-30 16:27:13 -07:00
Ashwin Bharambe
90234d6973
ci: support release branches and match client branch (#3990)
- Update workflows to trigger on release-X.Y.x-maint branches
- When PR targets release branch, fetch matching branch from
llama-stack-client-python
- Falls back to main if matching client branch doesn't exist
- Updated workflows:
  - integration-tests.yml
  - integration-auth-tests.yml
  - integration-sql-store-tests.yml
  - integration-vector-io-tests.yml
  - unit-tests.yml
  - backward-compat.yml
  - pre-commit.yml
2025-10-30 15:20:34 -07:00
Ashwin Bharambe
c2ae42b343
fix(ci): show pre-commit output easily on failure (#3985)
Right now, the failed Step which is opened by GH by default tells me to
just go up and click and scroll through for no reason.
2025-10-30 11:48:20 -07:00
Ashwin Bharambe
77c8bc6fa7
fix(ci): add back server:ci-tests to replay tests (#3976)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Python Package Build Test / build (3.12) (push) Failing after 39s
Unit Tests / unit-tests (3.12) (push) Failing after 40s
UI Tests / ui-tests (22) (push) Successful in 42s
It is useful for local debugging. If both server and docker are failing,
you can just run server locally to debug which is much easier to do.
2025-10-30 11:02:59 -07:00
ehhuang
5e20938832
fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture (#3982)
# What does this PR do?
this is meant to be a manual flag

## Test Plan
CI
2025-10-30 09:13:04 -07:00
Sébastien Han
b4ea05ada9
chore: add batches to openapi schema (#3980)
# What does this PR do?

While working on https://github.com/llamastack/llama-stack/pull/3944 I
realized that the batches API wasn't generated.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-30 07:08:35 -07:00
Derek Higgins
19d85003de
test: Updated test skips that were marked with "inline::vllm" (#3979)
This should be "remote::vllm". This causes some log probs tests to be
skipped with remote vllm. (They
fail if run).

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-30 14:48:21 +01:00
Ashwin Bharambe
174ef162b3
fix(mypy): add fast and full mypy modes (#3975)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Test Llama Stack Build / build (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
UI Tests / ui-tests (22) (push) Successful in 38s
`mypy` became very slow for the common path. This can make local
pre-commit runs very slow. Let's restore that.

- restore fast mirrors-mypy hook for local runs
- add optional mypy-full hook and docs so devs can match CI
- run full mypy in CI with a hint when failures occur

### Test Plan
- uv run pre-commit run mypy --all-files
- uv run pre-commit run mypy-full --hook-stage manual --all-files
- uv run --group dev --group type_checking mypy
2025-10-29 19:02:32 -07:00
Charlie Doern
e8ecc99524
fix!: remove chunk_id property from Chunk class (#3954)
# What does this PR do?

chunk_id in the Chunk class executes actual logic to compute a chunk ID.
This sort of logic should not live in the API spec.

Instead, the providers should be in charge of calling generate_chunk_id,
and pass it to `Chunk`.

this removes the incorrect dependency between Provider impl and API impl

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-29 18:59:59 -07:00
Charlie Doern
0ef9166c7e
fix: make integration-tests.sh Mac friendly (#3971)
# What does this PR do?

When running ./scripts/integration-tests.sh --network host on mac fails
regularly due to how Docker runs on MacOS.

if on mac, keep network bridge mode.

before:

=== Starting Docker Container ===
Using image: localhost/distribution-ci-tests:dev
WARNING: Published ports are discarded when using host network mode
Waiting for Docker container to start...
 Docker container failed to start
Container logs:
INFO 2025-10-29 18:38:32,180 llama_stack.cli.stack.run:100 cli: Using
run configuration:
         /workspace/src/llama_stack/distributions/ci-tests/run.yaml
... (stack starts but is not reachable on network)

after:

=== Starting Docker Container ===
Using image: localhost/distribution-ci-tests:dev
Using bridge networking with port mapping (non-Linux) Waiting for Docker
container to start...
 Docker container started successfully

=== Running Integration Tests ===

## Test Plan

integration tests pass!

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-29 14:12:09 -07:00
Ashwin Bharambe
da8f014b96
feat(models): list models available via provider_data header (#3968)
## Summary

When users provide API keys via `X-LlamaStack-Provider-Data` header,
`models.list()` now returns models they can access from those providers,
not just pre-registered models from the registry.

This complements the routing fix from f88416ef8 which enabled inference
calls with `provider_id/model_id` format for unregistered models. Users
can now discover which models are available to them before making
inference requests.

The implementation reuses
`NeedsRequestProviderData.get_request_provider_data()` to validate
credentials, then dynamically fetches models from providers without
caching them since they're user-specific. Registry models take
precedence to respect any pre-configured aliases.

## Test Script

```python
#!/usr/bin/env python3
import json
import os
from openai import OpenAI

# Test 1: Without provider_data header
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy")
models = client.models.list()
anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id]
print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic")

# Test 2: With provider_data header containing Anthropic API key
anthropic_api_key = os.environ["ANTHROPIC_API_KEY"]
client_with_key = OpenAI(
    base_url="http://localhost:8321/v1/openai/v1",
    api_key="dummy",
    default_headers={
        "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key})
    }
)
models_with_key = client_with_key.models.list()
anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id]
print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic")
print(f"Anthropic models: {anthropic_with}")

assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key"
print("\n✓ Test passed!")
```

Run with a stack that has Anthropic provider configured (but without API
key in config):
```bash
ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py
```
2025-10-29 14:03:03 -07:00
Ashwin Bharambe
c9d4b6c54f
chore(mypy): part-04 resolve mypy errors in meta_reference agents (#3969)
## Summary
Fixes all mypy type errors in `providers/inline/agents/meta_reference/`
and removes exclusions from pyproject.toml.

## Changes
- Fix type annotations for Safety API message parameters
(OpenAIMessageParam)
- Add Action enum usage in access control checks
- Correct method signatures to match API supertype (parameter ordering)
- Handle optional return types with proper None checks
- Remove 3 meta_reference exclusions from mypy config

**Files fixed:** 25 errors across 3 files (safety.py, persistence.py,
agents.py)
2025-10-29 13:37:28 -07:00
Omar Abdelwahab
e6b27db30a
docs: A getting started notebook featuring simple agent examples. (#3955)
# What does this PR do?
Getting started notebook featuring simple agent examples.

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-10-29 14:13:34 -04:00
Ashwin Bharambe
7dc48a75e5
chore: delete openapi.stainless.yaml for now. not source of truth. (#3967)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Test llama stack list-deps / list-deps (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 2m34s
This is really not the source of truth yet and is causing more confusion
right now.
2025-10-29 10:45:38 -07:00
Nathan Weinberg
b90c6a2c8b
fix(docs): remove leftover telemetry sidebar section (#3961)
Leftover telemetry section was preventing `npm run build` from
completing successfully

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-29 11:20:13 -04:00
Nathan Weinberg
10977caff3
fix: typo in .gitignore (#3960)
typo in https://github.com/llamastack/llama-stack/pull/3959 (whoops)

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-29 11:08:47 -04:00
Ashwin Bharambe
a4f97559d1
fix(mypy): part-03 completely resolve meta reference responses impl typing issues (#3951)
## Summary
Resolves all mypy errors in meta reference agent OpenAI responses
implementation by adding proper type narrowing, None checks, and
Sequence type support.

## Changes
- Fixed streaming.py, openai_responses.py, utils.py, tool_executor.py,
agent_instance.py
- Added Sequence type support to schema generator (ensures correct JSON
schema generation)
- Applied union type narrowing and None checks throughout

## Test plan
- All modified files pass mypy type checking (0 errors)
- Schema generator produces correct `type: array` for Sequence types

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 08:07:15 -07:00
Ashwin Bharambe
e5c27dbcbf
fix(mypy): part-02 resolve OpenAI compatibility layer type issues (#3947)
## Summary

Fixes 111 mypy type errors in OpenAI compatibility layer (PR3 in mypy
remediation series).

**Changes:**
- `litellm_openai_mixin.py`: Added type annotations, None checks for
tool_config/model_store access
- `openai_compat.py`: Added None checks throughout, fixed TypedDict
expansions, proper type conversions for messages/tool_calls

**Result:** 23 → 1 errors in litellm file, 88 → 0 errors in
openai_compat file

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 08:06:40 -07:00
Ashwin Bharambe
ce31aa1704
fix(mypy-cleanup): part-01 resolve meta reference agent type issues (126 errors) (#3945)
Error fixes in Agents implementation (`meta-reference` provider) --
adding proper type annotations and using type narrowing for optional
attributes. Essentially a bunch of `if x and x_foo := getattr(x, "foo")`
instead of `x.foo` directly

Part of ongoing mypy remediation effort.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 07:54:30 -07:00
Nathan Weinberg
22bf0d0471
chore: ignore API docs generation (#3959)
See
1432743473

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-29 10:27:53 -04:00
Nathan Weinberg
b6bb8fbf64
ci: add pre-commit check ensuring FIPS compliance (#3899)
# What does this PR do?
this commit adds a new pre-commit hook to scan for non-FIPS compliant
function usage within llama-stack

Closes #3427

## Test Plan
Ran locally

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-29 10:21:35 -04:00
Ashwin Bharambe
e809d21357
feat: add backward compatibility tests for run.yaml (#3952)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (push) Failing after 45s
API Conformance Tests / check-schema-compatibility (push) Successful in 54s
UI Tests / ui-tests (22) (push) Successful in 52s
Pre-commit / pre-commit (push) Successful in 3m28s
This adds automated backward compatibility testing for `run.yaml` files.
As we evolve `StackRunConfig`, changes can inadvertently break existing
user configurations. This workflow catches those breaks before merge.

We test old run.yaml files (from main and the latest release) against
the PR's new code. If configs that worked before now fail, the PR is
blocked unless explicitly acknowledged as a breaking change.

**Two test layers:**
- Schema validation: Quick pytest checks that configs parse without
errors
- Integration tests: Full test suite execution to catch runtime semantic
issues (cross-field validations, provider initialization, etc.)

**What we test against:**
- main branch: Breaking changes here block the PR (this is the gate)
- Latest release: Informational only - shows if we've drifted from what
users have

If tests fail, the PR author must acknowledge the breaking change by
adding `!:` to the PR title (e.g., `feat!: change xyz`) or including
`BREAKING CHANGE:` in a commit message. Once acknowledged, the check
passes with a warning.

These jobs are run:
1. `check-main-compatibility` - Schema validation of all distribution
run.yaml files from main
2. `test-integration-main` - Full integration test suite using main's
ci-tests run.yaml
3. `test-integration-release` - Integration tests with latest release
config (informational)
4. `check-schema-release-compatibility` - Schema checks against release
(informational)

The integration tests catch issues that schema validation alone would
miss, like assertion failures in
`StackRunConfig.validate_server_stores()` or provider-specific runtime
logic.

Resolves #3311
Related to #3237
2025-10-28 21:51:56 -07:00
Derek Higgins
c678682cdd
chore: remove unused methods from InferenceRouter (#3953)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 24s
Test llama stack list-deps / generate-matrix (push) Successful in 25s
Python Package Build Test / build (3.13) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Vector IO Integration Tests / test-matrix (push) Failing after 32s
Test llama stack list-deps / list-deps (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 40s
UI Tests / ui-tests (22) (push) Successful in 59s
Test Llama Stack Build / build (push) Failing after 1m1s
Pre-commit / pre-commit (push) Successful in 5m23s
Remove unused methods that became obsolete after d266c59c: o
_compute_and_log_token_usage
o _count_tokens
o stream_tokens_and_compute_metrics
o count_tokens_and_compute_metrics

These methods are no longer referenced anywhere in the codebase
following the removal of deprecated inference.chat_completion
implementations.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-28 17:12:41 -07:00
ehhuang
1aa8979050
test: enable telemetry tests in server mode (#3927)
# What does this PR do?
- added a server-based test OLTP collector

## Test Plan
CI
2025-10-28 16:33:48 -07:00
ehhuang
1f9d48cd54
feat: openai files provider (#3946)
# What does this PR do?
- Adds OpenAI files provider 
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com

## Test Plan
Modify run yaml to use openai files provider:
```
  files:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY:=}
      metadata_store:
        backend: sql_default
        table_name: openai_files_metadata

# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
2025-10-28 16:25:03 -07:00
raghotham
feabcdd67b
docs: add documentation on how to use custom run yaml in docker (#3949)
as title

test plan:

```yaml
# custom-ollama-run.yaml
version: 2
image_name: starter
external_providers_dir: /.llama/providers.d
apis:
- inference
- vector_io
- files
- safety
- tool_runtime
- agents

providers:
  inference:
  # Single Ollama provider for all models
  - provider_id: ollama
    provider_type: remote::ollama
    config:
      url: ${env.OLLAMA_URL:=http://localhost:11434}

  vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      persistence:
        namespace: vector_io::faiss
        backend: kv_default

  files:
  - provider_id: meta-reference-files
    provider_type: inline::localfs
    config:
      storage_dir: /.llama/files
      metadata_store:
        table_name: files_metadata
        backend: sql_default

  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config:
      excluded_categories: []

  tool_runtime:
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime

  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence:
        agent_state:
          namespace: agents
          backend: kv_default
        responses:
          table_name: responses
          backend: sql_default
          max_write_queue_size: 10000
          num_writers: 4

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: /.llama/kvstore.db
    sql_default:
      type: sql_sqlite
      db_path: /.llama/sql_store.db
  stores:
    metadata:
      namespace: registry
      backend: kv_default
    inference:
      table_name: inference_store
      backend: sql_default
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      table_name: openai_conversations
      backend: sql_default

registered_resources:
  models:
  # All models use the same 'ollama' provider
  - model_id: llama3.2-vision:latest
    provider_id: ollama
    provider_model_id: llama3.2-vision:latest
    model_type: llm
  - model_id: llama3.2:3b
    provider_id: ollama
    provider_model_id: llama3.2:3b
    model_type: llm
  # Embedding models
  - model_id: nomic-embed-text-v2-moe
    provider_id: ollama
    provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
    model_type: embedding
    metadata:
      embedding_dimension: 768
  shields: []
  vector_dbs: []
  datasets: []
  scoring_fns: []
  benchmarks: []
  tool_groups: []

server:
  port: 8321

telemetry:
  enabled: true

vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: ollama
    model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
```

```bash
docker run
     -it
     --pull always
     -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
     -v ~/.llama:/root/.llama
     -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml
     -e RUN_CONFIG_PATH=/app/custom-run.yaml
     -e OLLAMA_URL=http://host.docker.internal:11434/
     llamastack/distribution-starter:0.3.0
     --port $LLAMA_STACK_PORT
```
2025-10-28 16:05:44 -07:00
Ashwin Bharambe
f88416ef87
fix(inference): enable routing of models with provider_data alone (#3928)
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

## Test Plan

Added an integration test

Closes #3929

---------

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-10-28 11:16:37 -07:00
Ashwin Bharambe
94b0592240
fix(mypy): add type stubs and fix typing issues (#3938)
Adds type stubs and fixes mypy errors for better type coverage.

Changes:
- Added type_checking dependency group with type stubs (torchtune, trl,
etc.)
- Added lm-format-enforcer to pre-commit hook
- Created HFAutoModel Protocol for type-safe HuggingFace model handling
- Added mypy.overrides for untyped libraries (torchtune, fairscale,
etc.)
- Fixed type issues in post-training providers, databricks, and
api_recorder

Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude
list).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 11:00:09 -07:00
Ashwin Bharambe
1d385b5b75
fix(mypy): resolve OpenAI SDK and provider type issues (#3936)
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs

## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)

**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import

**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package

## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:54:29 -07:00
Ashwin Bharambe
d009dc29f7
fix(mypy): resolve provider utility and testing type issues (#3935)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 51s
Pre-commit / pre-commit (push) Successful in 2m0s
Fixes mypy type errors in provider utilities and testing infrastructure:
- `mcp.py`: Cast incompatible client types, wrap image data properly
- `batches.py`: Rename walrus variable to avoid shadowing
- `api_recorder.py`: Use cast for Pydantic field annotation

No functional changes.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:37:27 -07:00
Ashwin Bharambe
fcf07790c8
fix(mypy): resolve model implementation typing issues (#3934)
## Summary

Fixes mypy type errors across 4 model implementation files (Phase 2d of
mypy suppression removal plan):
- `src/llama_stack/models/llama/llama3/multimodal/image_transform.py`
(10 errors fixed)
- `src/llama_stack/models/llama/checkpoint.py` (2 errors fixed)
- `src/llama_stack/models/llama/hadamard_utils.py` (1 error fixed)
- `src/llama_stack/models/llama/llama3/multimodal/encoder_utils.py` (1
error fixed)

## Changes

### image_transform.py
- Fixed return type annotation for `find_supported_resolutions` from
`Tensor` to `list[tuple[int, int]]`
- Fixed parameter and return type annotations for
`resize_without_distortion` from `Tensor` to `Image.Image`
- Resolved variable shadowing by using separate names:
`possible_resolutions_list` for the list and
`possible_resolutions_tensor` for the tensor

### checkpoint.py
- Replaced deprecated `torch.BFloat16Tensor` and
`torch.cuda.BFloat16Tensor` with
`torch.set_default_dtype(torch.bfloat16)`
- Fixed variable shadowing by renaming numpy array to `ckpt_paths_array`
to distinguish from the parameter `ckpt_paths: list[Path]`

### hadamard_utils.py
- Added `isinstance` assertion to narrow type from `nn.Module` to
`nn.Linear` before accessing `in_features` attribute

### encoder_utils.py
- Fixed variable shadowing by using `masks_list` for list accumulation
and `masks` for the final Tensor result

## Test plan

- Verified all files pass mypy type checking (only optional dependency
import warnings remain)
- No functional changes - only type annotations and variable naming
improvements

Stacks on PR #3933

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:28:29 -07:00
Ashwin Bharambe
6ce59b5df8
fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933)
Fixes mypy type errors in provider utilities:
- MongoDB: Fix AsyncMongoClient parameters, use async iteration for
cursor
- Batches: Handle memoryview|bytes union for file decoding
- Auth: Add missing imports, validate JWKS URI, conditionally pass
parameters

Fixes 11 type errors. No functional changes.
2025-10-28 10:23:39 -07:00
Ashwin Bharambe
4a2ea278c5
fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3943)
Fixes mypy type errors in OpenTelemetry integration:
- Add type aliases for AttributeValue and Attributes
- Add helper to filter None values from attributes (OpenTelemetry
doesn't accept None)
- Cast metric and tracer objects to proper types
- Update imports after refactoring

No functional changes.
2025-10-28 10:10:18 -07:00
Ashwin Bharambe
85887d724f Revert "fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931)"
This reverts commit 9afc52a36a.
2025-10-28 09:48:46 -07:00
Ashwin Bharambe
9afc52a36a
fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931)
## Summary

Fix all 11 mypy type checking errors in `telemetry.py` without using any
type suppressions.

**Changes:**
- Add type aliases for OpenTelemetry attribute types (`AttributeValue`,
`Attributes`)
- Create `_clean_attributes()` helper to filter None values from
attribute dicts
- Use `cast()` for TracerProvider methods (`add_span_processor`,
`force_flush`)
- Use `cast()` for metric creation methods returning from global storage
- Fix variable reuse by renaming `span` to `end_span` in SpanEndPayload
branch
- Add None check for `parent_span` before `set_span_in_context`

**Errors Fixed:**
- TracerProvider attribute access: 2 errors
- Counter/UpDownCounter/ObservableGauge return types: 3 errors
- Attribute dict type mismatches: 4 errors
- Span assignment type conflicts: 2 errors

**Testing:**
```bash
uv run mypy src/llama_stack/core/telemetry/telemetry.py
# Success: no issues found
```

**Part of:** Mypy suppression removal plan (Phase 2a/4)

**Stack:**
- [Phase 1] Add type stubs (#3930)
- [Phase 2a] Fix OpenTelemetry types (this PR)
- [Phase 2b+] Fix remaining errors (upcoming)
- [Phase 3] Remove inline suppressions (upcoming)
- [Phase 4] Un-exclude files from mypy (upcoming)
2025-10-28 09:47:20 -07:00
Ian Miller
5598f61e12
feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for making changes to Responses API scheme to
introduce OpenAI compatible prompts there. Change to the API only,
therefore currently no implementation at all. However, the follow up PR
with actual implementation will be submitted after current PR lands.

The need of this functionality was initiated in #3514. 

> Note, #3514 is divided on three separate PRs. Current PR is the second
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
CI
2025-10-28 09:31:27 -07:00
Ashwin Bharambe
e5ca7e6450
chore(mypy): add mypy and type stub packages to dev deps (#3930)
## Summary

This PR adds mypy and essential type stub packages to dev dependencies
as Phase 1 of the mypy suppression removal plan.

**Changes:**
- Add `mypy` to dev dependencies
- Add type stubs: `types-jsonschema`, `pandas-stubs`, `types-psutil`,
`types-tqdm`, `boto3-stubs`

**Impact:**
- Enables static type checking across the codebase
- Eliminates ~30 type checking errors related to missing type
information for third-party packages
- Provides foundation for subsequent PRs to remove type suppressions

**Part of:** Mypy suppression removal plan (Phase 1/4)

**Testing:**
```bash
uv sync --group dev
uv run mypy
```
2025-10-28 06:02:38 -07:00
Sébastien Han
d10bfb5121
chore: remove leftover llama_stack directory (#3940)
# What does this PR do?

Followup on https://github.com/llamastack/llama-stack/pull/3920 where
the llama_stack directory was moved under src.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-28 05:09:08 -07:00
Sébastien Han
b47afac7c2
chore: bump openai package version (#3918)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test llama stack list-deps / show-single-provider (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test llama stack list-deps / generate-matrix (push) Successful in 26s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 28s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (push) Failing after 32s
Test llama stack list-deps / list-deps (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 39s
Test Llama Stack Build / build (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 1m25s
Pre-commit / pre-commit (push) Successful in 3m49s
# What does this PR do?

To match https://github.com/llamastack/llama-stack/pull/3847 We must not
update the lock manually, but always reflect the update in the
pyproject.toml. The lock is a state at build time.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-28 09:18:48 +01:00
Ashwin Bharambe
4e6c769cc4
fix(context): prevent provider data leak between streaming requests (#3924)
## Summary

- `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and
other context vars) populated after a streaming generator completed on
HEAD~1, so the asyncio context for request N+1 started with request N's
provider payload.
- FastAPI dependencies and middleware execute before
`request_provider_data_context` rebinds the header data, meaning
auth/logging hooks could observe a prior tenant's credentials or treat
them as authenticated. Traces and any background work that inspects the
context outside the `with` block leak as well—this is a real security
regression, not just a CLI artifact.
- The wrapper now restores each tracked `ContextVar` to the value it
held before the iteration (falling back to clearing when necessary)
after every yield and when the generator terminates, so provider data is
wiped while callers that set their own defaults keep them.

## Test Plan

- `uv run pytest tests/unit/core/test_provider_data_context.py -q`
- `uv run pytest tests/unit/distribution/test_context.py -q`

Both suites fail on HEAD~1 and pass with this change.
2025-10-27 23:01:12 -07:00
ehhuang
c077d01ddf
chore(telemetry): more cleanup: remove apis.telemetry (#3919)
# What does this PR do?


## Test Plan
CI
2025-10-27 22:20:15 -07:00
ehhuang
1c9a31d8bd
chore(telemetry): add grafana dashboards (#3921)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / lint (push) Failing after 3s
Installer CI / smoke-test-on-dev (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test llama stack list-deps / list-deps (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
API Conformance Tests / check-schema-compatibility (push) Successful in 47s
Test Llama Stack Build / build (push) Failing after 41s
Test Llama Stack Build / build-single-provider (push) Failing after 48s
Test llama stack list-deps / list-deps-from-config (push) Failing after 45s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Pre-commit / pre-commit (push) Successful in 1m48s
# What does this PR do?
- add a dashboard in grafana (vibe-coded)

## Test Plan
<img width="2416" height="1114" alt="image"
src="https://github.com/user-attachments/assets/8927aad2-cc14-4a1d-847e-350522cac02f"
/>
2025-10-27 14:58:27 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
Nathan Weinberg
b6954c9882
fix: add missing shutdown methods to PromptServiceImpl and ConversationServiceImpl (#3925)
Change is visible in server shutdown logs, changes `WARNING` loglines to
`INFO`

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-27 13:41:38 -07:00
Matthew Farrellee
a9b00db421
feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734)
# What does this PR do?

add provider-data key passing support to Cerebras, Databricks, NVIDIA
and RunPod

also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova,
and vLLM

addresses #3517 

## Test Plan

ci w/ new tests

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-27 13:09:35 -07:00
Ashwin Bharambe
471b1b248b
chore(package): migrate to src/ layout (#3920)
Migrates package structure to src/ layout following Python packaging
best practices.

All code moved from `llama_stack/` to `src/llama_stack/`. Public API
unchanged - imports remain `import llama_stack.*`.

Updated build configs, pre-commit hooks, scripts, and GitHub workflows
accordingly. All hooks pass, package builds cleanly.

**Developer note**: Reinstall after pulling: `pip install -e .`
2025-10-27 12:02:21 -07:00
IAN MILLER
98a5047f9d
feat(prompts): attach prompts to storage stores in run configs (#3893)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for attaching prompts to storage stores in run
configs. It allows to specify prompts as stores in different
distributions. The need of this functionality was initiated in #3514

> Note, #3514 is divided on three separate PRs. Current PR is the first
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing and updated CI unit tests

Prerequisites:

1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1
uv pip install`

2. `llama stack run starter `

```
INFO     2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration:                            
         /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml                                        
INFO     2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates:                    
           Key: None                                                                                                    
           Cert: None                                                                                                   
INFO     2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321                 
INFO     2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration:                    
INFO     2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis:                                 
         - agents                                                                                                       
         - batches                                                                                                      
         - datasetio                                                                                                    
         - eval                                                                                                         
         - files                                                                                                        
         - inference                                                                                                    
         - post_training                                                                                                
         - safety                                                                                                       
         - scoring                                                                                                      
         - tool_runtime                                                                                                 
         - vector_io                                                                                                    
         image_name: starter                                                                                            
         providers:                                                                                                     
           agents:                                                                                                      
           - config:                                                                                                    
               persistence:                                                                                             
                 agent_state:                                                                                           
                   backend: kv_default                                                                                  
                   namespace: agents                                                                                    
                 responses:                                                                                             
                   backend: sql_default                                                                                 
                   max_write_queue_size: 10000                                                                          
                   num_writers: 4                                                                                       
                   table_name: responses                                                                                
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           batches:                                                                                                     
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: batches                                                                                     
             provider_id: reference                                                                                     
             provider_type: inline::reference                                                                           
           datasetio:                                                                                                   
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::huggingface                                                                      
             provider_id: huggingface                                                                                   
             provider_type: remote::huggingface                                                                         
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::localfs                                                                          
             provider_id: localfs                                                                                       
             provider_type: inline::localfs                                                                             
           eval:                                                                                                        
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: eval                                                                                        
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           files:                                                                                                       
           - config:                                                                                                    
               metadata_store:                                                                                          
                 backend: sql_default                                                                                   
                 table_name: files_metadata                                                                             
               storage_dir: /Users/ianmiller/.llama/distributions/starter/files                                         
             provider_id: meta-reference-files                                                                          
             provider_type: inline::localfs                                                                             
           inference:                                                                                                   
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.fireworks.ai/inference/v1                                                               
             provider_id: fireworks                                                                                     
             provider_type: remote::fireworks                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.together.xyz/v1                                                                         
             provider_id: together                                                                                      
             provider_type: remote::together                                                                            
           - config: {}                                                                                                 
             provider_id: bedrock                                                                                       
             provider_type: remote::bedrock                                                                             
           - config:                                                                                                    
               api_key: '********'                                                                                      
               base_url: https://api.openai.com/v1                                                                      
             provider_id: openai                                                                                        
             provider_type: remote::openai                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: anthropic                                                                                     
             provider_type: remote::anthropic                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: gemini                                                                                        
             provider_type: remote::gemini                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.groq.com                                                                                
             provider_id: groq                                                                                          
             provider_type: remote::groq                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.sambanova.ai/v1                                                                         
             provider_id: sambanova                                                                                     
             provider_type: remote::sambanova                                                                           
           - config: {}                                                                                                 
             provider_id: sentence-transformers                                                                         
             provider_type: inline::sentence-transformers                                                               
           post_training:                                                                                               
           - config:                                                                                                    
               checkpoint_format: meta                                                                                  
             provider_id: torchtune-cpu                                                                                 
             provider_type: inline::torchtune-cpu                                                                       
           safety:                                                                                                      
           - config:                                                                                                    
               excluded_categories: []                                                                                  
             provider_id: llama-guard                                                                                   
             provider_type: inline::llama-guard                                                                         
           - config: {}                                                                                                 
             provider_id: code-scanner                                                                                  
             provider_type: inline::code-scanner                                                                        
           scoring:                                                                                                     
           - config: {}                                                                                                 
             provider_id: basic                                                                                         
             provider_type: inline::basic                                                                               
           - config: {}                                                                                                 
             provider_id: llm-as-judge                                                                                  
             provider_type: inline::llm-as-judge                                                                        
           - config:                                                                                                    
               openai_api_key: '********'                                                                               
             provider_id: braintrust                                                                                    
             provider_type: inline::braintrust                                                                          
           tool_runtime:                                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: brave-search                                                                                  
             provider_type: remote::brave-search                                                                        
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: tavily-search                                                                                 
             provider_type: remote::tavily-search                                                                       
           - config: {}                                                                                                 
             provider_id: rag-runtime                                                                                   
             provider_type: inline::rag-runtime                                                                         
           - config: {}                                                                                                 
             provider_id: model-context-protocol                                                                        
             provider_type: remote::model-context-protocol                                                              
           vector_io:                                                                                                   
           - config:                                                                                                    
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::faiss                                                                            
             provider_id: faiss                                                                                         
             provider_type: inline::faiss                                                                               
           - config:                                                                                                    
               db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db                                     
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::sqlite_vec                                                                       
             provider_id: sqlite-vec                                                                                    
             provider_type: inline::sqlite-vec                                                                          
         registered_resources:                                                                                          
           benchmarks: []                                                                                               
           datasets: []                                                                                                 
           models: []                                                                                                   
           scoring_fns: []                                                                                              
           shields: []                                                                                                  
           tool_groups:                                                                                                 
           - provider_id: tavily-search                                                                                 
             toolgroup_id: builtin::websearch                                                                           
           - provider_id: rag-runtime                                                                                   
             toolgroup_id: builtin::rag                                                                                 
           vector_stores: []                                                                                            
         server:                                                                                                        
           port: 8321                                                                                                   
         storage:                                                                                                       
           backends:                                                                                                    
             kv_default:                                                                                                
               db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db                                        
               type: kv_sqlite                                                                                          
             sql_default:                                                                                               
               db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db                                      
               type: sql_sqlite                                                                                         
           stores:                                                                                                      
             conversations:                                                                                             
               backend: sql_default                                                                                     
               table_name: openai_conversations                                                                         
             inference:                                                                                                 
               backend: sql_default                                                                                     
               max_write_queue_size: 10000                                                                              
               num_writers: 4                                                                                           
               table_name: inference_store                                                                              
             metadata:                                                                                                  
               backend: kv_default                                                                                      
               namespace: registry                                                                                      
             prompts:                                                                                                   
               backend: kv_default                                                                                      
               namespace: prompts                                                                                       
         telemetry:                                                                                                     
           enabled: true                                                                                                
         vector_stores:                                                                                                 
           default_embedding_model:                                                                                     
             model_id: nomic-ai/nomic-embed-text-v1.5                                                                   
             provider_id: sentence-transformers                                                                         
           default_provider_id: faiss                                                                                   
         version: 2                                                                                                     
                                                                                                                        
INFO     2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue        
         disabled for SQLite to avoid concurrency issues                                                                
WARNING  2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry:          
         OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry                                                     
INFO     2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils:               
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models                                           
INFO     2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328]                         
INFO     2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup.                       
INFO     2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server        
         (version: 0.3.0)                                                                                               
INFO     2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task                        
INFO     2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete.                          
INFO     2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)   
```
As you can see, prompts are attached to stores in config

Testing:

1. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \                 
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

2. Get prompt:

`curl -X GET
http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f`

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

3. Query sqlite KV storage to check created prompt:

```
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%';
key                                                           value                                                         expiration
------------------------------------------------------------  ------------------------------------------------------------  ----------
prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab            
163f:1                                                        5f6e163f", "prompt": "Hello {{name}}! You are working at {{c            
                                                              ompany}}. Your role is {{role}} at {{company}}. Remember, {{            
                                                              name}}, to be {{tone}}.", "version": 1, "variables": ["name"            
                                                              , "company", "role", "tone"], "is_default": false}                      

prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  1                                                                       
163f:default                                                                                                                          
sqlite> 
```
2025-10-27 11:12:12 -07:00
Luis Tomas Bolivar
63422e5b36
fix!: Enhance response API support to not fail with tool calling (#3385)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 1m3s
Vector IO Integration Tests / test-matrix (push) Failing after 1m6s
API Conformance Tests / check-schema-compatibility (push) Successful in 1m17s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Pre-commit / pre-commit (push) Successful in 3m5s
# What does this PR do?
Introduces two main fixes to enhance the stability of Responses API when
dealing with tool calling responses and structured outputs.

### Changes Made

1. It added OpenAIResponseOutputMessageMCPCall and ListTools to
OpenAIResponseInput but
https://github.com/llamastack/llama-stack/pull/3810 got merge that did
the same in a different way. Still this PR does it in a way that keep
the sync between OpenAIResponsesOutput and the allowed objects in
OpenAIResponseInput.

2. Add protection in case self.ctx.response_format does not have type
attribute

BREAKING CHANGE: OpenAIResponseInput now uses OpenAIResponseOutput union
type.
This is semantically equivalent - all previously accepted types are
still supported
via the OpenAIResponseOutput union. This improves type consistency and
maintainability.
2025-10-27 09:33:02 -07:00
Luis Tomas Bolivar
f18b5eb537
fix: Avoid BadRequestError due to invalid max_tokens (#3667)
This patch ensures if max tokens is not defined, then is set to None
instead of 0 when calling openai_chat_completion. This way some
providers (like gemini) that cannot handle the `max_tokens = 0` will not
fail

Issue: #3666
2025-10-27 09:27:21 -07:00
Derek Higgins
00d8414597
fix(tests): limit vector store providers for record mode in CI tests (#3898)
The vector_provider_wrapper was only limiting providers to
faiss/sqlite-vec for replay mode, but CI tests also run in record mode
with the same limited set of providers. This caused test failures when
trying to test against milvus, chromadb, pgvector, weaviate, and qdrant
which aren't configured in the record job.
2025-10-27 09:22:49 -07:00
Sébastien Han
7c0e43424d
chore: remove duplicate provider definition (#3917)
# What does this PR do?

Files was present twice.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-27 09:19:04 -07:00
dependabot[bot]
9c223d8593
chore(github-deps): bump actions/upload-artifact from 4.6.2 to 5.0.0 (#3905)
Bumps
[actions/upload-artifact](https://github.com/actions/upload-artifact)
from 4.6.2 to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<p><strong>BREAKING CHANGE:</strong> this update supports Node
<code>v24.x</code>. This is not a breaking change per-se but we're
treating it as such.</p>
<ul>
<li>Update README.md by <a
href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li>
<li>Readme: spell out the first use of GHES by <a
href="https://github.com/danwkennedy"><code>@​danwkennedy</code></a> in
<a
href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li>
<li>Update GHES guidance to include reference to Node 20 version by <a
href="https://github.com/patrikpolyak"><code>@​patrikpolyak</code></a>
in <a
href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li>
<li>Bump <code>@actions/artifact</code> to <code>v4.0.0</code></li>
<li>Prepare <code>v5.0.0</code> by <a
href="https://github.com/danwkennedy"><code>@​danwkennedy</code></a> in
<a
href="https://redirect.github.com/actions/upload-artifact/pull/734">actions/upload-artifact#734</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li>
<li><a
href="https://github.com/danwkennedy"><code>@​danwkennedy</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li>
<li><a
href="https://github.com/patrikpolyak"><code>@​patrikpolyak</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/upload-artifact/compare/v4...v5.0.0">https://github.com/actions/upload-artifact/compare/v4...v5.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="330a01c490"><code>330a01c</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/734">#734</a>
from actions/danwkennedy/prepare-5.0.0</li>
<li><a
href="03f2824452"><code>03f2824</code></a>
Update <code>github.dep.yml</code></li>
<li><a
href="905a1ecb59"><code>905a1ec</code></a>
Prepare <code>v5.0.0</code></li>
<li><a
href="2d9f9cdfa9"><code>2d9f9cd</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/725">#725</a>
from patrikpolyak/patch-1</li>
<li><a
href="9687587dec"><code>9687587</code></a>
Merge branch 'main' into patch-1</li>
<li><a
href="2848b2cda0"><code>2848b2c</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/727">#727</a>
from danwkennedy/patch-1</li>
<li><a
href="9b511775fd"><code>9b51177</code></a>
Spell out the first use of GHES</li>
<li><a
href="cd231ca1ed"><code>cd231ca</code></a>
Update GHES guidance to include reference to Node 20 version</li>
<li><a
href="de65e23aa2"><code>de65e23</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/712">#712</a>
from actions/nebuk89-patch-1</li>
<li><a
href="8747d8cd76"><code>8747d8c</code></a>
Update README.md</li>
<li>Additional commits viewable in <a
href="ea165f8d65...330a01c490">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=4.6.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-27 14:42:23 +01:00
dependabot[bot]
8ad9dd7d60
chore(github-deps): bump astral-sh/setup-uv from 7.1.0 to 7.1.1 (#3906)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
7.1.0 to 7.1.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.1 🌈 Fix empty workdir detection and lowest resolution
strategy</h2>
<h2>Changes</h2>
<p>This release fixes a bug where the <code>working-directory</code>
input was not used to detect an empty work dir. It also fixes the
<code>lowest</code> resolution strategy resolving to latest when only a
lower bound was specified.</p>
<p>Special thanks to <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> for the
first contribution!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Fix &quot;lowest&quot; resolution strategy with lower-bound only <a
href="https://github.com/tpgillam"><code>@​tpgillam</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li>Use working-directory to detect empty workdir <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known checksums for 0.9.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li>chore: update known checksums for 0.9.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Change version in docs to v7 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump github/codeql-action from 4.30.7 to 4.30.8 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li>Bump actions/setup-node from 5.0.0 to 6.0.0 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li>
<li>Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li>
<li>Update lockfile with latest npm <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2ddd2b9cb3"><code>2ddd2b9</code></a>
chore: update known checksums for 0.9.4 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li>
<li><a
href="b7bf78939d"><code>b7bf789</code></a>
Fix &quot;lowest&quot; resolution strategy with lower-bound only (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li>
<li><a
href="cb6c0a53d9"><code>cb6c0a5</code></a>
Change version in docs to v7 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li>
<li><a
href="dffc6292f2"><code>dffc629</code></a>
Use working-directory to detect empty workdir (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li>
<li><a
href="6e346e1653"><code>6e346e1</code></a>
chore: update known checksums for 0.9.3 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li>
<li><a
href="3ccd0fd498"><code>3ccd0fd</code></a>
Bump github/codeql-action from 4.30.7 to 4.30.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li>
<li><a
href="ce6dbd84e1"><code>ce6dbd8</code></a>
Bump actions/setup-node from 5.0.0 to 6.0.0 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li>
<li><a
href="2382069a66"><code>2382069</code></a>
Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li>
<li><a
href="b1daf91f4e"><code>b1daf91</code></a>
Update lockfile with latest npm (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li>
<li>See full diff in <a
href="3259c6206f...2ddd2b9cb3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=7.1.0&new-version=7.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-27 14:42:08 +01:00
dependabot[bot]
9d6e589120
chore(ui-deps): bump @types/node from 24.8.1 to 24.9.1 in /llama_stack/ui (#3912)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 47s
Pre-commit / pre-commit (push) Successful in 1m43s
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 24.8.1 to 24.9.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/node&package-manager=npm_and_yarn&previous-version=24.8.1&new-version=24.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-26 23:48:00 -04:00
dependabot[bot]
948951cc5c
chore(ui-deps): bump @tailwindcss/postcss from 4.1.14 to 4.1.16 in /llama_stack/ui (#3913)
Bumps
[@tailwindcss/postcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss)
from 4.1.14 to 4.1.16.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/releases"><code>@​tailwindcss/postcss</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v4.1.16</h2>
<h3>Fixed</h3>
<ul>
<li>Discard candidates with an empty data type (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19172">#19172</a>)</li>
<li>Fix canonicalization of arbitrary variants with attribute selectors
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19176">#19176</a>)</li>
<li>Fix invalid colors due to nested <code>&amp;</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19184">#19184</a>)</li>
<li>Improve canonicalization for <code>&amp; &gt; :pseudo</code> and
<code>&amp; :pseudo</code> arbitrary variants (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19178">#19178</a>)</li>
</ul>
<h2>v4.1.15</h2>
<h3>Fixed</h3>
<ul>
<li>Fix Safari devtools rendering issue due to <code>color-mix</code>
fallback (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19069">#19069</a>)</li>
<li>Suppress Lightning CSS warnings about <code>:deep</code>,
<code>:slotted</code>, and <code>:global</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19094">#19094</a>)</li>
<li>Fix resolving theme keys when starting with the name of another
theme key in JS configs and plugins (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19097">#19097</a>)</li>
<li>Allow named groups in combination with <code>not-*</code>,
<code>has-*</code>, and <code>in-*</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19100">#19100</a>)</li>
<li>Prevent important utilities from affecting other utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19110">#19110</a>)</li>
<li>Don’t index into strings with the <code>theme(…)</code> function (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19111">#19111</a>)</li>
<li>Fix parsing issue when <code>\t</code> is used in at-rules (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19130">#19130</a>)</li>
<li>Upgrade: Canonicalize utilities containing <code>0</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19095">#19095</a>)</li>
<li>Upgrade: Migrate deprecated <code>break-words</code> to
<code>wrap-break-word</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19157">#19157</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Remove the <code>postinstall</code> script from oxide (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19149">#19149</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md"><code>@​tailwindcss/postcss</code>'s
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.16] - 2025-10-23</h2>
<h3>Fixed</h3>
<ul>
<li>Discard candidates with an empty data type (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19172">#19172</a>)</li>
<li>Fix canonicalization of arbitrary variants with attribute selectors
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19176">#19176</a>)</li>
<li>Fix invalid colors due to nested <code>&amp;</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19184">#19184</a>)</li>
<li>Improve canonicalization for <code>&amp; &gt; :pseudo</code> and
<code>&amp; :pseudo</code> arbitrary variants (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19178">#19178</a>)</li>
</ul>
<h2>[4.1.15] - 2025-10-20</h2>
<h3>Fixed</h3>
<ul>
<li>Fix Safari devtools rendering issue due to <code>color-mix</code>
fallback (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19069">#19069</a>)</li>
<li>Suppress Lightning CSS warnings about <code>:deep</code>,
<code>:slotted</code>, and <code>:global</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19094">#19094</a>)</li>
<li>Fix resolving theme keys when starting with the name of another
theme key in JS configs and plugins (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19097">#19097</a>)</li>
<li>Allow named groups in combination with <code>not-*</code>,
<code>has-*</code>, and <code>in-*</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19100">#19100</a>)</li>
<li>Prevent important utilities from affecting other utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19110">#19110</a>)</li>
<li>Don’t index into strings with the <code>theme(…)</code> function (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19111">#19111</a>)</li>
<li>Fix parsing issue when <code>\t</code> is used in at-rules (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19130">#19130</a>)</li>
<li>Upgrade: Canonicalize utilities containing <code>0</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19095">#19095</a>)</li>
<li>Upgrade: Migrate deprecated <code>break-words</code> to
<code>wrap-break-word</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19157">#19157</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Remove the <code>postinstall</code> script from oxide (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19149">#19149</a>)(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19149">tailwindlabs/tailwindcss#19149</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="cbbbe84475"><code>cbbbe84</code></a>
Release 4.1.16 (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19185">#19185</a>)</li>
<li><a
href="b2e2435ccb"><code>b2e2435</code></a>
Release 4.1.15 (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19159">#19159</a>)</li>
<li>See full diff in <a
href="https://github.com/tailwindlabs/tailwindcss/commits/v4.1.16/packages/@tailwindcss-postcss">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@tailwindcss/postcss&package-manager=npm_and_yarn&previous-version=4.1.14&new-version=4.1.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-26 23:47:36 -04:00
dependabot[bot]
00bfda4eff
chore(ui-deps): bump @types/react-dom from 19.2.1 to 19.2.2 in /llama_stack/ui (#3915)
Bumps
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom)
from 19.2.1 to 19.2.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react-dom&package-manager=npm_and_yarn&previous-version=19.2.1&new-version=19.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-26 23:47:16 -04:00
dependabot[bot]
68e5a66ca9
chore(ui-deps): bump @testing-library/jest-dom from 6.8.0 to 6.9.1 in /llama_stack/ui (#3914)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m33s
Bumps
[@testing-library/jest-dom](https://github.com/testing-library/jest-dom)
from 6.8.0 to 6.9.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/testing-library/jest-dom/releases"><code>@​testing-library/jest-dom</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v6.9.1</h2>
<h2><a
href="https://github.com/testing-library/jest-dom/compare/v6.9.0...v6.9.1">6.9.1</a>
(2025-10-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Fix undefined <code>Node</code> error (nodejs) (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/707">#707</a>)
(<a
href="0ff8904ff4">0ff8904</a>)</li>
</ul>
<h2>v6.9.0</h2>
<h1><a
href="https://github.com/testing-library/jest-dom/compare/v6.8.0...v6.9.0">6.9.0</a>
(2025-09-30)</h1>
<h3>Features</h3>
<ul>
<li>Add .toAppearBefore/.toAppearAfter matcher (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/702">#702</a>)
(<a
href="95f870acb2">95f870a</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0ff8904ff4"><code>0ff8904</code></a>
fix: Fix undefined <code>Node</code> error (nodejs) (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/707">#707</a>)</li>
<li><a
href="95f870acb2"><code>95f870a</code></a>
feat: Add .toAppearBefore/.toAppearAfter matcher (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/702">#702</a>)</li>
<li><a
href="d6663f5f97"><code>d6663f5</code></a>
docs: add nossbigg as a contributor for code, and test (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/703">#703</a>)</li>
<li>See full diff in <a
href="https://github.com/testing-library/jest-dom/compare/v6.8.0...v6.9.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@testing-library/jest-dom&package-manager=npm_and_yarn&previous-version=6.8.0&new-version=6.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-25 19:55:14 -04:00
ehhuang
509676641a
chore: update run configs (#3902)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m34s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
# What does this PR do?
telemetry was deprecated


## Test Plan
2025-10-24 15:03:06 -07:00
ehhuang
2a1a813308
chore: update docs for telemetry api removal (#3900)
# What does this PR do?
Telemetry is no longer an API/provider.

## Test Plan
2025-10-24 13:57:28 -07:00
Francisco Arceo
4566eebe05
feat: Add static file import system for docs (#3882)
# What does this PR do?

Add static file import system for docs

- Use `remark-code-import` plugin to embed code at build time
- Support importing Python code with syntax highlighting using
`raw-loader` + `ReactMarkdown`

One caveat is that currently when embedding markdown with code used the
syntax highlighting isn't behaving but I'll investigate that in a follow
up.

## Test Plan

Python Example:
<img width="1372" height="995" alt="Screenshot 2025-10-23 at 9 22 18 PM"
src="https://github.com/user-attachments/assets/656d2c78-4d9b-45a4-bd5e-3f8490352b85"
/>

Markdown example:
<img width="1496" height="1070" alt="Screenshot 2025-10-23 at 9 22
38 PM"
src="https://github.com/user-attachments/assets/6c0a07ec-ff7c-45aa-b05f-8c46acd4445c"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-24 14:01:33 -04:00
ehhuang
8265d4efc8
chore(telemetry): code cleanup (#3897)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
Clean up telemetry code since the telemetry API has been remove.
- moved telemetry files out of providers to core
- removed from Api

## Test Plan

❯ OTEL_SERVICE_NAME=llama_stack
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run
starter
❯ curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

-> verify traces in Grafana

CI
2025-10-23 23:13:02 -07:00
ehhuang
9916cb3b17
chore: support default model in moderations API (#3890)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?
https://platform.openai.com/docs/api-reference/moderations supports
optional model parameter.

This PR adds support for using moderations API with model=None if a
default shield id is provided via safety config.

## Test Plan
added tests

manual test:
```
> SAFETY_MODEL='together/meta-llama/Llama-Guard-4-12B'   uv run llama stack run starter
> curl http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
        "hello"
    ]
  }'
```
2025-10-23 16:03:53 -07:00
ehhuang
d12e5f0999
chore(telemetry): add an arguement to select conatiner runtime explicitly (#3896)
# What does this PR do?


## Test Plan
❯ ./scripts/telemetry/setup_telemetry.sh --container docker
2025-10-23 12:36:34 -07:00
Ashwin Bharambe
658fb2c777 refactor(k8s): update run configs to v2 storage and registered_resources structure
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m30s
Migrates k8s run configs to match the updated run configs

- Replace storage.references with storage.stores
- Wrap resources under registered_resources section
- Update provider configs to use persistence with namespace/backend
- Add telemetry and vector_stores top-level sections
- Simplify agent/files metadata store configuration
2025-10-22 15:33:50 -07:00
Ashwin Bharambe
0e57233a0a
chore(misc): update datasets, benchmarks to use alpha, beta prefixes (#3891)
This will be landed together with
https://github.com/llamastack/llama-stack-client-python/pull/282 (hence
CI will be red on this one.)

I have verified locally that tests pass with the updated version of the
client-sdk.
2025-10-22 15:26:35 -07:00
Ashwin Bharambe
7918188f1e
fix(ci): enable responses tests in CI; suppress expected MCP auth error logs (#3889)
Let us enable responses suite in CI now.

Also a minor fix: MCP tool tests intentionally trigger authentication
failures to verify error handling, but the resulting error logs clutter
test output.
2025-10-22 14:59:42 -07:00
Ashwin Bharambe
7b90e0e9c8
test: suppress expected error logs in SSE test (#3886)
Our unit test outputs are filled with all kinds of obscene logs. This
makes it really hard to spot real issues quickly. The problem is that
these logs are necessary to output at the given logging level when the
server is operating normally. It's just that we don't want to see some
of them (especially the noisy ones) during tests.

This PR begins the cleanup. We pytest's caplog fixture to for
suppression.
2025-10-22 14:34:32 -07:00
ehhuang
f8eaa40580
chore: better error messages for moderations API (#3887)
# What does this PR do?


## Test Plan
```
~/projects/lst3 remotes/origin/HEAD*
.venv ❯ curl http://localhost:8321/v1/moderations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "input": [
        "hello"
    ]
  }'
{"detail":"Invalid value: No shield associated with provider_resource id gpt-4o-mini: choose from ['together/meta-llama/Llama-Guard-4-12B']"}
```
2025-10-22 14:33:13 -07:00
Ashwin Bharambe
30ba8c8655
fix(responses): sync conversation before yielding terminal events in streaming (#3888)
Move conversation sync logic before yield to ensure it executes even
when
streaming consumers break early after receiving response.completed
event.

## Test Plan

```
OLLAMA_URL=http://localhost:11434 \
  pytest -sv tests/integration/responses/ \
  --stack-config server:ci-tests \
  --text-model ollama/llama3.2:3b-instruct-fp16 \
  --inference-mode live \
  -k conversation_multi
```

This test now passes.
2025-10-22 14:31:12 -07:00
Ashwin Bharambe
cb2185b936
fix(logging): ensure logs go to stderr, loggers obey levels (#3885)
Important fix to the logging system
2025-10-22 13:06:54 -07:00
dependabot[bot]
8885cea8d7
fix(conversations)!: update Conversations API definitions (was: bump openai from 1.107.0 to 2.5.0) (#3847)
Bumps [openai](https://github.com/openai/openai-python) from 1.107.0 to
2.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/releases">openai's
releases</a>.</em></p>
<blockquote>
<h2>v2.5.0</h2>
<h2>2.5.0 (2025-10-17)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.4.0...v2.5.0">v2.4.0...v2.5.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> api update (<a
href="8b280d57d6">8b280d5</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>bump <code>httpx-aiohttp</code> version to 0.1.9 (<a
href="67f2f0afe5">67f2f0a</a>)</li>
</ul>
<h2>v2.4.0</h2>
<h2>2.4.0 (2025-10-16)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.3.0...v2.4.0">v2.3.0...v2.4.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> Add support for gpt-4o-transcribe-diarize on
audio/transcriptions endpoint (<a
href="bdbe9b8f44">bdbe9b8</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>fix dangling comment (<a
href="da14e99606">da14e99</a>)</li>
<li><strong>internal:</strong> detect missing future annotations with
ruff (<a
href="2672b8f072">2672b8f</a>)</li>
</ul>
<h2>v2.3.0</h2>
<h2>2.3.0 (2025-10-10)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.2.0...v2.3.0">v2.2.0...v2.3.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> comparison filter in/not in (<a
href="aa49f626a6">aa49f62</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li><strong>package:</strong> bump jiter to &gt;=0.10.0 to support
Python 3.14 (<a
href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)
(<a
href="aa445cab5c">aa445ca</a>)</li>
</ul>
<h2>v2.2.0</h2>
<h2>2.2.0 (2025-10-06)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.1.0...v2.2.0">v2.1.0...v2.2.0</a></p>
<h3>Features</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's
changelog</a>.</em></p>
<blockquote>
<h2>2.5.0 (2025-10-17)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.4.0...v2.5.0">v2.4.0...v2.5.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> api update (<a
href="8b280d57d6">8b280d5</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>bump <code>httpx-aiohttp</code> version to 0.1.9 (<a
href="67f2f0afe5">67f2f0a</a>)</li>
</ul>
<h2>2.4.0 (2025-10-16)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.3.0...v2.4.0">v2.3.0...v2.4.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> Add support for gpt-4o-transcribe-diarize on
audio/transcriptions endpoint (<a
href="bdbe9b8f44">bdbe9b8</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>fix dangling comment (<a
href="da14e99606">da14e99</a>)</li>
<li><strong>internal:</strong> detect missing future annotations with
ruff (<a
href="2672b8f072">2672b8f</a>)</li>
</ul>
<h2>2.3.0 (2025-10-10)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.2.0...v2.3.0">v2.2.0...v2.3.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> comparison filter in/not in (<a
href="aa49f626a6">aa49f62</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li><strong>package:</strong> bump jiter to &gt;=0.10.0 to support
Python 3.14 (<a
href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)
(<a
href="aa445cab5c">aa445ca</a>)</li>
</ul>
<h2>2.2.0 (2025-10-06)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v2.1.0...v2.2.0">v2.1.0...v2.2.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> dev day 2025 launches (<a
href="38ac0093eb">38ac009</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="513ae76253"><code>513ae76</code></a>
release: 2.5.0 (<a
href="https://redirect.github.com/openai/openai-python/issues/2694">#2694</a>)</li>
<li><a
href="ebf32212f7"><code>ebf3221</code></a>
release: 2.4.0</li>
<li><a
href="e043d7b164"><code>e043d7b</code></a>
chore: fix dangling comment</li>
<li><a
href="25cbb74f83"><code>25cbb74</code></a>
feat(api): Add support for gpt-4o-transcribe-diarize on
audio/transcriptions ...</li>
<li><a
href="8cdfd0650e"><code>8cdfd06</code></a>
codegen metadata</li>
<li><a
href="d5c64434b7"><code>d5c6443</code></a>
codegen metadata</li>
<li><a
href="b20a9e7b81"><code>b20a9e7</code></a>
chore(internal): detect missing future annotations with ruff</li>
<li><a
href="e5f93f5dae"><code>e5f93f5</code></a>
release: 2.3.0</li>
<li><a
href="044878859c"><code>0448788</code></a>
feat(api): comparison filter in/not in</li>
<li><a
href="85a91ade61"><code>85a91ad</code></a>
chore(package): bump jiter to &gt;=0.10.0 to support Python 3.14 (<a
href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/openai/openai-python/compare/v1.107.0...v2.5.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.107.0&new-version=2.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-22 12:32:48 -07:00
Jiayi Ni
bb1ebb3c6b
feat: Add rerank models and rerank API change (#3831)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Extend the model type to include rerank models.
- Implement `rerank()` method in inference router.
- Add `rerank_model_list` to `OpenAIMixin` to enable providers to
register and identify rerank models
- Update documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
pytest tests/unit/providers/utils/inference/test_openai_mixin.py
```
2025-10-22 12:02:28 -07:00
ehhuang
f2598d30e6
chore: use --no-cache in Containerfile (#3884)
# What does this PR do?
debugging
5332970065

--no-cache was what build_container.sh used

## Test Plan
2025-10-22 11:39:00 -07:00
Ashwin Bharambe
c582654d70 fix(ci): dont need server: anymore, docker: is sufficient
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 2m14s
2025-10-22 09:13:39 -07:00
Francisco Arceo
53c20f6113
feat: Adding Demo script (#3870)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 15s
API Conformance Tests / check-schema-compatibility (push) Successful in 24s
UI Tests / ui-tests (22) (push) Successful in 50s
Pre-commit / pre-commit (push) Successful in 1m26s
# What does this PR do?
Updated quickstart `demo_script.py` to use OpenAI APIs, which is simply:

```python
import io, requests
from openai import OpenAI

url="https://www.paulgraham.com/greatwork.html"
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")

vs = client.vector_stores.create()
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
uploaded_file = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants")
client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)

resp = client.responses.create(
    model="openai/gpt-4o",
    input="How do you do great work? Use the existing knowledge_search tool.",
    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
    include=["file_search_call.results"],
)

print(resp)
```



<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-21 21:31:21 -04:00
github-actions[bot]
bf2d16997d build: Bump version to 0.3.0
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 25s
Test llama stack list-deps / list-deps (push) Failing after 24s
UI Tests / ui-tests (22) (push) Successful in 52s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 1m59s
2025-10-21 23:59:09 +00:00
Ashwin Bharambe
c0c0e337d9 misc(tests): add recordings for responses tests 2025-10-21 16:39:08 -07:00
Ashwin Bharambe
557b1b8c2d fix(logs): restore uvicorn and llama_stack logger settings 2025-10-21 15:47:55 -07:00
slekkala1
eb2b240594
fix: remove consistency checks (#3881)
# What does this PR do?
metadata is conflicting with the default embedding model set on server
side via extra body, removing the check and just letting metadata take
precedence over extra body

`ValueError: Embedding model inconsistent between metadata
('text-embedding-3-small') and extra_body
     ('sentence-transformers/nomic-ai/nomic-embed-text-v1.5')`
## Test Plan
CI
2025-10-21 14:40:14 -07:00
Alexey Rybak
4c718523fa
docs: fix the building distro file (#3880)
# What does this PR do?
* Fixes the doc server build (which expects a blank line after imports)

## Test Plan
* `cd docs && npm run build`
2025-10-21 14:26:35 -07:00
slekkala1
cb6a5e2687
fix: fix segfault in load model (#3879)
# What does this PR do?
Fix segfault with load model
The cc-vec integration failed with segfault when used with default
embedding model on macOS
`model_id: nomic-ai/nomic-embed-text-v1.5` and `provider_id:
sentence-transformers`
Checked crash report and see this is due to torch OPENMP settings.
Constrainting to 1 thread works without crashes.


## Test Plan
Tested with cc-vec integration 
1. start server llama stack run starter
2. Do the setup in https://github.com/raghotham/cc-vec to set env
variables and try
`uv run cc-vec index --url-patterns "%.github.io" --vector-store-name
"ml-research" --limit 50 --chunk-size 800 --overlap 400`
2025-10-21 12:21:06 -07:00
ehhuang
1ec7216c3f
chore: update quick_start (#3878)
# What does this PR do?


## Test Plan
2025-10-21 11:33:23 -07:00
Ashwin Bharambe
bd3c473208
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877)
Reverts llamastack/llama-stack#3871

This PR broke RAG (even from Responses -- there _is_ a dependency)
2025-10-21 11:22:06 -07:00
ehhuang
eb3e9b85f9
chore: update getting_started (#3875)
# What does this PR do?


## Test Plan
2025-10-21 11:09:45 -07:00
Ashwin Bharambe
71ead88bce
fix(logging): move module-level initialization to explicit setup calls (#3874)
- Moved environment variable parsing and `setup_logging()` call from
module level to proper initialization points
- Added explicit `setup_logging()` calls in `server.py::create_app()`
and `library_client.py::AsyncLlamaStackAsLibraryClient.__init__()`

Module-level side effects are bad practice and can cause issues with
import order, testing, and circular dependencies. The previous
implementation ran logging setup on every import of the log module,
which is unpredictable and difficult to control.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-21 11:08:25 -07:00
Ashwin Bharambe
9191005ca1
fix(ci): dump server/container logs when tests fail (#3873)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.12) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test Llama Stack Build / generate-matrix (push) Successful in 20s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 36s
Test Llama Stack Build / build (push) Failing after 12s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Pre-commit / pre-commit (push) Successful in 2m5s
Output last 100 lines of server.log or docker container logs when
integration tests fail to aid debugging.
2025-10-20 22:28:55 -07:00
Ashwin Bharambe
0e96279bee
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer
targeted. We use the Responses implementation for knowledge_search which
uses the `openai_vector_stores` pathway.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 22:26:21 -07:00
Ashwin Bharambe
5aaf1a8bca
fix(ci): improve workflow logging and bot notifications (#3872)
## Summary
- Link pre-commit bot comment to workflow run instead of PR for better
debugging
- Dump docker container logs before removal to ensure logs are actually
captured

## Changes
1. **Pre-commit bot**: Changed the initial bot comment to link
"pre-commit hooks" text to the actual workflow run URL instead of just
having the PR number auto-link
2. **Docker logs**: Moved docker container log dumping from GitHub
Actions to the integration-tests.sh script's stop_container() function,
ensuring logs are captured before container removal

## Test plan
- Pre-commit bot comment will now have a clickable link to the workflow
run
- Docker container logs will be successfully captured in CI runs
2025-10-20 22:08:15 -07:00
Ashwin Bharambe
122de785c4
chore(cleanup)!: kill vector_db references as far as possible (#3864)
There should not be "vector db" anywhere.
2025-10-20 20:06:16 -07:00
ehhuang
444f6c88f3
chore: remove build.py (#3869)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
Test llama stack list-deps / list-deps-from-config (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 23s
Test llama stack list-deps / list-deps (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 57s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?


## Test Plan
CI
2025-10-20 16:28:15 -07:00
Charlie Doern
6a13a99e77
chore: add beta group to stainless (#3866)
# What does this PR do?

similarly to `alpha:` move `v1beta` routes under a `beta` group so the
client will have `client.beta`

From what I can tell, the openapi.stainless.yml file is hand written
while the openapi.yml file is generated and copied using the shell
script so I did this by hand.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-20 16:26:06 -07:00
ehhuang
407bade359
chore: migrate stack build (#3867)
# What does this PR do?
Just use editable install here. Not sure about the USE_COPY_NOT_MOUNT
that was used in original scripts and if that's needed.

## Test Plan
<img width="1008" height="587" alt="image"
src="https://github.com/user-attachments/assets/7ddf8e31-2635-45d3-b79c-1b898eefbf07"
/>

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3867).
* #3869
* __->__ #3867
2025-10-20 16:22:48 -07:00
ehhuang
ffeb86385c
chore: fix main (#3868)
# What does this PR do?
dup entry was added for some reason

## Test Plan
2025-10-20 16:01:03 -07:00
ehhuang
b215eb5944
chore: skip shutdown if otel_endpoint is not set (#3865)
# What does this PR do?
rid following error when ctrl+c'd server

│
/Users/erichuang/projects/lst3/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py:92
in │
│ shutdown │
│ │
│ 89 │ │ pass │
│ 90 │ │
│ 91 │ async def shutdown(self) -> None: │
│ ❱ 92 │ │ trace.get_tracer_provider().force_flush() │
│ 93 │ │
│ 94 │ async def log_event(self, event: Event, ttl_seconds: int =
604800) -> None: │
│ 95 │ │ if isinstance(event, UnstructuredLogEvent): │

╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'ProxyTracerProvider' object has no attribute
'force_flush'

## Test Plan
2025-10-20 15:48:37 -07:00
dependabot[bot]
d9274d199e
chore(ui-deps): bump @types/node from 24.3.0 to 24.8.1 in /llama_stack/ui (#3851)
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 24.3.0 to 24.8.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/node&package-manager=npm_and_yarn&previous-version=24.3.0&new-version=24.8.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:11:36 -07:00
dependabot[bot]
ec364499f5
chore(ui-deps): bump @tailwindcss/postcss from 4.1.6 to 4.1.14 in /llama_stack/ui (#3850)
Bumps
[@tailwindcss/postcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss)
from 4.1.6 to 4.1.14.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/releases"><code>@​tailwindcss/postcss</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v4.1.14</h2>
<h3>Fixed</h3>
<ul>
<li>Handle <code>'</code> syntax in ClojureScript when extracting
classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18888">#18888</a>)</li>
<li>Handle <code>@variant</code> inside <code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18885">#18885</a>)</li>
<li>Merge suggestions when using <code>@utility</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18900">#18900</a>)</li>
<li>Ensure that file system watchers created when using the CLI are
always cleaned up (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18905">#18905</a>)</li>
<li>Do not generate <code>grid-column</code> utilities when configuring
<code>grid-column-start</code> or <code>grid-column-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Do not generate <code>grid-row</code> utilities when configuring
<code>grid-row-start</code> or <code>grid-row-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Prevent duplicate CSS when overwriting a static utility with a theme
key (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18056">#18056</a>)</li>
<li>Show Lightning CSS warnings (if any) when optimizing/minifying (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18918">#18918</a>)</li>
<li>Use <code>default</code> export condition for
<code>@tailwindcss/vite</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18948">#18948</a>)</li>
<li>Re-throw errors from PostCSS nodes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18373">#18373</a>)</li>
<li>Detect classes in markdown inline directives (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18967">#18967</a>)</li>
<li>Ensure files with only <code>@theme</code> produce no output when
built (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18979">#18979</a>)</li>
<li>Support Maud templates when extracting classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18988">#18988</a>)</li>
<li>Upgrade: Do not migrate <code>variant = 'outline'</code> during
upgrades (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18922">#18922</a>)</li>
<li>Upgrade: Show version mismatch (if any) when running upgrade tool
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19028">#19028</a>)</li>
<li>Upgrade: Ensure first class inside <code>className</code> is
migrated (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
<li>Upgrade: Migrate classes inside <code>*ClassName</code> and
<code>*Class</code> attributes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
</ul>
<h2>v4.1.13</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>v4.1.12</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
<li>Fix trailing <code>)</code> from interfering with extraction in
Clojure keywords (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Detect classes inside Elixir charlist, word list, and string sigils
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li>
<li>Track source locations through <code>@plugin</code> and
<code>@config</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md"><code>@​tailwindcss/postcss</code>'s
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.14] - 2025-10-01</h2>
<h3>Fixed</h3>
<ul>
<li>Handle <code>'</code> syntax in ClojureScript when extracting
classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18888">#18888</a>)</li>
<li>Handle <code>@variant</code> inside <code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18885">#18885</a>)</li>
<li>Merge suggestions when using <code>@utility</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18900">#18900</a>)</li>
<li>Ensure that file system watchers created when using the CLI are
always cleaned up (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18905">#18905</a>)</li>
<li>Do not generate <code>grid-column</code> utilities when configuring
<code>grid-column-start</code> or <code>grid-column-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Do not generate <code>grid-row</code> utilities when configuring
<code>grid-row-start</code> or <code>grid-row-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Prevent duplicate CSS when overwriting a static utility with a theme
key (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18056">#18056</a>)</li>
<li>Show Lightning CSS warnings (if any) when optimizing/minifying (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18918">#18918</a>)</li>
<li>Use <code>default</code> export condition for
<code>@tailwindcss/vite</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18948">#18948</a>)</li>
<li>Re-throw errors from PostCSS nodes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18373">#18373</a>)</li>
<li>Detect classes in markdown inline directives (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18967">#18967</a>)</li>
<li>Ensure files with only <code>@theme</code> produce no output when
built (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18979">#18979</a>)</li>
<li>Support Maud templates when extracting classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18988">#18988</a>)</li>
<li>Upgrade: Do not migrate <code>variant = 'outline'</code> during
upgrades (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18922">#18922</a>)</li>
<li>Upgrade: Show version mismatch (if any) when running upgrade tool
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19028">#19028</a>)</li>
<li>Upgrade: Ensure first class inside <code>className</code> is
migrated (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
<li>Upgrade: Migrate classes inside <code>*ClassName</code> and
<code>*Class</code> attributes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
</ul>
<h2>[4.1.13] - 2025-09-03</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>[4.1.12] - 2025-08-13</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b67cbcf6cc"><code>b67cbcf</code></a>
Prepare v4.1.14 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19037">#19037</a>)</li>
<li><a
href="b497e1eaf3"><code>b497e1e</code></a>
Add <code>Upgrading from Tailwind CSS v…</code> when running upgrade
tool (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19026">#19026</a>)</li>
<li><a
href="210575a6a5"><code>210575a</code></a>
Update dedent 1.6.0 → 1.7.0 (minor) (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19010">#19010</a>)</li>
<li><a
href="d0f7f82787"><code>d0f7f82</code></a>
Add plugin option documentation to the postcss plugin readme (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18940">#18940</a>)</li>
<li><a
href="5b8136e838"><code>5b8136e</code></a>
Re-throw errors from PostCSS nodes (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18373">#18373</a>)</li>
<li><a
href="1334c99db8"><code>1334c99</code></a>
Prepare v4.1.13 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18868">#18868</a>)</li>
<li><a
href="6791e8133c"><code>6791e81</code></a>
Prepare v4.1.12 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18728">#18728</a>)</li>
<li><a
href="492304212f"><code>4923042</code></a>
Allow users to disable url rewriting in the PostCSS plugin (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18321">#18321</a>)</li>
<li><a
href="88b9f15b65"><code>88b9f15</code></a>
Center the dropdown icon added to an input with a paired datalist in
Chrome (...</li>
<li><a
href="9169d73aad"><code>9169d73</code></a>
update READMEs</li>
<li>Additional commits viewable in <a
href="https://github.com/tailwindlabs/tailwindcss/commits/v4.1.14/packages/@tailwindcss-postcss">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@tailwindcss/postcss&package-manager=npm_and_yarn&previous-version=4.1.6&new-version=4.1.14)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:11:24 -07:00
dependabot[bot]
6a74894e22
chore(python-deps): bump fastapi from 0.116.1 to 0.119.0 (#3845)
Bumps [fastapi](https://github.com/fastapi/fastapi) from 0.116.1 to
0.119.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fastapi/fastapi/releases">fastapi's
releases</a>.</em></p>
<blockquote>
<h2>0.119.0</h2>
<p>FastAPI now (temporarily) supports both Pydantic v2 models and
<code>pydantic.v1</code> models at the same time in the same app, to
make it easier for any FastAPI apps still using Pydantic v1 to gradually
but quickly <strong>migrate to Pydantic v2</strong>.</p>
<pre lang="Python"><code>from fastapi import FastAPI
from pydantic import BaseModel as BaseModelV2
from pydantic.v1 import BaseModel
<p>class Item(BaseModel):<br />
name: str<br />
description: str | None = None</p>
<p>class ItemV2(BaseModelV2):<br />
title: str<br />
summary: str | None = None</p>
<p>app = FastAPI()</p>
<p><a
href="https://github.com/app"><code>@​app</code></a>.post(&quot;/items/&quot;,
response_model=ItemV2)<br />
def create_item(item: Item):<br />
return {&quot;title&quot;: item.name, &quot;summary&quot;:
item.description}<br />
</code></pre></p>
<p>Adding this feature was a big effort with the main objective of
making it easier for the few applications still stuck in Pydantic v1 to
migrate to Pydantic v2.</p>
<p>And with this, support for <strong>Pydantic v1 is now
deprecated</strong> and will be <strong>removed</strong> from FastAPI in
a future version soon.</p>
<p><strong>Note</strong>: have in mind that the Pydantic team already
stopped supporting Pydantic v1 for recent versions of Python, starting
with Python 3.14.</p>
<p>You can read in the docs more about how to <a
href="https://fastapi.tiangolo.com/how-to/migrate-from-pydantic-v1-to-pydantic-v2/">Migrate
from Pydantic v1 to Pydantic v2</a>.</p>
<h3>Features</h3>
<ul>
<li> Add support for <code>from pydantic.v1 import BaseModel</code>,
mixed Pydantic v1 and v2 models in the same app. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14168">#14168</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h2>0.118.3</h2>
<h3>Upgrades</h3>
<ul>
<li>⬆️ Add support for Python 3.14. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14165">#14165</a>
by <a
href="https://github.com/svlandeg"><code>@​svlandeg</code></a>.</li>
</ul>
<h2>0.118.2</h2>
<h3>Fixes</h3>
<ul>
<li>🐛 Fix tagged discriminated union not recognized as body field. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/12942">#12942</a>
by <a
href="https://github.com/frankie567"><code>@​frankie567</code></a>.</li>
</ul>
<h3>Internal</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2e721e1b02"><code>2e721e1</code></a>
🔖 Release version 0.119.0</li>
<li><a
href="fc7a0686af"><code>fc7a068</code></a>
📝 Update release notes</li>
<li><a
href="3a3879b2c3"><code>3a3879b</code></a>
📝 Update release notes</li>
<li><a
href="d34918abf0"><code>d34918a</code></a>
 Add support for <code>from pydantic.v1 import BaseModel</code>, mixed
Pydantic v1 and ...</li>
<li><a
href="352dbefc63"><code>352dbef</code></a>
🔖 Release version 0.118.3</li>
<li><a
href="96e7d6eaa4"><code>96e7d6e</code></a>
📝 Update release notes</li>
<li><a
href="3611c3fc5b"><code>3611c3f</code></a>
⬆️ Add support for Python 3.14 (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14165">#14165</a>)</li>
<li><a
href="942fce394b"><code>942fce3</code></a>
🔖 Release version 0.118.2</li>
<li><a
href="13b067c9b6"><code>13b067c</code></a>
📝 Update release notes</li>
<li><a
href="185cecd891"><code>185cecd</code></a>
🐛 Fix tagged discriminated union not recognized as body field (<a
href="https://redirect.github.com/fastapi/fastapi/issues/12942">#12942</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/fastapi/fastapi/compare/0.116.1...0.119.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=fastapi&package-manager=uv&previous-version=0.116.1&new-version=0.119.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:11:11 -07:00
dependabot[bot]
5aafce4ff3
chore(python-deps): bump weaviate-client from 4.16.9 to 4.17.0 (#3844)
Bumps
[weaviate-client](https://github.com/weaviate/weaviate-python-client)
from 4.16.9 to 4.17.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's
releases</a>.</em></p>
<blockquote>
<h2>v4.16.10</h2>
<h2>What's Changed</h2>
<ul>
<li>Add uncompressed quantitizer factory by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1800">weaviate/weaviate-python-client#1800</a></li>
<li>Add support for groups by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1778">weaviate/weaviate-python-client#1778</a></li>
<li>feat: add overwrite_alias to backup restore by <a
href="https://github.com/bevzzz"><code>@​bevzzz</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1808">weaviate/weaviate-python-client#1808</a></li>
<li>Add Multi2vec-aws and text2vec-morph by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1820">weaviate/weaviate-python-client#1820</a></li>
<li>Add support for exists on aliases. by <a
href="https://github.com/jfrancoa"><code>@​jfrancoa</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1813">weaviate/weaviate-python-client#1813</a></li>
<li>Add note re GPT4All deprecation by <a
href="https://github.com/databyjp"><code>@​databyjp</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1825">weaviate/weaviate-python-client#1825</a></li>
<li>Update setup.cfg with min weaviate agents version by <a
href="https://github.com/cdpierse"><code>@​cdpierse</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1826">weaviate/weaviate-python-client#1826</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.9...v4.16.10">https://github.com/weaviate/weaviate-python-client/compare/v4.16.9...v4.16.10</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's
changelog</a>.</em></p>
<blockquote>
<h2>Version 4.17.0</h2>
<p>This minor version includes:
- Remove support for Weaviate versions &lt; 1.27. Please update your
Weaviate instances
- Support for new 1.33 features:
- OIDC group support in RBAC
- Uncompressed quantizer
- ContainsNone and Not filter operators
- Add support for <code>verbosity</code> and <code>reasoning
effort</code> for generative-openai module
- Add alias.exists method
- Add multi2vec-aws and text2vec-morph modules
- Add support for max_tokens for generative-aws module
- Fix weaviate client installation with other packages depending on
grpc-health-checking</p>
<h2>Version 4.16.10</h2>
<p>This patch version includes:
- Addition of helper to create an uncompressed quantizer for use when
not using default compression
- Support for <code>overwrite_alias</code> option to backup
create/restore
- Support for OIDC groups
- Addition of <code>multi2vec-aws</code> and <code>text2vec-morph</code>
modules
- Support for <code>alias.exists</code> method
- Update to <code>weaviate-agents-client</code> dependency for GA
release of agents</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7acf5c096a"><code>7acf5c0</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1838">#1838</a>
from weaviate/fix_tests</li>
<li><a
href="960559d788"><code>960559d</code></a>
Remove unneeded version checks</li>
<li><a
href="7cc1861b6c"><code>7cc1861</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1837">#1837</a>
from weaviate/changelog_417</li>
<li><a
href="3e124e9dfc"><code>3e124e9</code></a>
Small cleanup in version checking</li>
<li><a
href="e1859f17a7"><code>e1859f1</code></a>
Add changelog for 4.17.0</li>
<li><a
href="1e71c7832e"><code>1e71c78</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1827">#1827</a>
from weaviate/gen_openai_params</li>
<li><a
href="9a4bedfc7b"><code>9a4bedf</code></a>
Fix enum selection</li>
<li><a
href="033542fa8c"><code>033542f</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1824">#1824</a>
from weaviate/dependabot/pip/pydoclint-0.7.3</li>
<li><a
href="158889e6d4"><code>158889e</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1823">#1823</a>
from weaviate/dependabot/pip/polars-gte-0.20.26-and-...</li>
<li><a
href="65191bb1e4"><code>65191bb</code></a>
Merge branch 'dev/1.33'</li>
<li>Additional commits viewable in <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.9...v4.17.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.9&new-version=4.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:10:31 -07:00
ehhuang
5678c25b9d
chore: remove dead code (#3863)
# What does this PR do?


## Test Plan
2025-10-20 15:04:57 -07:00
dependabot[bot]
7294385df3
chore(github-deps): bump actions/setup-node from 5.0.0 to 6.0.0 (#3843)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
5.0.0 to 6.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<p><strong>Breaking Changes</strong></p>
<ul>
<li>Limit automatic caching to npm, update workflows and documentation
by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1374">actions/setup-node#1374</a></li>
</ul>
<p><strong>Dependency Upgrades</strong></p>
<ul>
<li>Upgrade ts-jest from 29.1.2 to 29.4.1 and document breaking changes
in v5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1336">#1336</a></li>
<li>Upgrade prettier from 2.8.8 to 3.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1334">#1334</a></li>
<li>Upgrade actions/publish-action from 0.3.0 to 0.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1362">#1362</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v5...v6.0.0">https://github.com/actions/setup-node/compare/v5...v6.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2028fbc5c2"><code>2028fbc</code></a>
Limit automatic caching to npm, update workflows and documentation (<a
href="https://redirect.github.com/actions/setup-node/issues/1374">#1374</a>)</li>
<li><a
href="13427813f7"><code>1342781</code></a>
Bump actions/publish-action from 0.3.0 to 0.4.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1362">#1362</a>)</li>
<li><a
href="89d709d423"><code>89d709d</code></a>
Bump prettier from 2.8.8 to 3.6.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1334">#1334</a>)</li>
<li><a
href="cd2651c462"><code>cd2651c</code></a>
Bump ts-jest from 29.1.2 to 29.4.1 (<a
href="https://redirect.github.com/actions/setup-node/issues/1336">#1336</a>)</li>
<li>See full diff in <a
href="a0853c2454...2028fbc5c2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=5.0.0&new-version=6.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 14:59:39 -07:00
dependabot[bot]
8943335e0b
chore(github-deps): bump astral-sh/setup-uv from 7.0.0 to 7.1.0 (#3842)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
7.0.0 to 7.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.0 🌈 Support all the use cases</h2>
<h2>Changes</h2>
<p><strong>Support all the use cases!!!</strong>
... well, that we know of.</p>
<p>This release adds support for some use cases that most users don't
encounter but are useful for e.g. people running Gitea.</p>
<p>The input <code>resolution-strategy</code> lets you use the lowest
possible version of uv from a version range. Useful if you want to test
your tool with different versions of uv.</p>
<p>If you use <code>activate-environment</code> the path to the
activated venv is now also exposed under the output
<code>venv</code>.</p>
<p>Downloaded python installations can now also be uploaded to the
GitHub Actions cache backend. Useful if you are running in
<code>act</code> and have configured your own backend and don't want to
download python again, and again over a slow internet connection.</p>
<p>Finally the path to installed python interpreters is now added to the
<code>PATH</code> on Windows.</p>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add resolution-strategy input to support oldest compatible version
selection @<a
href="https://github.com/apps/copilot-swe-agent">copilot-swe-agent[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/631">#631</a>)</li>
<li>Add value of UV_PYTHON_INSTALL_DIR to path <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/628">#628</a>)</li>
<li>Set output venv when activate-environment is used <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/627">#627</a>)</li>
<li>Cache python installs <a
href="https://github.com/merlinz01"><code>@​merlinz01</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/621">#621</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>Add copilot-instructions.md <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/630">#630</a>)</li>
<li>chore: update known checksums for 0.9.2 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/626">#626</a>)</li>
<li>chore: update known checksums for 0.9.1 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/625">#625</a>)</li>
<li>Fall back to PR for updating known versions <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/623">#623</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Split up documentation <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/632">#632</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump deps <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/633">#633</a>)</li>
<li>Bump github/codeql-action from 3.30.6 to 4.30.7 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/614">#614</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3259c6206f"><code>3259c62</code></a>
Bump deps (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/633">#633</a>)</li>
<li><a
href="bf8e8ed895"><code>bf8e8ed</code></a>
Split up documentation (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/632">#632</a>)</li>
<li><a
href="9c6b5e9fb5"><code>9c6b5e9</code></a>
Add resolution-strategy input to support oldest compatible version
selection ...</li>
<li><a
href="a5129e99f4"><code>a5129e9</code></a>
Add copilot-instructions.md (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/630">#630</a>)</li>
<li><a
href="d18bcc753a"><code>d18bcc7</code></a>
Add value of UV_PYTHON_INSTALL_DIR to path (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/628">#628</a>)</li>
<li><a
href="bd1f875aba"><code>bd1f875</code></a>
Set output venv when activate-environment is used (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/627">#627</a>)</li>
<li><a
href="1a91c3851d"><code>1a91c38</code></a>
chore: update known checksums for 0.9.2 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/626">#626</a>)</li>
<li><a
href="c79f606987"><code>c79f606</code></a>
chore: update known checksums for 0.9.1 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/625">#625</a>)</li>
<li><a
href="e0249f1599"><code>e0249f1</code></a>
Fall back to PR for updating known versions (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/623">#623</a>)</li>
<li><a
href="6d2eb15b49"><code>6d2eb15</code></a>
Cache python installs (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/621">#621</a>)</li>
<li>Additional commits viewable in <a
href="eb1897b8dc...3259c6206f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=7.0.0&new-version=7.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 14:59:35 -07:00
dependabot[bot]
e7f4ddcc86
chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 (#3841)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2
to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
<li>Prepare release v4.3.0 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/motss"><code>@​motss</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li><a href="https://github.com/mouismail"><code>@​mouismail</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
<li>README: Suggest <code>user.email</code> to be
<code>41898282+github-actions[bot]@users.noreply.github.com</code> by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1707">actions/checkout#1707</a></li>
</ul>
<h2>v4.1.4</h2>
<ul>
<li>Disable <code>extensions.worktreeConfig</code> when disabling
<code>sparse-checkout</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1692">actions/checkout#1692</a></li>
<li>Add dependabot config by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1688">actions/checkout#1688</a></li>
<li>Bump the minor-actions-dependencies group with 2 updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1693">actions/checkout#1693</a></li>
<li>Bump word-wrap from 1.2.3 to 1.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1643">actions/checkout#1643</a></li>
</ul>
<h2>v4.1.3</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a
href="9f265659d3"><code>9f26565</code></a>
Update actions checkout to use node 24 (<a
href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li><a
href="08eba0b27e"><code>08eba0b</code></a>
Prepare release v4.3.0 (<a
href="https://redirect.github.com/actions/checkout/issues/2237">#2237</a>)</li>
<li><a
href="631c7dc4f8"><code>631c7dc</code></a>
Update package dependencies (<a
href="https://redirect.github.com/actions/checkout/issues/2236">#2236</a>)</li>
<li><a
href="8edcb1bdb4"><code>8edcb1b</code></a>
Update CODEOWNERS for actions (<a
href="https://redirect.github.com/actions/checkout/issues/2224">#2224</a>)</li>
<li><a
href="09d2acae67"><code>09d2aca</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/2194">#2194</a>)</li>
<li><a
href="85e6279cec"><code>85e6279</code></a>
Adjust positioning of user email note and permissions heading (<a
href="https://redirect.github.com/actions/checkout/issues/2044">#2044</a>)</li>
<li><a
href="009b9ae9e4"><code>009b9ae</code></a>
Documentation update - add recommended permissions to Readme (<a
href="https://redirect.github.com/actions/checkout/issues/2043">#2043</a>)</li>
<li><a
href="cbb722410c"><code>cbb7224</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1977">#1977</a>)</li>
<li><a
href="3b9b8c884f"><code>3b9b8c8</code></a>
docs: update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1971">#1971</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/checkout/compare/v4.2.2...08c6903cd8c0fde910a37f88322edcfb5dd907a8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4.2.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 14:59:28 -07:00
ehhuang
ab2d5febb4
chore: install client first (#3862)
# What does this PR do?
mirrors build_container.sh

trying to resolve: 

0.105 + [ editable = editable ]
0.105 + [ ! -d /workspace/llama-stack ]
0.105 + uv pip install --no-cache-dir -e /workspace/llama-stack
0.261 Using Python 3.12.12 environment at: /usr/local
0.479   × No solution found when resolving dependencies:
0.479   ╰─▶ Because only llama-stack-client<=0.2.23 is available and
0.479 llama-stack==0.3.0rc4 depends on llama-stack-client>=0.3.0rc4, we
can
0.479       conclude that llama-stack==0.3.0rc4 cannot be used.
0.479 And because only llama-stack==0.3.0rc4 is available and you
require
0.479 llama-stack, we can conclude that your requirements are
unsatisfiable.
------

## Test Plan
2025-10-20 14:56:45 -07:00
Ashwin Bharambe
94faec7bc5
chore(yaml)!: move registered resources to a sub-key (#3861)
**NOTE: this is a backwards incompatible change to the run-configs.**

A small QOL update, but this will prove useful when I do a rename for
"vector_dbs" to "vector_stores" next.

Moves all the `models, shields, ...` keys in run-config under a
`registered_resources` sub-key.
2025-10-20 14:52:48 -07:00
Ashwin Bharambe
483d53cc37
feat(stainless): add stainless source of truth config (#3860)
Source of truth for Stainless should be in this repository.

This was long due.
2025-10-20 14:32:20 -07:00
Francisco Arceo
48581bf651
chore: Updating how default embedding model is set in stack (#3818)
# What does this PR do?

Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).

New config is simply (default for Starter distro):

```yaml
vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: sentence-transformers
    model_id: nomic-ai/nomic-embed-text-v1.5
```

## Test Plan
CI and Unit tests.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-20 14:22:45 -07:00
Ashwin Bharambe
2c43285e22
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.**

Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.

## Key Changes

- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.

## Migration

Before:
```yaml
metadata_store:
  type: sqlite
  db_path: ~/.llama/distributions/foo/registry.db
inference_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
conversations_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
```

After:
```yaml
storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/distributions/foo/kvstore.db
    sql_default:
      type: sql_postgres
      host: ${env.POSTGRES_HOST}
      port: ${env.POSTGRES_PORT}
      db: ${env.POSTGRES_DB}
      user: ${env.POSTGRES_USER}
      password: ${env.POSTGRES_PASSWORD}
  stores:
    metadata:
      backend: kv_default
      namespace: registry
    inference:
      backend: sql_default
      table_name: inference_store
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      backend: sql_default
      table_name: openai_conversations
```

Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      kvstore:
        type: sqlite
        db_path: ~/.llama/distributions/foo/chroma.db
```

to:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      persistence:
        backend: kv_default
        namespace: vector_io::chroma_remote
```

Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
2025-10-20 13:20:09 -07:00
Shabana Baig
add64e8e2a
feat: Add instructions parameter in response object (#3741)
# Problem
The current inline provider appends the user provided instructions to
messages as a system prompt, but the returned response object does not
contain the instructions field (as specified in the OpenAI responses
spec).

# What does this PR do?
This pull request adds the instruction field to the response object
definition and updates the inline provider. It also ensures that
instructions from previous response is not carried over to the next
response (as specified in the openAI spec).

Closes #[3566](https://github.com/llamastack/llama-stack/issues/3566)

## Test Plan

- Tested manually for change in model response w.r.t supplied
instructions field.
- Added unit test to check that the instructions from previous response
is not carried over to the next response.
- Added integration tests to check instructions parameter in the
returned response object.
- Added new recordings for the integration tests.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 13:10:37 -07:00
Derek Higgins
1f38359d95
fix: nested claims mapping in OAuth2 token validation (#3814)
fix: nested claims mapping in OAuth2 token validation
    
The get_attributes_from_claims function was only checking for top-level
claim keys, causing token validation to fail when using nested claims
like "resource_access.llamastack.roles" (common in Keycloak JWT tokens).
    
Updated the function to support dot notation for traversing nested claim
structures. Give precedence to dot notation over literal keys with dots
in claims mapping.
    
Added test coverage.
    
Closes: #3812

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-20 12:34:55 -07:00
dependabot[bot]
08cbb69ef7
chore(python-deps): bump sqlalchemy from 2.0.41 to 2.0.44 (#3848)
Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.41
to 2.0.44.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/sqlalchemy/sqlalchemy/releases">sqlalchemy's
releases</a>.</em></p>
<blockquote>
<h1>2.0.44</h1>
<p>Released: October 10, 2025</p>
<h2>platform</h2>
<ul>
<li><strong>[platform] [bug]</strong> Unblocked automatic greenlet
installation for Python 3.14 now that
there are greenlet wheels on pypi for python 3.14.</li>
</ul>
<h2>orm</h2>
<ul>
<li>
<p><strong>[orm] [usecase]</strong> The way ORM Annotated Declarative
interprets Python <a href="https://peps.python.org/pep-0695">PEP 695</a>
type aliases
in <code>Mapped[]</code> annotations has been refined to expand the
lookup scheme. A
<a href="https://peps.python.org/pep-0695">PEP 695</a> type can now be
resolved based on either its direct presence in
<code>_orm.registry.type_annotation_map</code> or its immediate resolved
value, as long as a recursive lookup across multiple <a
href="https://peps.python.org/pep-0695">PEP 695</a> types is
not required for it to resolve. This change reverses part of the
restrictions introduced in 2.0.37 as part of <a
href="https://www.sqlalchemy.org/trac/ticket/11955">#11955</a>, which
deprecated (and disallowed in 2.1) the ability to resolve any <a
href="https://peps.python.org/pep-0695">PEP 695</a>
type that was not explicitly present in
<code>_orm.registry.type_annotation_map</code>. Recursive lookups of
<a href="https://peps.python.org/pep-0695">PEP 695</a> types remains
deprecated in 2.0 and disallowed in version 2.1,
as do implicit lookups of <code>NewType</code> types without an entry in
<code>_orm.registry.type_annotation_map</code>.</p>
<p>Additionally, new support has been added for generic <a
href="https://peps.python.org/pep-0695">PEP 695</a> aliases that
refer to <a href="https://peps.python.org/pep-0593">PEP 593</a>
<code>Annotated</code> constructs containing
<code>_orm.mapped_column()</code> configurations. See the sections below
for
examples.</p>
<p>References: <a
href="https://www.sqlalchemy.org/trac/ticket/12829">#12829</a></p>
</li>
<li>
<p><strong>[orm] [bug]</strong> Fixed a caching issue where
<code>_orm.with_loader_criteria()</code> would
incorrectly reuse cached bound parameter values when used with
<code>_sql.CompoundSelect</code> constructs such as
<code>_sql.union()</code>. The
issue was caused by the cache key for compound selects not including the
execution options that are part of the <code>_sql.Executable</code> base
class,
which <code>_orm.with_loader_criteria()</code> uses to apply its
criteria
dynamically. The fix ensures that compound selects and other executable
constructs properly include execution options in their cache key
traversal.</p>
<p>References: <a
href="https://www.sqlalchemy.org/trac/ticket/12905">#12905</a></p>
</li>
</ul>
<h2>engine</h2>
<ul>
<li><strong>[engine] [bug]</strong> Implemented initial support for
free-threaded Python by adding new tests
and reworking the test harness to include Python 3.13t and Python 3.14t
in</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/sqlalchemy/sqlalchemy/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sqlalchemy&package-manager=uv&previous-version=2.0.41&new-version=2.0.44)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 12:34:11 -07:00
dependabot[bot]
112a974005
chore(python-deps): bump ruff from 0.9.10 to 0.14.1 (#3846)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.9.10 to 0.14.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.14.1</h2>
<h2>Release Notes</h2>
<p>Released on 2025-10-16.</p>
<h3>Preview features</h3>
<ul>
<li>[formatter] Remove parentheses around multiple exception types on
Python 3.14+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20768">#20768</a>)</li>
<li>[<code>flake8-bugbear</code>] Omit annotation in preview fix for
<code>B006</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20877">#20877</a>)</li>
<li>[<code>flake8-logging-format</code>] Avoid dropping implicitly
concatenated pieces in the <code>G004</code> fix (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20793">#20793</a>)</li>
<li>[<code>pydoclint</code>] Implement
<code>docstring-extraneous-parameter</code> (<code>DOC102</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20376">#20376</a>)</li>
<li>[<code>pyupgrade</code>] Extend <code>UP019</code> to detect
<code>typing_extensions.Text</code> (<code>UP019</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20825">#20825</a>)</li>
<li>[<code>pyupgrade</code>] Fix false negative for <code>TypeVar</code>
with default argument in <code>non-pep695-generic-class</code>
(<code>UP046</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20660">#20660</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Fix false negatives in <code>Truthiness::from_expr</code> for
lambdas, generators, and f-strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20704">#20704</a>)</li>
<li>Fix syntax error false positives for escapes and quotes in f-strings
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20867">#20867</a>)</li>
<li>Fix syntax error false positives on parenthesized context managers
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20846">#20846</a>)</li>
<li>[<code>fastapi</code>] Fix false positives for path parameters that
FastAPI doesn't recognize (<code>FAST003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20687">#20687</a>)</li>
<li>[<code>flake8-pyi</code>] Fix operator precedence by adding
parentheses when needed (<code>PYI061</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20508">#20508</a>)</li>
<li>[<code>ruff</code>] Suppress diagnostic for f-string interpolations
with debug text (<code>RUF010</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20525">#20525</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>airflow</code>] Add warning to
<code>airflow.datasets.DatasetEvent</code> usage (<code>AIR301</code>)
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20551">#20551</a>)</li>
<li>[<code>flake8-bugbear</code>] Mark <code>B905</code> and
<code>B912</code> fixes as unsafe (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20695">#20695</a>)</li>
<li>Use <code>DiagnosticTag</code> for more rules - changes display in
editors (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20758">#20758</a>,<a
href="https://redirect.github.com/astral-sh/ruff/pull/20734">#20734</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Update Python compatibility from 3.13 to 3.14 in README.md (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20852">#20852</a>)</li>
<li>Update <code>lint.flake8-type-checking.quoted-annotations</code>
docs (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20765">#20765</a>)</li>
<li>Update setup instructions for Zed 0.208.0+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20902">#20902</a>)</li>
<li>[<code>flake8-datetimez</code>] Clarify docs for several rules (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20778">#20778</a>)</li>
<li>Fix typo in <code>RUF015</code> description (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20873">#20873</a>)</li>
</ul>
<h3>Other changes</h3>
<ul>
<li>Reduce binary size (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20863">#20863</a>)</li>
<li>Improved error recovery for unclosed strings (including f- and
t-strings) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20848">#20848</a>)</li>
</ul>
<h3>Contributors</h3>
<ul>
<li><a href="https://github.com/ntBre"><code>@​ntBre</code></a></li>
<li><a
href="https://github.com/Paillat-dev"><code>@​Paillat-dev</code></a></li>
<li><a href="https://github.com/terror"><code>@​terror</code></a></li>
<li><a
href="https://github.com/pieterh-oai"><code>@​pieterh-oai</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a href="https://github.com/TaKO8Ki"><code>@​TaKO8Ki</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.14.1</h2>
<p>Released on 2025-10-16.</p>
<h3>Preview features</h3>
<ul>
<li>[formatter] Remove parentheses around multiple exception types on
Python 3.14+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20768">#20768</a>)</li>
<li>[<code>flake8-bugbear</code>] Omit annotation in preview fix for
<code>B006</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20877">#20877</a>)</li>
<li>[<code>flake8-logging-format</code>] Avoid dropping implicitly
concatenated pieces in the <code>G004</code> fix (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20793">#20793</a>)</li>
<li>[<code>pydoclint</code>] Implement
<code>docstring-extraneous-parameter</code> (<code>DOC102</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20376">#20376</a>)</li>
<li>[<code>pyupgrade</code>] Extend <code>UP019</code> to detect
<code>typing_extensions.Text</code> (<code>UP019</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20825">#20825</a>)</li>
<li>[<code>pyupgrade</code>] Fix false negative for <code>TypeVar</code>
with default argument in <code>non-pep695-generic-class</code>
(<code>UP046</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20660">#20660</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Fix false negatives in <code>Truthiness::from_expr</code> for
lambdas, generators, and f-strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20704">#20704</a>)</li>
<li>Fix syntax error false positives for escapes and quotes in f-strings
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20867">#20867</a>)</li>
<li>Fix syntax error false positives on parenthesized context managers
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20846">#20846</a>)</li>
<li>[<code>fastapi</code>] Fix false positives for path parameters that
FastAPI doesn't recognize (<code>FAST003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20687">#20687</a>)</li>
<li>[<code>flake8-pyi</code>] Fix operator precedence by adding
parentheses when needed (<code>PYI061</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20508">#20508</a>)</li>
<li>[<code>ruff</code>] Suppress diagnostic for f-string interpolations
with debug text (<code>RUF010</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20525">#20525</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>airflow</code>] Add warning to
<code>airflow.datasets.DatasetEvent</code> usage (<code>AIR301</code>)
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20551">#20551</a>)</li>
<li>[<code>flake8-bugbear</code>] Mark <code>B905</code> and
<code>B912</code> fixes as unsafe (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20695">#20695</a>)</li>
<li>Use <code>DiagnosticTag</code> for more rules - changes display in
editors (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20758">#20758</a>,<a
href="https://redirect.github.com/astral-sh/ruff/pull/20734">#20734</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Update Python compatibility from 3.13 to 3.14 in README.md (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20852">#20852</a>)</li>
<li>Update <code>lint.flake8-type-checking.quoted-annotations</code>
docs (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20765">#20765</a>)</li>
<li>Update setup instructions for Zed 0.208.0+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20902">#20902</a>)</li>
<li>[<code>flake8-datetimez</code>] Clarify docs for several rules (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20778">#20778</a>)</li>
<li>Fix typo in <code>RUF015</code> description (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20873">#20873</a>)</li>
</ul>
<h3>Other changes</h3>
<ul>
<li>Reduce binary size (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20863">#20863</a>)</li>
<li>Improved error recovery for unclosed strings (including f- and
t-strings) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20848">#20848</a>)</li>
</ul>
<h3>Contributors</h3>
<ul>
<li><a href="https://github.com/ntBre"><code>@​ntBre</code></a></li>
<li><a
href="https://github.com/Paillat-dev"><code>@​Paillat-dev</code></a></li>
<li><a href="https://github.com/terror"><code>@​terror</code></a></li>
<li><a
href="https://github.com/pieterh-oai"><code>@​pieterh-oai</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a href="https://github.com/TaKO8Ki"><code>@​TaKO8Ki</code></a></li>
<li><a
href="https://github.com/ageorgou"><code>@​ageorgou</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2bffef5966"><code>2bffef5</code></a>
Bump 0.14.1 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20925">#20925</a>)</li>
<li><a
href="e64d772788"><code>e64d772</code></a>
Standardize syntax error construction (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20903">#20903</a>)</li>
<li><a
href="03696687ea"><code>0369668</code></a>
[<code>pydoclint</code>] Implement
<code>docstring-extraneous-parameter</code> (<code>DOC102</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20376">#20376</a>)</li>
<li><a
href="058fc37542"><code>058fc37</code></a>
[ty] Fix panic 'missing root' when handling completion request (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20917">#20917</a>)</li>
<li><a
href="ec9faa34be"><code>ec9faa3</code></a>
[ty] Run file watching tests serial when using nextest (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20918">#20918</a>)</li>
<li><a
href="7155a62e5c"><code>7155a62</code></a>
[ty] Add version hint for failed stdlib attribute accesses (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20909">#20909</a>)</li>
<li><a
href="a67e0690f2"><code>a67e069</code></a>
More CI improvements (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20920">#20920</a>)</li>
<li><a
href="6a1e91ce97"><code>6a1e91c</code></a>
[ty] Check typeshed VERSIONS for parent modules when reporting failed
stdlib ...</li>
<li><a
href="3db5d5906e"><code>3db5d59</code></a>
Don't use codspeed or depot runners in CI jobs on forks (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20894">#20894</a>)</li>
<li><a
href="d23826ce46"><code>d23826c</code></a>
[ty] cache Type::is_redundant_with (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20477">#20477</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.9.10...0.14.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=uv&previous-version=0.9.10&new-version=0.14.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 12:33:44 -07:00
ehhuang
9936f33f7e
chore: disable telemetry if otel endpoint isn't set (#3859)
# What does this PR do?

removes error:
ConnectionError: HTTPConnectionPool(host='localhost', port=4318): Max
retries exceeded with url: /v1/traces
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object
at 0x10fd98e60>: Failed to establish a
         new connection: [Errno 61] Connection refused'))


## Test Plan
uv run llama stack run starter
curl http://localhost:8321/v1/models
observe no error in server logs
2025-10-20 11:42:57 -07:00
ehhuang
359df3a37c
chore: update doc (#3857)
# What does this PR do?
follows https://github.com/llamastack/llama-stack/pull/3839

## Test Plan
2025-10-20 10:33:21 -07:00
ehhuang
21772de5d3
chore: use dockerfile for building containers (#3839)
# What does this PR do?

relates to #2878 

We introduce a Containerfile which is used to replaced the `llama stack
build` command (removal in a separate PR).

```
llama stack build --distro starter --image-type venv --run
```
is replaced by
```
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```


- See the updated workflow files for e2e workflow.

## Test Plan
CI
```
❯ docker build . -f docker/Dockerfile --build-arg DISTRO_NAME=starter --build-arg INSTALL_MODE=editable --tag test_starter
❯ docker run -p 8321:8321 test_starter
❯ curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
```





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3839).
* #3855
* __->__ #3839
2025-10-20 10:23:01 -07:00
Charlie Doern
573e783ff0
docs: fix sidebar of Detailed Tutorial (#3856)
# What does this PR do?

the sidebar currently has an extra `ii. Run the Script` because its
incorrectly put into the doc as an H3 not an H4 (like the other ones)


<img width="239" height="218" alt="Screenshot 2025-10-20 at 1 04 54 PM"
src="https://github.com/user-attachments/assets/eb8cb26e-7ea9-4b61-9101-d64965b39647"
/>

Fix this which will update the sidebar

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-20 13:10:50 -04:00
Jiayi Ni
165b8b07f4
docs: Documentation update for NVIDIA Inference Provider (#3840)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
- Fix examples in the NVIDIA inference documentation to align with
current API requirements.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
N/A
2025-10-20 09:51:43 -07:00
dependabot[bot]
f675fdda0f
chore(ui-deps): bump jest and @types/jest in /llama_stack/ui (#3853)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 32s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 33s
Test External API and Providers / test-external (venv) (push) Failing after 45s
Vector IO Integration Tests / test-matrix (push) Failing after 47s
API Conformance Tests / check-schema-compatibility (push) Successful in 55s
UI Tests / ui-tests (22) (push) Successful in 2m14s
Pre-commit / pre-commit (push) Successful in 3m28s
Bumps [jest](https://github.com/jestjs/jest/tree/HEAD/packages/jest) and
[@types/jest](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/jest).
These dependencies needed to be updated together.
Updates `jest` from 29.7.0 to 30.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest's
releases</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li>`[jest-snapshot-utils] Fix deprecated goo.gl snapshot guide link not
getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<h1>Chore</h1>
<ul>
<li><code>[expect]</code> Update docblock for <code>toContain()</code>
to display info on substring check (<a
href="https://redirect.github.com/jestjs/jest/pull/15789">#15789</a>)</li>
</ul>
<h2>30.0.2</h2>
<h2>What's Changed</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest's
changelog</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Fix infinite recursion with
self-referential getters in <code>deepCyclicCopyReplaceable</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15831">#15831</a>)</li>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
<li><code>[jest-snapshot-utils]</code> Improve messaging about goo.gl
snapshot link change (<a
href="https://redirect.github.com/jestjs/jest/pull/15821">#15821</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
guide link not getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="855864e3f9"><code>855864e</code></a>
v30.2.0</li>
<li><a
href="da9b532f04"><code>da9b532</code></a>
v30.1.3</li>
<li><a
href="ebfa31cc97"><code>ebfa31c</code></a>
v30.1.2</li>
<li><a
href="d347c0f3f8"><code>d347c0f</code></a>
v30.1.1</li>
<li><a
href="4d5f41d088"><code>4d5f41d</code></a>
v30.1.0</li>
<li><a
href="22236cf58b"><code>22236cf</code></a>
v30.0.5</li>
<li><a
href="f4296d2bc8"><code>f4296d2</code></a>
v30.0.4</li>
<li><a
href="d4a6c94daf"><code>d4a6c94</code></a>
v30.0.3</li>
<li><a
href="393acbfac3"><code>393acbf</code></a>
v30.0.2</li>
<li><a
href="5ce865b406"><code>5ce865b</code></a>
v30.0.1</li>
<li>Additional commits viewable in <a
href="https://github.com/jestjs/jest/commits/v30.2.0/packages/jest">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/jest` from 29.5.14 to 30.0.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/jest">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-18 21:57:57 -04:00
dependabot[bot]
7a256895aa
chore(ui-deps): bump jest-environment-jsdom from 30.1.2 to 30.2.0 in /llama_stack/ui (#3852)
Bumps
[jest-environment-jsdom](https://github.com/jestjs/jest/tree/HEAD/packages/jest-environment-jsdom)
from 30.1.2 to 30.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest-environment-jsdom's
releases</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest-environment-jsdom's
changelog</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Fix infinite recursion with
self-referential getters in <code>deepCyclicCopyReplaceable</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15831">#15831</a>)</li>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="855864e3f9"><code>855864e</code></a>
v30.2.0</li>
<li>See full diff in <a
href="https://github.com/jestjs/jest/commits/v30.2.0/packages/jest-environment-jsdom">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jest-environment-jsdom&package-manager=npm_and_yarn&previous-version=30.1.2&new-version=30.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-18 21:53:58 -04:00
dependabot[bot]
83d2193077
chore(ui-deps): bump eslint-config-next from 15.5.2 to 15.5.6 in /llama_stack/ui (#3849)
Bumps
[eslint-config-next](https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next)
from 15.5.2 to 15.5.6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">eslint-config-next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.6</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Turbopack: don't define process.cwd() in node_modules <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83452">#83452</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/mischnic"><code>@​mischnic</code></a> for
helping!</p>
<h2>v15.5.5</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Split code-frame into separate compiled package (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84238">#84238</a>)</li>
<li>Add deprecation warning to Runtime config (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84650">#84650</a>)</li>
<li>fix: unstable_cache should perform blocking revalidation during ISR
revalidation (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84716">#84716</a>)</li>
<li>feat: <code>experimental.middlewareClientMaxBodySize</code> body
cloning limit (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84722">#84722</a>)</li>
<li>fix: missing next/link types with typedRoutes (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84779">#84779</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>docs: early October improvements and fixes (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84334">#84334</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, and <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a> for
helping!</p>
<h2>v15.5.4</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: ensure onRequestError is invoked when otel enabled (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83343">#83343</a>)</li>
<li>fix: devtools initial position should be from next config (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83571">#83571</a>)</li>
<li>[devtool] fix overlay styles are missing (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83721">#83721</a>)</li>
<li>Turbopack: don't match dynamic pattern for node_modules packages (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83176">#83176</a>)</li>
<li>Turbopack: don't treat metadata routes as RSC (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82911">#82911</a>)</li>
<li>[turbopack] Improve handling of symlink resolution errors in
track_glob and read_glob (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83357">#83357</a>)</li>
<li>Turbopack: throw large static metadata error earlier (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82939">#82939</a>)</li>
<li>fix: error overlay not closing when backdrop clicked (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83981">#83981</a>)</li>
<li>Turbopack: flush Node.js worker IPC on error (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84077">#84077</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>[CNA] use linter preference (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83194">#83194</a>)</li>
<li>CI: use KV for test timing data (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83745">#83745</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="55ef0e3ebc"><code>55ef0e3</code></a>
v15.5.6</li>
<li><a
href="81f530db26"><code>81f530d</code></a>
v15.5.5</li>
<li><a
href="40f1d7814d"><code>40f1d78</code></a>
v15.5.4</li>
<li><a
href="07d1cbc9c6"><code>07d1cbc</code></a>
v15.5.3</li>
<li>See full diff in <a
href="https://github.com/vercel/next.js/commits/v15.5.6/packages/eslint-config-next">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-config-next&package-manager=npm_and_yarn&previous-version=15.5.2&new-version=15.5.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-18 21:52:17 -04:00
ehhuang
316b76db7a
chore: add telemetry setup to install.sh (#3821)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Installer CI / lint (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Installer CI / smoke-test-on-dev (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Vector IO Integration Tests / test-matrix (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s
UI Tests / ui-tests (22) (push) Successful in 1m28s
Pre-commit / pre-commit (push) Successful in 2m27s
# What does this PR do?


## Test Plan

.venv ❯ sh ./scripts/install.sh 
⚠️  Found existing container(s) for 'ollama-server', removing...
⚠️  Found existing container(s) for 'llama-stack', removing...
⚠️  Found existing container(s) for 'jaeger', removing...
⚠️  Found existing container(s) for 'otel-collector', removing...
⚠️  Found existing container(s) for 'prometheus', removing...
⚠️  Found existing container(s) for 'grafana', removing...
📡 Starting telemetry stack...
🦙 Starting Ollama...
 Waiting for Ollama daemon...

📦 Ensuring model is pulled: llama3.2:3b...
🦙 Starting Llama Stack...
 Waiting for Llama Stack API...
..

🎉 Llama Stack is ready!
👉  API endpoint: http://localhost:8321
📖 Documentation:
https://llamastack.github.io/latest/references/api_reference/index.html
💻 To access the llama stack CLI, exec into the container:
   docker exec -ti llama-stack bash
📡 Telemetry dashboards:
   Jaeger UI:      http://localhost:16686
   Prometheus UI:  http://localhost:9090
   Grafana UI:     http://localhost:3000 (admin/admin)
   OTEL Collector: http://localhost:4318
🐛 Report an issue @ https://github.com/llamastack/llama-stack/issues if
you think it's a bug
2025-10-18 06:05:56 -07:00
Charlie Doern
b11bcfde11
refactor(build): rework CLI commands and build process (1/2) (#2974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 22s
Test llama stack list-deps / show-single-provider (push) Failing after 53s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 44s
API Conformance Tests / check-schema-compatibility (push) Successful in 52s
Test llama stack list-deps / generate-matrix (push) Successful in 52s
Test Llama Stack Build / build (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 53s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m2s
Unit Tests / unit-tests (3.13) (push) Failing after 1m30s
Test llama stack list-deps / list-deps-from-config (push) Failing after 1m59s
Test llama stack list-deps / list-deps (push) Failing after 1m10s
UI Tests / ui-tests (22) (push) Successful in 2m26s
Pre-commit / pre-commit (push) Successful in 3m8s
# What does this PR do?

This PR does a few things outlined in #2878 namely:
1. adds `llama stack list-deps` a command which simply takes the build
logic and instead of executing one of the `build_...` scripts, it
displays all of the providers' dependencies using the `module` and `uv`.
2. deprecated `llama stack build` in favor of `llama stack list-deps`
3. updates all tests to use `list-deps` alongside `build`.

PR 2/2 will migrate `llama stack run`'s default behavior to be `llama
stack build --run` and use the new `list-deps` command under the hood
before running the server.

examples of `llama stack list-deps starter`

```
llama stack list-deps starter --format json
{
  "name": "starter",
  "description": "Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments.",
  "apis": [
    {
      "api": "inference",
      "provider": "remote::cerebras"
    },
    {
      "api": "inference",
      "provider": "remote::ollama"
    },
    {
      "api": "inference",
      "provider": "remote::vllm"
    },
    {
      "api": "inference",
      "provider": "remote::tgi"
    },
    {
      "api": "inference",
      "provider": "remote::fireworks"
    },
    {
      "api": "inference",
      "provider": "remote::together"
    },
    {
      "api": "inference",
      "provider": "remote::bedrock"
    },
    {
      "api": "inference",
      "provider": "remote::nvidia"
    },
    {
      "api": "inference",
      "provider": "remote::openai"
    },
    {
      "api": "inference",
      "provider": "remote::anthropic"
    },
    {
      "api": "inference",
      "provider": "remote::gemini"
    },
    {
      "api": "inference",
      "provider": "remote::vertexai"
    },
    {
      "api": "inference",
      "provider": "remote::groq"
    },
    {
      "api": "inference",
      "provider": "remote::sambanova"
    },
    {
      "api": "inference",
      "provider": "remote::azure"
    },
    {
      "api": "inference",
      "provider": "inline::sentence-transformers"
    },
    {
      "api": "vector_io",
      "provider": "inline::faiss"
    },
    {
      "api": "vector_io",
      "provider": "inline::sqlite-vec"
    },
    {
      "api": "vector_io",
      "provider": "inline::milvus"
    },
    {
      "api": "vector_io",
      "provider": "remote::chromadb"
    },
    {
      "api": "vector_io",
      "provider": "remote::pgvector"
    },
    {
      "api": "files",
      "provider": "inline::localfs"
    },
    {
      "api": "safety",
      "provider": "inline::llama-guard"
    },
    {
      "api": "safety",
      "provider": "inline::code-scanner"
    },
    {
      "api": "agents",
      "provider": "inline::meta-reference"
    },
    {
      "api": "telemetry",
      "provider": "inline::meta-reference"
    },
    {
      "api": "post_training",
      "provider": "inline::torchtune-cpu"
    },
    {
      "api": "eval",
      "provider": "inline::meta-reference"
    },
    {
      "api": "datasetio",
      "provider": "remote::huggingface"
    },
    {
      "api": "datasetio",
      "provider": "inline::localfs"
    },
    {
      "api": "scoring",
      "provider": "inline::basic"
    },
    {
      "api": "scoring",
      "provider": "inline::llm-as-judge"
    },
    {
      "api": "scoring",
      "provider": "inline::braintrust"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::brave-search"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::tavily-search"
    },
    {
      "api": "tool_runtime",
      "provider": "inline::rag-runtime"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::model-context-protocol"
    },
    {
      "api": "batches",
      "provider": "inline::reference"
    }
  ],
  "pip_dependencies": [
    "pandas",
    "opentelemetry-exporter-otlp-proto-http",
    "matplotlib",
    "opentelemetry-sdk",
    "sentence-transformers",
    "datasets",
    "pymilvus[milvus-lite]>=2.4.10",
    "codeshield",
    "scipy",
    "torchvision",
    "tree_sitter",
    "h11>=0.16.0",
    "aiohttp",
    "pymongo",
    "tqdm",
    "pythainlp",
    "pillow",
    "torch",
    "emoji",
    "grpcio>=1.67.1,<1.71.0",
    "fireworks-ai",
    "langdetect",
    "psycopg2-binary",
    "asyncpg",
    "redis",
    "together",
    "torchao>=0.12.0",
    "openai",
    "sentencepiece",
    "aiosqlite",
    "google-cloud-aiplatform",
    "faiss-cpu",
    "numpy",
    "sqlite-vec",
    "nltk",
    "scikit-learn",
    "mcp>=1.8.1",
    "transformers",
    "boto3",
    "huggingface_hub",
    "ollama",
    "autoevals",
    "sqlalchemy[asyncio]",
    "torchtune>=0.5.0",
    "chromadb-client",
    "pypdf",
    "requests",
    "anthropic",
    "chardet",
    "aiosqlite",
    "fastapi",
    "fire",
    "httpx",
    "uvicorn",
    "opentelemetry-sdk",
    "opentelemetry-exporter-otlp-proto-http"
  ]
}
```

<img width="1500" height="420" alt="Screenshot 2025-10-16 at 5 53 03 PM"
src="https://github.com/user-attachments/assets/765929fb-93e2-44d7-9c3d-8918b70fc721"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-17 19:52:14 -07:00
Emilio Garcia
943558af36
test(telemetry): Telemetry Tests (#3805)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 30s
API Conformance Tests / check-schema-compatibility (push) Successful in 38s
UI Tests / ui-tests (22) (push) Successful in 1m32s
Pre-commit / pre-commit (push) Successful in 3m16s
# What does this PR do?
Adds a test and a standardized way to build future tests out for
telemetry in llama stack.
Contributes to https://github.com/llamastack/llama-stack/issues/3806

## Test Plan
This is the test plan 😎
2025-10-17 10:43:33 -07:00
Alexey Rybak
224c99560c
docs: update docstrings for better formatting (#3838)
# What does this PR do?
Updates docstrings for Conversations and Eval APIs to render better in
the docs nav sidebar.

Before: 
<img width="363" height="233" alt="Screenshot 2025-10-17 at 9 52 17 AM"
src="https://github.com/user-attachments/assets/3a77f9e3-3b03-43ae-8584-a21d1f44d54d"
/>

After:
<img width="410" height="206" alt="Screenshot 2025-10-17 at 9 52 11 AM"
src="https://github.com/user-attachments/assets/fa5d428d-2bde-4453-84fd-9aceebe712e8"
/>


## Test Plan
* Manual testing
2025-10-17 10:41:50 -07:00
Alexey Rybak
c9f0bebcb7
chore: update API leveling docs with deprecation flag (#3837)
# What does this PR do?
Adds information on the `deprecated=True` flags to the documentation for
extra clarity.

## Test Plan
* Manual testing
2025-10-17 10:17:58 -07:00
Ashwin Bharambe
a701f68bd7
feat(ci): enable docker based server tests (#3833)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 31s
UI Tests / ui-tests (22) (push) Successful in 1m35s
Pre-commit / pre-commit (push) Successful in 2m27s
2025-10-17 09:19:25 +02:00
Ashwin Bharambe
4c9d944380
fix(perf): make batches tests finish 30x faster (#3834)
In replay mode, inference is instantenous. We don't need to wait 15
seconds for the batch to be done. Fixing polling to do exp backoff makes
things work super fast.
2025-10-17 09:16:44 +02:00
Ashwin Bharambe
cd152f4240
feat(ci): add support for docker:distro in tests (#3832)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 12s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
Test Llama Stack Build / build (push) Failing after 7s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 25s
Python Package Build Test / build (3.12) (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 1m26s
Pre-commit / pre-commit (push) Successful in 2m18s
Also a critical bug fix so test recordings can be found inside docker
2025-10-16 19:33:13 -07:00
ehhuang
b3099d40e2
fix(telemetry): remove dependency on old telemetry config (#3830)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 57s
Vector IO Integration Tests / test-matrix (push) Failing after 1m13s
API Conformance Tests / check-schema-compatibility (push) Successful in 1m22s
UI Tests / ui-tests (22) (push) Successful in 1m33s
Pre-commit / pre-commit (push) Successful in 1m55s
# What does this PR do?
old telemetry config was removed in #3815

## Test Plan

❯ OTEL_SERVICE_NAME=aloha
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run
starter
<img width="1888" height="605" alt="image"
src="https://github.com/user-attachments/assets/dd5cc9f0-213a-4dc6-9385-f61a3a13b4c3"
/>
2025-10-16 12:05:10 -07:00
ehhuang
07ff15d917
chore: distrogen enables telemetry by default (#3828)
# What does this PR do?
leftover from #3815

## Test Plan
CI


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3828).
* #3830
* __->__ #3828
2025-10-16 11:29:51 -07:00
Charlie Doern
f22aaef42f
chore!: remove telemetry API usage (#3815)
# What does this PR do?

remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc

since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.

This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.

so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.


relates to #3806

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-16 10:39:32 -07:00
slekkala1
8c5705d39e
fix: test id not being set in headers (#3827)
# What does this PR do?
When stack config is set to server in docker
STACK_CONFIG_ARG=--stack-config=http://localhost:8321, the env variable
was not getting correctly set and test id not set, causing
This is needed for test-and-cut to work 
E openai.BadRequestError: Error code: 400 - {'detail': 'Invalid value:
Test ID is required for file ID allocation'}



5286461406

## Test Plan
CI
2025-10-16 10:29:07 -07:00
Bill Murdock
c19eb9854d
docs: Document known limitations of Responses (#3776)
# What does this PR do?

Adds a subpage of the OpenAI compatibility page in the documentation.
This subpage documents known limitations of the Responses API.

<!-- If resolving an issue, uncomment and update the line below -->

Closes #3575

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-10-16 10:26:23 -07:00
Ashwin Bharambe
185de61d8e
fix(openai_mixin): no yelling for model listing if API keys are not provided (#3826)
As indicated in the title. Our `starter` distribution enables all remote
providers _very intentionally_ because we believe it creates an easier,
more welcoming experience to new folks using the software. If we do
that, and then slam the logs with errors making them question their life
choices, it is not so good :)

Note that this fix is limited in scope. If you ever try to actually
instantiate the OpenAI client from a code path without an API key being
present, you deserve to fail hard.

## Test Plan

Run `llama stack run starter` with `OPENAI_API_KEY` set. No more wall of
text, just one message saying "listed 96 models".
2025-10-16 10:12:13 -07:00
Ashwin Bharambe
07fc8013eb
fix(tests): reduce some test noise (#3825)
a bunch of logger.info()s are good for server code to help debug in
production, but we don't want them killing our unit test output :)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-16 09:52:16 -07:00
Sébastien Han
0c368492b7
chore: update agent call (#3824)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (push) Failing after 11s
API Conformance Tests / check-schema-compatibility (push) Successful in 17s
UI Tests / ui-tests (22) (push) Successful in 1m49s
Pre-commit / pre-commit (push) Successful in 2m51s
followup on https://github.com/llamastack/llama-stack/pull/3810

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-16 16:04:43 +02:00
Derek Higgins
edb8afb219
chore: remove test_cases/openai/responses.json (#3823)
Its unused

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-16 06:59:29 -07:00
Ashwin Bharambe
f70aa99c97
fix(models)!: always prefix models with provider_id when registering (#3822)
**!!BREAKING CHANGE!!**

The lookup is also straightforward -- we always look for this identifier
and don't try to find a match for something without the provider_id
prefix.

Note that, this ideally means we need to update the `register_model()`
API also (we should kill "identifier" from there) but I am not doing
that as part of this PR.

## Test Plan

Existing unit tests
2025-10-16 06:47:39 -07:00
Ashwin Bharambe
f205ab6f6c
fix(responses): fixes, re-record tests (#3820)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 17s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 1m43s
Wanted to re-enable Responses CI but it seems to hang for some reason
due to some interactions with conversations_store or responses_store.

## Test Plan

```
# library client
./scripts/integration-tests.sh --stack-config ci-tests --suite responses

# server
./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses
```
2025-10-15 16:37:42 -07:00
slekkala1
99141c29b1
feat: Add responses and safety impl extra_body (#3781)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?

Have closed the previous PR due to merge conflicts with multiple PRs
Addressed all comments from
https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying
over to this one)


## Test Plan
Added UTs and integration tests
2025-10-15 15:01:37 -07:00
Ashwin Bharambe
8e7e0ddfec
fix(responses): use conversation items when no stored messages exist (#3819)
Handle a base case when no stored messages exist because no Response
call has been made.

## Test Plan

```
./scripts/integration-tests.sh --stack-config server:ci-tests \
   --suite responses   --inference-mode record-if-missing --pattern test_conversation_responses
```
2025-10-15 14:43:44 -07:00
ehhuang
6ba9db3929
chore!: BREAKING CHANGE: remove sqlite from telemetry config (#3808)
# What does this PR do?
- Removed sqlite sink from telemetry config.
- Removed related code
- Updated doc related to telemetry

## Test Plan
CI
2025-10-15 14:24:45 -07:00
Ashwin Bharambe
0a96a7faa5
fix(responses): fix subtle bugs in non-function tool calling (#3817)
We were generating "FunctionToolCall" items even for MCP (and
file-search, etc.) server-side calls. ID mismatches, etc. galore.
2025-10-15 13:57:37 -07:00
ehhuang
d709eeb33f
chore: mark recordings as generated files (#3816)
# What does this PR do?


## Test Plan
<img width="1506" height="653" alt="image"
src="https://github.com/user-attachments/assets/6c28b8e8-effe-41ab-8e31-72482c05662d"
/>
2025-10-15 11:06:42 -07:00
Sumanth Kamenani
bc8b377a7c
fix(vector-io): handle missing document_id in insert_chunks (#3521)
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.

Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError.
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.

 # What does this PR do?

Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing
`document_id` fields. The API
was failing even though `document_id` is optional according to the
schema.

  Closes #3494

  ## Test Plan

  **Before fix:**
  - POST to `/v1/vector-io/insert` with chunks → 500 KeyError
  - Happened regardless of where `document_id` was placed

  **After fix:**
  - Same request works fine → 200 OK
  - Tested with Postman using FAISS backend
  - Added unit test covering missing `document_id` scenarios
2025-10-15 11:02:48 -07:00
Ashwin Bharambe
e9b4278a51
feat(responses)!: improve responses + conversations implementations (#3810)
This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:

- it creates a streaming output item for the final assistant message
output by
  the model. until now we only added content parts and included that
  message in the final response.

- rewrites the conversation update code completely to account for items
  other than messages (tool calls, outputs, etc.)

## Test Plan

Used the test script from
https://github.com/llamastack/llama-stack-client-python/pull/281 for
this

```
TEST_API_BASE_URL=http://localhost:8321/v1 \
  pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs
```
2025-10-15 09:36:11 -07:00
Juan Pérez de Algaba
add8cd801b
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys (#3813)
# Add support for Google Gemini `gemini-embedding-001` embedding model
and correctly registers model type

MR message created with the assistance of Claude-4.5-sonnet

This resolves https://github.com/llamastack/llama-stack/issues/3755

## What does this PR do?

This PR adds support for the `gemini-embedding-001` Google embedding
model to the llama-stack Gemini provider. This model provides
high-dimensional embeddings (3072 dimensions) compared to the existing
`text-embedding-004` model (768 dimensions). Old embeddings models (such
as text-embedding-004) will be deprecated soon according to Google
([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/))

## Problem

The Gemini provider only supported the `text-embedding-004` embedding
model. The newer `gemini-embedding-001` model, which provides
higher-dimensional embeddings for improved semantic representation, was
not available through llama-stack.

## Solution

This PR consists of three commits that implement, fix the model
registration, and enable embedding generation:

### Commit 1: Initial addition of gemini-embedding-001

Added metadata for `gemini-embedding-001` to the
`embedding_model_metadata` dictionary:

```python
embedding_model_metadata: dict[str, dict[str, int]] = {
    "text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
    "gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # NEW
}
```

**Issue discovered:** The model was not being registered correctly
because the dictionary keys didn't match the model IDs returned by
Gemini's API.

### Commit 2: Fix model ID matching with `models/` prefix

Updated both dictionary keys to include the `models/` prefix to match
Gemini's OpenAI-compatible API response format:

```python
embedding_model_metadata: dict[str, dict[str, int]] = {
    "models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},      # UPDATED
    "models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # UPDATED
}
```

**Root cause:** Gemini's OpenAI-compatible API returns model IDs with
the `models/` prefix (e.g., `models/text-embedding-004`). The
`OpenAIMixin.list_models()` method directly matches these IDs against
the `embedding_model_metadata` dictionary keys. Without the prefix, the
models were being registered as LLMs instead of embedding models.

### Commit 3: Fix embedding generation for providers without usage stats

Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented
embedding generation for providers (like Gemini) that don't return usage
statistics:

```python
# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
    prompt_tokens=response.usage.prompt_tokens,  # ← Crashed with AttributeError
    total_tokens=response.usage.total_tokens,
)

# After (Lines 351-362):
if response.usage:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=response.usage.prompt_tokens,
        total_tokens=response.usage.total_tokens,
    )
else:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=0,  # Default when not provided
        total_tokens=0,   # Default when not provided
    )
```

**Impact:** This fix enables embedding generation for **all** Gemini
embedding models, not just the newly added one.

## Changes

### Modified Files

**`llama_stack/providers/remote/inference/gemini/gemini.py`**
- Line 17: Updated `text-embedding-004` key to
`models/text-embedding-004`
- Line 18: Added `models/gemini-embedding-001` with correct metadata

**`llama_stack/providers/utils/inference/openai_mixin.py`**
- Lines 351-362: Added null check for `response.usage` to handle
providers without usage statistics

## Key Technical Details

### Model ID Matching Flow

1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint
2. API returns model IDs like: `models/text-embedding-004`,
`models/gemini-embedding-001`
3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata :=
self.embedding_model_metadata.get(provider_model_id)`
4. If matched, registers as `model_type: "embedding"` with metadata;
otherwise registers as `model_type: "llm"`

### Why Both Keys Needed the Prefix

The `text-embedding-004` model was already working because there was
likely separate configuration or manual registration handling it. For
auto-discovery to work correctly for **both** models, both keys must
match the API's model ID format exactly.

## How to test this PR

Verified the changes by:

1. **Model Auto-Discovery**: Started llama-stack server and confirmed
models are auto-discovered from Gemini API

2. **Model Registration**: Confirmed both embedding models are correctly
registered and visible
```bash
curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'
```

**Results:**
-  `gemini/models/text-embedding-004` - 768 dimensions - `model_type:
"embedding"`
-  `gemini/models/gemini-embedding-001` - 3072 dimensions -
`model_type: "embedding"`

3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"`
without embedding metadata

4. **After Fix (Commit 2)**: Models correctly identified as `model_type:
"embedding"` with proper metadata

5. **Generate Embeddings**: Verified embedding generation works
```bash
curl -X POST http://localhost:8325/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
  jq '.data[0].embedding | length'
```
2025-10-15 12:22:10 -04:00
slekkala1
ce8ea2f505
chore: Support embedding params from metadata for Vector Store (#3811)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m34s
# What does this PR do?
Support reading embedding model and dimensions from metadata for vector
store

## Test Plan
Unit Tests
2025-10-15 15:53:36 +02:00
Francisco Arceo
ef4bc70bbe
feat: Enable setting a default embedding model in the stack (#3803)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
- Unit tests
- Integration tests
- Simple example below:

Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:

```python
vs = client.vector_stores.create(
    extra_body={
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
        "embedding_dimension": 384,
    }
)
```

The `extra_body` is now unnecessary.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-14 18:25:13 -07:00
Jiayi Ni
d875e427bf
refactor: use extra_body to pass in input_type params for asymmetric embedding models for NVIDIA Inference Provider (#3804)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Previously, the NVIDIA inference provider implemented a custom
`openai_embeddings` method with a hardcoded `input_type="query"`
parameter, which is required by NVIDIA asymmetric embedding
models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)).
Recently `extra_body` parameter is added to the embeddings API
([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)).
So, this PR updates the NVIDIA inference provider to use the base
`OpenAIMixin.openai_embeddings` method instead and pass the `input_type`
through the `extra_body` parameter for asymmetric embedding models.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run the following command for the ```embedding_model```:
```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```,
```nvidia/nv-embedqa-mistral-7b-v2```, and
```snowflake/arctic-embed-l```.
```
pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record
```
2025-10-14 13:52:55 -07:00
ehhuang
866c13cdc2
chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs (#3740)
# What does this PR do?
As discussed on discord, we do not need to reinvent the wheel for
telemetry. Instead we'll lean into the canonical OTEL stack.
Logs/traces/metrics will still be sent via OTEL - they just won't be
stored on, queried through Stack.

This is the first of many PRs to remove telemetry API from Stack.
1) removed webmethod decorators to remove from API spec
2) removed tests as @iamemilio is adding them on otel directly.

## Test Plan
2025-10-14 13:48:40 -07:00
Bill Murdock
15900472ad
docs: Update CONTRIBUTING: py 3.12 and pre-commit==4.3.0 (#3807)
# What does this PR do?

Updates CONTRIBUTING.md with the following changes:
- Use Python 3.12 (and why)
- Use pre-commit==4.3.0
- Recommend using -v with pre-commit to get detailed info about why it
is failing if it fails.
- Instructs users to go to the docs/ directory before rebuilding the
docs (it doesn't work unless you do that).

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-10-14 15:47:38 -04:00
IAN MILLER
007efa6eb5
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to replace the Llama Stack's default embedding
model by nomic-embed-text-v1.5.

These are the key reasons why Llama Stack community decided to switch
from all-MiniLM-L6-v2 to nomic-embed-text-v1.5:
1. The training data for
[all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data)
includes a lot of data sets with various licensing terms, so it is
tricky to know when/whether it is appropriate to use this model for
commercial applications.
2. The model is not particularly competitive on major benchmarks. For
example, if you look at the [MTEB
Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click
on Miscellaneous/BEIR to see English information retrieval accuracy, you
see that the top of the leaderboard is dominated by enormous models but
also that there are many, many models of relatively modest size whith
much higher Retrieval scores. If you want to look closely at the data, I
recommend clicking "Download Table" because it is easier to browse that
way.

More discussion info can be founded
[here](https://github.com/llamastack/llama-stack/issues/2418)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2418 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. Run `./scripts/unit-tests.sh`
2. Integration tests via CI wokrflow

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 10:44:20 -04:00
Cesare Pompeiano
0dbf79c328
fix: Fixed WatsonX remote inference provider (#3801)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 31s
UI Tests / ui-tests (22) (push) Successful in 46s
Pre-commit / pre-commit (push) Successful in 2m13s
# What does this PR do?
This PR fixes issues with the WatsonX provider so it works correctly
with LiteLLM.

The main problem was that WatsonX requests failed because the provider
data validator didn’t properly handle the API key and project ID. This
was fixed by updating the WatsonXProviderDataValidator and ensuring the
provider data is loaded correctly.

The openai_chat_completion method was also updated to match the behavior
of other providers while adding WatsonX-specific fields like project_id.
It still calls await super().openai_chat_completion.__func__(self,
params) to keep the existing setup and tracing logic.

After these changes, WatsonX requests now run correctly.

## Test Plan
The changes were tested by running chat completion requests and
confirming that credentials and project parameters are passed correctly.
I have tested with my WatsonX credentials, by using the cli with `uv run
llama-stack-client inference chat-completion --session`

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 14:52:32 +02:00
Sébastien Han
1136daf310
fix: replace python-jose with PyJWT for JWT handling (#3756)
# What does this PR do?

This commit migrates the authentication system from python-jose to PyJWT
to eliminate the dependency on the archived rsa package. The migration
includes:

- Refactored OAuth2TokenAuthProvider to use PyJWT's PyJWKClient for
clean JWKS handling
- Removed manual JWKS fetching, caching and key extraction logic in
favor of PyJWT's built-in functionality

The new implementation is cleaner, more maintainable, and follows PyJWT
best practices while maintaining full backward compatibility.

## Test Plan

Unit tests. Auth CI.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-14 09:35:48 +02:00
Francisco Arceo
968c364a3e
chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… (#3802)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 18s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
2 main changes:

1. Remove `provider_id` requirement in call to vector stores and
2. Removes "register first embedding model" logic 
   - Now forces embedding model id as required on Vector Store creation

Simplifies the UX for OpenAI to:

```python
vs = client.vector_stores.create(
    name="my_citations_db",
    extra_body={
        "embedding_model": "ollama/nomic-embed-text:latest",
    }
)
```


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-13 10:25:36 -07:00
Derek Higgins
642126e13b
fix: record job checking wrong directory (#3799)
Fixed CI job to check the correct directory for file changes Artifacts
are now stored in multiple directories not just
./tests/integration/recordings

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-13 09:55:55 -07:00
raghotham
b95f095a54
feat: Allow :memory: for kvstore (#3696)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m21s
## Test Plan
added unit tests
2025-10-13 11:19:27 +02:00
Ashwin Bharambe
ecc8a554d2
feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m23s
Applies the same pattern from
https://github.com/llamastack/llama-stack/pull/3777 to embeddings and
vector_stores.create() endpoints.

This should _not_ be a breaking change since (a) our tests were already
using the `extra_body` parameter when passing in to the backend (b) but
the backend probably wasn't extracting the parameters correctly. This PR
will fix that.

Updated APIs: `openai_embeddings(), openai_create_vector_store(),
openai_create_vector_store_file_batch()`
2025-10-12 19:01:52 -07:00
slekkala1
3bb6ef351b
chore!: Safety api refactoring to use OpenAIMessageParam (#3796)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?
Remove usage of deprecated `Message` from Safety apis


## Test Plan
CI
2025-10-12 08:01:00 -07:00
dependabot[bot]
82cbcada39
chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui (#3788)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m26s
Bumps
[lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react)
from 0.542.0 to 0.545.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/lucide-icons/lucide/releases">lucide-react's
releases</a>.</em></p>
<blockquote>
<h2>Version 0.545.0</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(icons): changed <code>flame</code> icon by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3600">lucide-icons/lucide#3600</a></li>
<li>fix(icons): arcified <code>square-m</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3549">lucide-icons/lucide#3549</a></li>
<li>chore(deps-dev): bump vite from 6.3.5 to 6.3.6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3611">lucide-icons/lucide#3611</a></li>
<li>fix(icons): changed <code>combine</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3200">lucide-icons/lucide#3200</a></li>
<li>fix(icons): changed <code>building-2</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3509">lucide-icons/lucide#3509</a></li>
<li>chore(deps): bump devalue from 5.1.1 to 5.3.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3638">lucide-icons/lucide#3638</a></li>
<li>feat(icons): Add <code>motorbike</code> icon by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3371">lucide-icons/lucide#3371</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.544.0...0.545.0">https://github.com/lucide-icons/lucide/compare/0.544.0...0.545.0</a></p>
<h2>Version 0.544.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update lucide-static documentation about raw string imports by
<a href="https://github.com/pascalduez"><code>@​pascalduez</code></a> in
<a
href="https://redirect.github.com/lucide-icons/lucide/pull/3524">lucide-icons/lucide#3524</a></li>
<li>feat(icons): added <code>ev-charger</code> icon by <a
href="https://github.com/UsamaKhan"><code>@​UsamaKhan</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2781">lucide-icons/lucide#2781</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/pascalduez"><code>@​pascalduez</code></a> made
their first contribution in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3524">lucide-icons/lucide#3524</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.543.0...0.544.0">https://github.com/lucide-icons/lucide/compare/0.543.0...0.544.0</a></p>
<h2>Version 0.543.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(preview-comment): put x-ray at top if there are more than 7
changed icons to prevent them from being cut of by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3589">lucide-icons/lucide#3589</a></li>
<li>fix(icons): changed <code>church</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2971">lucide-icons/lucide#2971</a></li>
<li>chore(metadata): Added tags to <code>messages-square</code> by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3529">lucide-icons/lucide#3529</a></li>
<li>fix(icons): Optimise <code>bug</code> icons by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3574">lucide-icons/lucide#3574</a></li>
<li>fix(icons): changed list/text &amp; derived icons by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3568">lucide-icons/lucide#3568</a></li>
<li>fix(icons): changed <code>panel-top-bottom-dashed</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3584">lucide-icons/lucide#3584</a></li>
<li>fix(icons): changed <code>message-square-quote</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3550">lucide-icons/lucide#3550</a></li>
<li>fix(meta): added tag to <code>ship</code> metadata by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3559">lucide-icons/lucide#3559</a></li>
<li>fix(meta): add tags to <code>id-card-lanyard</code> metadata by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3534">lucide-icons/lucide#3534</a></li>
<li>fix(icons): changed <code>calendar-cog</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3583">lucide-icons/lucide#3583</a></li>
<li>chore(deps): bump astro from 5.5.2 to 5.13.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3564">lucide-icons/lucide#3564</a></li>
<li>feat(packages): add new package for flutter by <a
href="https://github.com/vqh2602"><code>@​vqh2602</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3536">lucide-icons/lucide#3536</a></li>
<li>feat(icons): added <code>house-heart</code> icon by <a
href="https://github.com/danielbayley"><code>@​danielbayley</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3239">lucide-icons/lucide#3239</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.542.0...0.543.0">https://github.com/lucide-icons/lucide/compare/0.542.0...0.543.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1cfb3ff70e"><code>1cfb3ff</code></a>
chore(deps-dev): bump vite from 6.3.5 to 6.3.6 (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3611">#3611</a>)</li>
<li>See full diff in <a
href="https://github.com/lucide-icons/lucide/commits/0.545.0/packages/lucide-react">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lucide-react&package-manager=npm_and_yarn&previous-version=0.542.0&new-version=0.545.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 21:40:48 -04:00
dependabot[bot]
e94840d298
chore(ui-deps): bump framer-motion from 12.23.12 to 12.23.24 in /llama_stack/ui (#3792)
Bumps [framer-motion](https://github.com/motiondivision/motion) from
12.23.12 to 12.23.24.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/motiondivision/motion/blob/main/CHANGELOG.md">framer-motion's
changelog</a>.</em></p>
<blockquote>
<h2>[12.23.24] 2025-10-10</h2>
<h3>Fixed</h3>
<ul>
<li>Ensure that when a component remounts, it continues to fire
animations even when <code>initial={false}</code>.</li>
</ul>
<h2>[12.23.23] 2025-10-10</h2>
<h3>Added</h3>
<ul>
<li>Exporting <code>PresenceChild</code> and <code>PopChild</code> type
for internal use.</li>
</ul>
<h2>[12.23.22] 2025-09-25</h2>
<h3>Added</h3>
<ul>
<li>Exporting <code>HTMLElements</code> and <code>useComposedRefs</code>
type for internal use.</li>
</ul>
<h2>[12.23.21] 2025-09-24</h2>
<h3>Fixed</h3>
<ul>
<li>Fixing main-thread <code>scroll</code> with animations that contain
<code>delay</code>.</li>
</ul>
<h2>[12.23.20] 2025-09-24</h2>
<h3>Fixed</h3>
<ul>
<li>Suppress non-animatable value warning for instant animations.</li>
</ul>
<h2>[12.23.19] 2025-09-23</h2>
<h3>Fixed</h3>
<ul>
<li>Remove support for changing <code>ref</code> prop.</li>
</ul>
<h2>[12.23.18] 2025-09-19</h2>
<h3>Fixed</h3>
<ul>
<li><code>&lt;motion /&gt;</code> components now support changing
<code>ref</code> prop.</li>
</ul>
<h2>[12.23.17] 2025-09-19</h2>
<h3>Fixed</h3>
<ul>
<li>Ensure <code>animate()</code> <code>onComplete</code> only fires
once, when all values are complete.</li>
</ul>
<h2>[12.23.16] 2025-09-19</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b5df740a46"><code>b5df740</code></a>
v12.23.24</li>
<li><a
href="808ebce630"><code>808ebce</code></a>
Updating changelog</li>
<li><a
href="237eee2246"><code>237eee2</code></a>
v12.23.23</li>
<li><a
href="834965c803"><code>834965c</code></a>
Updating changelog</li>
<li><a
href="40690864e9"><code>4069086</code></a>
Update README.md</li>
<li><a
href="6da6b61e94"><code>6da6b61</code></a>
Update README.md with new sponsor links</li>
<li><a
href="e36683149d"><code>e366831</code></a>
Update README.md</li>
<li><a
href="7796f4f1e0"><code>7796f4f</code></a>
Update Gold section with new links and images</li>
<li><a
href="d1bb93757c"><code>d1bb937</code></a>
Update sponsor section in README.md</li>
<li><a
href="97fba16059"><code>97fba16</code></a>
Update sponsorship logos in README</li>
<li>Additional commits viewable in <a
href="https://github.com/motiondivision/motion/compare/v12.23.12...v12.23.24">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=framer-motion&package-manager=npm_and_yarn&previous-version=12.23.12&new-version=12.23.24)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 21:36:01 -04:00
dependabot[bot]
25ea94fcf7
chore(ui-deps): bump eslint from 9.26.0 to 9.37.0 in /llama_stack/ui (#3791)
Bumps [eslint](https://github.com/eslint/eslint) from 9.26.0 to 9.37.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/eslint/eslint/releases">eslint's
releases</a>.</em></p>
<blockquote>
<h2>v9.37.0</h2>
<h2>Features</h2>
<ul>
<li><a
href="39f7fb493a"><code>39f7fb4</code></a>
feat: <code>preserve-caught-error</code> should recognize all static
&quot;cause&quot; keys (<a
href="https://redirect.github.com/eslint/eslint/issues/20163">#20163</a>)
(Pixel998)</li>
<li><a
href="f81eabc584"><code>f81eabc</code></a>
feat: support TS syntax in <code>no-restricted-imports</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/19562">#19562</a>)
(Nitin Kumar)</li>
</ul>
<h2>Bug Fixes</h2>
<ul>
<li><a
href="a129cced7a"><code>a129cce</code></a>
fix: correct <code>no-loss-of-precision</code> false positives for
leading zeros (<a
href="https://redirect.github.com/eslint/eslint/issues/20164">#20164</a>)
(Francesco Trotta)</li>
<li><a
href="09e04fcc3f"><code>09e04fc</code></a>
fix: add missing AST token types (<a
href="https://redirect.github.com/eslint/eslint/issues/20172">#20172</a>)
(Pixel998)</li>
<li><a
href="861c6da2bd"><code>861c6da</code></a>
fix: correct <code>ESLint</code> typings (<a
href="https://redirect.github.com/eslint/eslint/issues/20122">#20122</a>)
(Pixel998)</li>
</ul>
<h2>Documentation</h2>
<ul>
<li><a
href="b950359c5f"><code>b950359</code></a>
docs: fix typos across the docs (<a
href="https://redirect.github.com/eslint/eslint/issues/20182">#20182</a>)
(루밀LuMir)</li>
<li><a
href="42498a2798"><code>42498a2</code></a>
docs: improve ToC accessibility by hiding non-semantic character (<a
href="https://redirect.github.com/eslint/eslint/issues/20181">#20181</a>)
(Percy Ma)</li>
<li><a
href="29ea092b93"><code>29ea092</code></a>
docs: Update README (GitHub Actions Bot)</li>
<li><a
href="5c97a04578"><code>5c97a04</code></a>
docs: show <code>availableUntil</code> in deprecated rule banner (<a
href="https://redirect.github.com/eslint/eslint/issues/20170">#20170</a>)
(Pixel998)</li>
<li><a
href="90a71bf502"><code>90a71bf</code></a>
docs: update <code>README</code> files to add badge and instructions (<a
href="https://redirect.github.com/eslint/eslint/issues/20115">#20115</a>)
(루밀LuMir)</li>
<li><a
href="1603ae1526"><code>1603ae1</code></a>
docs: update references from <code>master</code> to <code>main</code>
(<a
href="https://redirect.github.com/eslint/eslint/issues/20153">#20153</a>)
(루밀LuMir)</li>
</ul>
<h2>Chores</h2>
<ul>
<li><a
href="afe8a13469"><code>afe8a13</code></a>
chore: update <code>@eslint/js</code> dependency to version 9.37.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20183">#20183</a>)
(Francesco Trotta)</li>
<li><a
href="abee4ca1fa"><code>abee4ca</code></a>
chore: package.json update for <code>@​eslint/js</code> release
(Jenkins)</li>
<li><a
href="fc9381f6ca"><code>fc9381f</code></a>
chore: fix typos in comments (<a
href="https://redirect.github.com/eslint/eslint/issues/20175">#20175</a>)
(overlookmotel)</li>
<li><a
href="e1574a22d3"><code>e1574a2</code></a>
chore: unpin jiti (<a
href="https://redirect.github.com/eslint/eslint/issues/20173">#20173</a>)
(renovate[bot])</li>
<li><a
href="e1ac05e2fa"><code>e1ac05e</code></a>
refactor: mark <code>ESLint.findConfigFile()</code> as
<code>async</code>, add missing docs (<a
href="https://redirect.github.com/eslint/eslint/issues/20157">#20157</a>)
(Pixel998)</li>
<li><a
href="347906d627"><code>347906d</code></a>
chore: update eslint (<a
href="https://redirect.github.com/eslint/eslint/issues/20149">#20149</a>)
(renovate[bot])</li>
<li><a
href="0cb5897e24"><code>0cb5897</code></a>
test: remove tmp dir created for circular fixes in multithread mode test
(<a
href="https://redirect.github.com/eslint/eslint/issues/20146">#20146</a>)
(Milos Djermanovic)</li>
<li><a
href="bb995665e3"><code>bb99566</code></a>
ci: pin <code>jiti</code> to version 2.5.1 (<a
href="https://redirect.github.com/eslint/eslint/issues/20151">#20151</a>)
(Pixel998)</li>
<li><a
href="177f669adc"><code>177f669</code></a>
perf: improve worker count calculation for <code>&quot;auto&quot;</code>
concurrency (<a
href="https://redirect.github.com/eslint/eslint/issues/20067">#20067</a>)
(Francesco Trotta)</li>
<li><a
href="448b57bca3"><code>448b57b</code></a>
chore: Mark deprecated formatting rules as available until v11.0.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20144">#20144</a>)
(Milos Djermanovic)</li>
</ul>
<h2>v9.36.0</h2>
<h2>Features</h2>
<ul>
<li><a
href="47afcf668d"><code>47afcf6</code></a>
feat: correct <code>preserve-caught-error</code> edge cases (<a
href="https://redirect.github.com/eslint/eslint/issues/20109">#20109</a>)
(Francesco Trotta)</li>
</ul>
<h2>Bug Fixes</h2>
<ul>
<li><a
href="75b74d865d"><code>75b74d8</code></a>
fix: add missing rule option types (<a
href="https://redirect.github.com/eslint/eslint/issues/20127">#20127</a>)
(ntnyq)</li>
<li><a
href="1c0d85049e"><code>1c0d850</code></a>
fix: update <code>eslint-all.js</code> to use <code>Object.freeze</code>
for <code>rules</code> object (<a
href="https://redirect.github.com/eslint/eslint/issues/20116">#20116</a>)
(루밀LuMir)</li>
<li><a
href="7d61b7fadc"><code>7d61b7f</code></a>
fix: add missing scope types to <code>Scope.type</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/20110">#20110</a>)
(Pixel998)</li>
<li><a
href="7a670c301b"><code>7a670c3</code></a>
fix: correct rule option typings in <code>rules.d.ts</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/20084">#20084</a>)
(Pixel998)</li>
</ul>
<h2>Documentation</h2>
<ul>
<li><a
href="b73ab12acd"><code>b73ab12</code></a>
docs: update examples to use <code>defineConfig</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/20131">#20131</a>)
(sethamus)</li>
<li><a
href="31d9392699"><code>31d9392</code></a>
docs: fix typos (<a
href="https://redirect.github.com/eslint/eslint/issues/20118">#20118</a>)
(Pixel998)</li>
<li><a
href="c7f861b3f8"><code>c7f861b</code></a>
docs: Update README (GitHub Actions Bot)</li>
<li><a
href="6b0c08b106"><code>6b0c08b</code></a>
docs: Update README (GitHub Actions Bot)</li>
<li><a
href="91f97c5046"><code>91f97c5</code></a>
docs: Update README (GitHub Actions Bot)</li>
</ul>
<h2>Chores</h2>
<ul>
<li><a
href="12411e8d45"><code>12411e8</code></a>
chore: upgrade <code>@​eslint/js</code><a
href="https://github.com/9"><code>@​9</code></a>.36.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20139">#20139</a>)
(Milos Djermanovic)</li>
<li><a
href="488cba6b39"><code>488cba6</code></a>
chore: package.json update for <code>@​eslint/js</code> release
(Jenkins)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d5d1bdf5fd"><code>d5d1bdf</code></a>
9.37.0</li>
<li><a
href="94865ff41c"><code>94865ff</code></a>
Build: changelog update for 9.37.0</li>
<li><a
href="afe8a13469"><code>afe8a13</code></a>
chore: update <code>@eslint/js</code> dependency to version 9.37.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20183">#20183</a>)</li>
<li><a
href="abee4ca1fa"><code>abee4ca</code></a>
chore: package.json update for <code>@​eslint/js</code> release</li>
<li><a
href="b950359c5f"><code>b950359</code></a>
docs: fix typos across the docs (<a
href="https://redirect.github.com/eslint/eslint/issues/20182">#20182</a>)</li>
<li><a
href="42498a2798"><code>42498a2</code></a>
docs: improve ToC accessibility by hiding non-semantic character (<a
href="https://redirect.github.com/eslint/eslint/issues/20181">#20181</a>)</li>
<li><a
href="fc9381f6ca"><code>fc9381f</code></a>
chore: fix typos in comments (<a
href="https://redirect.github.com/eslint/eslint/issues/20175">#20175</a>)</li>
<li><a
href="e1574a22d3"><code>e1574a2</code></a>
chore: unpin jiti (<a
href="https://redirect.github.com/eslint/eslint/issues/20173">#20173</a>)</li>
<li><a
href="29ea092b93"><code>29ea092</code></a>
docs: Update README</li>
<li><a
href="a129cced7a"><code>a129cce</code></a>
fix: correct <code>no-loss-of-precision</code> false positives for
leading zeros (<a
href="https://redirect.github.com/eslint/eslint/issues/20164">#20164</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/eslint/eslint/compare/v9.26.0...v9.37.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint&package-manager=npm_and_yarn&previous-version=9.26.0&new-version=9.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 18:00:29 -07:00
dependabot[bot]
190b96ea62
chore(ui-deps): bump @types/react-dom from 19.2.0 to 19.2.1 in /llama_stack/ui (#3789)
Bumps
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom)
from 19.2.0 to 19.2.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react-dom&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 18:00:22 -07:00
dependabot[bot]
4fb39f0a6a
chore(ui-deps): bump @types/react from 19.2.0 to 19.2.2 in /llama_stack/ui (#3790)
Bumps
[@types/react](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react)
from 19.2.0 to 19.2.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 18:00:18 -07:00
dependabot[bot]
cfd2e303db
chore(python-deps): bump black from 25.1.0 to 25.9.0 (#3783)
Bumps [black](https://github.com/psf/black) from 25.1.0 to 25.9.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/psf/black/releases">black's
releases</a>.</em></p>
<blockquote>
<h2>25.9.0</h2>
<h3>Highlights</h3>
<ul>
<li>Remove support for pre-python 3.7 <code>await/async</code> as soft
keywords/variable names
(<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
</ul>
<h3>Stable style</h3>
<ul>
<li>Fix crash while formatting a long <code>del</code> statement
containing tuples (<a
href="https://redirect.github.com/psf/black/issues/4628">#4628</a>)</li>
<li>Fix crash while formatting expressions using the walrus operator in
complex <code>with</code>
statements (<a
href="https://redirect.github.com/psf/black/issues/4630">#4630</a>)</li>
<li>Handle <code># fmt: skip</code> followed by a comment at the end of
file (<a
href="https://redirect.github.com/psf/black/issues/4635">#4635</a>)</li>
<li>Fix crash when a tuple appears in the <code>as</code> clause of a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4634">#4634</a>)</li>
<li>Fix crash when tuple is used as a context manager inside a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4646">#4646</a>)</li>
<li>Fix crash when formatting a <code>\</code> followed by a
<code>\r</code> followed by a comment (<a
href="https://redirect.github.com/psf/black/issues/4663">#4663</a>)</li>
<li>Fix crash on a <code>\\r\n</code> (<a
href="https://redirect.github.com/psf/black/issues/4673">#4673</a>)</li>
<li>Fix crash on <code>await ...</code> (where <code>...</code> is a
literal <code>Ellipsis</code>) (<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
<li>Fix crash on parenthesized expression inside a type parameter bound
(<a
href="https://redirect.github.com/psf/black/issues/4684">#4684</a>)</li>
<li>Fix crash when using line ranges excluding indented single line
decorated items
(<a
href="https://redirect.github.com/psf/black/issues/4670">#4670</a>)</li>
</ul>
<h3>Preview style</h3>
<ul>
<li>Fix a bug where one-liner functions/conditionals marked with <code>#
fmt: skip</code> would still
be formatted (<a
href="https://redirect.github.com/psf/black/issues/4552">#4552</a>)</li>
<li>Improve <code>multiline_string_handling</code> with ternaries and
dictionaries (<a
href="https://redirect.github.com/psf/black/issues/4657">#4657</a>)</li>
<li>Fix a bug where <code>string_processing</code> would not split
f-strings directly after
expressions (<a
href="https://redirect.github.com/psf/black/issues/4680">#4680</a>)</li>
<li>Wrap the <code>in</code> clause of comprehensions across lines if
necessary (<a
href="https://redirect.github.com/psf/black/issues/4699">#4699</a>)</li>
<li>Remove parentheses around multiple exception types in
<code>except</code> and <code>except*</code> without
<code>as</code>. (<a
href="https://redirect.github.com/psf/black/issues/4720">#4720</a>)</li>
<li>Add <code>\r</code> style newlines to the potential newlines to
normalize file newlines both from
and to (<a
href="https://redirect.github.com/psf/black/issues/4710">#4710</a>)</li>
</ul>
<h3>Parser</h3>
<ul>
<li>Rewrite tokenizer to improve performance and compliance (<a
href="https://redirect.github.com/psf/black/issues/4536">#4536</a>)</li>
<li>Fix bug where certain unusual expressions (e.g., lambdas) were not
accepted in type
parameter bounds and defaults. (<a
href="https://redirect.github.com/psf/black/issues/4602">#4602</a>)</li>
</ul>
<h3>Performance</h3>
<ul>
<li>Avoid using an extra process when running with only one worker (<a
href="https://redirect.github.com/psf/black/issues/4734">#4734</a>)</li>
</ul>
<h3>Integrations</h3>
<ul>
<li>Fix the version check in the vim file to reject Python 3.8 (<a
href="https://redirect.github.com/psf/black/issues/4567">#4567</a>)</li>
<li>Enhance GitHub Action <code>psf/black</code> to read Black version
from an additional section in
pyproject.toml: <code>[project.dependency-groups]</code> (<a
href="https://redirect.github.com/psf/black/issues/4606">#4606</a>)</li>
<li>Build gallery docker image with python3-slim and reduce image size
(<a
href="https://redirect.github.com/psf/black/issues/4686">#4686</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psf/black/blob/main/CHANGES.md">black's
changelog</a>.</em></p>
<blockquote>
<h2>25.9.0</h2>
<h3>Highlights</h3>
<ul>
<li>Remove support for pre-python 3.7 <code>await/async</code> as soft
keywords/variable names
(<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
</ul>
<h3>Stable style</h3>
<ul>
<li>Fix crash while formatting a long <code>del</code> statement
containing tuples (<a
href="https://redirect.github.com/psf/black/issues/4628">#4628</a>)</li>
<li>Fix crash while formatting expressions using the walrus operator in
complex <code>with</code>
statements (<a
href="https://redirect.github.com/psf/black/issues/4630">#4630</a>)</li>
<li>Handle <code># fmt: skip</code> followed by a comment at the end of
file (<a
href="https://redirect.github.com/psf/black/issues/4635">#4635</a>)</li>
<li>Fix crash when a tuple appears in the <code>as</code> clause of a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4634">#4634</a>)</li>
<li>Fix crash when tuple is used as a context manager inside a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4646">#4646</a>)</li>
<li>Fix crash when formatting a <code>\</code> followed by a
<code>\r</code> followed by a comment (<a
href="https://redirect.github.com/psf/black/issues/4663">#4663</a>)</li>
<li>Fix crash on a <code>\\r\n</code> (<a
href="https://redirect.github.com/psf/black/issues/4673">#4673</a>)</li>
<li>Fix crash on <code>await ...</code> (where <code>...</code> is a
literal <code>Ellipsis</code>) (<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
<li>Fix crash on parenthesized expression inside a type parameter bound
(<a
href="https://redirect.github.com/psf/black/issues/4684">#4684</a>)</li>
<li>Fix crash when using line ranges excluding indented single line
decorated items
(<a
href="https://redirect.github.com/psf/black/issues/4670">#4670</a>)</li>
</ul>
<h3>Preview style</h3>
<ul>
<li>Fix a bug where one-liner functions/conditionals marked with <code>#
fmt: skip</code> would still
be formatted (<a
href="https://redirect.github.com/psf/black/issues/4552">#4552</a>)</li>
<li>Improve <code>multiline_string_handling</code> with ternaries and
dictionaries (<a
href="https://redirect.github.com/psf/black/issues/4657">#4657</a>)</li>
<li>Fix a bug where <code>string_processing</code> would not split
f-strings directly after
expressions (<a
href="https://redirect.github.com/psf/black/issues/4680">#4680</a>)</li>
<li>Wrap the <code>in</code> clause of comprehensions across lines if
necessary (<a
href="https://redirect.github.com/psf/black/issues/4699">#4699</a>)</li>
<li>Remove parentheses around multiple exception types in
<code>except</code> and <code>except*</code> without
<code>as</code>. (<a
href="https://redirect.github.com/psf/black/issues/4720">#4720</a>)</li>
<li>Add <code>\r</code> style newlines to the potential newlines to
normalize file newlines both from
and to (<a
href="https://redirect.github.com/psf/black/issues/4710">#4710</a>)</li>
</ul>
<h3>Parser</h3>
<ul>
<li>Rewrite tokenizer to improve performance and compliance (<a
href="https://redirect.github.com/psf/black/issues/4536">#4536</a>)</li>
<li>Fix bug where certain unusual expressions (e.g., lambdas) were not
accepted in type
parameter bounds and defaults. (<a
href="https://redirect.github.com/psf/black/issues/4602">#4602</a>)</li>
</ul>
<h3>Performance</h3>
<ul>
<li>Avoid using an extra process when running with only one worker (<a
href="https://redirect.github.com/psf/black/issues/4734">#4734</a>)</li>
</ul>
<h3>Integrations</h3>
<ul>
<li>Fix the version check in the vim file to reject Python 3.8 (<a
href="https://redirect.github.com/psf/black/issues/4567">#4567</a>)</li>
<li>Enhance GitHub Action <code>psf/black</code> to read Black version
from an additional section in
pyproject.toml: <code>[project.dependency-groups]</code> (<a
href="https://redirect.github.com/psf/black/issues/4606">#4606</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="af0ba72a73"><code>af0ba72</code></a>
Prepare docs for release 25.9.0 (<a
href="https://redirect.github.com/psf/black/issues/4751">#4751</a>)</li>
<li><a
href="ffc01a0275"><code>ffc01a0</code></a>
Fix schema generation error caused by new click version (<a
href="https://redirect.github.com/psf/black/issues/4750">#4750</a>)</li>
<li><a
href="626b32fe2b"><code>626b32f</code></a>
Add normalizing for <code>\r</code> style newlines (<a
href="https://redirect.github.com/psf/black/issues/4710">#4710</a>)</li>
<li><a
href="57a461258f"><code>57a4612</code></a>
Fix mypy type issue (<a
href="https://redirect.github.com/psf/black/issues/4745">#4745</a>)</li>
<li><a
href="4f6ad7cf8c"><code>4f6ad7c</code></a>
Wrap the <code>in</code> clause of comprehensions across lines if
necessary (<a
href="https://redirect.github.com/psf/black/issues/4699">#4699</a>)</li>
<li><a
href="24f5169617"><code>24f5169</code></a>
ci: Run diff-shades on unstable instead of preview (<a
href="https://redirect.github.com/psf/black/issues/4741">#4741</a>)</li>
<li><a
href="4d55e60179"><code>4d55e60</code></a>
Bump actions/setup-python from 5 to 6 (<a
href="https://redirect.github.com/psf/black/issues/4744">#4744</a>)</li>
<li><a
href="0cf39efdbc"><code>0cf39ef</code></a>
Improve the performance of get_string_prefix (<a
href="https://redirect.github.com/psf/black/issues/4742">#4742</a>)</li>
<li><a
href="1f779dec01"><code>1f779de</code></a>
Fix line ranges decorator edge case (<a
href="https://redirect.github.com/psf/black/issues/4670">#4670</a>)</li>
<li><a
href="203fd6b5cd"><code>203fd6b</code></a>
Optimize Line string method (<a
href="https://redirect.github.com/psf/black/issues/4739">#4739</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/psf/black/compare/25.1.0...25.9.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=black&package-manager=uv&previous-version=25.1.0&new-version=25.9.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 16:48:53 -07:00
dependabot[bot]
055a7664f0
chore(python-deps): bump blobfile from 3.0.0 to 3.1.0 (#3784)
Bumps [blobfile](https://github.com/christopher-hesse/blobfile) from
3.0.0 to 3.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/blobfile/blobfile/blob/master/CHANGES.md">blobfile's
changelog</a>.</em></p>
<blockquote>
<h2>3.1.0</h2>
<ul>
<li>Improve <code>bf.join</code></li>
<li>Add option to support blind writes</li>
<li>Treat <code>EAI_NODATA</code> similarly to <code>EAI_NONAME</code>
in DNS retry logic</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ff0cd5d8ce"><code>ff0cd5d</code></a>
Release 3.1 (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/259">#259</a>)</li>
<li><a
href="395973ae2d"><code>395973a</code></a>
Handle EAI_NODATA in _bad_hostname_check (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/258">#258</a>)</li>
<li><a
href="cdc6e6a5a4"><code>cdc6e6a</code></a>
Improve bf.join (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/255">#255</a>)</li>
<li><a
href="90cb2436a7"><code>90cb243</code></a>
Add option to support blind writes (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/254">#254</a>)</li>
<li><a
href="4a2d011363"><code>4a2d011</code></a>
Add .git-blame-ignore-revs (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/253">#253</a>)</li>
<li><a
href="ab888d0679"><code>ab888d0</code></a>
Replace all CRLF with LF (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/252">#252</a>)</li>
<li><a
href="7eeb2aea87"><code>7eeb2ae</code></a>
Do not ignore warnings in tests (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/250">#250</a>)</li>
<li><a
href="0717345283"><code>0717345</code></a>
Run isort (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/249">#249</a>)</li>
<li>See full diff in <a
href="https://github.com/christopher-hesse/blobfile/compare/v3.0.0...v3.1.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=blobfile&package-manager=uv&previous-version=3.0.0&new-version=3.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 16:48:47 -07:00
dependabot[bot]
13518e7562
chore(python-deps): bump ollama from 0.5.1 to 0.6.0 (#3786)
Bumps [ollama](https://github.com/ollama/ollama-python) from 0.5.1 to
0.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/ollama/ollama-python/releases">ollama's
releases</a>.</em></p>
<blockquote>
<h2>v0.6.0</h2>
<h2>What's Changed</h2>
<ul>
<li>
<p>client: add web search and web crawl capabilities by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/578">ollama/ollama-python#578</a></p>
</li>
<li>
<p>client: load OLLAMA_API_KEY on init by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/583">ollama/ollama-python#583</a></p>
</li>
<li>
<p>client/types: update web search and fetch API by <a
href="https://github.com/npardal"><code>@​npardal</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/584">ollama/ollama-python#584</a></p>
</li>
<li>
<p>examples: add mcp server for web_search web_crawl by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/585">ollama/ollama-python#585</a></p>
</li>
<li>
<p>examples: gpt oss browser tool by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/588">ollama/ollama-python#588</a></p>
</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/npardal"><code>@​npardal</code></a> made
their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/584">ollama/ollama-python#584</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/ollama/ollama-python/compare/v0.5.4...v0.6.0">https://github.com/ollama/ollama-python/compare/v0.5.4...v0.6.0</a></p>
<h2>v0.5.4</h2>
<h2>What's Changed</h2>
<ul>
<li>examples: add gpt-oss browser example by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/558">ollama/ollama-python#558</a></li>
<li>build(deps): bump actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/559">ollama/ollama-python#559</a></li>
<li>examples/gpt-oss: fix examples by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/566">ollama/ollama-python#566</a></li>
<li>Fix link for thinking-levels.py in documentation by <a
href="https://github.com/btjanaka"><code>@​btjanaka</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/567">ollama/ollama-python#567</a></li>
<li>examples: fix gpt-oss-tools-stream for adding tool calls by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/568">ollama/ollama-python#568</a></li>
<li>examples: resolve invalid tool usage status code 400 if llm makes a
mistake gpt-oss by <a
href="https://github.com/MarkWard0110"><code>@​MarkWard0110</code></a>
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/569">ollama/ollama-python#569</a></li>
<li>build(deps): bump actions/setup-python from 5 to 6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/571">ollama/ollama-python#571</a></li>
<li>feat: add dimensions to embed request by <a
href="https://github.com/mxyng"><code>@​mxyng</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/574">ollama/ollama-python#574</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/btjanaka"><code>@​btjanaka</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/567">ollama/ollama-python#567</a></li>
<li><a
href="https://github.com/MarkWard0110"><code>@​MarkWard0110</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/569">ollama/ollama-python#569</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/ollama/ollama-python/compare/v0.5.3...v0.5.4">https://github.com/ollama/ollama-python/compare/v0.5.3...v0.5.4</a></p>
<h2>v0.5.3</h2>
<h2>What's Changed</h2>
<ul>
<li>add support for 'high'/'medium'/'low' think values by <a
href="https://github.com/drifkin"><code>@​drifkin</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/553">ollama/ollama-python#553</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/ollama/ollama-python/compare/v0.5.2...v0.5.3">https://github.com/ollama/ollama-python/compare/v0.5.2...v0.5.3</a></p>
<h2>v0.5.2</h2>
<h2>What's Changed</h2>
<ul>
<li>
<p>types/examples: add tool_name to message and examples by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/537">ollama/ollama-python#537</a></p>
</li>
<li>
<p>types: add <code>context_length</code> to ProcessResponse by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/538">ollama/ollama-python#538</a></p>
</li>
<li>
<p>types: relax type for tools by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/550">ollama/ollama-python#550</a></p>
</li>
<li>
<p>add license metadata to package by <a
href="https://github.com/ViViDboarder"><code>@​ViViDboarder</code></a>
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/526">ollama/ollama-python#526</a></p>
</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/hwittenborn"><code>@​hwittenborn</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/525">ollama/ollama-python#525</a></li>
<li><a
href="https://github.com/ViViDboarder"><code>@​ViViDboarder</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/526">ollama/ollama-python#526</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d967f048d9"><code>d967f04</code></a>
examples: gpt oss browser tool (<a
href="https://redirect.github.com/ollama/ollama-python/issues/588">#588</a>)</li>
<li><a
href="ab49a669cd"><code>ab49a66</code></a>
examples: add mcp server for web_search web_crawl (<a
href="https://redirect.github.com/ollama/ollama-python/issues/585">#585</a>)</li>
<li><a
href="16f344f635"><code>16f344f</code></a>
client/types: update web search and fetch API (<a
href="https://redirect.github.com/ollama/ollama-python/issues/584">#584</a>)</li>
<li><a
href="d0f71bc8b8"><code>d0f71bc</code></a>
client: load OLLAMA_API_KEY on init (<a
href="https://redirect.github.com/ollama/ollama-python/issues/583">#583</a>)</li>
<li><a
href="b22c5fdabb"><code>b22c5fd</code></a>
init: fix export for web_search (<a
href="https://redirect.github.com/ollama/ollama-python/issues/581">#581</a>)</li>
<li><a
href="4d0b81b37a"><code>4d0b81b</code></a>
client: add web search and web crawl capabilities (<a
href="https://redirect.github.com/ollama/ollama-python/issues/578">#578</a>)</li>
<li><a
href="a1d04f04f2"><code>a1d04f0</code></a>
feat: add dimensions to embed request (<a
href="https://redirect.github.com/ollama/ollama-python/issues/574">#574</a>)</li>
<li><a
href="8af6cac86b"><code>8af6cac</code></a>
build(deps): bump actions/setup-python from 5 to 6 (<a
href="https://redirect.github.com/ollama/ollama-python/issues/571">#571</a>)</li>
<li><a
href="9f41447f20"><code>9f41447</code></a>
examples: make gpt-oss resilient for failed tool calls (<a
href="https://redirect.github.com/ollama/ollama-python/issues/569">#569</a>)</li>
<li><a
href="da79e987f0"><code>da79e98</code></a>
examples: fix gpt-oss-tools-stream for adding toolcalls (<a
href="https://redirect.github.com/ollama/ollama-python/issues/568">#568</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/ollama/ollama-python/compare/v0.5.1...v0.6.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ollama&package-manager=uv&previous-version=0.5.1&new-version=0.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 16:48:42 -07:00
Ashwin Bharambe
e6378872c7 fix(misc): pre-commit fix for server.py 2025-10-11 16:47:59 -07:00
Ashwin Bharambe
7c63aebd64
feat(responses)!: add reasoning and annotation added events (#3793)
Implements missing streaming events from OpenAI Responses API spec: 
 - reasoning text/summary events for o1/o3 models, 
 - refusal events for safety moderation
 - annotation events for citations, 
 - and file search streaming events. 
 
Added optional reasoning_content field to chat completion chunks to
support non-standard provider extensions.

**NOTE:** OpenAI does _not_ fill reasoning_content when users use the
chat_completion APIs. This means there is no way for us to implement
Responses (with reasoning) by using OpenAI chat completions! We'd need
to transparently punt to OpenAI's responses endpoints if we wish to do
that. For others though (vLLM, etc.) we can use it.

## Test Plan

File search streaming test passes:
```
./scripts/integration-tests.sh --stack-config server:ci-tests \
   --suite responses --setup gpt --inference-mode replay --pattern test_response_file_search_streaming_events
```

Need more complex setup and validation for reasoning tests (need a vLLM
powered OSS model maybe gpt-oss which can return reasoning_content). I
will do that in a followup PR.
2025-10-11 16:47:14 -07:00
Ashwin Bharambe
f365961731 fix(tests): handle TEST_CONTEXT not being set 2025-10-11 15:31:08 -07:00
dependabot[bot]
dac1d7be1c
chore(python-deps): bump fire from 0.7.0 to 0.7.1 (#3787)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 38s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 42s
Unit Tests / unit-tests (3.12) (push) Failing after 39s
API Conformance Tests / check-schema-compatibility (push) Successful in 51s
UI Tests / ui-tests (22) (push) Successful in 54s
Pre-commit / pre-commit (push) Successful in 1m24s
Bumps [fire](https://github.com/google/python-fire) from 0.7.0 to 0.7.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/google/python-fire/releases">fire's
releases</a>.</em></p>
<blockquote>
<h2>Python Fire v0.7.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Use Neutral theme for IPython Inspector, supporting newer IPython
versions in <a
href="https://redirect.github.com/google/python-fire/pull/588">google/python-fire#588</a></li>
<li>Call inspectutils.GetClassAttrsDict on component, not None in <a
href="https://redirect.github.com/google/python-fire/pull/606">google/python-fire#606</a></li>
<li>Move to pyproject.toml, adding wheel support in pypi</li>
<li>Use ty in place of pytype</li>
<li>Update requirements <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/google/python-fire/compare/v0.7.0...v0.7.1">https://github.com/google/python-fire/compare/v0.7.0...v0.7.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8ea2f631e6"><code>8ea2f63</code></a>
Update email address</li>
<li><a
href="ea8c7f5e74"><code>ea8c7f5</code></a>
Remove unused MANIFEST</li>
<li><a
href="86bf4ca693"><code>86bf4ca</code></a>
Update pylint requirement from &lt;3.3.7 to &lt;3.3.8 (<a
href="https://redirect.github.com/google/python-fire/issues/614">#614</a>)</li>
<li><a
href="8c62e05569"><code>8c62e05</code></a>
Update pytest requirement from &lt;=8.3.5 to &lt;=8.4.1 (<a
href="https://redirect.github.com/google/python-fire/issues/615">#615</a>)</li>
<li><a
href="cec0119b10"><code>cec0119</code></a>
Update hypothesis requirement from &lt;6.133.0 to &lt;6.136.0 (<a
href="https://redirect.github.com/google/python-fire/issues/616">#616</a>)</li>
<li><a
href="8449619604"><code>8449619</code></a>
Use ty in place of pytype (<a
href="https://redirect.github.com/google/python-fire/issues/617">#617</a>)</li>
<li><a
href="d33056cb32"><code>d33056c</code></a>
Move to pyproject.toml (<a
href="https://redirect.github.com/google/python-fire/issues/613">#613</a>)</li>
<li><a
href="2e6f8d2b24"><code>2e6f8d2</code></a>
Bump version to 0.7.1 (<a
href="https://redirect.github.com/google/python-fire/issues/609">#609</a>)</li>
<li><a
href="dba7e1d0da"><code>dba7e1d</code></a>
Update hypothesis requirement in /.github/scripts (<a
href="https://redirect.github.com/google/python-fire/issues/608">#608</a>)</li>
<li><a
href="51974c67bf"><code>51974c6</code></a>
Update pylint requirement from &lt;3.3.5 to &lt;3.3.7 in
/.github/scripts (<a
href="https://redirect.github.com/google/python-fire/issues/591">#591</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/google/python-fire/compare/v0.7.0...v0.7.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=fire&package-manager=uv&previous-version=0.7.0&new-version=0.7.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 14:15:23 -07:00
dependabot[bot]
2cb1b19efe
chore(python-deps): bump psycopg2-binary from 2.9.10 to 2.9.11 (#3785)
Bumps [psycopg2-binary](https://github.com/psycopg/psycopg2) from 2.9.10
to 2.9.11.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psycopg/psycopg2/blob/master/NEWS">psycopg2-binary's
changelog</a>.</em></p>
<blockquote>
<h2>Current release</h2>
<p>What's new in psycopg 2.9.11
^^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Add support for Python 3.14.</li>
<li>Avoid a segfault passing more arguments than placeholders if Python
is built
with assertions enabled
(🎫<code>[#1791](https://github.com/psycopg/psycopg2/issues/1791)</code>).</li>
<li><code>~psycopg2.errorcodes</code> map and
<code>~psycopg2.errors</code> classes updated to
PostgreSQL 18.</li>
<li>Drop support for Python 3.8.</li>
</ul>
<p>What's new in psycopg 2.9.10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Add support for Python 3.13.</li>
<li>Receive notifications on commit
(🎫<code>[#1728](https://github.com/psycopg/psycopg2/issues/1728)</code>).</li>
<li><code>~psycopg2.errorcodes</code> map and
<code>~psycopg2.errors</code> classes updated to
PostgreSQL 17.</li>
<li>Drop support for Python 3.7.</li>
</ul>
<p>What's new in psycopg 2.9.9
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Add support for Python 3.12.</li>
<li>Drop support for Python 3.6.</li>
</ul>
<p>What's new in psycopg 2.9.8
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Wheel package bundled with PostgreSQL 16 libpq in order to add
support for
recent features, such as <code>sslcertmode</code>.</li>
</ul>
<p>What's new in psycopg 2.9.7
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Fix propagation of exceptions raised during module initialization

(🎫<code>[#1598](https://github.com/psycopg/psycopg2/issues/1598)</code>).</li>
<li>Fix building when pg_config returns an empty string
(🎫<code>[#1599](https://github.com/psycopg/psycopg2/issues/1599)</code>).</li>
<li>Wheel package bundled with OpenSSL 1.1.1v.</li>
</ul>
<p>What's new in psycopg 2.9.6
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="fd9ae8cad2"><code>fd9ae8c</code></a>
chore: bump to version 2.9.11</li>
<li><a
href="d923840546"><code>d923840</code></a>
chore: update docs requirements</li>
<li><a
href="d42dc7169d"><code>d42dc71</code></a>
Merge branch 'fix-1791'</li>
<li><a
href="4fde6560c3"><code>4fde656</code></a>
fix: avoid failed assert passing more arguments than placeholders</li>
<li><a
href="8308c19d6a"><code>8308c19</code></a>
fix: drop warning about the use of deprecated PyWeakref_GetObject
function</li>
<li><a
href="1a1eabf098"><code>1a1eabf</code></a>
build(deps): bump actions/github-script from 7 to 8</li>
<li><a
href="897af8b38b"><code>897af8b</code></a>
build(deps): bump peter-evans/repository-dispatch from 3 to 4</li>
<li><a
href="ceefd30511"><code>ceefd30</code></a>
build(deps): bump actions/checkout from 4 to 5</li>
<li><a
href="4dc585430c"><code>4dc5854</code></a>
build(deps): bump actions/setup-python from 5 to 6</li>
<li><a
href="1945788dcf"><code>1945788</code></a>
Merge pull request <a
href="https://redirect.github.com/psycopg/psycopg2/issues/1802">#1802</a>
from edgarrmondragon/cp314-wheels</li>
<li>Additional commits viewable in <a
href="https://github.com/psycopg/psycopg2/compare/2.9.10...2.9.11">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=psycopg2-binary&package-manager=uv&previous-version=2.9.10&new-version=2.9.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 14:15:17 -07:00
dependabot[bot]
f15d865a3e
chore(github-deps): bump astral-sh/setup-uv from 6.8.0 to 7.0.0 (#3782)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.8.0 to 7.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.0.0 🌈 node24 and a lot of bugfixes</h2>
<h2>Changes</h2>
<p>This release comes with a load of bug fixes and a speed up. Because
of switching from node20 to node24 it is also a breaking change. If you
are running on GitHub hosted runners this will just work, if you are
using self-hosted runners make sure, that your runners are up to date.
If you followed the normal installation instructions your self-hosted
runner will keep itself updated.</p>
<p>This release also removes the deprecated input
<code>server-url</code> which was used to download uv releases from a
different server.
The <a
href="https://github.com/astral-sh/setup-uv?tab=readme-ov-file#manifest-file">manifest-file</a>
input supersedes that functionality by adding a flexible way to define
available versions and where they should be downloaded from.</p>
<h3>Fixes</h3>
<ul>
<li>The action now respects when the environment variable
<code>UV_CACHE_DIR</code> is already set and does not overwrite it. It
now also finds <a
href="https://docs.astral.sh/uv/reference/settings/#cache-dir">cache-dir</a>
settings in config files if you set them.</li>
<li>Some users encountered problems that <a
href="https://github.com/astral-sh/setup-uv?tab=readme-ov-file#disable-cache-pruning">cache
pruning</a> took forever because they had some <code>uv</code> processes
running in the background. Starting with uv version <code>0.8.24</code>
this action uses <code>uv cache prune --ci --force</code> to ignore the
running processes</li>
<li>If you just want to install uv but not have it available in path,
this action now respects <code>UV_NO_MODIFY_PATH</code></li>
<li>Some other actions also set the env var <code>UV_CACHE_DIR</code>.
This action can now deal with that but as this could lead to unwanted
behavior in some edgecases a warning is now displayed.</li>
</ul>
<h3>Improvements</h3>
<p>If you are using minimum version specifiers for the version of uv to
install for example</p>
<pre lang="toml"><code>[tool.uv]
required-version = &quot;&gt;=0.8.17&quot;
</code></pre>
<p>This action now detects that and directly uses the latest version.
Previously it would download all available releases from the uv repo
to determine the highest matching candidate for the version specifier,
which took much more time.</p>
<p>If you are using other specifiers like <code>0.8.x</code> this action
still needs to download all available releases because the specifier
defines an upper bound (not 0.9.0 or later) and &quot;latest&quot; would
possibly not satisfy that.</p>
<h2>🚨 Breaking changes</h2>
<ul>
<li>Use node24 instead of node20 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/608">#608</a>)</li>
<li>Remove deprecated input server-url <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/607">#607</a>)</li>
</ul>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Respect UV_CACHE_DIR and cache-dir <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/612">#612</a>)</li>
<li>Use --force when pruning cache <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/611">#611</a>)</li>
<li>Respect UV_NO_MODIFY_PATH <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/603">#603</a>)</li>
<li>Warn when <code>UV_CACHE_DIR</code> has changed <a
href="https://github.com/jamesbraza"><code>@​jamesbraza</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/601">#601</a>)</li>
</ul>
<h2>🚀 Enhancements</h2>
<ul>
<li>Shortcut to latest version for minimum version specifier <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/598">#598</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/613">#613</a>)</li>
<li>Fix test-uv-no-modify-path <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/604">#604</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="eb1897b8dc"><code>eb1897b</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/613">#613</a>)</li>
<li><a
href="d78d791822"><code>d78d791</code></a>
Bump github/codeql-action from 3.30.5 to 3.30.6 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/605">#605</a>)</li>
<li><a
href="535dc2664c"><code>535dc26</code></a>
Respect UV_CACHE_DIR and cache-dir (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/612">#612</a>)</li>
<li><a
href="f610be5ff9"><code>f610be5</code></a>
Use --force when pruning cache (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/611">#611</a>)</li>
<li><a
href="3deccc0075"><code>3deccc0</code></a>
Use node24 instead of node20 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/608">#608</a>)</li>
<li><a
href="d9ee7e2f26"><code>d9ee7e2</code></a>
Remove deprecated input server-url (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/607">#607</a>)</li>
<li><a
href="59a0868fea"><code>59a0868</code></a>
Bump github/codeql-action from 3.30.3 to 3.30.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/594">#594</a>)</li>
<li><a
href="c952556164"><code>c952556</code></a>
Bump <code>@​renovatebot/pep440</code> from 4.2.0 to 4.2.1 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/581">#581</a>)</li>
<li><a
href="51c3328db2"><code>51c3328</code></a>
Fix test-uv-no-modify-path (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/604">#604</a>)</li>
<li><a
href="f2859da213"><code>f2859da</code></a>
Respect UV_NO_MODIFY_PATH (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/603">#603</a>)</li>
<li>Additional commits viewable in <a
href="d0cc045d04...eb1897b8dc">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.8.0&new-version=7.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 14:14:43 -07:00
Francisco Arceo
a165b8b5bb
chore!: BREAKING CHANGE removing VectorDB APIs (#3774)
# What does this PR do?
Removes VectorDBs from API surface and our tests.

Moves tests to Vector Stores.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-11 14:07:08 -07:00
ehhuang
06e4cd8e02
feat(api)!: BREAKING CHANGE: support passing extra_body through to providers (#3777)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?
Allows passing through extra_body parameters to inference providers.

With this, we removed the 2 vllm-specific parameters from completions
API into `extra_body`.
Before/After
<img width="1883" height="324" alt="image"
src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1"
/>



closes #2720

## Test Plan
CI and added new test
```
❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes
Uninstalled 3 packages in 125ms
Installed 3 packages in 19ms
INFO     2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base
INFO     2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server
         (stack_config=server:starter)
============================================================================================================== test session starts ==============================================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-1
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 285 items / 284 deselected / 1 selected

tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 11.773s
PASSEDTerminating llama stack server process...
Terminating process 98444 and its group...
Server process and children terminated gracefully


============================================================================================================= slowest 10 durations ==============================================================================================================
11.88s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
3.02s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s =================================================================================================
```
2025-10-10 16:21:44 -07:00
ehhuang
80d58ab519
chore: refactor (chat)completions endpoints to use shared params struct (#3761)
# What does this PR do?

Converts openai(_chat)_completions params to pydantic BaseModel to
reduce code duplication across all providers.

## Test Plan
CI









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761).
* #3777
* __->__ #3761
2025-10-10 15:46:34 -07:00
Derek Higgins
6954fe2274
fix(auth): allow unauthenticated access to health and version endpoints (#3736)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 2m1s
The AuthenticationMiddleware was blocking all requests without an
Authorization header, including health and version endpoints that are
needed by monitoring tools, load balancers, and Kubernetes probes.

This commit allows endpoints ending in /health or /version to bypass
authentication, enabling operational tooling to function properly
without requiring credentials.

Closes: #3735

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-10 13:41:43 -07:00
Varsha
32fde8d9a8
feat: Add /v1/embeddings endpoint to batches API (#3384)
# What does this PR do?
This PR extends the Llama Stack Batches API to support the
/v1/embeddings endpoint, enabling efficient batch processing of
embedding requests alongside the existing /v1/chat/completions and
/v1/completions support.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes: https://github.com/llamastack/llama-stack/issues/3145

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
(stack-client) ➜  llama-stack git:(support/embeddings-api) conda activate stack-client && python -m pytest tests/unit/providers/batches/test_reference.py -v                             
============================================================================================================================================ test session starts =============================================================================================================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 46 items                                                                                                                                                                                                                                                                                           

tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_and_retrieve_batch_success PASSED                                                                                                                                                                                [  2%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_without_metadata PASSED                                                                                                                                                                                    [  4%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_completion_window PASSED                                                                                                                                                                                   [  6%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[/v1/invalid/endpoint] PASSED                                                                                                                                                             [  8%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[] PASSED                                                                                                                                                                                 [ 10%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_metadata PASSED                                                                                                                                                                                    [ 13%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_retrieve_batch_not_found PASSED                                                                                                                                                                                         [ 15%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_success PASSED                                                                                                                                                                                             [ 17%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[failed] PASSED                                                                                                                                                                            [ 19%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[expired] PASSED                                                                                                                                                                           [ 21%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[completed] PASSED                                                                                                                                                                         [ 23%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_not_found PASSED                                                                                                                                                                                           [ 26%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_empty PASSED                                                                                                                                                                                               [ 28%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_single_batch PASSED                                                                                                                                                                                        [ 30%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_multiple_batches PASSED                                                                                                                                                                                    [ 32%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_limit PASSED                                                                                                                                                                                          [ 34%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_pagination PASSED                                                                                                                                                                                     [ 36%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_invalid_after PASSED                                                                                                                                                                                       [ 39%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_kvstore_persistence PASSED                                                                                                                                                                                              [ 41%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_not_found PASSED                                                                                                                                                                                    [ 43%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_exists_empty_content PASSED                                                                                                                                                                         [ 45%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_mixed_valid_invalid_json PASSED                                                                                                                                                                     [ 47%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_model PASSED                                                                                                                                                                                     [ 50%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED                                                                         [ 52%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED                                                                                  [ 54%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED                                                                                           [ 56%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED                                                                                        [ 58%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[model-body.model-invalid_request-Model parameter is required] PASSED                                                                                                 [ 60%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[messages-body.messages-invalid_request-Messages parameter is required] PASSED                                                                                        [ 63%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED                                                                              [ 65%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED                                                                                       [ 67%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED                                                                                                [ 69%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED                                                                                             [ 71%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[model-body.model-invalid_request-Model parameter is required] PASSED                                                                                                      [ 73%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[prompt-body.prompt-invalid_request-Prompt parameter is required] PASSED                                                                                                   [ 76%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_url_mismatch PASSED                                                                                                                                                                                      [ 78%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_multiple_errors_per_request PASSED                                                                                                                                                                       [ 80%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_request_format PASSED                                                                                                                                                                            [ 82%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[custom_id-custom_id-12345-Custom_id must be a string] PASSED                                                                                                                     [ 84%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[url-url-123-URL must be a string] PASSED                                                                                                                                         [ 86%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[method-method-invalid_value2-Method must be a string] PASSED                                                                                                                     [ 89%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[body-body-invalid_value3-Body must be a JSON dictionary object] PASSED                                                                                                           [ 91%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[model-body.model-123-Model must be a string] PASSED                                                                                                                              [ 93%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[messages-body.messages-invalid messages format-Messages must be an array] PASSED                                                                                                 [ 95%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_max_concurrent_batches PASSED                                                                                                                                                                                           [ 97%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_embeddings_endpoint PASSED                                                                                                                                                                                 [100%]

```

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 13:25:58 -07:00
Ashwin Bharambe
1394403360
feat(responses): implement usage tracking in streaming responses (#3771)
Implementats usage accumulation to StreamingResponseOrchestrator. 

The most important part was to pass `stream_options = { "include_usage":
true }` to the chat_completion call. This means I will have to record
all responses tests again because request hash will change :)

Test changes:
- Add usage assertions to streaming and non-streaming tests
- Update test recordings with actual usage data from OpenAI
2025-10-10 12:27:03 -07:00
Francisco Arceo
e7d21e1ee3
feat: Add support for Conversations in Responses API (#3743)
# What does this PR do?
This PR adds support for Conversations in Responses.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Unit tests
Integration tests

<Details>
<Summary>Manual testing with this script: (click to expand)</Summary>

```python
from openai import OpenAI

client = OpenAI()
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")

def test_conversation_create():
    print("Testing conversation create...")
    conversation = client.conversations.create(
        metadata={"topic": "demo"},
        items=[
            {"type": "message", "role": "user", "content": "Hello!"}
        ]
    )
    print(f"Created: {conversation}")
    return conversation

def test_conversation_retrieve(conv_id):
    print(f"Testing conversation retrieve for {conv_id}...")
    retrieved = client.conversations.retrieve(conv_id)
    print(f"Retrieved: {retrieved}")
    return retrieved

def test_conversation_update(conv_id):
    print(f"Testing conversation update for {conv_id}...")
    updated = client.conversations.update(
        conv_id,
        metadata={"topic": "project-x"}
    )
    print(f"Updated: {updated}")
    return updated

def test_conversation_delete(conv_id):
    print(f"Testing conversation delete for {conv_id}...")
    deleted = client.conversations.delete(conv_id)
    print(f"Deleted: {deleted}")
    return deleted

def test_conversation_items_create(conv_id):
    print(f"Testing conversation items create for {conv_id}...")
    items = client.conversations.items.create(
        conv_id,
        items=[
            {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "Hello!"}]
            },
            {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "How are you?"}]
            }
        ]
    )
    print(f"Items created: {items}")
    return items

def test_conversation_items_list(conv_id):
    print(f"Testing conversation items list for {conv_id}...")
    items = client.conversations.items.list(conv_id, limit=10)
    print(f"Items list: {items}")
    return items

def test_conversation_item_retrieve(conv_id, item_id):
    print(f"Testing conversation item retrieve for {conv_id}/{item_id}...")
    item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id)
    print(f"Item retrieved: {item}")
    return item

def test_conversation_item_delete(conv_id, item_id):
    print(f"Testing conversation item delete for {conv_id}/{item_id}...")
    deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id)
    print(f"Item deleted: {deleted}")
    return deleted

def test_conversation_responses_create():
    print("\nTesting conversation create for a responses example...")
    conversation = client.conversations.create()
    print(f"Created: {conversation}")

    response = client.responses.create(
      model="gpt-4.1",
      input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
      conversation=conversation.id,
    )
    print(f"Created response: {response} for conversation {conversation.id}")

    return response, conversation

def test_conversations_responses_create_followup(
        conversation,
        content="Repeat what you just said but add 'this is my second time saying this'",
    ):
    print(f"Using: {conversation.id}")

    response = client.responses.create(
      model="gpt-4.1",
      input=[{"role": "user", "content": content}],
      conversation=conversation.id,
    )
    print(f"Created response: {response} for conversation {conversation.id}")

    conv_items = client.conversations.items.list(conversation.id)
    print(f"\nRetrieving list of items for conversation {conversation.id}:")
    print(conv_items.model_dump_json(indent=2))

def test_response_with_fake_conv_id():
    fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44"
    print(f"Using {fake_conv_id}")
    try:
        response = client.responses.create(
          model="gpt-4.1",
          input=[{"role": "user", "content": "say hello"}],
          conversation=fake_conv_id,
        )
        print(f"Created response: {response} for conversation {fake_conv_id}")
    except Exception as e:
        print(f"failed to create response for conversation {fake_conv_id} with error {e}")


def main():
    print("Testing OpenAI Conversations API...")

    # Create conversation
    conversation = test_conversation_create()
    conv_id = conversation.id

    # Retrieve conversation
    test_conversation_retrieve(conv_id)

    # Update conversation
    test_conversation_update(conv_id)

    # Create items
    items = test_conversation_items_create(conv_id)

    # List items
    items_list = test_conversation_items_list(conv_id)

    # Retrieve specific item
    if items_list.data:
        item_id = items_list.data[0].id
        test_conversation_item_retrieve(conv_id, item_id)

        # Delete item
        test_conversation_item_delete(conv_id, item_id)

    # Delete conversation
    test_conversation_delete(conv_id)

    response, conversation2 = test_conversation_responses_create()
    print('\ntesting reseponse retrieval')
    test_conversation_retrieve(conversation2.id)

    print('\ntesting responses follow up')
    test_conversations_responses_create_followup(conversation2)

    print('\ntesting responses follow up x2!')

    test_conversations_responses_create_followup(
        conversation2,
        content="Repeat what you just said but add 'this is my third time saying this'",
    )

    test_response_with_fake_conv_id()

    print("All tests completed!")


if __name__ == "__main__":
    main()
```
</Details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 11:57:40 -07:00
Ashwin Bharambe
932fea813a
fix(ci): remove responses from CI for now (#3773)
There are many changes to responses which are landing. They are
introducing fundamental new types. This means re-recordings even from
the inference calls. Let's avoid that for now.

Once everything lands I will re-record everything, make things pass and
re-enable.
2025-10-10 11:52:17 -07:00
Ashwin Bharambe
548ccff368
fix(mypy): fix wrong attribute access (#3770) 2025-10-10 09:30:43 -07:00
grs
8bf07f91cb
feat: reuse previous mcp tool listings where possible (#3710)
# What does this PR do?
This PR checks whether, if a previous response is linked, there are
mcp_list_tools objects that can be reused instead of listing the tools
explicitly every time.

 Closes #3106 

## Test Plan
Tested manually.
Added unit tests to cover new behaviour.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 09:28:25 -07:00
Matthew Farrellee
0066d986c5
feat: use SecretStr for inference provider auth credentials (#3724)
# What does this PR do?

use SecretStr for OpenAIMixin providers

- RemoteInferenceProviderConfig now has auth_credential: SecretStr
- the default alias is api_key (most common name)
- some providers override to use api_token (RunPod, vLLM, Databricks)
- some providers exclude it (Ollama, TGI, Vertex AI)

addresses #3517 

## Test Plan

ci w/ new tests
2025-10-10 07:32:50 -07:00
Derek Higgins
6d8f61206e
fix: update normalize to search all recordings dirs (#3767)
Updated scripts/normalize_recordings.py to dynamically find and process
all 'recordings' directories under tests/ using pathlib.rglob() instead
of hardcoding a single path.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-10 07:32:14 -07:00
Ashwin Bharambe
e039b61d26
feat(responses)!: add in_progress, failed, content part events (#3765)
## Summary
- add schema + runtime support for response.in_progress /
response.failed / response.incomplete
- stream content parts with proper indexes and reasoning slots
- align tests + docs with the richer event payloads

## Testing
- uv run pytest
tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input
- uv run pytest
tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py
2025-10-10 07:27:34 -07:00
Akram Ben Aissi
a548169b99
fix: allow skipping model availability check for vLLM (#3739)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Allows model check to fail gracefully instead of crashing on startup.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

set VLLM_URL to your VLLM server 

```
(base) akram@Mac llama-stack % LAMA_STACK_LOGGING="all=debug" VLLM_ENABLE_MODEL_DISCOVERY=false  MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter  --image-type venv --run
```



```

INFO     2025-10-08 20:11:24,637 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-08 20:11:24,866 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-08 20:11:26,160 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=9fba207425
         5851c718aca717a5887d76%3A%2Fmodels">Found</a>.
         
[...]
INFO     2025-10-08 20:11:26,295 uvicorn.error:84 uncategorized: Started server process [83144]
INFO     2025-10-08 20:11:26,296 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-10-08 20:11:26,297 llama_stack.core.server.server:170 core::server: Starting up
INFO     2025-10-08 20:11:26,297 llama_stack.core.stack:399 core: starting registry refresh task
INFO     2025-10-08 20:11:26,311 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-10-08 20:11:26,312 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
ERROR    2025-10-08 20:11:26,791 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=8ef0cba3e1
         71a4f8b04cb445cfb91a4c%3A%2Fmodels">Found</a>.

```
2025-10-10 07:23:13 -07:00
Ashwin Bharambe
aaf5036235
feat(responses): add usage types to inference and responses APIs (#3764)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
## Summary
Adds OpenAI-compatible usage tracking types to enable reporting token
consumption for both streaming and non-streaming responses.

## Type Definitions
**Chat Completion Usage** (inference API):
```python
class OpenAIChatCompletionUsage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails | None
    completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails | None
```

**Response Usage** (responses API):
```python
class OpenAIResponseUsage(BaseModel):
    input_tokens: int
    output_tokens: int
    total_tokens: int
    input_tokens_details: OpenAIResponseUsageInputTokensDetails | None
    output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None
```

This matches OpenAI's usage reporting format and enables PR #3766 to
implement usage tracking in streaming responses.

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-10 09:22:59 -04:00
Ashwin Bharambe
ebae0385bb
fix: update dangling references to llama download command (#3763)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 2m14s
## Summary
After removing model management CLI in #3700, this PR updates remaining
references to the old `llama download` command to use `huggingface-cli
download` instead.

## Changes
- Updated error messages in `meta_reference/common.py` to recommend
`huggingface-cli download`
- Updated error messages in
`torchtune/recipes/lora_finetuning_single_device.py` to use
`huggingface-cli download`
- Updated post-training notebook to use `huggingface-cli download`
instead of `llama download`
- Fixed typo: "you model" -> "your model"

## Test Plan
- Verified error messages provide correct guidance for users
- Checked that notebook instructions are up-to-date with current tooling
2025-10-09 18:35:02 -07:00
Ashwin Bharambe
8fe4a216b5
fix(inference): propagate 401/403 errors from remote providers (#3762)
## Summary
Fixes #2990

Remote provider authentication errors (401/403) were being converted to
500 Internal Server Error, preventing users from understanding why their
requests failed.

## The Problem
When a request with an invalid API key was sent to a remote provider:
- Provider correctly returns 401 with error details
- Llama Stack's `translate_exception()` didn't recognize provider SDK
exceptions
- Fell through to generic 500 error handler
- User received: "Internal server error: An unexpected error occurred."

## The Fix
Added handler in `translate_exception()` that checks for exceptions with
a `status_code` attribute and preserves the original HTTP status code
and error message.

**Before:**
```json
HTTP 500
{"detail": "Internal server error: An unexpected error occurred."}
```

**After:**
```json
HTTP 401
{"detail": "Error code: 401 - {'error': {'message': 'Invalid API Key', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}"}
```

## Tested With
-  groq: 401 "Invalid API Key"  
-  openai: 401 "Incorrect API key provided"
-  together: 401 "Invalid API key provided"
-  fireworks: 403 "unauthorized"

## Test Plan

**Automated test script:**
https://gist.github.com/ashwinb/1199dd7585ffa3f4be67b111cc65f2f3

The test script:
1. Builds separate stacks for each provider
2. Registers models (with validation temporarily disabled for testing)
3. Sends requests with invalid API keys via `x-llamastack-provider-data`
header
4. Verifies HTTP status codes are 401/403 (not 500)

**Results before fix:** All providers returned 500  
**Results after fix:** All providers correctly return 401/403

**Manual verification:**
```bash
# 1. Build stack
llama stack build --image-type venv --providers inference=remote::groq

# 2. Start stack
llama stack run

# 3. Send request with invalid API key
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H 'x-llamastack-provider-data: {"groq_api_key": "invalid-key"}' \
  -d '{"model": "groq/llama3-70b-8192", "messages": [{"role": "user", "content": "test"}]}'

# Expected: HTTP 401 with provider error message (not 500)
```

## Impact
- Works with all remote providers using OpenAI SDK (groq, openai,
together, fireworks, etc.)
- Works with any provider SDK that follows the pattern of exceptions
with `status_code` attribute
- No breaking changes - only affects error responses
2025-10-09 18:34:39 -07:00
Matthew Farrellee
145b2bcf25
feat: make object registration idempotent (#3752)
# What does this PR do?

objects (vector dbs, models, scoring functions, etc) have an identifier
and associated object values.

we allow exact duplicate registrations.

we reject registrations when the identifier exists and the associated
object values differ.

note: model are namespaced, i.e. {provider_id}/{identifier}, while other
object types are not

## Test Plan

ci w/ new tests
2025-10-09 17:04:28 -07:00
Sébastien Han
7ee0ee7843
chore!: remove model mgmt from CLI for Hugging Face CLI (#3700)
This change removes the `llama model` and `llama download` subcommands
from the CLI, replacing them with recommendations to use the Hugging
Face CLI instead.

Rationale for this change:
- The model management functionality was largely duplicating what
Hugging Face CLI already provides, leading to unnecessary maintenance
overhead (except the download source from Meta?)
- Maintaining our own implementation required fixing bugs and keeping up
with changes in model repositories and download mechanisms
- The Hugging Face CLI is more mature, widely adopted, and better
maintained
- This allows us to focus on the core Llama Stack functionality rather
than reimplementing model management tools

Changes made:
- Removed all model-related CLI commands and their implementations
- Updated documentation to recommend using `huggingface-cli` for model
downloads
- Removed Meta-specific download logic and statements
- Simplified the CLI to focus solely on stack management operations

Users should now use:
- `huggingface-cli download` for downloading models
- `huggingface-cli scan-cache` for listing downloaded models

This is a breaking change as it removes previously available CLI
commands.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-09 16:50:33 -07:00
Ashwin Bharambe
841d0c3583
fix(testing): improve api_recorder error messages for missing recordings (#3760)
Replaces opaque error messages when recordings are not found with
somewhat better guidance

Before:
```
No recorded response found for request hash: abc123...
To record this response, run with LLAMA_STACK_TEST_INFERENCE_MODE=record
```

After:
```
Recording not found for request hash: abc123
Model: gpt-4 | Request: POST https://api.openai.com/v1/chat/completions

Run './scripts/integration-tests.sh --inference-mode record-if-missing' with required API keys to generate.
```
2025-10-09 15:04:16 -07:00
Ashwin Bharambe
a055a32ee4
fix(tests): remove chroma and qdrant from vector io unit tests (#3759)
These vector databases are already thoroughly tested in integration
tests.
Unit tests now focus on sqlite_vec, faiss, and pgvector with mocked
dependencies, removing the need for external service dependencies.

## Changes:
- Deleted test_qdrant.py unit test file
- Removed chroma/qdrant fixtures and parametrization from conftest.py
- Fixed SqliteKVStoreConfig import to use correct location
- Removed chromadb, qdrant-client, pymilvus, milvus-lite, and
  weaviate-client from unit test dependencies in pyproject.toml
2025-10-09 14:36:34 -07:00
Ashwin Bharambe
f50ce11a3b
feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403)
Renames `inference_recorder.py` to `api_recorder.py` and extends it to
support recording/replaying tool invocations in addition to inference
calls.

This allows us to record web-search, etc. tool calls and thereafter
apply recordings for `tests/integration/responses`

## Test Plan

```
export OPENAI_API_KEY=...
export TAVILY_SEARCH_API_KEY=...

./scripts/integration-tests.sh --stack-config ci-tests \
   --suite responses --inference-mode record-if-missing
```
2025-10-09 14:27:51 -07:00
grs
26fd5dbd34
fix: add traces for tool calls and mcp tool listing (#3722)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
Adds traces around tool execution and mcp tool listing for better
observability.

Closes #3108 

## Test Plan
Manually examined traces in jaeger to verify the added information was
available.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-10-09 09:59:09 -07:00
Sébastien Han
4b9ebbf6a2
chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750)
Reverts llamastack/llama-stack#3624
Causing https://github.com/llamastack/llama-stack/issues/3749
2025-10-09 09:17:37 -04:00
ehhuang
05a62a6ffb
chore: print integration tests command (#3747)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m23s
# What does this PR do?


## Test Plan

<img width="1104" height="60" alt="image"
src="https://github.com/user-attachments/assets/d4691a2e-c5ec-4df5-a15a-f86e667fdf8c"
/>
2025-10-08 15:12:13 -07:00
Ashwin Bharambe
16db42e7e5
feat(tests): add --collect-only option to integration test script (#3745)
Adds --collect-only flag to scripts/integration-tests.sh that skips
server startup and passes the flag to pytest for test collection only.
When specified, minimal flags are required (no --stack-config or --setup
needed).

## Changes
- Added `--collect-only` flag that skips server startup
- Made `--stack-config` and `--setup` optional when using
`--collect-only`
- Skip `llama` command check when collecting tests only

## Usage
```bash
# Collect tests without starting server
./scripts/integration-tests.sh --subdirs inference --collect-only
```
2025-10-08 14:20:34 -07:00
Francisco Arceo
b96640eca3
chore: Removing Weaviate, PGVector, and Milvus from unit tests (#3742)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 48s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?
Removing Weaviate, PostGres, and Milvus unit tests

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-08 12:25:51 -07:00
Ashwin Bharambe
79bed44b04
fix(tests): ensure test isolation in server mode (#3737)
Propagate test IDs from client to server via HTTP headers to maintain
proper test isolation when running with server-based stack configs.
Without
this, recorded/replayed inference requests in server mode would leak
across
tests.

Changes:
- Patch client _prepare_request to inject test ID into provider data
header
- Sync test context from provider data on server side before storage
operations
- Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config
- Configure console width for cleaner log output in CI
- Add SQLITE_STORE_DIR temp directory for test data isolation
2025-10-08 12:03:36 -07:00
grs
96886afaca
fix(responses): fix regression in support for mcp tool require_approval argument (#3731)
# What does this PR do?

It prevents a tool call message being added to the chat completions
message without a corresponding tool call result, which is needed in the
case that an approval is required first or if the approval request is
denied. In both these cases the tool call messages is popped of the next
turn messages.

Closes #3728

## Test Plan
Ran the integration tests
Manual check of both approval and denial against gpt-4o

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-10-08 10:47:17 -04:00
Bill Murdock
5d711d4bcb
fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 32s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

- The watsonx.ai provider now uses the LiteLLM mixin instead of using
IBM's library, which does not seem to be working (see #3165 for
context).
- The watsonx.ai provider now lists all the models available by calling
the watsonx.ai server instead of having a hard coded list of known
models. (That list gets out of date quickly)
- An edge case in
[llama_stack/core/routers/inference.py](https://github.com/llamastack/llama-stack/pull/3674/files#diff-a34bc966ed9befd9f13d4883c23705dff49be0ad6211c850438cdda6113f3455)
is addressed that was causing my manual tests to fail.
- Fixes `b64_encode_openai_embeddings_response` which was trying to
enumerate over a dictionary and then reference elements of the
dictionary using .field instead of ["field"]. That method is called by
the LiteLLM mixin for embedding models, so it is needed to get the
watsonx.ai embedding models to work.
- A unit test along the lines of the one in #3348 is added. A more
comprehensive plan for automatically testing the end-to-end
functionality for inference providers would be a good idea, but is out
of scope for this PR.
- Updates to the watsonx distribution. Some were in response to the
switch to LiteLLM (e.g., updating the Python packages needed). Others
seem to be things that were already broken that I found along the way
(e.g., a reference to a watsonx specific doc template that doesn't seem
to exist).

Closes #3165

Also it is related to a line-item in #3387 but doesn't really address
that goal (because it uses the LiteLLM mixin, not the OpenAI one). I
tried the OpenAI one and it doesn't work with watsonx.ai, presumably
because the watsonx.ai service is not OpenAI compatible. It works with
LiteLLM because LiteLLM has a provider implementation for watsonx.ai.

## Test Plan

The test script below goes back and forth between the OpenAI and watsonx
providers. The idea is that the OpenAI provider shows how it should work
and then the watsonx provider output shows that it is also working with
watsonx. Note that the result from the MCP test is not as good (the
Llama 3.3 70b model does not choose tools as wisely as gpt-4o), but it
is still working and providing a valid response. For more details on
setup and the MCP server being used for testing, see [the AI Alliance
sample
notebook](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/)
that these examples are drawn from.

```python
#!/usr/bin/env python3

import json
from llama_stack_client import LlamaStackClient
from litellm import completion
import http.client


def print_response(response):
    """Print response in a nicely formatted way"""
    print(f"ID: {response.id}")
    print(f"Status: {response.status}")
    print(f"Model: {response.model}")
    print(f"Created at: {response.created_at}")
    print(f"Output items: {len(response.output)}")
    
    for i, output_item in enumerate(response.output):
        if len(response.output) > 1:
            print(f"\n--- Output Item {i+1} ---")
        print(f"Output type: {output_item.type}")
        
        if output_item.type in ("text", "message"):
            print(f"Response content: {output_item.content[0].text}")
        elif output_item.type == "file_search_call":
            print(f"  Tool Call ID: {output_item.id}")
            print(f"  Tool Status: {output_item.status}")
            # 'queries' is a list, so we join it for clean printing
            print(f"  Queries: {', '.join(output_item.queries)}")
            # Display results if they exist, otherwise note they are empty
            print(f"  Results: {output_item.results if output_item.results else 'None'}")
        elif output_item.type == "mcp_list_tools":
            print_mcp_list_tools(output_item)
        elif output_item.type == "mcp_call":
            print_mcp_call(output_item)
        else:
            print(f"Response content: {output_item.content}")


def print_mcp_call(mcp_call):
    """Print MCP call in a nicely formatted way"""
    print(f"\n🛠️  MCP Tool Call: {mcp_call.name}")
    print(f"   Server: {mcp_call.server_label}")
    print(f"   ID: {mcp_call.id}")
    print(f"   Arguments: {mcp_call.arguments}")
    
    if mcp_call.error:
        print("Error: {mcp_call.error}")
    elif mcp_call.output:
        print("Output:")
        # Try to format JSON output nicely
        try:
            parsed_output = json.loads(mcp_call.output)
            print(json.dumps(parsed_output, indent=4))
        except:
            # If not valid JSON, print as-is
            print(f"   {mcp_call.output}")
    else:
        print("    No output yet")


def print_mcp_list_tools(mcp_list_tools):
    """Print MCP list tools in a nicely formatted way"""
    print(f"\n🔧 MCP Server: {mcp_list_tools.server_label}")
    print(f"   ID: {mcp_list_tools.id}")
    print(f"   Available Tools: {len(mcp_list_tools.tools)}")
    print("=" * 80)
    
    for i, tool in enumerate(mcp_list_tools.tools, 1):
        print(f"\n{i}. {tool.name}")
        print(f"   Description: {tool.description}")
        
        # Parse and display input schema
        schema = tool.input_schema
        if schema and 'properties' in schema:
            properties = schema['properties']
            required = schema.get('required', [])
            
            print("   Parameters:")
            for param_name, param_info in properties.items():
                param_type = param_info.get('type', 'unknown')
                param_desc = param_info.get('description', 'No description')
                required_marker = " (required)" if param_name in required else " (optional)"
                print(f"     • {param_name} ({param_type}){required_marker}")
                if param_desc:
                    print(f"       {param_desc}")
        
        if i < len(mcp_list_tools.tools):
            print("-" * 40)


def main():
    """Main function to run all the tests"""
    
    # Configuration
    LLAMA_STACK_URL = "http://localhost:8321/"
    LLAMA_STACK_MODEL_IDS = [
        "openai/gpt-3.5-turbo",
        "openai/gpt-4o",
        "llama-openai-compat/Llama-3.3-70B-Instruct",
        "watsonx/meta-llama/llama-3-3-70b-instruct"
    ]
    
    # Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml.
    OPENAI_MODEL_ID = LLAMA_STACK_MODEL_IDS[1]
    WATSONX_MODEL_ID = LLAMA_STACK_MODEL_IDS[-1]
    NPS_MCP_URL = "http://localhost:3005/sse/"
    
    print("=== Llama Stack Testing Script ===")
    print(f"Using OpenAI model: {OPENAI_MODEL_ID}")
    print(f"Using WatsonX model: {WATSONX_MODEL_ID}")
    print(f"MCP URL: {NPS_MCP_URL}")
    print()
    
    # Initialize client
    print("Initializing LlamaStackClient...")
    client = LlamaStackClient(base_url="http://localhost:8321")
    
    # Test 1: List models
    print("\n=== Test 1: List Models ===")
    try:
        models = client.models.list()
        print(f"Found {len(models)} models")
    except Exception as e:
        print(f"Error listing models: {e}")
        raise e
    
    # Test 2: Basic chat completion with OpenAI
    print("\n=== Test 2: Basic Chat Completion (OpenAI) ===")
    try:
        chat_completion_response = client.chat.completions.create(
            model=OPENAI_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}]
        )
        
        print("OpenAI Response:")
        for chunk in chat_completion_response.choices[0].message.content:
            print(chunk, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with OpenAI chat completion: {e}")
        raise e
    
    # Test 3: Basic chat completion with WatsonX
    print("\n=== Test 3: Basic Chat Completion (WatsonX) ===")
    try:
        chat_completion_response_wxai = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}],
        )
        
        print("WatsonX Response:")
        for chunk in chat_completion_response_wxai.choices[0].message.content:
            print(chunk, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with WatsonX chat completion: {e}")
        raise e
    
    # Test 4: Tool calling with OpenAI
    print("\n=== Test 4: Tool Calling (OpenAI) ===")
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather for a specific location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g., San Francisco, CA",
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    
    messages = [
        {"role": "user", "content": "What's the weather like in Boston, MA?"}
    ]
    
    try:
        print("--- Initial API Call ---")
        response = client.chat.completions.create(
            model=OPENAI_MODEL_ID,
            messages=messages,
            tools=tools,
            tool_choice="auto",  # "auto" is the default
        )
        print("OpenAI tool calling response received")
    except Exception as e:
        print(f"Error with OpenAI tool calling: {e}")
        raise e
    
    # Test 5: Tool calling with WatsonX
    print("\n=== Test 5: Tool Calling (WatsonX) ===")
    try:
        wxai_response = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=messages,
            tools=tools,
            tool_choice="auto",  # "auto" is the default
        )
        print("WatsonX tool calling response received")
    except Exception as e:
        print(f"Error with WatsonX tool calling: {e}")
        raise e
    
    # Test 6: Streaming with WatsonX
    print("\n=== Test 6: Streaming Response (WatsonX) ===")
    try:
        chat_completion_response_wxai_stream = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}],
            stream=True
        )
        print("Model response: ", end="")
        for chunk in chat_completion_response_wxai_stream:
            # Each 'chunk' is a ChatCompletionChunk object.
            # We want the content from the 'delta' attribute.
            if hasattr(chunk, 'choices') and chunk.choices is not None:
                content = chunk.choices[0].delta.content
                # The first few chunks might have None content, so we check for it.
                if content is not None:
                    print(content, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with streaming: {e}")
        raise e
    
    # Test 7: MCP with OpenAI
    print("\n=== Test 7: MCP Integration (OpenAI) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=OPENAI_MODEL_ID,
            input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
            tools=[
                {
                    "type": "mcp",
                    "server_url": NPS_MCP_URL,
                    "server_label": "National Parks Service tools",
                    "allowed_tools": ["search_parks", "get_park_events"],
                }
            ]
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (OpenAI): {e}")
        raise e
    
    # Test 8: MCP with WatsonX
    print("\n=== Test 8: MCP Integration (WatsonX) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=WATSONX_MODEL_ID,
            input="What is the capital of France?"
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (WatsonX): {e}")
        raise e
    
    # Test 9: MCP with Llama 3.3
    print("\n=== Test 9: MCP Integration (Llama 3.3) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=WATSONX_MODEL_ID,
            input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
            tools=[
                {
                    "type": "mcp",
                    "server_url": NPS_MCP_URL,
                    "server_label": "National Parks Service tools",
                    "allowed_tools": ["search_parks", "get_park_events"],
                }
            ]
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (Llama 3.3): {e}")
        raise e
    
    # Test 10: Embeddings
    print("\n=== Test 10: Embeddings ===")
    try:
        conn = http.client.HTTPConnection("localhost:8321")
        payload = json.dumps({
            "model": "watsonx/ibm/granite-embedding-278m-multilingual",
            "input": "Hello, world!",
        })
        headers = {
            'Content-Type': 'application/json',
            'Accept': 'application/json'
        }
        conn.request("POST", "/v1/openai/v1/embeddings", payload, headers)
        res = conn.getresponse()
        data = res.read()
        print(data.decode("utf-8"))
    except Exception as e:
        print(f"Error with Embeddings: {e}")
        raise e

    print("\n=== Testing Complete ===")


if __name__ == "__main__":
    main()
```

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-08 07:29:43 -04:00
dependabot[bot]
62bac0aad4
chore(github-deps): bump actions/stale from 10.0.0 to 10.1.0 (#3684)
Bumps [actions/stale](https://github.com/actions/stale) from 10.0.0 to
10.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/stale/releases">actions/stale's
releases</a>.</em></p>
<blockquote>
<h2>v10.1.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add <code>only-issue-types</code> option to filter issues by type by
<a href="https://github.com/Bibo-Joshi"><code>@​Bibo-Joshi</code></a> in
<a
href="https://redirect.github.com/actions/stale/pull/1255">actions/stale#1255</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/Bibo-Joshi"><code>@​Bibo-Joshi</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1255">actions/stale#1255</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/stale/compare/v10...v10.1.0">https://github.com/actions/stale/compare/v10...v10.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5f858e3efb"><code>5f858e3</code></a>
Add <code>only-issue-types</code> option to filter issues by type (<a
href="https://redirect.github.com/actions/stale/issues/1255">#1255</a>)</li>
<li>See full diff in <a
href="3a9db7e6a4...5f858e3efb">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/stale&package-manager=github_actions&previous-version=10.0.0&new-version=10.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-08 12:16:54 +02:00
Omar Abdelwahab
702fcd1abf
fix: Raising an error message to the user when registering an existing provider. (#3624)
When the user wants to change the attributes (which could include model
name, dimensions,...etc) of an already registered provider, they will
get an error message asking that they first unregister the provider
before registering a new one.

# What does this PR do?
This PR updated the register function to raise an error to the user when
they attempt to register a provider that was already registered asking
them to un-register the existing provider first.

<!-- If resolving an issue, uncomment and update the line below -->
#2313

## Test Plan
Tested the change with /tests/unit/registry/test_registry.py

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-10-08 12:09:23 +02:00
ehhuang
0cde3d956d
chore: require valid logging category (#3712)
# What does this PR do?
grep'd and audited all usage of 'get_logger' with help of Claude.

## Test Plan
CI
2025-10-08 11:10:33 +02:00
ehhuang
a3f5072776
chore!: remove --env from llama stack run (#3711)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / smoke-test-on-dev (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?
user can simply set env vars in the beginning of the command.`FOO=BAR
llama stack run ...`

## Test Plan
Run
TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build
--distro=starter --image-type=venv --run




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711).
* #3714
* __->__ #3711
2025-10-07 20:58:15 -07:00
slekkala1
1ac320b7e6
chore: remove dead code (#3729)
# What does this PR do?
Removing some dead code, found by vulture and checked by claude that
there are no references or imports for these


## Test Plan
CI
2025-10-07 20:26:02 -07:00
ehhuang
b6e9f41041
chore: Revert "fix: fix nvidia provider (#3716)" (#3730)
This reverts commit c940fe7938.

@wukaixingxp I stamped to fast. Let's wait for @mattf's review.
2025-10-07 19:16:51 -07:00
Kai Wu
c940fe7938
fix: fix nvidia provider (#3716)
# What does this PR do?
(Used claude to solve #3715, coded with claude but tested by me)
## From claude summary:
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
**Problem**: The `NVIDIAInferenceAdapter` class was missing the
`alias_to_provider_id_map` attribute, which caused the error:

`ERROR 'NVIDIAInferenceAdapter' object has no attribute
'alias_to_provider_id_map'`

**Root Cause**: The `NVIDIAInferenceAdapter` only inherited from
`OpenAIMixin`, but some parts of the system expected it to have the
`alias_to_provider_id_map` attribute, which is provided by the
`ModelRegistryHelper` class.

**Solution**:

1. **Added ModelRegistryHelper import**: Imported the
`ModelRegistryHelper` class from
`llama_stack.providers.utils.inference.model_registry`
2. **Updated inheritance**: Changed the class declaration to inherit
from both `OpenAIMixin` and `ModelRegistryHelper`
3. **Added proper initialization**: Added an `__init__` method that
properly initializes the `ModelRegistryHelper` with empty model entries
(since NVIDIA uses dynamic model discovery) and the allowed models from
the configuration

**Key Changes**:

* Added `from llama_stack.providers.utils.inference.model_registry
import ModelRegistryHelper`
* Changed class declaration from `class
NVIDIAInferenceAdapter(OpenAIMixin):` to `class
NVIDIAInferenceAdapter(OpenAIMixin, ModelRegistryHelper):`
* Added `__init__` method that calls `ModelRegistryHelper.__init__(self,
model_entries=[], allowed_models=config.allowed_models)`

The inheritance order is important - `OpenAIMixin` comes first to ensure
its `check_model_availability()` method takes precedence over the
`ModelRegistryHelper` version, as mentioned in the class documentation.

This fix ensures that the `NVIDIAInferenceAdapter` has the required
`alias_to_provider_id_map` attribute while maintaining all existing
functionality.<!-- If resolving an issue, uncomment and update the line
below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching llama-stack server successfully, see logs:
```
NVIDIA_API_KEY=dummy NVIDIA_BASE_URL=http://localhost:8912 llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --image-type venv &
[2] 3753042
(venv) nvidia@nv-meta-H100-testing-gpu01:~/kai/llama-stack$ WARNING  2025-10-07 00:29:09,848 root:266 uncategorized: Unknown logging category:
         openai::conversations. Falling back to default 'root' level: 20
WARNING  2025-10-07 00:29:09,932 root:266 uncategorized: Unknown logging category: cli.
         Falling back to default 'root' level: 20
INFO     2025-10-07 00:29:09,937 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:09,937 llama_stack.cli.stack.run:136 cli: Using run
         configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml
Using virtual environment: /home/nvidia/kai/venv
Virtual environment already activated
+ '[' -n /home/nvidia/.llama/distributions/starter/starter-run.yaml ']'
+ yaml_config_arg=/home/nvidia/.llama/distributions/starter/starter-run.yaml
+ llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --port 8321
WARNING  2025-10-07 00:29:11,432 root:266 uncategorized: Unknown logging category:
         openai::conversations. Falling back to default 'root' level: 20
WARNING  2025-10-07 00:29:11,593 root:266 uncategorized: Unknown logging category: cli.
         Falling back to default 'root' level: 20
INFO     2025-10-07 00:29:11,603 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,604 llama_stack.cli.stack.run:136 cli: Using run
         configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,624 llama_stack.cli.stack.run:155 cli: No image type or
         image name provided. Assuming environment packages.
INFO     2025-10-07 00:29:11,625 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,644 llama_stack.cli.stack.run:230 cli: HTTPS enabled with
         certificates:
           Key: None
           Cert: None
INFO     2025-10-07 00:29:11,645 llama_stack.cli.stack.run:232 cli: Listening on ['::',
         '0.0.0.0']:8321
INFO     2025-10-07 00:29:11,816 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,836 llama_stack.core.server.server:480 core::server: Run
         configuration:
INFO     2025-10-07 00:29:11,845 llama_stack.core.server.server:483 core::server: apis:
         - agents
         - batches
         - datasetio
         - eval
         - files
         - inference
         - post_training
         - safety
         - scoring
         - telemetry
         - tool_runtime
         - vector_io
         benchmarks: []
         datasets: []
         image_name: starter
         inference_store:
           db_path: /home/nvidia/.llama/distributions/starter/inference_store.db
           type: sqlite
         metadata_store:
           db_path: /home/nvidia/.llama/distributions/starter/registry.db
           type: sqlite
         models: []
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /home/nvidia/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /home/nvidia/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           batches:
           - config:
               kvstore:
                 db_path: /home/nvidia/.llama/distributions/starter/batches.db
                 type: sqlite
             provider_id: reference
             provider_type: inline::reference
           datasetio:
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /home/nvidia/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /home/nvidia/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               api_key: '********'
               url: https://api.fireworks.ai/inference/v1
             provider_id: fireworks
             provider_type: remote::fireworks
           - config:
               api_key: '********'
               url: https://api.together.xyz/v1
             provider_id: together
             provider_type: remote::together
           - config: {}
             provider_id: bedrock
             provider_type: remote::bedrock
           - config:
               api_key: '********'
               append_api_version: true
               url: http://localhost:8912
             provider_id: nvidia
             provider_type: remote::nvidia
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
           - config:
               api_key: '********'
             provider_id: anthropic
             provider_type: remote::anthropic
           - config:
               api_key: '********'
             provider_id: gemini
             provider_type: remote::gemini
           - config:
               api_key: '********'
               url: https://api.groq.com
             provider_id: groq
             provider_type: remote::groq
           - config:
               api_key: '********'
               url: https://api.sambanova.ai/v1
             provider_id: sambanova
             provider_type: remote::sambanova
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: meta
             provider_id: torchtune-cpu
             provider_type: inline::torchtune-cpu
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           - config: {}
             provider_id: code-scanner
             provider_type: inline::code-scanner
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               service_name: "\u200B"
               sinks: sqlite
               sqlite_db_path: /home/nvidia/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /home/nvidia/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
           - config:
               db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec.db
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/sqlite_vec_registry.db
                 type: sqlite
             provider_id: sqlite-vec
             provider_type: inline::sqlite-vec
         scoring_fns: []
         server:
           port: 8321
         shields: []
         tool_groups:
         - provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2
INFO     2025-10-07 00:29:12,138
         llama_stack.providers.remote.inference.nvidia.nvidia:49 inference::nvidia:
         Initializing NVIDIAInferenceAdapter(http://localhost:8912)...
INFO     2025-10-07 00:29:12,921
         llama_stack.providers.utils.inference.inference_store:74 inference: Write
         queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 00:29:13,524
         llama_stack.providers.utils.responses.responses_store:96 openai_responses:
         Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 00:29:13,679 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:13,681 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider fireworks: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:13,682 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed
         with: Pass Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
WARNING  2025-10-07 00:29:13,684 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider together: Pass
         Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
Handling connection for 8912
INFO     2025-10-07 00:29:14,047 llama_stack.providers.utils.inference.openai_mixin:448
         providers::utils: NVIDIAInferenceAdapter.list_provider_model_ids() returned 3
         models
ERROR    2025-10-07 00:29:14,062 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,063 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider openai: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,099 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed
         with: "Could not resolve authentication method. Expected either api_key or
         auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers
         to be explicitly omitted"
WARNING  2025-10-07 00:29:14,100 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider anthropic: "Could not
         resolve authentication method. Expected either api_key or auth_token to be
         set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly
         omitted"
ERROR    2025-10-07 00:29:14,102 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,103 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider gemini: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,105 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with:
         API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in
         the provider config.
WARNING  2025-10-07 00:29:14,106 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider groq: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider
         config.
ERROR    2025-10-07 00:29:14,107 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,109 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider sambanova: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,454 uvicorn.error:84 uncategorized: Started server process
         [3753046]
INFO     2025-10-07 00:29:14,455 uvicorn.error:48 uncategorized: Waiting for
         application startup.
INFO     2025-10-07 00:29:14,457 llama_stack.core.server.server:170 core::server:
         Starting up
INFO     2025-10-07 00:29:14,458 llama_stack.core.stack:415 core: starting registry
         refresh task
ERROR    2025-10-07 00:29:14,459 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,461 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider fireworks: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,462 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed
         with: Pass Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
WARNING  2025-10-07 00:29:14,463 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider together: Pass
         Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
ERROR    2025-10-07 00:29:14,465 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,466 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider openai: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,500 uvicorn.error:62 uncategorized: Application startup
         complete.
ERROR    2025-10-07 00:29:14,502 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed
         with: "Could not resolve authentication method. Expected either api_key or
         auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers
         to be explicitly omitted"
WARNING  2025-10-07 00:29:14,503 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider anthropic: "Could not
         resolve authentication method. Expected either api_key or auth_token to be
         set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly
         omitted"
ERROR    2025-10-07 00:29:14,504 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,506 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider gemini: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,507 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with:
         API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in
         the provider config.
WARNING  2025-10-07 00:29:14,508 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider groq: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider
         config.
ERROR    2025-10-07 00:29:14,510 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,511 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider sambanova: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,513 uvicorn.error:216 uncategorized: Uvicorn running on
         http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

tested with curl model, it also works:
```
curl http://localhost:8321/v1/models
{"data":[{"identifier":"bedrock/meta.llama3-1-8b-instruct-v1:0","provider_resource_id":"meta.llama3-1-8b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-70b-instruct-v1:0","provider_resource_id":"meta.llama3-1-70b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-405b-instruct-v1:0","provider_resource_id":"meta.llama3-1-405b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/bigcode/starcoder2-7b","provider_resource_id":"bigcode/starcoder2-7b","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/meta/llama-3.3-70b-instruct","provider_resource_id":"meta/llama-3.3-70b-instruct","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2","provider_resource_id":"nvidia/llama-3.2-nv-embedqa-1b-v2","provider_id":"nvidia","type":"model","metadata":{"embedding_dimension":2048,"context_length":8192},"model_type":"embedding"},{"identifier":"sentence-transformers/all-MiniLM-L6-v2","provider_resource_id":"all-MiniLM-L6-v2","provider_id":"sentence-transformers","type":"model","metadata":{"embedding_dimension":384},"model_type":"embedding"}]}%
```

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-07 18:23:12 -07:00
Emilio Garcia
bc7d4b423b
fix(scripts): select container runtime for telemetry (#3727)
# What does this PR do?
script runs with either docker or podman

## Test Plan
passes when run
2025-10-07 14:59:53 -07:00
slekkala1
c2d97a9db9
chore: fix flaky unit test and add proper shutdown for file batches (#3725)
# What does this PR do?
Have been running into flaky unit test failures:
5217035494
Fixing below
1. Shutting down properly by cancelling any stale file batches tasks
running in background.
2. Also, use unique_kvstore_config, so the test dont use same db path
and maintain test isolation
## Test Plan
Ran unit test locally and CI
2025-10-07 14:23:14 -07:00
Akram Ben Aissi
1970b4aa4b
fix: improve model availability checks: Allows use of unavailable models on startup (#3717)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m28s
- Allows use of unavailable models on startup
- Add has_model method to ModelsRoutingTable for checking pre-registered
models
- Update check_model_availability to check model_store before provider
APIs

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->


Start llama stack and point unavailable vLLM

```
VLLM_URL=https://my-unavailable-vllm/v1 MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run
```

llama stack will start without crashing but only notifying error. 

```


         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

INFO     2025-10-07 06:40:41,804 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 06:40:42,066 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 06:40:58,882 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:40:58,883 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
[...]
INFO     2025-10-07 06:40:59,036 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO     2025-10-07 06:41:04,064 openai._base_client:1618 uncategorized: Retrying request to /models in 0.398814 seconds
INFO     2025-10-07 06:41:09,497 openai._base_client:1618 uncategorized: Retrying request to /models in 0.781908 seconds
ERROR    2025-10-07 06:41:15,282 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:41:15,283 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
```
2025-10-07 14:27:24 -04:00
Francisco Arceo
d5b136ac66
feat: Enabling Annotations in Responses (#3698)
# What does this PR do?
Implements annotations for `file_search` tool.

Also adds some logs and tests.

## How does this work? 
1. **Citation Markers**: Models insert `<|file-id|>` tokens during
generation with instructions from search results
2. **Post-Processing**: Extract markers using regex to calculate
character positions and create `AnnotationFileCitation` objects
3. **File Mapping**: Store filename metadata during vector store
operations for proper citation display

## Example 
This is the updated `quickstart.py` script, which uses the `extra_body`
to register the embedding model.

```python
import io, requests
from openai import OpenAI

url="https://www.paulgraham.com/greatwork.html"
model = "gpt-4o-mini"
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")

vs = client.vector_stores.create(
    name="my_citations_db",
    extra_body={
        "embedding_model": "ollama/nomic-embed-text:latest",
        "embedding_dimension": 768,
    }
)
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)

resp = client.responses.create(
    model=model,
    input="How do you do great work? Use our existing knowledge_search tool.",
    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
    include=["file_search_call.results"],
)

print(resp)
```

<details>
<summary> Example of the full response </summary>

```python
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_0f6f7e35-f48b-4850-8604-8117d9a50e0a/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"
Response(id='resp-28f5793d-3272-4de3-81f6-8cbf107d5bcd', created_at=1759797954.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-4o-mini', object='response', output=[ResponseFileSearchToolCall(id='call_xWtvEQETN5GNiRLLiBIDKntg', queries=['how to do great work tips'], status='completed', type='file_search_call', results=[Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.3722624322210302, text='\\\'re looking where few have looked before.<br /><br />One sign that you\\\'re suited for some kind of work is when you like\\neven the parts that other people find tedious or frightening.<br /><br />But fields aren\\\'t people; you don\\\'t owe them any loyalty. If in the\\ncourse of working on one thing you discover another that\\\'s more\\nexciting, don\\\'t be afraid to switch.<br /><br />If you\\\'re making something for people, make sure it\\\'s something\\nthey actually want. The best way to do this is to make something\\nyou yourself want. Write the story you want to read; build the tool\\nyou want to use. Since your friends probably have similar interests,\\nthis will also get you your initial audience.<br /><br />This <i>should</i> follow from the excitingness rule. Obviously the most\\nexciting story to write will be the one you want to read. The reason\\nI mention this case explicitly is that so many people get it wrong.\\nInstead of making what they want, they try to make what some\\nimaginary, more sophisticated audience wants. And once you go down\\nthat route, you\\\'re lost.\\n<font color=#dddddd>[<a href="#f6n"><font color=#dddddd>6</font></a>]</font><br /><br />There are a lot of forces that will lead you astray when you\\\'re\\ntrying to figure out what to work on. Pretentiousness, fashion,\\nfear, money, politics, other people\\\'s wishes, eminent frauds. But\\nif you stick to what you find genuinely interesting, you\\\'ll be proof\\nagainst all of them. If you\\\'re interested, you\\\'re not astray.<br /><br /><br /><br /><br /><br />\\nFollowing your interests may sound like a rather passive strategy,\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.2532794607643494, text=' with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good colleagues\\noffer <i>surprising</i> insights. They can see and do things that you\\ncan\\\'t. So if you have a handful of colleagues good enough to keep\\nyou on your toes in this sense, you\\\'re probably over the threshold.<br /><br />Most of us can benefit from collaborating with colleagues, but some\\nprojects require people on a larger scale, and starting one of those\\nis not for everyone. If you want to run a project like that, you\\\'ll\\nhave to become a manager, and managing well takes aptitude and\\ninterest like any other kind of work. If you don\\\'t have them, there\\nis no middle path: you must either force yourself to learn management\\nas a second language, or avoid such projects.\\n<font color=#dddddd>[<a href="#f27n"><font color=#dddddd>27</font></a>]</font><br /><br /><br /><br /><br /><br />\\nHusband your morale. It\\\'s the basis of everything when you\\\'re working\\non ambitious projects. You have to nurture and protect it like a\\nliving organism.<br /><br />Morale starts with your view of life. You\\\'re more likely to do great\\nwork if you\\\'re an optimist, and more likely to if you think of\\nyourself as lucky than if you think of yourself as a victim.<br /><br />Indeed, work can to some extent protect you from your problems. If\\nyou choose work that\\\'s pure, its very difficulties will serve as a\\nrefuge from the difficulties of everyday life. If this is escapism,\\nit\\\'s a very productive form of it, and one that has been used by\\nsome of the greatest minds in history.<br /><br />Morale compounds via work: high morale helps you do good work, which\\nincreases your morale and helps you do even'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1973485818164222, text=' your\\nability and interest can take you. And you can only answer that by\\ntrying.<br /><br />Many more people could try to do great work than do. What holds\\nthem back is a combination of modesty and fear. It seems presumptuous\\nto try to be Newton or Shakespeare. It also seems hard; surely if\\nyou tried something like that, you\\\'d fail. Presumably the calculation\\nis rarely explicit. Few people consciously decide not to try to do\\ngreat work. But that\\\'s what\\\'s going on subconsciously; they shy\\naway from the question.<br /><br />So I\\\'m going to pull a sneaky trick on you. Do you want to do great\\nwork, or not? Now you have to decide consciously. Sorry about that.\\nI wouldn\\\'t have done it to a general audience. But we already know\\nyou\\\'re interested.<br /><br />Don\\\'t worry about being presumptuous. You don\\\'t have to tell anyone.\\nAnd if it\\\'s too hard and you fail, so what? Lots of people have\\nworse problems than that. In fact you\\\'ll be lucky if it\\\'s the worst\\nproblem you have.<br /><br />Yes, you\\\'ll have to work hard. But again, lots of people have to\\nwork hard. And if you\\\'re working on something you find very\\ninteresting, which you necessarily will if you\\\'re on the right path,\\nthe work will probably feel less burdensome than a lot of your\\npeers\\\'.<br /><br />The discoveries are out there, waiting to be made. Why not by you?<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />\\n<b>Notes</b><br /><br />[<a name="f1n"><font color=#000000>1</font></a>]\\nI don\\\'t think you could give a precise definition of what\\ncounts as great work. Doing great work means doing something important\\nso well that you expand people\\\'s ideas of what\\\'s possible. But\\nthere\\\'s no threshold for importance. It\\\'s a matter of degree, and\\noften hard to judge at the time anyway. So I\\\'d rather people focused\\non developing their interests rather than worrying about whether\\nthey\\\'re important or not. Just try to do something amazing, and\\nleave it to future generations to say if you succeeded.<br /><br />[<a name="f2n"><font color=#000000>2</font></a>]\\nA lot of standup comedy is based on noticing anomalies in\\neveryday life. "Did you ever notice...?" New ideas come from doing\\nthis about nontrivial things. Which may help explain why people\\\'s\\nreaction to a new idea is often the first half of laughing: Ha!<br /><br />[<a name="f3n"><font color=#000000>3</font></a>]\\nThat second qualifier is critical. If you\\\'re excited about\\nsomething most authorities discount, but you can\\\'t give a more\\nprecise explanation than "they don\\\'t get it," then you\\\'re starting\\nto drift into the territory of cranks.<br /><br />[<a name="f4n"><font color=#000000>4</font></a>]\\nFinding something to work on is not simply a matter of finding\\na match between the current version of you and a list of known\\nproblems. You\\\'ll often have to coevolve with the problem. That\\\'s\\nwhy it can sometimes be so hard to figure out what to work on. The\\nsearch space is huge. It\\\'s the cartesian product of all possible\\nt'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1764591706535943, text='\\noptimistic, and even though one of the sources of their optimism\\nis ignorance, in this case ignorance can sometimes beat knowledge.<br /><br />Try to finish what you start, though, even if it turns out to be\\nmore work than you expected. Finishing things is not just an exercise\\nin tidiness or self-discipline. In many projects a lot of the best\\nwork happens in what was meant to be the final stage.<br /><br />Another permissible lie is to exaggerate the importance of what\\nyou\\\'re working on, at least in your own mind. If that helps you\\ndiscover something new, it may turn out not to have been a lie after\\nall.\\n<font color=#dddddd>[<a href="#f7n"><font color=#dddddd>7</font></a>]</font><br /><br /><br /><br /><br /><br />\\nSince there are two senses of starting work &mdash; per day and per\\nproject &mdash; there are also two forms of procrastination. Per-project\\nprocrastination is far the more dangerous. You put off starting\\nthat ambitious project from year to year because the time isn\\\'t\\nquite right. When you\\\'re procrastinating in units of years, you can\\nget a lot not done.\\n<font color=#dddddd>[<a href="#f8n"><font color=#dddddd>8</font></a>]</font><br /><br />One reason per-project procrastination is so dangerous is that it\\nusually camouflages itself as work. You\\\'re not just sitting around\\ndoing nothing; you\\\'re working industriously on something else. So\\nper-project procrastination doesn\\\'t set off the alarms that per-day\\nprocrastination does. You\\\'re too busy to notice it.<br /><br />The way to beat it is to stop occasionally and ask yourself: Am I\\nworking on what I most want to work on? When you\\\'re young it\\\'s ok\\nif the answer is sometimes no, but this gets increasingly dangerous\\nas you get older.\\n<font color=#dddddd>[<a href="#f9n"><font color=#dddddd>9</font></a>]</font><br /><br /><br /><br /><br /><br />\\nGreat work usually entails spending what would seem to most people\\nan unreasonable amount of time on a problem. You can\\\'t think of\\nthis time as a cost, or it will seem too high. You have to find the\\nwork sufficiently engaging as it\\\'s happening.<br /><br />There may be some jobs where you have to work diligently for years\\nat things you hate before you get to the good part, but this is not\\nhow great work happens. Great work happens by focusing consistently\\non something you\\\'re genuinely interested in. When you pause to take\\nstock, you\\\'re surprised how far you\\\'ve come.<br /><br />The reason we\\\'re surprised is that we underestimate the cumulative\\neffect of work. Writing a page a day doesn\\\'t sound like much, but\\nif you do it every day you\\\'ll write a book a year. That\\\'s the key:\\nconsistency. People who do great things don\\\'t get a lot done every\\nday. They get something done, rather than nothing.<br /><br />If you do work that compounds, you\\\'ll get exponential growth. Most\\npeople who do this do it unconsciously, but it\\\'s worth stopping to\\nthink about. Learning, for example, is an instance of this phenomenon:\\nthe more you learn about something, the easier it is to learn more.\\nGrowing an audience is another: the more fans you have, the more\\nnew fans they\\\'ll bring you.<br /><br />'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.174069664815369, text='\\ninside.<br /><br /><br /><br /><br /><br />Let\\\'s talk a little more about the complicated business of figuring\\nout what to work on. The main reason it\\\'s hard is that you can\\\'t\\ntell what most kinds of work are like except by doing them. Which\\nmeans the four steps overlap: you may have to work at something for\\nyears before you know how much you like it or how good you are at\\nit. And in the meantime you\\\'re not doing, and thus not learning\\nabout, most other kinds of work. So in the worst case you choose\\nlate based on very incomplete information.\\n<font color=#dddddd>[<a href="#f4n"><font color=#dddddd>4</font></a>]</font><br /><br />The nature of ambition exacerbates this problem. Ambition comes in\\ntwo forms, one that precedes interest in the subject and one that\\ngrows out of it. Most people who do great work have a mix, and the\\nmore you have of the former, the harder it will be to decide what\\nto do.<br /><br />The educational systems in most countries pretend it\\\'s easy. They\\nexpect you to commit to a field long before you could know what\\nit\\\'s really like. And as a result an ambitious person on an optimal\\ntrajectory will often read to the system as an instance of breakage.<br /><br />It would be better if they at least admitted it &mdash; if they admitted\\nthat the system not only can\\\'t do much to help you figure out what\\nto work on, but is designed on the assumption that you\\\'ll somehow\\nmagically guess as a teenager. They don\\\'t tell you, but I will:\\nwhen it comes to figuring out what to work on, you\\\'re on your own.\\nSome people get lucky and do guess correctly, but the rest will\\nfind themselves scrambling diagonally across tracks laid down on\\nthe assumption that everyone does.<br /><br />What should you do if you\\\'re young and ambitious but don\\\'t know\\nwhat to work on? What you should <i>not</i> do is drift along passively,\\nassuming the problem will solve itself. You need to take action.\\nBut there is no systematic procedure you can follow. When you read\\nbiographies of people who\\\'ve done great work, it\\\'s remarkable how\\nmuch luck is involved. They discover what to work on as a result\\nof a chance meeting, or by reading a book they happen to pick up.\\nSo you need to make yourself a big target for luck, and the way to\\ndo that is to be curious. Try lots of things, meet lots of people,\\nread lots of books, ask lots of questions.\\n<font color=#dddddd>[<a href="#f5n"><font color=#dddddd>5</font></a>]</font><br /><br />When in doubt, optimize for interestingness. Fields change as you\\nlearn more about them. What mathematicians do, for example, is very\\ndifferent from what you do in high school math classes. So you need\\nto give different types of work a chance to show you what they\\\'re\\nlike. But a field should become <i>increasingly</i> interesting as you\\nlearn more about it. If it doesn\\\'t, it\\\'s probably not for you.<br /><br />Don\\\'t worry if you find you\\\'re interested in different things than\\nother people. The stranger your tastes in interestingness, the\\nbetter. Strange tastes are often strong ones, and a strong taste\\nfor work means you\\\'ll be productive. And you\\\'re more likely to find\\nnew things if you'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.158095578895721, text='. Don\\\'t copy the manner of\\nan eminent 50 year old professor if you\\\'re 18, for example, or the\\nidiom of a Renaissance poem hundreds of years later.<br /><br />Some of the features of things you admire are flaws they succeeded\\ndespite. Indeed, the features that are easiest to imitate are the\\nmost likely to be the flaws.<br /><br />This is particularly true for behavior. Some talented people are\\njerks, and this sometimes makes it seem to the inexperienced that\\nbeing a jerk is part of being talented. It isn\\\'t; being talented\\nis merely how they get away with it.<br /><br />One of the most powerful kinds of copying is to copy something from\\none field into another. History is so full of chance discoveries\\nof this type that it\\\'s probably worth giving chance a hand by\\ndeliberately learning about other kinds of work. You can take ideas\\nfrom quite distant fields if you let them be metaphors.<br /><br />Negative examples can be as inspiring as positive ones. In fact you\\ncan sometimes learn more from things done badly than from things\\ndone well; sometimes it only becomes clear what\\\'s needed when it\\\'s\\nmissing.<br /><br /><br /><br /><br /><br />\\nIf a lot of the best people in your field are collected in one\\nplace, it\\\'s usually a good idea to visit for a while. It will\\nincrease your ambition, and also, by showing you that these people\\nare human, increase your self-confidence.\\n<font color=#dddddd>[<a href="#f26n"><font color=#dddddd>26</font></a>]</font><br /><br />If you\\\'re earnest you\\\'ll probably get a warmer welcome than you\\nmight expect. Most people who are very good at something are happy\\nto talk about it with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1566747762241967, text=',\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if you do that you\\\'ll find you get diminishing returns:\\nfatigue will make you stupid, and eventually even damage your health.\\nThe point at which work yields diminishing returns depends on the\\ntype. Some of the hardest types you might only be able to do for\\nfour or five hours a day.<br /><br />Ideally those hours will be contiguous. To the extent you can, try\\nto arrange your life so you have big blocks of time to work in.\\nYou\\\'ll shy away from hard tasks if you know you might be interrupted.<br /><br />It will probably be harder to start working than to keep working.\\nYou\\\'ll often have to trick yourself to get over that initial\\nthreshold. Don\\\'t worry about this; it\\\'s the nature of work, not a\\nflaw in your character. Work has a sort of activation energy, both\\nper day and per project. And since this threshold is fake in the\\nsense that it\\\'s higher than the energy required to keep going, it\\\'s\\nok to tell yourself a lie of corresponding magnitude to get over\\nit.<br /><br />It\\\'s usually a mistake to lie to yourself if you want to do great\\nwork, but this is one of the rare cases where it isn\\\'t. When I\\\'m\\nreluctant to start work in the morning, I often trick myself by\\nsaying "I\\\'ll just read over what I\\\'ve got so far." Five minutes\\nlater I\\\'ve found something that seems mistaken or incomplete, and\\nI\\\'m off.<br /><br />Similar techniques work for starting new projects. It\\\'s ok to lie\\nto yourself about how much work a project will entail, for example.\\nLots of great things began with someone saying "How hard could it\\nbe?"<br /><br />This is one case where the young have an advantage. They\\\'re more'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1349744395573516, text=' audience\\nin the traditional sense. Either way it doesn\\\'t need to be big.\\nThe value of an audience doesn\\\'t grow anything like linearly with\\nits size. Which is bad news if you\\\'re famous, but good news if\\nyou\\\'re just starting out, because it means a small but dedicated\\naudience can be enough to sustain you. If a handful of people\\ngenuinely love what you\\\'re doing, that\\\'s enough.<br /><br />To the extent you can, avoid letting intermediaries come between\\nyou and your audience. In some types of work this is inevitable,\\nbut it\\\'s so liberating to escape it that you might be better off\\nswitching to an adjacent type if that will let you go direct.\\n<font color=#dddddd>[<a href="#f28n"><font color=#dddddd>28</font></a>]</font><br /><br />The people you spend time with will also have a big effect on your\\nmorale. You\\\'ll find there are some who increase your energy and\\nothers who decrease it, and the effect someone has is not always\\nwhat you\\\'d expect. Seek out the people who increase your energy and\\navoid those who decrease it. Though of course if there\\\'s someone\\nyou need to take care of, that takes precedence.<br /><br />Don\\\'t marry someone who doesn\\\'t understand that you need to work,\\nor sees your work as competition for your attention. If you\\\'re\\nambitious, you need to work; it\\\'s almost like a medical condition;\\nso someone who won\\\'t let you work either doesn\\\'t understand you,\\nor does and doesn\\\'t care.<br /><br />Ultimately morale is physical. You think with your body, so it\\\'s\\nimportant to take care of it. That means exercising regularly,\\neating and sleeping well, and avoiding the more dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.123214818076958, text='b\'<html><head><meta name="Keywords" content="" /><title>How to Do Great Work</title><!-- <META NAME="ROBOTS" CONTENT="NOODP"> -->\\n<link rel="shortcut icon" href="http://ycombinator.com/arc/arc.png">\\n</head><body bgcolor="#ffffff" background="https://s.turbifycdn.com/aah/paulgraham/bel-6.gif" text="#000000" link="#000099" vlink="#464646"><table border="0" cellspacing="0" cellpadding="0"><tr valign="top"><td><map name=118ab66adb24b4f><area shape=rect coords="0,0,67,21" href="index.html"><area shape=rect coords="0,21,67,42" href="articles.html"><area shape=rect coords="0,42,67,63" href="http://www.amazon.com/gp/product/0596006624"><area shape=rect coords="0,63,67,84" href="books.html"><area shape=rect coords="0,84,67,105" href="http://ycombinator.com"><area shape=rect coords="0,105,67,126" href="arc.html"><area shape=rect coords="0,126,67,147" href="bel.html"><area shape=rect coords="0,147,67,168" href="lisp.html"><area shape=rect coords="0,168,67,189" href="antispam.html"><area shape=rect coords="0,189,67,210" href="kedrosky.html"><area shape=rect coords="0,210,67,231" href="faq.html"><area shape=rect coords="0,231,67,252" href="raq.html"><area shape=rect coords="0,252,67,273" href="quo.html"><area shape=rect coords="0,273,67,294" href="rss.html"><area shape=rect coords="0,294,67,315" href="bio.html"><area shape=rect coords="0,315,67,336" href="https://twitter.com/paulg"><area shape=rect coords="0,336,67,357" href="https://mas.to/@paulg"></map><img src="https://s.turbifycdn.com/aah/paulgraham/bel-7.gif" width="69" height="357" usemap=#118ab66adb24b4f border="0" hspace="0" vspace="0" ismap /></td><td><img src="https://sep.turbifycdn.com/ca/Img/trans_1x1.gif" height="1" width="26" border="0" /></td><td><a href="index.html"><img src="https://s.turbifycdn.com/aah/paulgraham/bel-8.gif" width="410" height="45" border="0" hspace="0" vspace="0" /></a><br /><br /><table border="0" cellspacing="0" cellpadding="0" width="435"><tr valign="top"><td width="435"><img src="https://s.turbifycdn.com/aah/paulgraham/how-to-do-great-work-2.gif" width="185" height="18" border="0" hspace="0" vspace="0" alt="How to Do Great Work" /><br /><br /><font size="2" face="verdana">July 2023<br /><br />If you collected lists of techniques for doing great work in a lot\\nof different fields, what would the intersection look like? I decided\\nto find out'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1193194369249235, text=' dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied with a single\\nword, my bet would be on "curiosity."<br /><br />That doesn\\\'t translate directly to advice. It\\\'s not enough just to\\nbe curious, and you can\\\'t command curiosity anyway. But you can\\nnurture it and let it drive you.<br /><br />Curiosity is the key to all four steps in doing great work: it will\\nchoose the field for you, get you to the frontier, cause you to\\nnotice the gaps in it, and drive you to explore them. The whole\\nprocess is a kind of dance with curiosity.<br /><br /><br /><br /><br /><br />\\nBelieve it or not, I tried to make this essay as short as I could.\\nBut its length at least means it acts as a filter. If you made it\\nthis far, you must be interested in doing great work. And if so\\nyou\\\'re already further along than you might realize, because the\\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\\nmathematical sense, and they are: ability, interest, effort, and\\nluck. Luck by definition you can\\\'t do anything about, so we can\\nignore that. And we can assume effort, if you do in fact want to\\ndo great work. So the problem boils down to ability and interest.\\nCan you find a kind of work where your ability and interest will\\ncombine to yield an explosion of new ideas?<br /><br />Here there are grounds for optimism. There are so many different\\nways to do great work, and even more that are still undiscovered.\\nOut of all those different types of work, the one you\\\'re most suited\\nfor is probably a pretty close match. Probably a comically close\\nmatch. It\\\'s just a question of finding it, and how far into it')]), ResponseOutputMessage(id='msg_3591ea71-8b35-4efd-a5ad-c1c250801971', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')], text='To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=False, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity=None), top_logprobs=None, truncation=None, usage=None, user=None)

In [34]: resp.output[1].content[0].text
Out[34]: 'To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.'
```
</details>

The relevant output looks like this:

```python
>resp.output[1].content[0].annotations
[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')]```
And
```python
In [144]: print(resp.output[1].content[0].text)
To do great work, consider the following principles:

1. **Follow Your Interests**: Engage in work that genuinely excites you.
If you find an area intriguing, pursue it without being overly concerned
about external pressures or norms. You should create things that you
would want for yourself, as this often aligns with what others in your
circle might want too.

2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should
be tempered by genuine interest. Instead of detailed planning for the
future, focus on exciting projects that keep your options open. This
approach, known as "staying upwind," allows for adaptability and can
lead to unforeseen achievements.

3. **Choose Quality Colleagues**: Collaborating with talented colleagues
can significantly affect your own work. Seek out individuals who offer
surprising insights and whom you admire. The presence of good colleagues
can elevate the quality of your work and inspire you.

4. **Maintain High Morale**: Your attitude towards work and life affects
your performance. Cultivating optimism and viewing yourself as lucky
rather than victimized can boost your productivity. It’s essential to
care for your physical health as well since it directly impacts your
mental faculties and morale.

5. **Be Consistent**: Great work often comes from cumulative effort.
Daily progress, even in small amounts, can result in substantial
achievements over time. Emphasize consistency and make the work
engaging, as this reduces the perceived burden of hard labor.

6. **Embrace Curiosity**: Curiosity is a driving force that can guide
you in selecting fields of interest, pushing you to explore uncharted
territories. Allow it to shape your work and continually seek knowledge
and insights.

By focusing on these aspects, you can create an environment conducive to
great work and personal fulfillment.
```

And the code below outputs only periods highlighting that the position/index behaves as expected—i.e., the annotation happens at the end of the sentence.

```python
print([resp.output[1].content[0].text[j.index] for j in
resp.output[1].content[0].annotations])
Out[41]: ['.', '.', '.', '.', '.', '.']
```

## Test Plan
Unit tests added.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-07 14:00:56 -04:00
Charlie Doern
6389bf5ffb
fix: make telemetry optional for agents (#3705)
# What does this PR do?

there is a lot of code in the agents API using the telemetry API and its
helpers without checking if that API is even enabled.

This is the only API besides inference actively using telemetry code, so
after this telemetry can be optional for the entire stack


resolves #3665


## Test Plan

existing agent tests.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-07 16:09:03 +02:00
Matthew Farrellee
e892a3f7f4
feat: add refresh_models support to inference adapters (default: false) (#3719)
# What does this PR do?

inference adapters can now configure `refresh_models: bool` to control
periodic model listing from their providers

BREAKING CHANGE: together inference adapter default changed. previously
always refreshed, now follows config.

addresses "models: refresh" on #3517

## Test Plan

ci w/ new tests
2025-10-07 15:19:56 +02:00
Charlie Doern
8b9af03a1b
fix: refresh log should be debug (#3720)
# What does this PR do?

when using a distro like starter where a bunch of providers are disabled
I should not see logs like:

```
         in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:38:52,117 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,123 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass Fireworks API Key in the header
         X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}
WARNING  2025-10-07 08:43:52,126 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header
         X-LlamaStack-Provider-Data as { "together_api_key": <your api key>}
WARNING  2025-10-07 08:43:52,129 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API
         key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,132 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,136 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API
         key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,139 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key
         in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,142 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config.
^CINFO     2025-10-07 08:46:11,996 llama_stack.core.utils.exec:75 core:
```

as WARNING. Switch to Debug.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-07 09:04:07 -04:00
Sumanth Kamenani
1fcde5fc2f
fix: update pyproject.toml dependencies for vector processing (#3555)
What does this PR do?

Updates pyproject.toml dependencies to fix vector processing
compatibility issues.

closes: #3495 

  Test Plan

  Tested llama stack server with faiss vector database:

1. Built and ran server: llama stack build --distro starter --image-type
venv --image-name llamastack-faiss
3. Tested file upload: Successfully uploaded PDF via /v1/openai/v1/files
  4. Tested vector operations:
    - Created vector store with faiss backend
    - Added PDF to vector store
    - Performed semantic search queries
2025-10-07 15:01:36 +02:00
Justin
509ac4a659
feat: enable Runpod inference adapter (#3707)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 30s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
Sorry to @mattf I thought I could close the other PR and reopen it.. But
I didn't have the option to reopen it now. I just didn't want it to keep
notifying maintainers if I would make other commits for testing.

Continuation of: https://github.com/llamastack/llama-stack/pull/3641

PR fixes Runpod Adapter
https://github.com/llamastack/llama-stack/issues/3517

## What I fixed from before:
Continuation of: https://github.com/llamastack/llama-stack/pull/3641

1. Made it all OpenAI
2. Fixed the class up since the OpenAIMixin had a couple changes with
the pydantic base model stuff.
3. Test to make sure that we could dynamically find models and use the
resulting identifier to make requests
```bash
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"
```

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

```
# RunPod Provider Quick Start

## Prerequisites
- Python 3.10+
- Git
- RunPod API token

## Setup for Development

```bash
# 1. Clone and enter the repository
cd (into the repo)

# 2. Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Remove any existing llama-stack installation
pip uninstall llama-stack llama-stack-client -y

# 4. Install llama-stack in development mode
pip install -e .

# 5. Build using local development code
(Found this through the Discord)
LLAMA_STACK_DIR=. llama stack build

# When prompted during build:
# - Name: runpod-dev
# - Image type: venv
# - Inference provider: remote::runpod
# - Safety provider: "llama-guard" 
# - Other providers: first defaults
```

## Configure the Stack

The RunPod adapter automatically discovers models from your endpoint via the `/v1/models` API.
No manual model configuration is required - just set your environment variables.

## Run the Server

### Important: Use the Build-Created Virtual Environment

```bash
# Exit the development venv if you're in it
deactivate

# Activate the build-created venv (NOT .venv)
cd (lama-stack folder github repo)
source llamastack-runpod-dev/bin/activate
```

### For Qwen3-32B-AWQ Public Endpoint (Recommended)

```bash
# Set environment variables
export RUNPOD_URL="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1"
export RUNPOD_API_TOKEN="your_runpod_api_key"

# Start server
llama stack run
~/.llama/distributions/llamastack-runpod-dev/llamastack-runpod-dev-run.yaml
```

## Quick Test

### 1. List Available Models (Dynamic Discovery)

First, check which models are available on your RunPod endpoint:

```bash
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"
```

**Example Response:**
```json
{
  "data": [
    {
      "identifier": "qwen3-32b-awq",
      "provider_resource_id": "Qwen/Qwen3-32B-AWQ",
      "provider_id": "runpod",
      "type": "model",
      "metadata": {},
      "model_type": "llm"
    }
  ]
}
```

**Note:** Use the `identifier` value from the response above in your requests below.

### 2. Chat Completion (Non-streaming)

Replace `qwen3-32b-awq` with your model identifier from step 1:

```bash
curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Hello, count to 3"}],
    "stream": false
  }'
```

### 3. Chat Completion (Streaming)

```bash
curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'
```

**Clean streaming output:**
```bash
curl -N -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
-d '{"model": "qwen3-32b-awq", "messages": [{"role": "user", "content":
"Count to 5"}], "stream": true}' \
  2>/dev/null | while read -r line; do
echo "$line" | grep "^data: " | sed 's/^data: //' | jq -r
'.choices[0].delta.content // empty' 2>/dev/null
  done
```

**Expected Output:**
```
1
2
3
4
5
```
2025-10-07 12:24:50 +02:00
ehhuang
50f9ca3541
chore: remove dead code (#3713)
# What does this PR do?


## Test Plan
2025-10-07 12:13:11 +02:00
slekkala1
bba9957edd
feat(api): Add vector store file batches api (#3642)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Add Open AI Compatible vector store file batches api. This functionality
is needed to attach many files to a vector store as a batch.
https://github.com/llamastack/llama-stack/issues/3533

API Stubs have been merged
https://github.com/llamastack/llama-stack/pull/3615
Adds persistence for file batches as discussed in diff
https://github.com/llamastack/llama-stack/pull/3544
(Used claude code for generation and reviewed by me)


## Test Plan
1. Unit tests pass
2. Also verified the cc-vec integration with LLamaStackClient works with
the file batches api. https://github.com/raghotham/cc-vec
2. Integration tests pass
2025-10-06 16:58:22 -07:00
ehhuang
597d405e13
chore: fix closing error (#3709)
# What does this PR do?
Gets rid of this error message below (disclaimer: not sure why, but it
does).

ERROR 2025-10-06 12:04:22,837 asyncio:118 uncategorized: Task exception
was never retrieved
future: <Task finished name='Task-36' coro=<AsyncClient.aclose() done,
defined at

/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_client.py:1978>
exception=RuntimeError('unable to perform operation on <TCPTransport
closed=True reading=False 0x122dc7ad0>; the handler is closed')>
╭───────────────────────────────────────────────────────────────────
Traceback (most recent call last)
───────────────────────────────────────────────────────────────────╮
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_client.py:1985
in aclose │
│ │
│ 1982 │ │ if self._state != ClientState.CLOSED: │
│ 1983 │ │ │ self._state = ClientState.CLOSED │
│ 1984 │ │ │ │
│ ❱ 1985 │ │ │ await self._transport.aclose() │
│ 1986 │ │ │ for proxy in self._mounts.values(): │
│ 1987 │ │ │ │ if proxy is not None: │
│ 1988 │ │ │ │ │ await proxy.aclose() │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_transports/default.py:406
in aclose │
│ │
│ 403 │ │ ) │
│ 404 │ │
│ 405 │ async def aclose(self) -> None: │
│ ❱ 406 │ │ await self._pool.aclose() │
│ 407 │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py:353
in aclose │
│ │
│ 350 │ │ with self._optional_thread_lock: │
│ 351 │ │ │ closing_connections = list(self._connections) │
│ 352 │ │ │ self._connections = [] │
│ ❱ 353 │ │ await self._close_connections(closing_connections) │
│ 354 │ │
│ 355 │ async def __aenter__(self) -> AsyncConnectionPool: │
│ 356 │ │ return self │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py:345
in _close_connections │
│ │
│ 342 │ │ # Close connections which have been removed from the pool. │
│ 343 │ │ with AsyncShieldCancellation(): │
│ 344 │ │ │ for connection in closing: │
│ ❱ 345 │ │ │ │ await connection.aclose() │
│ 346 │ │
│ 347 │ async def aclose(self) -> None: │
│ 348 │ │ # Explicitly close the connection pool. │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py:173
in aclose │
│ │
│ 170 │ async def aclose(self) -> None: │
│ 171 │ │ if self._connection is not None: │
│ 172 │ │ │ async with Trace("close", logger, None, {}): │
│ ❱ 173 │ │ │ │ await self._connection.aclose() │
│ 174 │ │
│ 175 │ def is_available(self) -> bool: │
│ 176 │ │ if self._connection is None: │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py:258
in aclose │
│ │
│ 255 │ │ # Note that this method unilaterally closes the connection,
and does │
│ 256 │ │ # not have any kind of locking in place around it. │
│ 257 │ │ self._state = HTTPConnectionState.CLOSED │
│ ❱ 258 │ │ await self._network_stream.aclose() │
│ 259 │ │
│ 260 │ # The AsyncConnectionInterface methods provide information about
the state of │
│ 261 │ # the connection, allowing for a connection pooling
implementation to │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py:53
in aclose │
│ │
│ 50 │ │ │ │ await self._stream.send(item=buffer) │
│ 51 │ │
│ 52 │ async def aclose(self) -> None: │
│ ❱ 53 │ │ await self._stream.aclose() │
│ 54 │ │
│ 55 │ async def start_tls( │
│ 56 │ │ self, │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/anyio/streams/tls.py:216
in aclose │
│ │
│ 213 │ │ │ │ await aclose_forcefully(self.transport_stream) │
│ 214 │ │ │ │ raise │
│ 215 │ │ │
│ ❱ 216 │ │ await self.transport_stream.aclose() │
│ 217 │ │
│ 218 │ async def receive(self, max_bytes: int = 65536) -> bytes: │
│ 219 │ │ data = await
self._call_sslobject_method(self._ssl_object.read, max_bytes) │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py:1310
in aclose │
│ │
│ 1307 │ │ if not self._transport.is_closing(): │
│ 1308 │ │ │ self._closed = True │
│ 1309 │ │ │ try: │
│ ❱ 1310 │ │ │ │ self._transport.write_eof() │
│ 1311 │ │ │ except OSError: │
│ 1312 │ │ │ │ pass │
│ 1313 │
│ │
│ in uvloop.loop.UVStream.write_eof:703 │
│ │
│ in uvloop.loop.UVHandle._ensure_alive:159 │

╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: unable to perform operation on <TCPTransport closed=True
reading=False 0x122dc7ad0>; the handler is closed

## Test Plan
Run
uv run --with llama-stack llama stack build --distro=starter
--image-type=venv --run

No more error
2025-10-06 14:44:01 -07:00
ehhuang
696fefbf17
chore: logger category fix (#3706)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test Llama Stack Build / build (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m22s
# What does this PR do?
WARNING 2025-10-06 12:01:45,137 root:266 uncategorized: Unknown logging
category: tokenizer_utils. Falling back to default 'root' level: 20

## Test Plan
2025-10-06 12:16:26 -07:00
Alexey Rybak
a8da6ba3a7
docs: API docstrings cleanup for better documentation rendering (#3661)
# What does this PR do?
* Cleans up API docstrings for better documentation rendering

<img width="2346" height="1126" alt="image"
src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1"
/>

## Test Plan
* Manual testing

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Co-authored-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com>
Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu>
Co-authored-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 10:46:33 -07:00
Matthew Farrellee
892ea759fa
chore: remove together inference adapter's custom check_model_availability (#3702)
# What does this PR do?

remove Together inference adapter's check_model_availability impl, rely
on standard impl instead


## Test Plan

ci
2025-10-06 13:28:36 -04:00
Matthew Farrellee
de9940c697
chore: disable openai_embeddings on inference=remote::llama-openai-compat (#3704)
# What does this PR do?

api.llama.com does not provide embedding models, this makes that clear


## Test Plan

ci
2025-10-06 13:27:40 -04:00
Matthew Farrellee
ae74b31ae3
chore: remove vLLM inference adapter's custom list_models (#3703)
# What does this PR do?

remove vLLM inference adapter's custom list_models impl, rely on
standard impl instead

## Test Plan

ci
2025-10-06 13:27:30 -04:00
Matthew Farrellee
d23ed26238
chore: turn OpenAIMixin into a pydantic.BaseModel (#3671)
# What does this PR do?

- implement get_api_key instead of relying on
LiteLLMOpenAIMixin.get_api_key
 - remove use of LiteLLMOpenAIMixin
 - add default initialize/shutdown methods to OpenAIMixin
 - remove __init__s to allow proper pydantic construction
- remove dead code from vllm adapter and associated / duplicate unit
tests
 - update vllm adapter to use openaimixin for model registration
 - remove ModelRegistryHelper from fireworks & together adapters
 - remove Inference from nvidia adapter
 - complete type hints on embedding_model_metadata
- allow extra fields on OpenAIMixin, for model_store, __provider_id__,
etc
 - new recordings for ollama
 - enhance the list models error handling
- update cerebras (remove cerebras-cloud-sdk) and anthropic (custom
model listing) inference adapters
 - parametrized test_inference_client_caching
- remove cerebras, databricks, fireworks, together from blanket mypy
exclude
 - removed unnecessary litellm deps

## Test Plan

ci
2025-10-06 11:33:19 -04:00
Matthew Farrellee
724dac498c
chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details (#3682)
# What does this PR do?

close the _model_cache abstraction leak

## Test Plan

ci w/ new tests
2025-10-06 09:44:33 -04:00
Charlie Doern
f00bcd9561
feat: allow for multiple external provider specs (#3341)
# What does this PR do?

when using the providers.d method of installation users could hand craft
their AdapterSpec's to use overlapping code meaning one repo could
contain an inline and remote impl. Currently installing a provider via
module does not allow for that as each repo is only allowed to have one
`get_provider_spec` method with one Spec returned

add an optional way for `get_provider_spec` to return a list of
`ProviderSpec` where each can be either an inline or remote impl.

Note: the `adapter_type` in `get_provider_spec` MUST match the
`provider_type` in the build/run yaml for this to work.

resolves #3226

## Test Plan

once this merges we need to re-enable the external provider test and
account for this functionality. Work needs to be done in the external
provider repos to support this functionality.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-06 15:26:38 +02:00
ehhuang
426cac078b
chore: use uvicorn to start llama stack server everywhere (#3625)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn
to start llama stack server which supports spawning multiple workers.

This PR enables us to launch >1 workers from `llama stack run` (will add
the parameter in a follow-up PR, keeping this PR on simplifying) by
removing the old way of launching stack server and consolidates
launching via uvicorn.run only.


## Test Plan
ran `llama stack run starter`
CI
2025-10-06 14:27:40 +02:00
dependabot[bot]
92219fd8fb
chore(python-deps): bump pandas from 2.3.1 to 2.3.3 (#3689)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m26s
Bumps [pandas](https://github.com/pandas-dev/pandas) from 2.3.1 to
2.3.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pandas-dev/pandas/releases">pandas's
releases</a>.</em></p>
<blockquote>
<h2>Pandas 2.3.3</h2>
<p>We are pleased to announce the release of pandas 2.3.3.
This release includes some improvements and fixes to the future string
data type (preview feature for the upcoming pandas 3.0). We recommend
that all users upgrade to this version.</p>
<p>See the <a
href="https://pandas.pydata.org/pandas-docs/version/2.3/whatsnew/v2.3.3.html">full
whatsnew</a> for a list of all the changes.
Pandas 2.3.3 supports Python 3.9 and higher, and is the first release to
support Python 3.14.</p>
<p>The release will be available on the conda-forge channel:</p>
<pre><code>conda install pandas --channel conda-forge
</code></pre>
<p>Or via PyPI:</p>
<pre><code>python3 -m pip install --upgrade pandas
</code></pre>
<p>Please report any issues with the release on the <a
href="https://github.com/pandas-dev/pandas/issues">pandas issue
tracker</a>.</p>
<p>Thanks to all the contributors who made this release possible.</p>
<h2>Pandas 2.3.2</h2>
<p>We are pleased to announce the release of pandas 2.3.2.
This release includes some improvements and fixes to the future string
data type (preview feature for the upcoming pandas 3.0). We recommend
that all users upgrade to this version.</p>
<p>See the <a
href="https://pandas.pydata.org/pandas-docs/version/2.3/whatsnew/v2.3.2.html">full
whatsnew</a> for a list of all the changes.
Pandas 2.3.2 supports Python 3.9 and higher.</p>
<p>The release will be available on the conda-forge channel:</p>
<pre><code>conda install pandas --channel conda-forge
</code></pre>
<p>Or via PyPI:</p>
<pre><code>python3 -m pip install --upgrade pandas
</code></pre>
<p>Please report any issues with the release on the <a
href="https://github.com/pandas-dev/pandas/issues">pandas issue
tracker</a>.</p>
<p>Thanks to all the contributors who made this release possible.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9c8bc3e551"><code>9c8bc3e</code></a>
RLS: 2.3.3</li>
<li><a
href="6aa788a00b"><code>6aa788a</code></a>
[backport 2.3.x] DOC: prepare 2.3.3 whatsnew notes for release (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62499">#62499</a>)
(<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62508">#62508</a>)</li>
<li><a
href="b64f0df403"><code>b64f0df</code></a>
[backport 2.3.x] BUG: avoid validation error for ufunc with
string[python] ar...</li>
<li><a
href="058eb2b0ed"><code>058eb2b</code></a>
[backport 2.3.x] BUG: String[pyarrow] comparison with mixed object (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62424">#62424</a>)
(...</li>
<li><a
href="2ca088daef"><code>2ca088d</code></a>
[backport 2.3.x] DEPR: remove the Period resampling deprecation (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62480">#62480</a>)
(<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62">#62</a>...</li>
<li><a
href="92bf98f623"><code>92bf98f</code></a>
[backport 2.3.x] BUG: fix .str.isdigit to honor unicode superscript for
older...</li>
<li><a
href="e57c7d6a22"><code>e57c7d6</code></a>
Backport PR <a
href="https://redirect.github.com/pandas-dev/pandas/issues/62452">#62452</a>
on branch 2.3.x (TST: Adjust tests for numexpr 2.13) (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62454">#62454</a>)</li>
<li><a
href="e0fe9a03c9"><code>e0fe9a0</code></a>
Backport to 2.3.x: REGR: from_records not initializing subclasses
properly (#...</li>
<li><a
href="23a1085e64"><code>23a1085</code></a>
BUG: improve future warning for boolean operations with missaligned
indexes (...</li>
<li><a
href="61136969fb"><code>6113696</code></a>
Backport PR <a
href="https://redirect.github.com/pandas-dev/pandas/issues/62396">#62396</a>
on branch 2.3.x (PKG/DOC: indicate Python 3.14 support in ...</li>
<li>Additional commits viewable in <a
href="https://github.com/pandas-dev/pandas/compare/v2.3.1...v2.3.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pandas&package-manager=uv&previous-version=2.3.1&new-version=2.3.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:20:29 -07:00
dependabot[bot]
198536f136
chore(github-deps): bump actions/github-script from 7.0.1 to 8.0.0 (#3685)
Bumps [actions/github-script](https://github.com/actions/github-script)
from 7.0.1 to 8.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/github-script/releases">actions/github-script's
releases</a>.</em></p>
<blockquote>
<h2>v8.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update Node.js version support to 24.x by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/637">actions/github-script#637</a></li>
<li>README for updating actions/github-script from v7 to v8 by <a
href="https://github.com/sneha-krip"><code>@​sneha-krip</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/653">actions/github-script#653</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/637">actions/github-script#637</a></li>
<li><a
href="https://github.com/sneha-krip"><code>@​sneha-krip</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/653">actions/github-script#653</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/github-script/compare/v7.1.0...v8.0.0">https://github.com/actions/github-script/compare/v7.1.0...v8.0.0</a></p>
<h2>v7.1.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Upgrade husky to v9 by <a
href="https://github.com/benelan"><code>@​benelan</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/482">actions/github-script#482</a></li>
<li>Add workflow file for publishing releases to immutable action
package by <a
href="https://github.com/Jcambass"><code>@​Jcambass</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/485">actions/github-script#485</a></li>
<li>Upgrade IA Publish by <a
href="https://github.com/Jcambass"><code>@​Jcambass</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/486">actions/github-script#486</a></li>
<li>Fix workflow status badges by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/497">actions/github-script#497</a></li>
<li>Update usage of <code>actions/upload-artifact</code> by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/512">actions/github-script#512</a></li>
<li>Clear up package name confusion by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/514">actions/github-script#514</a></li>
<li>Update dependencies with <code>npm audit fix</code> by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/515">actions/github-script#515</a></li>
<li>Specify that the used script is JavaScript by <a
href="https://github.com/timotk"><code>@​timotk</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/478">actions/github-script#478</a></li>
<li>chore: Add Dependabot for NPM and Actions by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/472">actions/github-script#472</a></li>
<li>Define <code>permissions</code> in workflows and update actions by
<a href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in
<a
href="https://redirect.github.com/actions/github-script/pull/531">actions/github-script#531</a></li>
<li>chore: Add Dependabot for .github/actions/install-dependencies by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/532">actions/github-script#532</a></li>
<li>chore: Remove .vscode settings by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/533">actions/github-script#533</a></li>
<li>ci: Use github/setup-licensed by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/473">actions/github-script#473</a></li>
<li>make octokit instance available as octokit on top of github, to make
it easier to seamlessly copy examples from GitHub rest api or octokit
documentations by <a
href="https://github.com/iamstarkov"><code>@​iamstarkov</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/508">actions/github-script#508</a></li>
<li>Remove <code>octokit</code> README updates for v7 by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/557">actions/github-script#557</a></li>
<li>docs: add &quot;exec&quot; usage examples by <a
href="https://github.com/neilime"><code>@​neilime</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/546">actions/github-script#546</a></li>
<li>Bump ruby/setup-ruby from 1.213.0 to 1.222.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/github-script/pull/563">actions/github-script#563</a></li>
<li>Bump ruby/setup-ruby from 1.222.0 to 1.229.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/github-script/pull/575">actions/github-script#575</a></li>
<li>Clearly document passing inputs to the <code>script</code> by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/603">actions/github-script#603</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/610">actions/github-script#610</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/benelan"><code>@​benelan</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/482">actions/github-script#482</a></li>
<li><a href="https://github.com/Jcambass"><code>@​Jcambass</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/485">actions/github-script#485</a></li>
<li><a href="https://github.com/timotk"><code>@​timotk</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/478">actions/github-script#478</a></li>
<li><a
href="https://github.com/iamstarkov"><code>@​iamstarkov</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/508">actions/github-script#508</a></li>
<li><a href="https://github.com/neilime"><code>@​neilime</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/546">actions/github-script#546</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/610">actions/github-script#610</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/github-script/compare/v7...v7.1.0">https://github.com/actions/github-script/compare/v7...v7.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ed597411d8"><code>ed59741</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/github-script/issues/653">#653</a>
from actions/sneha-krip/readme-for-v8</li>
<li><a
href="2dc352e4ba"><code>2dc352e</code></a>
Bold minimum Actions Runner version in README</li>
<li><a
href="01e118c8d0"><code>01e118c</code></a>
Update README for Node 24 runtime requirements</li>
<li><a
href="8b222ac82e"><code>8b222ac</code></a>
Apply suggestion from <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a></li>
<li><a
href="adc0eeac99"><code>adc0eea</code></a>
README for updating actions/github-script from v7 to v8</li>
<li><a
href="20fe497b3f"><code>20fe497</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/github-script/issues/637">#637</a>
from actions/node24</li>
<li><a
href="e7b7f222b1"><code>e7b7f22</code></a>
update licenses</li>
<li><a
href="2c81ba05f3"><code>2c81ba0</code></a>
Update Node.js version support to 24.x</li>
<li><a
href="f28e40c7f3"><code>f28e40c</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/github-script/issues/610">#610</a>
from actions/nebuk89-patch-1</li>
<li><a
href="1ae9958572"><code>1ae9958</code></a>
Update README.md</li>
<li>Additional commits viewable in <a
href="60a0d83039...ed597411d8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/github-script&package-manager=github_actions&previous-version=7.0.1&new-version=8.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:20:00 -07:00
dependabot[bot]
59e5bde991
chore(github-deps): bump astral-sh/setup-uv from 6.7.0 to 6.8.0 (#3686)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.7.0 to 6.8.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d0cc045d04"><code>d0cc045</code></a>
Always show prune cache output (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/597">#597</a>)</li>
<li><a
href="2841f9f5c1"><code>2841f9f</code></a>
Bump zizmorcore/zizmor-action from 0.1.2 to 0.2.0 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/571">#571</a>)</li>
<li><a
href="e554b93b80"><code>e554b93</code></a>
Add **/*.py.lock to cache-dependency-glob (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/590">#590</a>)</li>
<li><a
href="c7d85d9988"><code>c7d85d9</code></a>
chore: update known versions for 0.8.20</li>
<li><a
href="07f2cb5db9"><code>07f2cb5</code></a>
persist credentials for version update (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/584">#584</a>)</li>
<li><a
href="208b0c0ee4"><code>208b0c0</code></a>
README.md: Fix Python versions and update checkout action (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/572">#572</a>)</li>
<li>See full diff in <a
href="b75a909f75...d0cc045d04">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.7.0&new-version=6.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:19:50 -07:00
dependabot[bot]
45cf74db33
chore(python-deps): bump requests from 2.32.4 to 2.32.5 (#3691)
Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.32.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/psf/requests/releases">requests's
releases</a>.</em></p>
<blockquote>
<h2>v2.32.5</h2>
<h2>2.32.5 (2025-08-18)</h2>
<p><strong>Bugfixes</strong></p>
<ul>
<li>The SSLContext caching feature originally introduced in 2.32.0 has
created
a new class of issues in Requests that have had negative impact across a
number
of use cases. The Requests team has decided to revert this feature as
long term
maintenance of it is proving to be unsustainable in its current
iteration.</li>
</ul>
<p><strong>Deprecations</strong></p>
<ul>
<li>Added support for Python 3.14.</li>
<li>Dropped support for Python 3.8 following its end of support.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psf/requests/blob/main/HISTORY.md">requests's
changelog</a>.</em></p>
<blockquote>
<h2>2.32.5 (2025-08-18)</h2>
<p><strong>Bugfixes</strong></p>
<ul>
<li>The SSLContext caching feature originally introduced in 2.32.0 has
created
a new class of issues in Requests that have had negative impact across a
number
of use cases. The Requests team has decided to revert this feature as
long term
maintenance of it is proving to be unsustainable in its current
iteration.</li>
</ul>
<p><strong>Deprecations</strong></p>
<ul>
<li>Added support for Python 3.14.</li>
<li>Dropped support for Python 3.8 following its end of support.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b25c87d7cb"><code>b25c87d</code></a>
v2.32.5</li>
<li><a
href="131e506079"><code>131e506</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/7010">#7010</a>
from psf/dependabot/github_actions/actions/checkout-...</li>
<li><a
href="b336cb2bc6"><code>b336cb2</code></a>
Bump actions/checkout from 4.2.0 to 5.0.0</li>
<li><a
href="46e939b552"><code>46e939b</code></a>
Update publish workflow to use <code>artifact-id</code> instead of
<code>name</code></li>
<li><a
href="4b9c546aa3"><code>4b9c546</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/6999">#6999</a>
from psf/dependabot/github_actions/step-security/har...</li>
<li><a
href="7618dbef01"><code>7618dbe</code></a>
Bump step-security/harden-runner from 2.12.0 to 2.13.0</li>
<li><a
href="2edca11103"><code>2edca11</code></a>
Add support for Python 3.14 and drop support for Python 3.8 (<a
href="https://redirect.github.com/psf/requests/issues/6993">#6993</a>)</li>
<li><a
href="fec96cd597"><code>fec96cd</code></a>
Update Makefile rules (<a
href="https://redirect.github.com/psf/requests/issues/6996">#6996</a>)</li>
<li><a
href="d58d8aa2f4"><code>d58d8aa</code></a>
docs: clarify timeout parameter uses seconds in Session.request (<a
href="https://redirect.github.com/psf/requests/issues/6994">#6994</a>)</li>
<li><a
href="91a3eabd3d"><code>91a3eab</code></a>
Bump github/codeql-action from 3.28.5 to 3.29.0</li>
<li>Additional commits viewable in <a
href="https://github.com/psf/requests/compare/v2.32.4...v2.32.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=requests&package-manager=uv&previous-version=2.32.4&new-version=2.32.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:19:19 -07:00
dependabot[bot]
c0f0a03529
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3693)
Bumps
[react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom)
and
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom).
These dependencies needed to be updated together.
Updates `react-dom` from 19.1.1 to 19.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/releases">react-dom's
releases</a>.</em></p>
<blockquote>
<h2>19.2.0 (Oct 1, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h2>New React Features</h2>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h2>New React DOM Features</h2>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h2>Notable changes</h2>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h2>All Changes</h2>
<h3>React</h3>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h3>React DOM</h3>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
<li>Warn for using a React owned node as a Container if it also has text
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32774">#32774</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's
changelog</a>.</em></p>
<blockquote>
<h2>19.2.0 (October 1st, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h3>New React Features</h3>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h3>New React DOM Features</h3>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h3>Notable changes</h3>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h3>All Changes</h3>
<h4>React</h4>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h4>React DOM</h4>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="861811347b"><code>8618113</code></a>
Bump scheduler version (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34671">#34671</a>)</li>
<li><a
href="1bd1f01f2a"><code>1bd1f01</code></a>
Ship partial-prerendering APIs to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34633">#34633</a>)</li>
<li><a
href="2f0649a0b2"><code>2f0649a</code></a>
[Fizz] Remove <code>nonce</code> option from resume-and-prerender APIs
(<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34664">#34664</a>)</li>
<li><a
href="5667a41fe4"><code>5667a41</code></a>
Bump next prerelease version numbers (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34639">#34639</a>)</li>
<li><a
href="e08f53b182"><code>e08f53b</code></a>
Match <code>react-dom/static</code> test entrypoints and published
entrypoints (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34599">#34599</a>)</li>
<li><a
href="8bb7241f4c"><code>8bb7241</code></a>
Bump useEffectEvent to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34610">#34610</a>)</li>
<li><a
href="83c88ad470"><code>83c88ad</code></a>
Handle fabric root level fragment with compareDocumentPosition (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34533">#34533</a>)</li>
<li><a
href="68f00c901c"><code>68f00c9</code></a>
Release Activity in Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34374">#34374</a>)</li>
<li><a
href="3168e08f83"><code>3168e08</code></a>
[flags] enable opt-in for enableDefaultTransitionIndicator (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34373">#34373</a>)</li>
<li><a
href="3434ff4f4b"><code>3434ff4</code></a>
Add scrollIntoView to fragment instances (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32814">#32814</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/facebook/react/commits/v19.2.0/packages/react-dom">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/react-dom` from 19.1.9 to 19.2.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:02:31 -04:00
dependabot[bot]
91c6a8a3a3
chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui (#3694)
Bumps [next](https://github.com/vercel/next.js) from 15.5.3 to 15.5.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.4</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: ensure onRequestError is invoked when otel enabled (<a
href="https://redirect.github.com/vercel/next.js/issues/83343">#83343</a>)</li>
<li>fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)</li>
<li>[devtool] fix overlay styles are missing (<a
href="https://redirect.github.com/vercel/next.js/issues/83721">#83721</a>)</li>
<li>Turbopack: don't match dynamic pattern for node_modules packages (<a
href="https://redirect.github.com/vercel/next.js/issues/83176">#83176</a>)</li>
<li>Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/82911">#82911</a>)</li>
<li>[turbopack] Improve handling of symlink resolution errors in
track_glob and read_glob (<a
href="https://redirect.github.com/vercel/next.js/issues/83357">#83357</a>)</li>
<li>Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/82939">#82939</a>)</li>
<li>fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)</li>
<li>Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84077">#84077</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>[CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)</li>
<li>CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83745">#83745</a>)</li>
<li>docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/yiminghe"><code>@​yiminghe</code></a>, <a
href="https://github.com/huozhi"><code>@​huozhi</code></a>, <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/mischnic"><code>@​mischnic</code></a>, <a
href="https://github.com/lukesandberg"><code>@​lukesandberg</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a>, <a
href="https://github.com/leerob"><code>@​leerob</code></a>, <a
href="https://github.com/fufuShih"><code>@​fufuShih</code></a>, <a
href="https://github.com/dwrth"><code>@​dwrth</code></a>, <a
href="https://github.com/aymericzip"><code>@​aymericzip</code></a>, <a
href="https://github.com/obendev"><code>@​obendev</code></a>, <a
href="https://github.com/molebox"><code>@​molebox</code></a>, <a
href="https://github.com/OoMNoO"><code>@​OoMNoO</code></a>, <a
href="https://github.com/pontasan"><code>@​pontasan</code></a>, <a
href="https://github.com/styfle"><code>@​styfle</code></a>, <a
href="https://github.com/HondaYt"><code>@​HondaYt</code></a>, <a
href="https://github.com/ryuapp"><code>@​ryuapp</code></a>, <a
href="https://github.com/lpalmes"><code>@​lpalmes</code></a>, and <a
href="https://github.com/ijjk"><code>@​ijjk</code></a> for helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40f1d7814d"><code>40f1d78</code></a>
v15.5.4</li>
<li><a
href="cb30f0a176"><code>cb30f0a</code></a>
[backport] docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
<li><a
href="b6a32bb579"><code>b6a32bb</code></a>
[backport] [CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/84087">#84087</a>)</li>
<li><a
href="26d61f1e9a"><code>26d61f1</code></a>
[backport] Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84079">#84079</a>)</li>
<li><a
href="e11e87a547"><code>e11e87a</code></a>
[backport] fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83">#83</a>...</li>
<li><a
href="0a29888575"><code>0a29888</code></a>
[backport] fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)...</li>
<li><a
href="7a53950c13"><code>7a53950</code></a>
[backport] Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/83804">#83804</a>)</li>
<li><a
href="050bdf1ae7"><code>050bdf1</code></a>
[backport] Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/83816">#83816</a>)</li>
<li><a
href="1f6ea09f85"><code>1f6ea09</code></a>
[backport] Turbopack: Improve handling of symlink resolution errors (<a
href="https://redirect.github.com/vercel/next.js/issues/83805">#83805</a>)</li>
<li><a
href="c7d1855499"><code>c7d1855</code></a>
[backport] CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83860">#83860</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.5.3...v15.5.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.5.3&new-version=15.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:01:38 -04:00
Matthew Farrellee
351c4b98e4
chore: inference=remote::llama-openai-compat does not support /v1/completion (#3683)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m22s
## What does this PR do?

skip completion tests for inference=remote::llama-openai-compat

## Test Plan

ci
2025-10-04 11:36:48 -07:00
Ashwin Bharambe
045a0c1d57
feat(tests): implement test isolation for inference recordings (#3681)
Uses test_id in request hashes and test-scoped subdirectories to prevent
cross-test contamination. Model list endpoints exclude test_id to enable
merging recordings from different servers.

Additionally, this PR adds a `record-if-missing` mode (which we will use
instead of `record` which records everything) which is very useful.

🤖 Co-authored with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-04 11:34:18 -07:00
Young Han
f176196fba
docs: Update links in README for quick start and documentation (#3678)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m59s
Previous quick start and documentation links linked to `Page Not Found`.

# What does this PR do?
<img width="900" height="316" alt="image"
src="https://github.com/user-attachments/assets/60ceac27-18db-4a3b-852f-8d139309f4cb"
/>
2025-10-03 20:51:46 -07:00
ehhuang
c21bb0e837
chore: fix setup_telemetry script (#3680)
# What does this PR do?
Added missing configuration files

## Test Plan
run ./scripts/telemetry/setup_telemetry.sh
```
OTEL_SERVICE_NAME=llama_stack OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 TELEMETRY_SINKS=otel_trace,otel_metric uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run
```
Navigate to grafana localhost:3000, query metrics and traces
2025-10-03 17:36:35 -07:00
Ashwin Bharambe
3f36bfaeaa
chore(tests): normalize recording IDs and timestamps to reduce git diff noise (#3676)
IDs are now deterministic hashes based on request content, and
timestamps are normalized to constants, eliminating spurious changes
when re-recording tests.

## Changes
- Updated `inference_recorder.py` to normalize IDs and timestamps during
recording
- Added `scripts/normalize_recordings.py` utility to re-normalize
existing recordings
- Created documentation in `tests/integration/recordings/README.md`
- Normalized 350 existing recording files
2025-10-03 17:26:11 -07:00
Alexey Rybak
6bcd3e25f2
chore: update CODEOWNERS (#3613)
# What does this PR do?

Update CODEOWNERS file 

## Test Plan
N/A
2025-10-03 17:12:34 -07:00
Francisco Arceo
7ec7e0c1ac
chore: Add weaviate client to unit group in pyproject.toml and uv.lock (#3675)
# What does this PR do?
`uv add "weaviate-client>=4.16.4" --group unit`

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-03 14:02:20 -07:00
Ashwin Bharambe
61b4238912
feat(api): add extra_body parameter support with shields example (#3670)
## Summary
Introduce `ExtraBodyField` annotation to enable parameters that arrive
via extra_body in client SDKs but are accessible server-side with full
typing.

These parameters are documented in OpenAPI specs under
**`x-llama-stack-extra-body-params`** but excluded from generated SDK
signatures.

Add `shields` parameter to `create_openai_response` as the first
implementation using this pattern.

## Test Plan
- added an integration test which checks that shields parameter passed
via extra_body reaches server implementation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-03 13:25:09 -07:00
Ashwin Bharambe
188a56af5c fix: merge workflows to avoid GITHUB_TOKEN limitation
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m16s
2025-10-03 12:04:02 -07:00
Ashwin Bharambe
f232b78ad6 fix(ci): update hashes 2025-10-03 11:58:49 -07:00
Ashwin Bharambe
5a44b9ff82
feat: add comment-triggered pre-commit bot for PRs (#3672)
## Summary

This PR adds a comment-triggered GitHub Actions workflow that allows
running pre-commit hooks on-demand for any pull request. When someone
comments `@github-actions run precommit` on a PR, the bot automatically
runs all pre-commit hooks and commits any formatting or linting fixes
directly to the PR branch.

The implementation uses a secure two-workflow approach: a trigger
workflow validates permissions and dispatches to an execution workflow
that runs pre-commit in a privileged context. This works safely for both
same-repo and fork PRs, with permission checks ensuring only PR authors
or repository collaborators can trigger the bot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-03 11:51:40 -07:00
Alexey Rybak
9f6c658f2a
docs: update OG image (#3669)
# What does this PR do?
* Updates OG image for docs preview

## Test Plan
* Manual testing
2025-10-03 10:22:54 -07:00
Matthew Farrellee
ce77c27ff8
chore: use remoteinferenceproviderconfig for remote inference providers (#3668)
# What does this PR do?

on the path to maintainable impls of inference providers. make all
configs instances of RemoteInferenceProviderConfig.

## Test Plan

ci
2025-10-03 08:48:42 -07:00
Francisco Arceo
a20e8eac8c
feat: Add OpenAI Conversations API (#3429)
# What does this PR do?

Initial implementation for `Conversations` and `ConversationItems` using
`AuthorizedSqlStore` with endpoints to:
- CREATE
- UPDATE
- GET/RETRIEVE/LIST
- DELETE

Set `level=LLAMA_STACK_API_V1`.

NOTE: This does not currently incorporate changes for Responses, that'll
be done in a subsequent PR.

Closes https://github.com/llamastack/llama-stack/issues/3235

## Test Plan
- Unit tests
- Integration tests

Also comparison of [OpenAPI spec for OpenAI
API](https://github.com/openai/openai-openapi/tree/manual_spec)
```bash
oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \
--match-path '(^/v1/openai/v1/conversations.*|^/conversations.*)'
```

Note I still have some uncertainty about this, I borrowed this info from
@cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need
to spend more time to confirm it's working, at the moment it suggests it
does.

UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks
like currently the spec does not list Conversations, so that analysis is
useless. Noting for future reference.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-03 08:47:18 -07:00
Charlie Doern
a09e30bd87
docs!: adjust external provider docs (#3484)
# What does this PR do?

now that we consolidated the providerspec types and got rid of
`AdapterSpec`, adjust external.md

BREAKING CHANGE: external providers must update their
`get_provider_spec` function to use `RemoteProviderSpec` properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-03 15:48:41 +02:00
Matthew Farrellee
d266c59c2a
chore: remove deprecated inference.chat_completion implementations (#3654)
# What does this PR do?

remove unused chat_completion implementations

vllm features ported -
 - requires max_tokens be set, use config value
 - set tool_choice to none if no tools provided


## Test Plan

ci
2025-10-03 07:55:34 -04:00
Anastas Stoyanovsky
4dfbe46954
fix(docs): Correct indentation in documented example for access_policy (#3652)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 17s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m21s
`access_policy` needs to be inside the `auth` section in config; this PR
corrects indentation in a documented example of configuring that
section.
2025-10-03 12:19:52 +02:00
Christian Zaccaria
bcdbb53be3
feat: implement keyword and hybrid search for Weaviate provider (#3264)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- This PR implements keyword and hybrid search for Weaviate DB based on
its inbuilt functions.
- Added fixtures to conftest.py for Weaviate.
- Enabled integration tests for remote Weaviate on all 3 search modes.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3010 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Unit tests and integration tests should pass on this PR.
2025-10-03 10:22:30 +02:00
Doug Edgar
52c8df2322
feat: auto-detect Console width (#3327)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Addresses Issue #3271 - "Starting LLS server locally on a terminal with
120 chars width results in an output with empty lines".

This removes the specific 150-character width limit specified for the
Console, and will now auto-detect the terminal width instead. Now the
formatting of Console output is consistent across different sizes of
terminal windows.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3271

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching the server with several different sizes of terminal windows
results in Console output without unexpected spacing. e.g. `python -m
llama_stack.core.server.server /tmp/run.yaml --port 8321`

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-10-03 10:19:31 +02:00
Matthew Farrellee
0a41c4ead0
chore: OpenAIMixin implements ModelsProtocolPrivate (#3662)
# What does this PR do?

add ModelsProtocolPrivate methods to OpenAIMixin

this will allow providers using OpenAIMixin to use a common interface


## Test Plan

ci w/ new tests
2025-10-02 21:32:02 -07:00
ehhuang
14a94e9894
fix: responses <> chat completion input conversion (#3645)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?

closes #3268
closes #3498

When resuming from previous response ID, currently we attempt to convert
from the stored responses input to chat completion messages, which is
not always possible, e.g. for tool calls where some data is lost once
converted from chat completion message to repsonses input format.

This PR stores the chat completion messages that correspond to the
_last_ call to chat completion, which is sufficient to be resumed from
in the next responses API call, where we load these saved messages and
skip conversion entirely.

Separate issue to optimize storage:
https://github.com/llamastack/llama-stack/issues/3646

## Test Plan
existing CI tests
2025-10-02 16:01:08 -07:00
Ashwin Bharambe
ef0736527d
feat(tools)!: substantial clean up of "Tool" related datatypes (#3627)
This is a sweeping change to clean up some gunk around our "Tool"
definitions.

First, we had two types `Tool` and `ToolDef`. The first of these was a
"Resource" type for the registry but we had stopped registering tools
inside the Registry long back (and only registered ToolGroups.) The
latter was for specifying tools for the Agents API. This PR removes the
former and adds an optional `toolgroup_id` field to the latter.

Secondly, as pointed out by @bbrowning in
https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132,
we were doing a lossy conversion from a full JSON schema from the MCP
tool specification into our ToolDefinition to send it to the model.
There is no necessity to do this -- we ourselves aren't doing any
execution at all but merely passing it to the chat completions API which
supports this. By doing this (and by doing it poorly), we encountered
limitations like not supporting array items, or not resolving $refs,
etc.

To fix this, we replaced the `parameters` field by `{ input_schema,
output_schema }` which can be full blown JSON schemas.

Finally, there were some types in our llama-related chat format
conversion which needed some cleanup. We are taking this opportunity to
clean those up.

This PR is a substantial breaking change to the API. However, given our
window for introducing breaking changes, this suits us just fine. I will
be landing a concurrent `llama-stack-client` change as well since API
shapes are changing.
2025-10-02 15:12:03 -07:00
ehhuang
1f5003d50e
chore: fix precommit (#3663)
# What does this PR do?


## Test Plan
2025-10-02 14:51:41 -07:00
ehhuang
ceca3c056f
chore: fix/add logging categories (#3658)
# What does this PR do?
These aren't controllable by LLAMA_STACK_LOGGING

```

tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [  3%]
tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [  7%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] 
instantiating llama_stack_client
WARNING  2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20                  
WARNING  2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
INFO     2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20                    
WARNING  2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20             
WARNING  2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20        
INFO     2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20 
```

## Test Plan
2025-10-02 13:10:13 -07:00
Ashwin Bharambe
6afa96b0b9 fix(api): fix a mistake from #3636 which overwrote POST /responses 2025-10-02 13:03:17 -07:00
Matthew Farrellee
0e13512dd7
chore: fix agents tests for non-ollama providers, provide max_tokens (#3657)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

closes #3656 


## Test Plan

openai is not enabled in ci, so manual testing with:

```
$ ./scripts/integration-tests.sh --stack-config ci-tests --suite base --setup gpt --subdirs agents --inference-mode live                            
=== Llama Stack Integration Test Runner ===
Stack Config: ci-tests
Setup: gpt
Inference Mode: live
Test Suite: base
Test Subdirs: agents
Test Pattern: 

Checking llama packages
llama-stack                              0.2.23          .../llama-stack
llama-stack-client                       0.3.0a3
ollama                                   0.5.1
=== System Resources Before Tests ===
...
=== Applying Setup Environment Variables ===
Setting up environment variables:
=== Running Integration Tests ===
Test subdirs to run: agents
Added test files from agents: 3 files

=== Running all collected tests in a single pytest command ===
Total test files: 3
+ pytest -s -v tests/integration/agents/test_persistence.py tests/integration/agents/test_openai_responses.py tests/integration/agents/test_agents.py --stack-config=ci-tests --inference-mode=live -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag )' --setup=gpt --color=yes --capture=tee-sys
WARNING  2025-10-02 13:14:32,653 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,043 root:258 uncategorized: Unknown logging category: tests. Falling back to default 'root' level: 20                    
INFO     2025-10-02 13:14:33,063 tests.integration.conftest:86 tests: Applying setup 'gpt'                                                            
========================================= test session starts ==========================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- .../.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'Linux-6.16.7-200.fc42.x86_64-x86_64-with-glibc2.41', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'html': '4.1.1', 'anyio': '4.9.0', 'timeout': '2.4.0', 'cov': '6.2.1', 'asyncio': '1.1.0', 'nbval': '0.11.0', 'socket': '0.7.0', 'json-report': '1.5.0', 'metadata': '3.1.1'}}
rootdir: ...
configfile: pyproject.toml
plugins: html-4.1.1, anyio-4.9.0, timeout-2.4.0, cov-6.2.1, asyncio-1.1.0, nbval-0.11.0, socket-0.7.0, json-report-1.5.0, metadata-3.1.1
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 32 items / 6 deselected / 26 selected                                                        

tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [  3%]
tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [  7%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] 
instantiating llama_stack_client
WARNING  2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20                  
WARNING  2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
INFO     2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20                    
WARNING  2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20             
WARNING  2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20        
INFO     2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:36,726 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass    
         Fireworks API Key in the header X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}                                         
WARNING  2025-10-02 13:14:36,727 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass     
         Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>}                                           
WARNING  2025-10-02 13:14:38,404 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key 
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
WARNING  2025-10-02 13:14:38,406 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is 
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in 
         the provider config.                                                                                                                         
WARNING  2025-10-02 13:14:38,408 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is   
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in   
         the provider config.                                                                                                                         
WARNING  2025-10-02 13:14:38,411 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key 
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
llama_stack_client instantiated in 5.237s
SKIPPED [ 11%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items[openai_client-txt=openai/gpt-4o] SKIPPED [ 15%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items_with_limit_and_order[txt=openai/gpt-4o] SKIPPED [ 19%]
tests/integration/agents/test_openai_responses.py::test_function_call_output_response[txt=openai/gpt-4o] SKIPPED [ 23%]
tests/integration/agents/test_openai_responses.py::test_function_call_output_response_with_none_arguments[txt=openai/gpt-4o] SKIPPED [ 26%]
tests/integration/agents/test_agents.py::test_agent_simple[openai/gpt-4o] PASSED                 [ 30%]
tests/integration/agents/test_agents.py::test_agent_name[txt=openai/gpt-4o] SKIPPED (this te...) [ 34%]
tests/integration/agents/test_agents.py::test_tool_config[openai/gpt-4o] PASSED                  [ 38%]
tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o] FAILED                  [ 42%]
tests/integration/agents/test_agents.py::test_custom_tool_infinite_loop[openai/gpt-4o] PASSED    [ 46%]
tests/integration/agents/test_agents.py::test_tool_choice_required[openai/gpt-4o] INFO     2025-10-02 13:14:51,559 llama_stack.providers.inline.agents.meta_reference.agent_instance:691 agents::meta_reference: done with MAX          
         iterations (2), exiting.                                                                                                                     
PASSED         [ 50%]
tests/integration/agents/test_agents.py::test_tool_choice_none[openai/gpt-4o] PASSED             [ 53%]
tests/integration/agents/test_agents.py::test_tool_choice_get_boiling_point[openai/gpt-4o] XFAIL [ 57%]
tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools0] PASSED [ 61%]
tests/integration/agents/test_agents.py::test_multi_tool_calls[openai/gpt-4o] PASSED             [ 65%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-False] SKIPPED [ 69%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items[client_with_models-txt=openai/gpt-4o] PASSED [ 73%]
tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools1] PASSED [ 76%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools1-True] SKIPPED [ 80%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools1-False] SKIPPED [ 84%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools0-True] SKIPPED [ 88%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools0-False] SKIPPED [ 92%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools1-True] SKIPPED [ 96%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools1-False] SKIPPED [100%]

=============================================== FAILURES ===============================================
___________________________________ test_custom_tool[openai/gpt-4o] ____________________________________
tests/integration/agents/test_agents.py:370: in test_custom_tool
    assert "-100" in logs_str
E   assert '-100' in "inference> Polyjuice Potion is a fictional substance from the Harry Potter series, and it doesn't have a scientifically defined boiling point. If you have any other real liquid in mind, feel free to ask!"
========================================= slowest 10 durations =========================================
5.47s setup    tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True]
4.78s call     tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o]
3.01s call     tests/integration/agents/test_agents.py::test_tool_choice_required[openai/gpt-4o]
2.97s call     tests/integration/agents/test_agents.py::test_agent_simple[openai/gpt-4o]
2.85s call     tests/integration/agents/test_agents.py::test_tool_choice_none[openai/gpt-4o]
2.06s call     tests/integration/agents/test_agents.py::test_multi_tool_calls[openai/gpt-4o]
1.83s call     tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools0]
1.83s call     tests/integration/agents/test_agents.py::test_custom_tool_infinite_loop[openai/gpt-4o]
1.29s call     tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools1]
0.57s call     tests/integration/agents/test_openai_responses.py::test_list_response_input_items[client_with_models-txt=openai/gpt-4o]
======================================= short test summary info ========================================
FAILED tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o] - assert '-100' in "inference> Polyjuice Potion is a fictional substance from the Harry Potter series...
=========== 1 failed, 9 passed, 15 skipped, 6 deselected, 1 xfailed, 139 warnings in 27.18s ============
```
note: the failure is separate from the issue being fixed
2025-10-02 14:30:13 -04:00
Alexey Rybak
24ee577cb0
docs: API spec generation for Stainless (#3655)
# What does this PR do?
* Adds stainless-llama-stack-spec.yaml for Stainless client generation,
which comprises stable + experimental APIs

## Test Plan
* Manual generation
2025-10-02 09:25:09 -07:00
Kelly Brown
1d02385e48
docs: Update docs navbar config (#3653)
## Description

Currently, the docs page has the home book opened by default. This PR
updates the .ts so that the sidebar books are collapsed when you first
open the webpage
2025-10-02 16:48:38 +02:00
Sébastien Han
4161102100
chore!: add double routes for v1/openai/v1 (#3636)
So that users get a warning in 0.3.0 and we remove them in 0.4.0.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-02 16:11:05 +02:00
Charlie Doern
f1748e2f92
fix: re-enable conformance skipping ability (#3651)
# What does this PR do?

this was broken by #3631, re-enable this ability by only using oasdiff
when .skip != 'true'

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-02 15:04:26 +02:00
Aakanksha Duggal
7e48cc48bc
refactor(agents): migrate to OpenAI chat completions API (#3323)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 18s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m16s
2025-10-02 06:50:32 -04:00
Chacksu
426dc54883
docs: Fix Dell distro documentation code snippets (#3640)
# What does this PR do?
* Updates code snippets for Dell distribution, fixing specific user home
directory in code (replacing with $HOME) and updates docker instructions
to use `docker` instead of `podman`.

## Test Plan
N.A.

Co-authored-by: Connor Hack <connorhack@fb.com>
2025-10-02 11:11:30 +02:00
Alexey Rybak
382eb25398
docs: fix more broken links (#3649)
# What does this PR do?
* Fixes some more documentation links

## Test Plan
* Manual testing
2025-10-02 10:43:49 +02:00
Alexey Rybak
cb36b3bab1
docs: add favicon and mobile styling (#3650)
# What does this PR do?
* Adds favicon
* Replaces old llama-stack theme image 
* Adds some mobile styling

## Test Plan
* Manual testing
2025-10-02 10:42:54 +02:00
Alexey Rybak
267f658968
docs: fix broken links (#3647)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 2m0s
# What does this PR do?
* Fixes numerous broken links in the new documentation 

## Test Plan
* Server builds
2025-10-01 16:48:13 -07:00
ehhuang
5adcf0e0cb
chore: Remove debug logging from telemetry adapter (#3643)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Spammy

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
n/a
2025-10-01 15:16:23 -07:00
Matthew Farrellee
4dbe0593f9
chore: add provider-data-api-key support to openaimixin (#3639)
# What does this PR do?

the LiteLLMOpenAIMixin provides support for reading key from provider
data (headers users send).

this adds the same functionality to the OpenAIMixin.

this is infrastructure for migrating providers.


## Test Plan

ci w/ new tests
2025-10-01 13:44:59 -07:00
Alexey Rybak
28bbbcf2c1
docs: adding supplementary markdown content to API specs (#3632)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 45s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?

Adds supplementary static content to root API spec pages. This is useful for giving context behind a specific API group, adding information on supported features or work in progress, etc.

This PR introduces supplementary information for Agents (experimental, deprecated) and Responses (stable) APIs.

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan

Documentation server renders rich static content for the Agents API group:

![image.png](https://app.graphite.dev/user-attachments/assets/fc521619-0320-4a22-9409-8ee3fb57ed0e.png)

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-10-01 10:15:30 -07:00
Alexey Rybak
b6a5bccadf
docs: api separation (#3630)
# What does this PR do?

First step towards cleaning up the API reference section of the docs.

- Separates API reference into 3 sections: stable (`v1`), experimental (`v1alpha` and `v1beta`), and deprecated (`deprecated=True`)
- Each section is accessible via the dropdown menu and `docs/api-overview`

<img width="1237" height="321" alt="Screenshot 2025-09-30 at 5 47 30 PM" src="https://github.com/user-attachments/assets/fe0e498c-b066-46ed-a48e-4739d3b6724c" />

<img width="860" height="510" alt="Screenshot 2025-09-30 at 5 47 49 PM" src="https://github.com/user-attachments/assets/a92a8d8c-94bf-42d5-9f5b-b47bb2b14f9c" />

- Deprecated APIs: Added styling to the sidebar, and a notice on the endpoint pages

<img width="867" height="428" alt="Screenshot 2025-09-30 at 5 47 43 PM" src="https://github.com/user-attachments/assets/9e6e050d-c782-461b-8084-5ff6496d7bd9" />

Closes #3628

TODO in follow-up PRs:

- Add the ability to annotate API groups with supplementary content  (so we can have longer descriptions of complex APIs like Responses)
- Clean up docstrings to show API endpoints (or short semantic titles) in the sidebar

## Test Plan

- Local testing
- Made sure API conformance test still passes
2025-10-01 10:13:31 -07:00
Alexey Rybak
7f1a33f51c
docs: update API conformance test (#3631)
# What does this PR do?

Given the rapidly changing nature of Llama Stack's APIs and the need to have clean, user-friendly API documentation, we want to split the API reference into 3 main buckets; stable, experimental and deprecated. The most straightforward way to do it is to have several automatically generated doctrees, which introduces some complexity in testing APIs for backwards compatibility. 

This PR updates the API conformance test to handle cases where the API schema is split into several files; it does not change the testing criteria. 

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan

No developer-facing changes (all existing tests should pass)

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-10-01 10:11:31 -07:00
ehhuang
853e9b3b0a
fix: log level (#3637)
# What does this PR do?
- categories like "core::server" is not recognized so it's level is not
set by 'all=debug'
- removed spammy telemetry debug logging

## Test Plan
test server launched with LLAMA_STACK_LOGGING='all=debug'
2025-10-01 09:51:39 -07:00
Charlie Doern
4819a2e0ee
feat(conformance): skip test if breaking change is ack (#3619)
# What does this PR do?

if the PR title has `!` or the footer of the commit has `BREAKING
CHANGE:`, skip conformance. This is documented in the API leveling
proposal

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-01 09:22:42 -07:00
Charlie Doern
d167101e70
feat(api): implement v1beta leveling, and additional alpha (#3594)
# What does this PR do?

level the following APIs, keeping their old routes around as well until
0.4.0

1. datasetio to v1beta: used primarily by eval and training. Given that
training is v1alpha, and eval is v1alpha, datasetio is likely to change
in structure as real usages of the API spin up. Register,unregister, and
iter dataset is sparsely implemented meaning the shape of that route is
likely to change.

2. telemetry to v1alpha: telemetry has been going through many changes.
for example query_metrics was not even implemented until recently and
had to change its shape to work. putting this in v1beta will allow us to
fix functionality like OTEL, sqlite, etc. The routes themselves are set,
but the structure might change a bit

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-01 09:18:11 -07:00
Matthew Farrellee
f7c5ef4ec0
chore: remove /v1/inference/completion and implementations (#3622)
# What does this PR do?

the /inference/completion route is gone. this removes the
implementations.

## Test Plan

ci
2025-10-01 11:36:53 -04:00
Matthew Farrellee
ea15f2a270
chore: use openai_chat_completion for llm as a judge scoring (#3635)
# What does this PR do?

update llm as a judge to use openai_chat_completion, instead of
deprecated chat_completion


## Test Plan

ci
2025-10-01 09:44:31 -04:00
Jaideep Rao
ca47d90926
fix: Ensure that tool calls with no arguments get handled correctly (#3560)
# What does this PR do?
When a model decides to use an MCP tool call that requires no arguments,
it sets the `arguments` field to `None`. This causes the user to see a
`400 bad requst error` due to validation errors down the stack because
this field gets removed when being parsed by an openai compatible
inference provider like vLLM
This PR ensures that, as soon as the tool call args are accumulated
while streaming, we check to ensure no tool call function arguments are
set to None - if they are we replace them with "{}"

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3456

## Test Plan
Added new unit test to verify that any tool calls with function
arguments set to `None` get handled correctly

---------

Signed-off-by: Jaideep Rao <jrao@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-01 08:36:57 -04:00
Ashwin Bharambe
42414a1a1b
fix(logging): disable console telemetry sink by default (#3623)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 21s
Test Llama Stack Build / build-single-provider (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 33s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m12s
The current span processing dumps so much junk on the console that it
makes actual understanding of what is going on in the server impossible.
I am killing the console sink as a default. If you want, you are always
free to change your run.yaml to add it.

Before: 
<img width="1877" height="1107" alt="image"
src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37"
/>

After:
<img width="1919" height="470" alt="image"
src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332"
/>
2025-09-30 14:58:05 -07:00
ehhuang
ac7c35fbe6
fix: don't pass default response format in Responses (#3614)
# What does this PR do?
Fireworks doesn't allow repsonse_format with tool use. The default
response format is 'text' anyway, so we can safely omit.


## Test Plan
Below script failed without the change, runs after.

```
#!/usr/bin/env python3
"""
Script to test Responses API with kubernetes-mcp-server.

This script:
1. Connects to the llama stack server
2. Uses the Responses API with MCP tools
3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server
"""

import json

from openai import OpenAI

# Connect to the llama stack server
base_url = "http://localhost:8321/v1"
client = OpenAI(base_url=base_url, api_key="fake")

# Define the MCP tool pointing to the kubernetes-mcp-server
# The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse
mcp_server_url = "http://localhost:3000/sse"

tools = [
    {
        "type": "mcp",
        "server_label": "k8s",
        "server_url": mcp_server_url,
    }
]

# Create a response request asking for k8s namespaces
print("Sending request to list Kubernetes namespaces...")
print(f"Using MCP server at: {mcp_server_url}")
print("Available tools will be listed automatically by the MCP server.")
print()

response = client.responses.create(
    # model="meta-llama/Llama-3.2-3B-Instruct",  # Using the vllm model
    model="fireworks/accounts/fireworks/models/llama4-scout-instruct-basic",
    # model="openai/gpt-4o",
    input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format UNDER ALL CIRCUMSTANCES.",
    tools=tools,
    stream=False,
)

print("\n" + "=" * 80)
print("RESPONSE OUTPUT:")
print("=" * 80)

# Print the output
for i, output in enumerate(response.output):
    print(f"\n[Output {i + 1}] Type: {output.type}")
    if output.type == "mcp_list_tools":
        print(f"  Server: {output.server_label}")
        print(f"  Tools available: {[t.name for t in output.tools]}")
    elif output.type == "mcp_call":
        print(f"  Tool called: {output.name}")
        print(f"  Arguments: {output.arguments}")
        print(f"  Result: {output.output}")
        if output.error:
            print(f"  Error: {output.error}")
    elif output.type == "message":
        print(f"  Role: {output.role}")
        print(f"  Content: {output.content}")

print("\n" + "=" * 80)
print("FINAL RESPONSE TEXT:")
print("=" * 80)
print(response.output_text)
```
2025-09-30 14:52:24 -07:00
grs
d350e3662b
feat: add support for require_approval argument when creating response (#3608)
# What does this PR do?
This PR adds support for the require_approval on an mcp tool definition
passed to create response in the Responses API. This allows the caller
to indicate whether they want to approve calls to that server, or let
them be called without approval.

Closes #3443

## Test Plan
Tested both approval and denial.
Added automated integration test for both cases.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-09-30 14:18:34 -07:00
Alexey Rybak
0837fa7bef
docs: update safety notebook (#3617)
# What does this PR do?
* Updates the safety guide in Zero to Hero series to use Moderations API
and the latest safety models
* Fixes an image link

Closes #2557

## Test Plan
* Manual testing
2025-09-30 14:11:12 -07:00
Alexey Rybak
c4c980b056
docs: frontpage update (#3620)
# What does this PR do?
* Adds canonical project information and links to client SDK / k8s
operator / app examples repos to the front page
* Fixes some button rendering errors

Closes #3618  

## Test Plan
Local rebuild of the documentation server
2025-09-30 14:11:00 -07:00
Ashwin Bharambe
606f4cf281
fix(expires_after): make sure multipart/form-data is properly parsed (#3612)
https://github.com/llamastack/llama-stack/pull/3604 broke multipart form
data field parsing for the Files API since it changed its shape -- so as
to match the API exactly to the OpenAI spec even in the generated client
code.

The underlying reason is that multipart/form-data cannot transport
structured nested fields. Each field must be str-serialized. The client
(specifically the OpenAI client whose behavior we must match),
transports sub-fields as `expires_after[anchor]` and
`expires_after[seconds]`, etc. We must be able to handle these fields
somehow on the server without compromising the shape of the YAML spec.

This PR "fixes" this by adding a dependency to convert the data. The
main trade-off here is that we must add this `Depends()` annotation on
every provider implementation for Files. This is a headache, but a much
more reasonable one (in my opinion) given the alternatives.

## Test Plan

Tests as shown in
https://github.com/llamastack/llama-stack/pull/3604#issuecomment-3351090653
pass.
2025-09-30 16:14:03 -04:00
Ashwin Bharambe
73de235ef1 fix(eval): use client.alpha for eval tests 2025-09-30 13:02:33 -07:00
slekkala1
cc64093ae4
feat(api): Add Vector Store File batches api stub (#3615)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?
Adding api stubs for vector store file batches apis
https://github.com/llamastack/llama-stack/issues/3533
API Ref:
https://platform.openai.com/docs/api-reference/vector-stores-file-batches

## Test Plan
CI
2025-09-30 12:07:33 -07:00
Charlie Doern
1e25a72ece
feat(api): level /agents as v1alpha (#3610)
# What does this PR do?

agents is likely to be deprecated in favor of responses. Lets level it
as alpha to indicate the lack of longterm support

keep v1 route for backwards compat.

Closes #3611

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-30 11:15:04 -07:00
Matthew Farrellee
2de4e6c900
feat: use /v1/chat/completions for safety model inference (#3591)
# What does this PR do?

migrate safety api implementation from /inference/chat-completion to
/v1/chat/completions

## Test Plan

ci w/ recordings

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-09-30 11:01:44 -07:00
Matthew Farrellee
cb33f45c11
chore: unpublish /inference/chat-completion (#3609)
# What does this PR do?

BREAKING CHANGE: removes /inference/chat-completion route and updates
relevant documentation

## Test Plan

🤷
2025-09-30 11:00:42 -07:00
Kai Wu
62e302613f
feat: add llamastack + CrewAI integration example notebook (#3275)
# What does this PR do?
Add llamastack + CrewAI integration example notebook


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Tested in local jupyternotebook and it works.
2025-09-30 10:23:57 -07:00
ehhuang
6cce553c93
fix: mcp tool with array type should include items (#3602)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test External API and Providers / test-external (venv) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 11s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
API Conformance Tests / check-schema-compatibility (push) Successful in 32s
UI Tests / ui-tests (22) (push) Successful in 57s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?
Fixes error:
```
[ERROR] Error executing endpoint route='/v1/openai/v1/responses'  
         method='post': Error code: 400 - {'error': {'message': "Invalid schema for function 'pods_exec': In context=('properties', 'command'), array 
         schema missing items.", 'type': 'invalid_request_error', 'param': 'tools[7].function.parameters', 'code': 'invalid_function_parameters'}} 
```

From script:
```
#!/usr/bin/env python3
"""
Script to test Responses API with kubernetes-mcp-server.

This script:
1. Connects to the llama stack server
2. Uses the Responses API with MCP tools
3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server
"""

import json

from openai import OpenAI

# Connect to the llama stack server
base_url = "http://localhost:8321/v1/openai/v1"
client = OpenAI(base_url=base_url, api_key="fake")

# Define the MCP tool pointing to the kubernetes-mcp-server
# The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse
mcp_server_url = "http://localhost:3000/sse"

tools = [
    {
        "type": "mcp",
        "server_label": "k8s",
        "server_url": mcp_server_url,
    }
]

# Create a response request asking for k8s namespaces
print("Sending request to list Kubernetes namespaces...")
print(f"Using MCP server at: {mcp_server_url}")
print("Available tools will be listed automatically by the MCP server.")
print()

response = client.responses.create(
    # model="meta-llama/Llama-3.2-3B-Instruct",  # Using the vllm model
    model="openai/gpt-4o",
    input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format.",
    tools=tools,
    stream=False,
)

print("\n" + "=" * 80)
print("RESPONSE OUTPUT:")
print("=" * 80)

# Print the output
for i, output in enumerate(response.output):
    print(f"\n[Output {i + 1}] Type: {output.type}")
    if output.type == "mcp_list_tools":
        print(f"  Server: {output.server_label}")
        print(f"  Tools available: {[t.name for t in output.tools]}")
    elif output.type == "mcp_call":
        print(f"  Tool called: {output.name}")
        print(f"  Arguments: {output.arguments}")
        print(f"  Result: {output.output}")
        if output.error:
            print(f"  Error: {output.error}")
    elif output.type == "message":
        print(f"  Role: {output.role}")
        print(f"  Content: {output.content}")

print("\n" + "=" * 80)
print("FINAL RESPONSE TEXT:")
print("=" * 80)
print(response.output_text)
```


## Test Plan
new unit tests
script now runs successfully
2025-09-29 23:11:41 -07:00
Ashwin Bharambe
56b625d18a
feat(openai_movement)!: Change URL structures to kill /openai/v1 (part 2) (#3605) 2025-09-29 22:57:37 -07:00
Ashwin Bharambe
3a09f00cdb
feat(files): fix expires_after API shape (#3604)
This was just quite incorrect. See source here:
https://platform.openai.com/docs/api-reference/files/create
2025-09-29 21:29:15 -07:00
Ashwin Bharambe
5e7fed8bbb
feat(openai_movement): Change URL structures to kill /openai/v1 (part 1) (#3587)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m19s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
The `/v1/openai/v1` prefix is annoying and now unnecessary given our
clearer focus on how to think about the API surface.

Let's kill it for the 0.3.0 update.

To make client-side changes feasible, we will do this in two parts. This
part adds a new route (sans `/openai/v1`) so the existing client
continues to work since the server supports both.

The next PR will be client-side (Stainless) changes which I will be
making shortly.

The final PR will remove the `/openai/v1` routes. 

Note that all these changes will happen rapidly within this release
cycle. The entire set _will be backwards incompatible_.
2025-09-29 16:14:35 -07:00
Michael Dawson
ddf3f1735a
fix: ensure usage is requested if telemetry is enabled (#3571)
# What does this PR do?
Refs: https://github.com/llamastack/llama-stack/issues/3420

When telemetry is enabled the router uncondionally expects the usage
attribute to be availble and fails if it is not present.

Usage is not currently being requested by litellm_openai_mixin.py for
streaming requests when using the responses API which means that
providers like vertexai fail if telemetry is enabled and streaming is
used.

This is part of the required fix. Other part is in liteLLM, will plan to
submit PR for that soon.

## Test Plan
I applied this change along with the change for litellm in a llama stack
deployment and validated that I could make streaming requests through
the responses API to a gemini model and they would succeed instead of
failing due to the missing usage attribute when telemetry is enabled.

Signed-off-by: Michael Dawson <midawson@redhat.com>
2025-09-29 14:09:08 -07:00
slekkala1
455579a88e
fix: Remove deprecated user param in OpenAIResponseObject (#3596)
# What does this PR do?
Just removing the deprecated User param in `OpenAIResponseObject`

Closing https://github.com/llamastack/llama-stack/issues/3482

## Test Plan
CI
2025-09-29 13:55:59 -07:00
Matthew Farrellee
e9eb004bf8
fix: remove inference.completion from docs (#3589)
# What does this PR do?

now that /v1/inference/completion has been removed, no docs should refer
to it

this cleans up remaining references

## Test Plan

ci

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-09-29 13:14:41 -07:00
Alexey Rybak
498be131a1
docs: update image paths (#3599)
# What does this PR do?
* Updates image paths for images in docs/resources/ to proper static
image locations

## Test Plan
* `npm run build` builds documentation properly
2025-09-29 13:14:05 -07:00
Matthew Farrellee
7c888fc0da
feat: update eval runner to use openai endpoints (#3588)
# What does this PR do?

move the eval=inline::meta-reference implementation to use
openai_completion/openai_chat_completion

note: this breaks backward compatibility if eval setup used sampling
params' repetition_penalty or strategy

## Test Plan

ci w/ new recordings

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-09-29 13:13:53 -07:00
Matthew Farrellee
45f438c027
chore: skip safety tests when shield not available (#3592)
# What does this PR do?

we skip embedding tests when the embedding_model_id isn't provided. same
for completion / chat tests when text_model_id isn't given.

instead of failing safety tests when a shield_id isn't provided, we'll
skip them too.

## Test Plan

ci

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-09-29 13:11:37 -07:00
Charlie Doern
aac42ddcc2
feat(api): level inference/rerank and remove experimental (#3565)
# What does this PR do?

inference/rerank is the one route in the API intended to not be
deprecated. Level it as v1alpha.

Additionally, remove `experimental` and opt to instead use `v1alpha`
which itself implies an experimental state based on the original
proposal

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-29 12:42:09 -07:00
Matthew Farrellee
975ead1d6a
chore(api): remove deprecated embeddings impls (#3301)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?

remove deprecated embeddings implementations
2025-09-29 14:45:09 -04:00
Kai Wu
aab22dc759
fix: adding mime type of application/json support (#3452)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fix #3300 by adding mime type of application/json support in
[agent_instance.py](4a59961a6c/llama_stack/providers/inline/agents/meta_reference/agent_instance.py (L923))
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[3300] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
all related pytest passed, see log:
```
./scripts/unit-tests.sh tests/unit/providers/agent/test_get_raw_document_text.py -vvv

/Users/kaiwu/work/kaiwu/llama-stack/.venv/bin/python3
Uninstalled 22 packages in 5.65s
Installed 47 packages in 1.24s
================= test session starts =================
platform darwin -- Python 3.12.9, pytest-8.4.2, pluggy-1.6.0 -- /Users/kaiwu/work/kaiwu/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.9', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/kaiwu/work/kaiwu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 14 items

tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_yaml_mime_type PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_deprecated_text_yaml_with_warning PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_deprecated_text_yaml_with_url PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_deprecated_text_yaml_with_text_content_item PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_json_mime_type PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_json_url PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_json_text_content_item PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unsupported_mime_types PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_url_content PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_yaml_url PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_text_content_item PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_yaml_text_content_item PASSED
tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unexpected_content_type PASSED

================ slowest 10 durations =================
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_deprecated_text_yaml_with_url
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unsupported_mime_types
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unexpected_content_type
0.00s setup    tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types
0.00s teardown tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_yaml_url
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_url_content
0.00s teardown tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unsupported_mime_types
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_json_url
0.00s call     tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types
================= 14 passed in 0.14s ==================
Generating coverage report...
Wrote HTML report to htmlcov-3.12/index.html
```
2025-09-29 11:27:31 -07:00
Ashwin Bharambe
fdb144f009
revert: feat(ci): use @next branch from llama-stack-client (#3593)
Reverts llamastack/llama-stack#3576

When I edit Stainless and codegen succeeds, the `next` branch is updated
directly. It provides us no chance to see if there might be something
unideal going on. If something is wrong, all CI will start breaking
immediately. This is not ideal. I will likely create another staging
branch `next-release` or something to accomodate the special workflow
that Stainless requires.
2025-09-29 10:41:04 -07:00
ehhuang
8ab6684a94
chore: introduce write queue for response_store (#3497)
# What does this PR do?
Mirroring the same changes that was used for inference_store:
https://github.com/llamastack/llama-stack/pull/3383

Will follow up with a shared internal API for managing these write
queues.

## Test Plan
existing tests
2025-09-29 10:36:16 -07:00
Matthew Farrellee
7c466a7ec5
chore: skip nvidia datastore tests when nvidia datastore is not enabled (#3590)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 28s
API Conformance Tests / check-schema-compatibility (push) Successful in 33s
UI Tests / ui-tests (22) (push) Successful in 58s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?

the nvidia datastore tests were running when the datastore was not
configured. they would always fail.

this introduces a skip when the nvidia datastore is not configured.


## Test Plan

ci
2025-09-29 05:15:58 -04:00
dependabot[bot]
90bb9cfb0a
chore(github-deps): bump actions/cache from 4.2.4 to 4.3.0 (#3577)
Bumps [actions/cache](https://github.com/actions/cache) from 4.2.4 to
4.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/releases">actions/cache's
releases</a>.</em></p>
<blockquote>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add note on runner versions by <a
href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1642">actions/cache#1642</a></li>
<li>Prepare <code>v4.3.0</code> release by <a
href="https://github.com/Link"><code>@​Link</code></a>- in <a
href="https://redirect.github.com/actions/cache/pull/1655">actions/cache#1655</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1642">actions/cache#1642</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v4...v4.3.0">https://github.com/actions/cache/compare/v4...v4.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's
changelog</a>.</em></p>
<blockquote>
<h1>Releases</h1>
<h3>4.3.0</h3>
<ul>
<li>Bump <code>@actions/cache</code> to <a
href="https://redirect.github.com/actions/toolkit/pull/2132">v4.1.0</a></li>
</ul>
<h3>4.2.4</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.5</li>
</ul>
<h3>4.2.3</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.3 (obfuscates SAS token in
debug logs for cache entries)</li>
</ul>
<h3>4.2.2</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.2</li>
</ul>
<h3>4.2.1</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.1</li>
</ul>
<h3>4.2.0</h3>
<p>TLDR; The cache backend service has been rewritten from the ground up
for improved performance and reliability. <a
href="https://github.com/actions/cache">actions/cache</a> now integrates
with the new cache service (v2) APIs.</p>
<p>The new service will gradually roll out as of <strong>February 1st,
2025</strong>. The legacy service will also be sunset on the same date.
Changes in these release are <strong>fully backward
compatible</strong>.</p>
<p><strong>We are deprecating some versions of this action</strong>. We
recommend upgrading to version <code>v4</code> or <code>v3</code> as
soon as possible before <strong>February 1st, 2025.</strong> (Upgrade
instructions below).</p>
<p>If you are using pinned SHAs, please use the SHAs of versions
<code>v4.2.0</code> or <code>v3.4.0</code></p>
<p>If you do not upgrade, all workflow runs using any of the deprecated
<a href="https://github.com/actions/cache">actions/cache</a> will
fail.</p>
<p>Upgrading to the recommended versions will not break your
workflows.</p>
<h3>4.1.2</h3>
<ul>
<li>Add GitHub Enterprise Cloud instances hostname filters to inform API
endpoint choices - <a
href="https://redirect.github.com/actions/cache/pull/1474">#1474</a></li>
<li>Security fix: Bump braces from 3.0.2 to 3.0.3 - <a
href="https://redirect.github.com/actions/cache/pull/1475">#1475</a></li>
</ul>
<h3>4.1.1</h3>
<ul>
<li>Restore original behavior of <code>cache-hit</code> output - <a
href="https://redirect.github.com/actions/cache/pull/1467">#1467</a></li>
</ul>
<h3>4.1.0</h3>
<ul>
<li>Ensure <code>cache-hit</code> output is set when a cache is missed -
<a
href="https://redirect.github.com/actions/cache/pull/1404">#1404</a></li>
<li>Deprecate <code>save-always</code> input - <a
href="https://redirect.github.com/actions/cache/pull/1452">#1452</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0057852bfa"><code>0057852</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1655">#1655</a>
from actions/Link-/prepare-4.3.0</li>
<li><a
href="4f5ea67f1c"><code>4f5ea67</code></a>
Update licensed cache</li>
<li><a
href="9fcad95d03"><code>9fcad95</code></a>
Upgrade actions/cache to 4.1.0 and prepare 4.3.0 release</li>
<li><a
href="638ed79f9d"><code>638ed79</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1642">#1642</a>
from actions/GhadimiR-patch-1</li>
<li><a
href="3862dccb17"><code>3862dcc</code></a>
Add note on runner versions</li>
<li>See full diff in <a
href="0400d5f644...0057852bfa">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/cache&package-manager=github_actions&previous-version=4.2.4&new-version=4.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-29 10:04:58 +02:00
dependabot[bot]
9fdfd3a2ad
chore(ui-deps): bump tw-animate-css from 1.2.9 to 1.4.0 in /llama_stack/ui (#3583)
Bumps [tw-animate-css](https://github.com/Wombosvideo/tw-animate-css)
from 1.2.9 to 1.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/Wombosvideo/tw-animate-css/releases">tw-animate-css's
releases</a>.</em></p>
<blockquote>
<h2>v1.4.0</h2>
<h2>Changelog</h2>
<p>902e37a019ffd165ba078e0b3c02634526c54bf0: fix: remove support for
prefix, add new export for prefixed version. Closes <a
href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/58">#58</a>.
fab2a5bf817605be1976e159976718a83489fc1c: chore: bump version to 1.4.0
and update dependencies
c20dc32e2b532a8e74546879b4ce7d9ce89ba710: fix(build): make transform.ts
accept two arguments</p>
<h2>⚠️ BREAKING CHANGE ⚠️</h2>
<p>Support for Tailwind CSS's prefix option was moved to
<code>tw-animate-css/prefix</code> because it was breaking the
<code>--spacing</code> function. Users requiring prefixes should replace
their import:</p>
<pre lang="diff"><code>- import &quot;tw-animate-css&quot;;
+ import &quot;tw-animate-css/prefix&quot;;
</code></pre>
<p><em>I do not plan to introduce breaking changes like this to
non-major releases in the future. But because more people use spacing
rather than prefixes, reverting the previous version's (obviously
breaking) change seems reasonable.</em></p>
<h2>v1.3.8</h2>
<h2>Changelog</h2>
<ul>
<li>b5ff23a: fix: add support for global CSS variable prefix. Closes <a
href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/48">#48</a></li>
<li>03e5f12: feat: add support for ng-primitives height variables <a
href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/56">#56</a>
(thanks <a
href="https://github.com/immohammadjaved"><code>@​immohammadjaved</code></a>)</li>
<li>b076cfb: docs: fix various issues in accordion and collapsible
docs</li>
<li>9485e33: chore: bump version to 1.3.8 and update dependencies</li>
</ul>
<h2>⚠️ BREAKING CHANGE ⚠️</h2>
<p>Adding support for prefixes broke custom spacing. It is recommended
that you skip this version if you do not use Tailwind CSS's prefix
option, and use v1.4.0 instead. If you are actually using prefixes, you
can use a special version supporting prefixes:</p>
<pre lang="diff"><code>- import &quot;tw-animate-css&quot;; /* Version
with spacing support */
+ import &quot;tw-animate-css/prefix&quot;; /* Version with prefix
support */
</code></pre>
<p><em>I do not plan to fix the incompatibility between the spacing and
prefix versions due to time constraints. Feel free to investigate and
open a pull request if you manage to fix it.</em></p>
<h2>v1.3.7</h2>
<h2>Changelog</h2>
<ul>
<li>80dbfcc: feat: add utilities for blur transitions <a
href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/54">#54</a>
(thanks <a
href="https://github.com/coffeeispower"><code>@​coffeeispower</code></a>)</li>
<li>dc294f9: docs: add upcoming changes warning</li>
<li>c640bb8: chore: update dependencies and package manager version</li>
<li>9e63e34: chore: bump version to 1.3.7</li>
</ul>
<h2>v1.3.6</h2>
<h2>Changelog</h2>
<ul>
<li>58f3396: fix: allow changing animation parameters for ready-to-use
animations</li>
<li>8313476: chore: update dependencies nd package manager version</li>
<li>f81346c: chore: bump version to 1.3.6</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c20dc32e2b"><code>c20dc32</code></a>
fix(build): make transform.ts accept two arguments</li>
<li><a
href="fab2a5bf81"><code>fab2a5b</code></a>
chore: bump version to 1.4.0 and update dependencies</li>
<li><a
href="902e37a019"><code>902e37a</code></a>
fix: remove support for prefix, add new export for prefixed version</li>
<li><a
href="9485e33d99"><code>9485e33</code></a>
chore: bump version to 1.3.8 and update dependencies</li>
<li><a
href="b076cfb04a"><code>b076cfb</code></a>
docs: fix various issues in accordion and collapsible docs</li>
<li><a
href="03e5f12418"><code>03e5f12</code></a>
feat: add support for ng-primitives height variables (<a
href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/56">#56</a>)</li>
<li><a
href="b5ff23a0d5"><code>b5ff23a</code></a>
fix: add support for global CSS variable prefix. Closes <a
href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/48">#48</a></li>
<li><a
href="9e63e34286"><code>9e63e34</code></a>
chore: bump version to 1.3.7</li>
<li><a
href="c640bb8933"><code>c640bb8</code></a>
chore: update dependencies and package manager version</li>
<li><a
href="dc294f990a"><code>dc294f9</code></a>
docs: add upcoming changes warning</li>
<li>Additional commits viewable in <a
href="https://github.com/Wombosvideo/tw-animate-css/compare/v1.2.9...v1.4.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tw-animate-css&package-manager=npm_and_yarn&previous-version=1.2.9&new-version=1.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-29 10:03:26 +02:00
dependabot[bot]
d95853d784
chore(ui-deps): bump shiki from 1.29.2 to 3.13.0 in /llama_stack/ui (#3585)
Bumps [shiki](https://github.com/shikijs/shiki/tree/HEAD/packages/shiki)
from 1.29.2 to 3.13.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/shikijs/shiki/releases">shiki's
releases</a>.</em></p>
<blockquote>
<h2>v3.13.0</h2>
<h3>   🚀 Features</h3>
<ul>
<li><strong>transformers</strong>: Render indent guides  -  by <a
href="https://github.com/KazariEX"><code>@​KazariEX</code></a> and <a
href="https://github.com/antfu"><code>@​antfu</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1060">shikijs/shiki#1060</a>
<a href="aecd1617"><!-- raw HTML
omitted -->(aecd1)<!-- raw HTML omitted --></a></li>
</ul>
<h5>    <a
href="https://github.com/shikijs/shiki/compare/v3.12.3...v3.13.0">View
changes on GitHub</a></h5>
<h2>v3.12.3</h2>
<h3>   🐞 Bug Fixes</h3>
<ul>
<li><code>@shikijs/twoslash</code> version specifier  -  by <a
href="https://github.com/9romise"><code>@​9romise</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1078">shikijs/shiki#1078</a>
<a href="a1cdea41"><!-- raw HTML
omitted -->(a1cde)<!-- raw HTML omitted --></a></li>
</ul>
<h5>    <a
href="https://github.com/shikijs/shiki/compare/v3.12.2...v3.12.3">View
changes on GitHub</a></h5>
<h2>v3.12.2</h2>
<h3>   🐞 Bug Fixes</h3>
<ul>
<li><strong>twoslash</strong>: Fix <code>onTwoslashError</code> return
value handling  -  by <a
href="https://github.com/Karibash"><code>@​Karibash</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1070">shikijs/shiki#1070</a>
<a href="e86b0a7c"><!-- raw HTML
omitted -->(e86b0)<!-- raw HTML omitted --></a></li>
</ul>
<h5>    <a
href="https://github.com/shikijs/shiki/compare/v3.12.1...v3.12.2">View
changes on GitHub</a></h5>
<h2>v3.12.1</h2>
<p><em>No significant changes</em></p>
<h5>    <a
href="https://github.com/shikijs/shiki/compare/v3.12.0...v3.12.1">View
changes on GitHub</a></h5>
<h2>v3.12.0</h2>
<h3>   🚀 Features</h3>
<ul>
<li><strong>vitepress-twoslash</strong>:
<ul>
<li>Improve UX for option customization  -  by <a
href="https://github.com/9romise"><code>@​9romise</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1066">shikijs/shiki#1066</a>
<a href="e3cfdeca"><!-- raw HTML
omitted -->(e3cfd)<!-- raw HTML omitted --></a></li>
<li>Twoslash inline type cache for markdown  -  by <a
href="https://github.com/serkodev"><code>@​serkodev</code></a> and <a
href="https://github.com/antfu"><code>@​antfu</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1063">shikijs/shiki#1063</a>
<a href="dc7fbc70"><!-- raw HTML
omitted -->(dc7fb)<!-- raw HTML omitted --></a></li>
</ul>
</li>
</ul>
<h3>   🐞 Bug Fixes</h3>
<ul>
<li><strong>remove-notation-escape</strong>: Correct escape sequence  - 
by <a href="https://github.com/sor4chi"><code>@​sor4chi</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1065">shikijs/shiki#1065</a>
<a href="22d0c780"><!-- raw HTML
omitted -->(22d0c)<!-- raw HTML omitted --></a></li>
</ul>
<h5>    <a
href="https://github.com/shikijs/shiki/compare/v3.11.0...v3.12.0">View
changes on GitHub</a></h5>
<h2>v3.11.0</h2>
<h3>   🚀 Features</h3>
<ul>
<li><strong>core</strong>: Add <code>enforce</code> options to
<code>ShikiTransformer</code>  -  by <a
href="https://github.com/serkodev"><code>@​serkodev</code></a> and <a
href="https://github.com/antfu"><code>@​antfu</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1062">shikijs/shiki#1062</a>
<a href="8ad05bd8"><!-- raw HTML
omitted -->(8ad05)<!-- raw HTML omitted --></a></li>
</ul>
<h5>    <a
href="https://github.com/shikijs/shiki/compare/v3.10.0...v3.11.0">View
changes on GitHub</a></h5>
<h2>v3.10.0</h2>
<h3>   🚀 Features</h3>
<ul>
<li>Add funding links to playground  -  by <a
href="https://github.com/jtbandes"><code>@​jtbandes</code></a> in <a
href="https://redirect.github.com/shikijs/shiki/issues/1054">shikijs/shiki#1054</a>
<a href="e36eb4d8"><!-- raw HTML
omitted -->(e36eb)<!-- raw HTML omitted --></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="fd7326a82f"><code>fd7326a</code></a>
chore: release v3.13.0</li>
<li><a
href="5cbb05219e"><code>5cbb052</code></a>
chore: release v3.12.3</li>
<li><a
href="e462618190"><code>e462618</code></a>
chore: release v3.12.2</li>
<li><a
href="793d71e68f"><code>793d71e</code></a>
chore: release v3.12.1</li>
<li><a
href="9260f3fd10"><code>9260f3f</code></a>
chore: release v3.12.0</li>
<li><a
href="d05f39b1e8"><code>d05f39b</code></a>
chore: release v3.11.0</li>
<li><a
href="bda1a76743"><code>bda1a76</code></a>
chore: release v3.10.0</li>
<li><a
href="09921f1cb8"><code>09921f1</code></a>
chore: release v3.9.2</li>
<li><a
href="854eddf2ed"><code>854eddf</code></a>
chore: release v3.9.1</li>
<li><a
href="950ede5ae5"><code>950ede5</code></a>
chore: release v3.9.0</li>
<li>Additional commits viewable in <a
href="https://github.com/shikijs/shiki/commits/v3.13.0/packages/shiki">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by [GitHub Actions](<a
href="https://www.npmjs.com/~GitHub">https://www.npmjs.com/~GitHub</a>
Actions), a new releaser for shiki since your current version.</p>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=shiki&package-manager=npm_and_yarn&previous-version=1.29.2&new-version=3.13.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-29 10:02:51 +02:00
Ashwin Bharambe
8dc9fd6844
feat(ci): use @next branch from llama-stack-client (#3576)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m16s
When we update Stainless (editor changes), the `next` branch gets
updated. Eventually when one decides on a release, you land changes into
`main`. This is the Stainless workflow.

This PR makes sure we follow that workflow by pulling from the `next`
branch for our integration tests.
2025-09-27 12:56:51 -07:00
Tami Takamiya
65f7b81e98
feat: Add items and title to ToolParameter/ToolParamDefinition (#3003)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Python Package Build Test / build (3.12) (push) Failing after 17s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (push) Failing after 20s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
API Conformance Tests / check-schema-compatibility (push) Successful in 25s
UI Tests / ui-tests (22) (push) Successful in 50s
Pre-commit / pre-commit (push) Successful in 1m16s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Add items and title to ToolParameter/ToolParamDefinition. Adding items
will resolve the issue that occurs with Gemini LLM when an MCP tool has
array-type properties.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Unite test cases will be added.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-09-27 11:35:29 -07:00
Sébastien Han
1a8d3ed315
chore: MANIFEST maintenance (#3454)
b4789c59 chore: exclude ci-test distro from the package
86a85da8 chore: re-add files in the package

commit b4789c5941
Author: Sébastien Han <seb@redhat.com>
Date:   Tue Sep 16 14:34:06 2025 +0200

    chore: exclude ci-test distro from the package
    
    This is a CI artifact, we shouldn't package it.
    Proof it works, when building ci-tests is not added:
    
    ```
    adding 'llama_stack/core/utils/serialize.py'
    adding 'llama_stack/distributions/__init__.py'
    adding 'llama_stack/distributions/template.py'
    adding 'llama_stack/distributions/dell/__init__.py'
    adding 'llama_stack/distributions/dell/build.yaml'
    adding 'llama_stack/distributions/dell/dell.py'
    adding 'llama_stack/distributions/dell/run-with-safety.yaml'
    adding 'llama_stack/distributions/dell/run.yaml'
    adding 'llama_stack/distributions/meta-reference-gpu/__init__.py'
    adding 'llama_stack/distributions/meta-reference-gpu/build.yaml'
adding 'llama_stack/distributions/meta-reference-gpu/meta_reference.py'
adding
'llama_stack/distributions/meta-reference-gpu/run-with-safety.yaml'
    adding 'llama_stack/distributions/meta-reference-gpu/run.yaml'
    adding 'llama_stack/distributions/nvidia/__init__.py'
    adding 'llama_stack/distributions/nvidia/build.yaml'
    adding 'llama_stack/distributions/nvidia/nvidia.py'
    adding 'llama_stack/distributions/nvidia/run-with-safety.yaml'
    adding 'llama_stack/distributions/nvidia/run.yaml'
    adding 'llama_stack/distributions/open-benchmark/__init__.py'
    adding 'llama_stack/distributions/open-benchmark/build.yaml'
    adding 'llama_stack/distributions/open-benchmark/open_benchmark.py'
    adding 'llama_stack/distributions/open-benchmark/run.yaml'
    adding 'llama_stack/distributions/postgres-demo/__init__.py'
    adding 'llama_stack/distributions/postgres-demo/build.yaml'
    adding 'llama_stack/distributions/postgres-demo/postgres_demo.py'
    adding 'llama_stack/distributions/postgres-demo/run.yaml'
    adding 'llama_stack/distributions/starter/__init__.py'
    adding 'llama_stack/distributions/starter/build.yaml'
    adding 'llama_stack/distributions/starter/run.yaml'
    adding 'llama_stack/distributions/starter/starter.py'
    adding 'llama_stack/distributions/starter-gpu/__init__.py'
    adding 'llama_stack/distributions/starter-gpu/build.yaml'
    adding 'llama_stack/distributions/starter-gpu/run.yaml'
    adding 'llama_stack/distributions/starter-gpu/starter_gpu.py'
    adding 'llama_stack/distributions/watsonx/__init__.py'
    adding 'llama_stack/distributions/watsonx/build.yaml'
    adding 'llama_stack/distributions/watsonx/run.yaml'
    adding 'llama_stack/distributions/watsonx/watsonx.py'
    adding 'llama_stack/models/__init__.py'
    adding 'llama_stack/models/llama/__init__.py'
    ```
    
    Signed-off-by: Sébastien Han <seb@redhat.com>

commit 86a85da877
Author: Sébastien Han <seb@redhat.com>
Date:   Tue Sep 16 14:45:37 2025 +0200

    chore: re-add files in the package
    
    These files were not added anymore since the path changed.
    
    Signed-off-by: Sébastien Han <seb@redhat.com>

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-27 11:28:11 -07:00
ehhuang
c392f3a0f4
chore: remove extra logging (#3574)
# What does this PR do?
This is already logged by console processor as INFO
<img width="1093" height="280" alt="image"
src="https://github.com/user-attachments/assets/780b0ac2-6744-49d7-b1d4-b7204050a6dc"
/>


## Test Plan
2025-09-27 11:22:54 -07:00
Matthew Farrellee
0d94f3e2c0
chore: recordings for fireworks (inference + openai) (#3573)
# What does this PR do?

recorded for: ./scripts/integration-tests.sh --stack-config
server:ci-tests --suite base --setup fireworks --subdirs inference
--pattern openai

## Test Plan

./scripts/integration-tests.sh --stack-config server:ci-tests --suite
base --setup fireworks --subdirs inference --pattern openai
2025-09-27 11:22:30 -07:00
Matthew Farrellee
53b15725b6
chore(apis): unpublish deprecated /v1/inference apis (#3297)
# What does this PR do?

unpublish (make unavailable to users) the following apis -
 - `/v1/inference/completion`, replaced by `/v1/openai/v1/completions`
- `/v1/inference/chat-completion`, replaced by
`/v1/openai/v1/chat/completions`
 - `/v1/inference/embeddings`, replaced by `/v1/openai/v1/embeddings`
 - `/v1/inference/batch-completion`, replaced by `/v1/openai/v1/batches`
- `/v1/inference/batch-chat-completion`, replaced by
`/v1/openai/v1/batches`

note: the implementations are still available for internal use, e.g.
agents uses chat-completion.
2025-09-27 11:20:06 -07:00
Matthew Farrellee
60484c5c4e
chore(api): remove batch inference (#3261)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?

APIs removed:
 - POST /v1/batch-inference/completion
 - POST /v1/batch-inference/chat-completion
 - POST /v1/inference/batch-completion
 - POST /v1/inference/batch-chat-completion

note -
- batch-completion & batch-chat-completion were only implemented for
inference=inline::meta-reference
 - batch-inference were not implemented
2025-09-26 14:35:34 -07:00
Matthew Farrellee
b48d5cfed7
feat(internal): add image_url download feature to OpenAIMixin (#3516)
# What does this PR do?

simplify Ollama inference adapter by -
 - moving image_url download code to OpenAIMixin
- being a ModelRegistryHelper instead of having one (mypy blocks
check_model_availability method assignment)

## Test Plan

 - add unit tests for new download feature
- add integration tests for openai_chat_completion w/ image_url (close
test gap)
2025-09-26 17:32:16 -04:00
github-actions[bot]
4487b88ffe build: Bump version to 0.2.23 2025-09-26 21:11:51 +00:00
Matthew Farrellee
7a25be633c
fix: Revert "fix: Added a bug fix when registering new models" (#3473)
the commit to be reverted is an public api behavior change to something
we should not support.

instead of allowing silent updates (the caller cannot see the log
messages), we should be sending an error to the caller that they must
first unregister the model before reusing the same name w/ a different
backend.
2025-09-26 16:19:21 -04:00
Matthew Farrellee
da5ea107fc
fix: ensure ModelRegistryHelper init for together and fireworks (#3572)
# What does this PR do?

address -
```
ERROR    2025-09-26 10:44:29,450 main:527 core::server: Error creating app: 'FireworksInferenceAdapter' object has no attribute
         'alias_to_provider_id_map'
```

## Test Plan

manual startup w/ valid together & fireworks api keys
2025-09-26 16:18:32 -04:00
Ben Browning
b6e2934f7b
fix: Gracefully handle errors when listing MCP tools (#2544)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?

When listing (and lazily indexing) tools, it's possible for an error to
get thrown by individual toolgroups if for example an MCP toolgroup is
unable to connect to its `mcp_endpoint`.

This logs a warning in the server when that happens, logs a full stack
trace of the error if debug logging is enabled, and just returns the
list of tools from all working toolgroups instead of throwing an error
to the client when a single toolgroup is temporarily or permanently
misbehaving.

The exception to the above is authentication errors, which we
specifically send all the way back to the client as that's how we
indicate to the client that it needs to provide authentication data for
the remote MCP servers.

Closes #2540

## Test Plan

A new unit test was added to test this exception handling, which is run
as part of our regular test suite but also manually run to specifically
verify this fix via:

```
uv run pytest -sv --asyncio-mode=auto \
tests/unit/distribution/routers/test_routing_tables.py
```

To verify the additional debug logging is printing properly:

```
LLAMA_STACK_LOGGING=core=debug \
uv run pytest -sv --asyncio-mode=auto \
tests/unit/distribution/routers/test_routing_tables.py
```

The mcp integration tests were run as below (and by CI):

```
ollama run llama3.2:3b

ENABLE_OLLAMA="ollama" \
OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=starter \
uv run pytest -sv tests/integration/tool_runtime/test_mcp.py \
  --text-model meta-llama/Llama-3.2-3B-Instruct
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-09-26 18:09:48 +02:00
Matthew Farrellee
926c3ada41
chore: prune mypy exclude list (#3561)
# What does this PR do?

prune the mypy exclude list, build a stronger foundation for quality
code


## Test Plan

ci
2025-09-26 11:44:43 -04:00
Charlie Doern
c88c4ff2c6
feat: introduce API leveling, post_training, eval to v1alpha (#3449)
# What does this PR do?

Rather than have a single `LLAMA_STACK_VERSION`, we need to have a
`_V1`, `_V1ALPHA`, and `_V1BETA` constant.

This also necessitated addition of `level` to the `WebMethod` so that
routing can be handeled properly.


For backwards compat, the `v1` routes are being kept around and marked
as `deprecated`. When used, the server will log a deprecation warning.

Deprecation log:

<img width="1224" height="134" alt="Screenshot 2025-09-25 at 2 43 36 PM"
src="https://github.com/user-attachments/assets/0cc7c245-dafc-48f0-be99-269fb9a686f9"
/>

move:
1. post_training to `v1alpha` as it is under heavy development and not
near its final state
2. eval: job scheduling is not implemented. Relies heavily on the
datasetio API which is under development missing implementations of
specific routes indicating the structure of those routes might change.
Additionally eval depends on the `inference` API which is going to be
deprecated, eval will likely need a major API surface change to conform
to using completions properly

implements leveling in #3317 

note: integration tests will fail until the SDK is regenerated with
v1alpha/inference as opposed to v1/inference

## Test Plan

existing tests should pass with newly generated schema. Conformance will
also pass as these routes are not the ones we currently test for
stability

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-26 16:18:07 +02:00
Matthew Farrellee
65e01b5684 feat: together now supports base64 embedding encoding (#3559)
# What does this PR do?

use together's new base64 support

## Test Plan

recordings for: ./scripts/integration-tests.sh --stack-config
server:ci-tests --suite base --setup together --subdirs inference
--pattern openai
2025-09-26 16:05:52 +02:00
Doug Edgar
9c751b6789
feat: use FIPS validated CSPRNG for telemetry (#3554)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 53s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?
Switches from `random.getrandbits` to `secrets.randbits` in the
telemetry module.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3553 

## Test Plan
Unit tests from scripts/unit-tests.sh were ran to verify the tests still
pass.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-09-26 11:17:25 +02:00
Alexey Rybak
28d83faf8a
fix: docs deployment URL (#3556)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?
Fixes Llama Stack docs deployment URL

## Test Plan
```
npm run gen-api-docs all
npm run build
```
successfully builds the documentation
2025-09-25 15:41:12 -07:00
Matthew Farrellee
b67aef2fc4
feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547)
# What does this PR do?

- remove auto-download of ollama embedding models
- add embedding model metadata to dynamic listing w/ unit test
- add support and tests for allowed_models
- removed inference provider models.py files where dynamic listing is
enabled
- store embedding metadata in embedding_model_metadata field on
inference providers
- make model_entries optional on ModelRegistryHelper and
LiteLLMOpenAIMixin
- make OpenAIMixin a ModelRegistryHelper
- skip base64 embedding test for remote::ollama, always returns floats
- only use OpenAI client for ollama model listing
- remove unused build_model_entry function
- remove unused get_huggingface_repo function


## Test Plan

ci w/ new tests
2025-09-25 17:17:00 -04:00
Matthew Farrellee
a50b63906c
chore: use ollama/all-minilm:l6-v2 for ollama tests (#3537)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?

use ollama embedding models for ollama test, previously using
sentence-transformer

recordings:
- ./scripts/integration-tests.sh --stack-config server:ci-tests --suite
base --setup ollama --inference-mode record
- ./scripts/integration-tests.sh --stack-config server:ci-tests --suite
vision --setup ollama-vision --inference-mode record

## Test Plan

ci w/ added skip base64 embedding test
2025-09-24 19:33:02 -04:00
Alexey Rybak
6101c8e015
docs: fix broken links (#3540)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

- Fixes broken links and Docusaurus search

Closes #3518

## Test Plan

The following should produce a clean build with no warnings and search enabled:

```
npm install
npm run gen-api-docs all
npm run build
npm run serve
```

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:16:31 -07:00
Alexey Rybak
8537ada11b
docs: MDX leftover fixes (#3536)
# What does this PR do?

- Fixes Docusaurus build errors

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan

- `npm run build`​ compiles the build properly
- Broken links expected and will be fixed in a follow-on PR

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:14:32 -07:00
Alexey Rybak
aebd728c81
docs: docusaurus setup (#3541)
# What does this PR do?

- Docusaurus server setup
- Deprecates Sphinx build pipeline
- Deprecates remaining references to Readthedocs
- MDX compile errors and broken links to be addressed in follow-up PRs

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan

```
npm install
npm gen-api-docs all
npm run build
```

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:11:30 -07:00
Alexey Rybak
610526d6d7
docs: static content migration (#3535)
# What does this PR do?

- Migrates static content from Sphinx to Docusaurus

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan



<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:08:50 -07:00
Alexey Rybak
c71ce8df61
docs: concepts and building_applications migration (#3534)
# What does this PR do?

- Migrates the remaining documentation sections to the new documentation format

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

## Test Plan

- Partial migration

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:05:30 -07:00
Alexey Rybak
05ff4c4420
docs: advanced_apis migration (#3532)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

- Migrates the `advanced_apis/`​ section of the docs to the new format

## Test Plan

- Partial migration

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:03:41 -07:00
Alexey Rybak
d23865757f
docs: provider and distro codegen migration (#3531)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

- Updates provider and distro codegen to handle the new format
- Migrates provider and distro files to the new format

## Test Plan

- Manual testing

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 14:01:29 -07:00
Alexey Rybak
45da31801c
fix: update API conformance test to point to new schema location (#3528)
# What does this PR do?

Update file paths in the conformance workflow to reflect the new location of the llama-stack-spec files from `docs/_static/` to `docs/static/`. Also update the `.gitignore` file to exclude Docusaurus-related directories (`docs/.docusaurus/` and `docs/node_modules/`).

## Test Plan

- Run the workflow locally
2025-09-24 13:59:31 -07:00
Alexey Rybak
0a7d1adfee
fix: update OpenAPI generator (#3527)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

Updates OpenAPI generator to use summaries and changed the file generation path. 

## Test Plan

- docs/openapi_generator/run_openapi_generator.sh

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 13:57:27 -07:00
Alexey Rybak
914c8cb605
fix: fix API docstrings for proper MDX parsing (#3526)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->

<!-- Closes #[issue-number] -->

_[Stack 1/10] Docusaurus documentation migration_

Updates the file upload API documentation to use proper OpenAPI format for integer parameters. Replaces `<int>` with `{integer}` in the description of the `expires_after[seconds]` parameter across the HTML spec, YAML spec, and Python implementation.

## Test Plan

- docs/openapi_generator/run_openapi_generator.sh

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
2025-09-24 13:55:12 -07:00
ehhuang
48a551ecbc
chore(perf): run guidellm benchmarks (#3421)
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m9s
# What does this PR do?
- Mostly AI-generated scripts to run guidellm
(https://github.com/vllm-project/guidellm) benchmarks on k8s setup
- Stack is using image built from main on 9/11


## Test Plan
See updated README.md
2025-09-24 10:18:33 -07:00
Nathan Weinberg
2f58d87c22
docs: fix typos in RAG docs (#3530)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 1m21s
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-09-23 14:30:24 -07:00
Matthew Farrellee
ce7a3b4dff
feat: update Cerebras inference provider to support dynamic model listing (#3481)
# What does this PR do?

- update Cerebras to use OpenAIMixin
- enable openai completions tests
- enable openai chat completions tests
- disable with n > 1 tests
- add recording for --setup cerebras --subdirs inference --pattern
openai


## Test Plan

`./scripts/integration-tests.sh --stack-config server:ci-tests --setup
cerebras --subdirs inference --pattern openai`

```
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] 
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.053s
PASSED                                                                                            [  2%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=cerebras/llama-3.3-70b-inference:completion:suffix] SKIPPED (Suffix is not supported for the model: cerebras/llama-3.3-70b.)                   [  4%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] PASSED                                                                                                [  6%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=cerebras/llama-3.3-70b-1] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.)             [  8%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=cerebras/llama-3.3-70b] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.)                 [ 10%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] PASSED                                                          [ 12%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] PASSED                                                                  [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cere...) [ 17%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-True] PASSED                                                                                                                     [ 19%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-True] PASSED                                                                                                          [ 21%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=cerebras/llama-3.3-70b] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support chat completion calls wit...) [ 23%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                               [ 25%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                            [ 27%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                  [ 29%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                             [ 31%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                         [ 34%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                            [ 36%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                         [ 38%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                          [ 40%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                 [ 42%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                     [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=cerebras/llama-3.3-70b-0] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.)             [ 46%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_02] PASSED                                                          [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] PASSED                                                                  [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cere...) [ 53%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-False] PASSED                                                                                                                    [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-False] PASSED                                                                                                         [ 57%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                          [ 59%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                       [ 61%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                             [ 63%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                        [ 65%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                    [ 68%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                       [ 70%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                    [ 72%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                     [ 74%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                            [ 76%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test)                                [ 78%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] PASSED                                                     [ 80%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] PASSED                                                             [ 82%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote:...) [ 85%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-True] PASSED                                                                                                                [ 87%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-True] PASSED                                                                                                     [ 89%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_02] PASSED                                                     [ 91%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] PASSED                                                             [ 93%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote:...) [ 95%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-False] PASSED                                                                                                               [ 97%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-False] PASSED                                                                                                    [100%]

=================================================================================================================== slowest 10 durations ====================================================================================================================
0.37s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01]
0.34s call     tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-False]
0.18s call     tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-True]
0.17s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity]
0.15s call     tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-True]
0.13s call     tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-True]
0.12s call     tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-False]
0.12s call     tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-True]
0.12s call     tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-False]
0.08s call     tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02]
================================================================================================================== short test summary info ==================================================================================================================
SKIPPED [1] tests/integration/inference/test_openai_completion.py:75: Suffix is not supported for the model: cerebras/llama-3.3-70b.
SKIPPED [3] tests/integration/inference/test_openai_completion.py:123: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:103: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py:129: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support chat completion calls with base64 encoded files.
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:90: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:112: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:136: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:154: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:175: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:195: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:206: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:217: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:244: embedding_model_id empty - skipping test
SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:278: embedding_model_id empty - skipping test
================================================================================================= 18 passed, 29 skipped, 50 deselected, 4 warnings in 3.02s =================================================================================================
```
2025-09-23 16:26:00 -04:00
Matthew Farrellee
d07ebce4d9
feat: (re-)enable Databricks inference adapter (#3500)
# What does this PR do?

add/enable the Databricks inference adapter

Databricks inference adapter was broken, closes #3486 

- remove deprecated completion / chat_completion endpoints
- enable dynamic model listing w/o refresh, listing is not async
- use SecretStr instead of str for token
- backward incompatible change: for consistency with databricks docs,
env DATABRICKS_URL -> DATABRICKS_HOST and DATABRICKS_API_TOKEN ->
DATABRICKS_TOKEN
- databricks urls are custom per user/org, add special recorder handling
for databricks urls
- add integration test --setup databricks
- enable chat completions tests
- enable embeddings tests
- disable n > 1 tests
- disable embeddings base64 tests
- disable embeddings dimensions tests

note: reasoning models, e.g. gpt oss, fail because databricks has a
custom, incompatible response format

## Test Plan

ci and 

```
./scripts/integration-tests.sh --stack-config server:ci-tests --setup databricks --subdirs inference --pattern openai
```

note: databricks needs to be manually added to the ci-tests distro for
replay testing
2025-09-23 15:37:23 -04:00
ehhuang
9406a998b9
chore: refactor tracingmiddelware (#3520)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 1m21s
# What does this PR do?
Just moving TracingMiddleware to a new file

## Test Plan

CI
2025-09-23 10:14:41 -07:00
Matthew Farrellee
2be869b3ef
fix(dev): fix vllm inference recording (await models.list) (#3524)
# What does this PR do?

fix inference recording for vLLM

closes #3523 

## Test Plan

```
$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference --inference-mode record --pattern test_text_chat_completion_non_streaming

=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: vllm
Inference Mode: record
Test Suite: base
Test Subdirs: inference
Test Pattern: test_text_chat_completion_non_streaming

...

=== Applying Setup Environment Variables ===
Setting up environment variables:
export VLLM_URL='http://localhost:8000/v1'

=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start...
 Llama Stack Server started successfully

=== Running Integration Tests ===
Test subdirs to run: inference
Added test files from inference: 6 files

=== Running all collected tests in a single pytest command ===
Total test files: 6
+ pytest -s -v tests/integration/inference/test_openai_completion.py tests/integration/inference/test_batch_inference.py tests/integration/inference/test_openai_embeddings.py tests/integration/inference/test_text_inference.py tests/integration/inference/test_vision_inference.py tests/integration/inference/test_embedding.py --stack-config=server:ci-tests --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag or test_inference_store_tool_calls ) and test_text_chat_completion_non_streaming' --setup=vllm --color=yes --capture=tee-sys
INFO     2025-09-23 10:35:36,662 tests.integration.conftest:86 tests: Applying setup 'vllm'                                                           
======================================================= test session starts =======================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- .../.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'Linux-6.16.7-200.fc42.x86_64-x86_64-with-glibc2.41', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'html': '4.1.1', 'anyio': '4.9.0', 'timeout': '2.4.0', 'cov': '6.2.1', 'asyncio': '1.1.0', 'nbval': '0.11.0', 'socket': '0.7.0', 'json-report': '1.5.0', 'metadata': '3.1.1'}}
rootdir: ...
configfile: pyproject.toml
plugins: html-4.1.1, anyio-4.9.0, timeout-2.4.0, cov-6.2.1, asyncio-1.1.0, nbval-0.11.0, socket-0.7.0, json-report-1.5.0, metadata-3.1.1
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 97 items / 95 deselected / 2 selected                                                                                   

tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01] 
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.044s
PASSED [ 50%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02] PASSED [100%]

====================================================== slowest 10 durations =======================================================
1.62s call     tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02]
0.93s call     tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]
0.62s setup    tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]

(3 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================== 2 passed, 95 deselected, 6 warnings in 3.26s ===========================================
+ exit_code=0
+ set +x
 All tests completed successfully
```

```
$ git status
...
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	tests/integration/recordings/responses/032f8c5a1289.json
	tests/integration/recordings/responses/c42baf6a3700.json
	tests/integration/recordings/responses/models-bd032f995f2a-fb68f5a6.json
...
```
2025-09-23 12:56:33 -04:00
Matthew Farrellee
62e0aef7bc
fix: return llama stack model id from embeddings (#3525)
# What does this PR do?

the openai_embeddings method on OpenAIMixin was returning the provider's
model id instead of the llama stack name

## Test Plan

before -
```
$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --subdirs inference --inference-mode live --pattern test_openai_embeddings_single_string
...
FAILED tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=openai/text-embedding-3-small] - AssertionError: assert 'text-embedding-3-small' == 'openai/text-...dding-3-small'
FAILED tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=openai/text-embedding-3-small] - AssertionError: assert 'text-embedding-3-small' == 'openai/text-...dding-3-small'
========================================== 2 failed, 95 deselected, 4 warnings in 3.87s ===========================================
```
after -
```
$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --subdirs inference --inference-mode live --pattern test_openai_embeddings_single_string ...
========================================== 2 passed, 95 deselected, 4 warnings in 2.12s ===========================================
```
2025-09-23 12:30:00 -04:00
ehhuang
a7f9ce9a3a
chore: fix build (#3522)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 2s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m12s
# What does this PR do?
error:
5099094847


## Test Plan
GITHUB_ACTIONS=true BUILD_PLATFORM=linux/amd64 USE_COPY_NOT_MOUNT=true
LLAMA_STACK_DIR=. uv run --with llama-stack llama stack build --distro
starter --image-type container --image-name ehhuang/distribution-starter

succeeds
2025-09-22 22:53:48 -07:00
slekkala1
8d8261961e
chore: Refactor fireworks to use OpenAIMixin (#3480)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?
Refactor Fireworks to use OpenAIMixin

Closes https://github.com/llamastack/llama-stack/issues/3391
Related to https://github.com/llamastack/llama-stack/issues/3387

## Test Plan
```
(llama-stack) (base) swapna942@swapna942-mac llama-stack % FIREWORKS_API_KEY=**** ./scripts/integration-tests.sh --stack-config server:ci-tests --setup fireworks --subdirs inference --pattern openai

tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] 
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.031s
PASSED [  2%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [  4%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [  6%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [  8%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 10%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 12%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 14%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 17%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 19%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 21%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:completion:sanity] PASSED [ 23%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:completion:suffix] SKIPPED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:completion:sanity] PASSED [ 27%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-1] SKIPPED [ 29%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=accounts/fireworks/models/llama-v3p1-8b-instruct] SKIPPED [ 31%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 34%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 36%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 38%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 40%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 42%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=accounts/fireworks/models/llama-v3p1-8b-instruct] SKIPPED [ 44%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 46%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 48%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 51%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 53%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 55%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 57%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 59%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 61%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 63%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 65%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-0] SKIPPED [ 68%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 72%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [ 76%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [ 78%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 80%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 82%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 87%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 89%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 91%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 93%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 95%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [ 97%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [100%]

========================================== slowest 10 durations ==========================================
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_02]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
================= 36 passed, 11 skipped, 50 deselected, 4 warnings in 1429.05s (0:23:49) =================
+ exit_code=0
+ set +x
 All tests completed successfully
```
2025-09-22 13:19:36 -04:00
Kai Wu
e3fd70c321
fix: change ModelRegistryHelper to use ProviderModelEntry instead of hardcoded ModelType.llm (#3451)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
change ModelRegistryHelper to use ProviderModelEntry instead of
hardcoded ModelType.llm which fixed issue #3330.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[3330] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. open llama-stack server 
```
uv sync --python 3.12
source .venv/bin/activate
uv run llama stack build --distro starter --image-type venv  --run
```
2.Used following script to test 
```
from llama_stack_client import LlamaStackClient
import os
def test_openai_embedding_type():
    client = LlamaStackClient(
        base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:8321"),
        provider_data={
        "openai_api_key": os.environ.get("OPENAI_API_KEY", ""),
    },
    )
    model = client.models.retrieve("openai/text-embedding-3-small")
    print(model)
    assert model.identifier == "openai/text-embedding-3-small"
    assert model.model_type == "embedding"
test_openai_embedding_type()
```
logs:
```
python test_openai.py
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models/openai/text-embedding-3-small "HTTP/1.1 200 OK"
Model(identifier='openai/text-embedding-3-small', metadata={'embedding_dimension': 1536.0, 'context_length': 8192.0}, api_model_type='embedding', provider_id='openai', type='model', provider_resource_id='text-embedding-3-small', owner=None, source='listed_from_provider', model_type='embedding')
```
2025-09-22 12:55:32 -04:00
dependabot[bot]
a1301911e4
chore(ui-deps): bump jest-environment-jsdom from 29.7.0 to 30.1.2 in /llama_stack/ui (#3509)
Bumps
[jest-environment-jsdom](https://github.com/jestjs/jest/tree/HEAD/packages/jest-environment-jsdom)
from 29.7.0 to 30.1.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest-environment-jsdom's
releases</a>.</em></p>
<blockquote>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li>`[jest-snapshot-utils] Fix deprecated goo.gl snapshot guide link not
getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<h1>Chore</h1>
<ul>
<li><code>[expect]</code> Update docblock for <code>toContain()</code>
to display info on substring check (<a
href="https://redirect.github.com/jestjs/jest/pull/15789">#15789</a>)</li>
</ul>
<h2>30.0.2</h2>
<h2>What's Changed</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Make 'deepCyclicCopyObject' safer
by setting descriptors to a null-prototype object (<a
href="https://redirect.github.com/jestjs/jest/pull/15689">#15689</a>)</li>
<li><code>[jest-util]</code> Make garbage collection protection property
writable (<a
href="https://redirect.github.com/jestjs/jest/pull/15689">#15689</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">https://github.com/jestjs/jest/blob/main/CHANGELOG.md</a></p>
<h2>Jest 30.0.1</h2>
<h2>What's Changed</h2>
<h3>Features</h3>
<ul>
<li><code>[jest-resolver]</code> Implement the
<code>defaultAsyncResolver</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15679">#15679</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-resolver]</code> Resolve builtin modules correctly (<a
href="https://redirect.github.com/jestjs/jest/pull/15683">#15683</a>)</li>
<li><code>[jest-environment-node, jest-util]</code> Avoid setting
globals cleanup protection symbol when feature is off (<a
href="https://redirect.github.com/jestjs/jest/pull/15684">#15684</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest-environment-jsdom's
changelog</a>.</em></p>
<blockquote>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
guide link not getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<h1>Chore</h1>
<ul>
<li><code>[expect]</code> Update docblock for <code>toContain()</code>
to display info on substring check (<a
href="https://redirect.github.com/jestjs/jest/pull/15789">#15789</a>)</li>
</ul>
<h2>30.0.5</h2>
<h3>Features</h3>
<ul>
<li><code>[jest-config]</code> Allow <code>testMatch</code> to take a
string value</li>
<li><code>[jest-worker]</code> Let <code>workerIdleMemoryLimit</code>
accept 0 to always restart worker child processes</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[expect]</code> Fix <code>bigint</code> error (<a
href="https://redirect.github.com/jestjs/jest/pull/15702">#15702</a>)</li>
</ul>
<h2>30.0.4</h2>
<h3>Features</h3>
<ul>
<li><code>[expect]</code> The <code>Inverse</code> type is now exported
(<a
href="https://redirect.github.com/jestjs/jest/pull/15714">#15714</a>)</li>
<li><code>[expect]</code> feat: support <code>async functions</code> in
<code>toBe</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15704">#15704</a>)</li>
</ul>
<h3>Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ebfa31cc97"><code>ebfa31c</code></a>
v30.1.2</li>
<li><a
href="d347c0f3f8"><code>d347c0f</code></a>
v30.1.1</li>
<li><a
href="4d5f41d088"><code>4d5f41d</code></a>
v30.1.0</li>
<li><a
href="22236cf58b"><code>22236cf</code></a>
v30.0.5</li>
<li><a
href="f4296d2bc8"><code>f4296d2</code></a>
v30.0.4</li>
<li><a
href="393acbfac3"><code>393acbf</code></a>
v30.0.2</li>
<li><a
href="5ce865b406"><code>5ce865b</code></a>
v30.0.1</li>
<li><a
href="469f665c2d"><code>469f665</code></a>
v30.0.0</li>
<li><a
href="ce14203d91"><code>ce14203</code></a>
v30.0.0-rc.1</li>
<li><a
href="ac334c0cdf"><code>ac334c0</code></a>
v30.0.0-beta.8</li>
<li>Additional commits viewable in <a
href="https://github.com/jestjs/jest/commits/v30.1.2/packages/jest-environment-jsdom">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jest-environment-jsdom&package-manager=npm_and_yarn&previous-version=29.7.0&new-version=30.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 13:57:10 +02:00
dependabot[bot]
7c4a740a08
chore(ui-deps): bump @radix-ui/react-dialog from 1.1.13 to 1.1.15 in /llama_stack/ui (#3510)
Bumps [@radix-ui/react-dialog](https://github.com/radix-ui/primitives)
from 1.1.13 to 1.1.15.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-dialog&package-manager=npm_and_yarn&previous-version=1.1.13&new-version=1.1.15)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 13:56:58 +02:00
dependabot[bot]
21f7667bb7
chore(ui-deps): bump remeda from 2.30.0 to 2.32.0 in /llama_stack/ui (#3511)
Bumps [remeda](https://github.com/remeda/remeda) from 2.30.0 to 2.32.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/remeda/remeda/releases">remeda's
releases</a>.</em></p>
<blockquote>
<h2>v2.32.0</h2>
<h1><a
href="https://github.com/remeda/remeda/compare/v2.31.1...v2.32.0">2.32.0</a>
(2025-09-18)</h1>
<h3>Features</h3>
<ul>
<li>toTitleCase (<a
href="https://redirect.github.com/remeda/remeda/issues/1200">#1200</a>)
(<a
href="90866698f7">9086669</a>)</li>
</ul>
<h2>v2.31.1</h2>
<h2><a
href="https://github.com/remeda/remeda/compare/v2.31.0...v2.31.1">2.31.1</a>
(2025-09-09)</h2>
<p><em>This version is identical to <a
href="https://github.com/remeda/remeda/releases/tag/v2.31.0">2.31.0</a>.
We were experimenting with some modernizations of our release pipelines
and needed to generate a release as part of testing those
changes.</em></p>
<h2>v2.31.0</h2>
<h1><a
href="https://github.com/remeda/remeda/compare/v2.30.0...v2.31.0">2.31.0</a>
(2025-09-08)</h1>
<h3>Features</h3>
<ul>
<li><strong>conditional:</strong> remove <code>defaultCase</code> (<a
href="https://redirect.github.com/remeda/remeda/issues/1192">#1192</a>)
(<a
href="ebea7b3bc6">ebea7b3</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/1114">#1114</a></li>
<li><strong>isEmptyish:</strong> a wider variant of <code>isEmpty</code>
that accepts any input. (<a
href="https://redirect.github.com/remeda/remeda/issues/1180">#1180</a>)
(<a
href="025b2ec8d8">025b2ec</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/775">#775</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="90866698f7"><code>9086669</code></a>
feat: toTitleCase (<a
href="https://redirect.github.com/remeda/remeda/issues/1200">#1200</a>)</li>
<li><a
href="6198796b25"><code>6198796</code></a>
docs: even more broken links (<a
href="https://redirect.github.com/remeda/remeda/issues/1202">#1202</a>)</li>
<li><a
href="705c29d48c"><code>705c29d</code></a>
docs: fix broken links in migration articles (<a
href="https://redirect.github.com/remeda/remeda/issues/1201">#1201</a>)</li>
<li><a
href="d24c932fa3"><code>d24c932</code></a>
docs(string): rework all docs in the string category + lodash migration
(<a
href="https://redirect.github.com/remeda/remeda/issues/1199">#1199</a>)</li>
<li><a
href="ae0e1156d6"><code>ae0e115</code></a>
fix(release): revert OIDC release (for now) + jsr provenance (<a
href="https://redirect.github.com/remeda/remeda/issues/1197">#1197</a>)</li>
<li><a
href="6293fc2e95"><code>6293fc2</code></a>
fix(semantic-release): provide a github token in the env (<a
href="https://redirect.github.com/remeda/remeda/issues/1196">#1196</a>)</li>
<li><a
href="53c4e07f14"><code>53c4e07</code></a>
fix(npm): use oidc when publishing to npm (<a
href="https://redirect.github.com/remeda/remeda/issues/1195">#1195</a>)</li>
<li><a
href="3cdc9833ea"><code>3cdc983</code></a>
chore(deps): bump locked versions (<a
href="https://redirect.github.com/remeda/remeda/issues/1194">#1194</a>)</li>
<li><a
href="49a295b58a"><code>49a295b</code></a>
chore(deps): manually bump everything (<a
href="https://redirect.github.com/remeda/remeda/issues/1193">#1193</a>)</li>
<li><a
href="ebea7b3bc6"><code>ebea7b3</code></a>
feat(conditional): remove <code>defaultCase</code> (<a
href="https://redirect.github.com/remeda/remeda/issues/1192">#1192</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/remeda/remeda/compare/v2.30.0...v2.32.0">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~eranhirsch">eranhirsch</a>, a new releaser
for remeda since your current version.</p>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=remeda&package-manager=npm_and_yarn&previous-version=2.30.0&new-version=2.32.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 13:56:43 +02:00
dependabot[bot]
6ce2cf3e12
chore(github-deps): bump astral-sh/setup-uv from 6.6.1 to 6.7.0 (#3502)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.6.1 to 6.7.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.7.0 🌈 New inputs <code>restore-cache</code> and
<code>save-cache</code></h2>
<h2>Changes</h2>
<p>This release adds fine-grained control over the caching steps.</p>
<ul>
<li>The input <code>restore-cache</code> (<code>true</code> by default)
can be set to <code>false</code> to skip restoring the cache while still
allowing to save the cache.</li>
<li>The input <code>save-cache</code> (<code>true</code> by default) can
be set to <code>false</code> to skip saving the cache.</li>
</ul>
<p>Skipping cache saving can be useful if you know, that you will never
use this version of the cache again and don't want to waste storage
space:</p>
<pre lang="yaml"><code>- name: Save cache only on main branch
  uses: astral-sh/setup-uv@v6
  with:
    enable-cache: true
    save-cache: ${{ github.ref == 'refs/heads/main' }}
</code></pre>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add inputs restore-cache and save-cache <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/568">#568</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>bump deps <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/569">#569</a>)</li>
<li>Automatically push updated known versions <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/565">#565</a>)</li>
<li>chore: update known versions for 0.8.16/0.8.17 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/562">#562</a>)</li>
<li>chore: update known versions for 0.8.15 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/550">#550</a>)</li>
<li>chore(ci): address CI lint findings <a
href="https://github.com/woodruffw"><code>@​woodruffw</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/545">#545</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump github/codeql-action from 3.29.11 to 3.30.3 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/566">#566</a>)</li>
<li>Bump actions/setup-node from 4.4.0 to 5.0.0 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/551">#551</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b75a909f75"><code>b75a909</code></a>
bump deps (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/569">#569</a>)</li>
<li><a
href="ffff8aa2b5"><code>ffff8aa</code></a>
Bump github/codeql-action from 3.29.11 to 3.30.3 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/566">#566</a>)</li>
<li><a
href="95d0e233fa"><code>95d0e23</code></a>
Bump actions/setup-node from 4.4.0 to 5.0.0 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/551">#551</a>)</li>
<li><a
href="dc724a12b6"><code>dc724a1</code></a>
Add inputs restore-cache and save-cache (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/568">#568</a>)</li>
<li><a
href="f67343ac2e"><code>f67343a</code></a>
Automatically push updated known versions (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/565">#565</a>)</li>
<li><a
href="4dd9f52a47"><code>4dd9f52</code></a>
chore: update known versions for 0.8.16/0.8.17 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/562">#562</a>)</li>
<li><a
href="e1e6fe7910"><code>e1e6fe7</code></a>
chore: update known versions for 0.8.15 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/550">#550</a>)</li>
<li><a
href="b1836110f7"><code>b183611</code></a>
chore(ci): address CI lint findings (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/545">#545</a>)</li>
<li>See full diff in <a
href="557e51de59...b75a909f75">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.6.1&new-version=6.7.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-22 13:54:35 +02:00
Matthew Farrellee
e2e42c8a37
chore: remove duplicate OpenAI and Gemini data validators (#3513)
# What does this PR do?

removes the duplicate OpenAI/GeminiProviderDataValidator

the active ones are in config.pys


## Test Plan

ci
2025-09-22 13:53:17 +02:00
Derek Higgins
0e43be36e1
fix: handle missing API keys gracefully in model refresh (#3493)
- Catch Errors from providers without API keys during model refresh
- Log as warning instead of exception to avoid a scary startup

Closes: #3492

Error message are now warnings instead of several tracebacks
```

INFO     2025-09-19 16:06:55,228 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-09-19 16:06:59,362 llama_stack.providers.utils.inference.openai_mixin:327 providers::utils: Failed to list models for anthropic: API key
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
WARNING  2025-09-19 16:06:59,364 llama_stack.providers.utils.inference.openai_mixin:327 providers::utils: Failed to list models for gemini: API key is
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in 
         the provider config.                                                                                                                         
WARNING  2025-09-19 16:06:59,367 llama_stack.providers.utils.inference.openai_mixin:327 providers::utils: Failed to list models for groq: API key is  
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in   
         the provider config.                                                                                                                         
WARNING  2025-09-19 16:06:59,372 llama_stack.providers.utils.inference.openai_mixin:327 providers::utils: Failed to list models for sambanova: API key
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
INFO     2025-09-19 16:06:59,533 llama_stack.core.utils.config_resolution:45 core: Using file path:                                                   

```

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-09-22 07:31:30 -04:00
Derek Higgins
e3f77c1004
fix: Update inference recorder to handle both Ollama and OpenAI model (#3470)
Some checks failed
Pre-commit / pre-commit (push) Successful in 1m39s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 22s
UI Tests / ui-tests (22) (push) Successful in 57s
- Handle Ollama format where models are nested under
response['body']['models']
- Fall back to OpenAI format where models are directly in
response['body']

Closes: #3457

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-09-21 09:32:39 -04:00
Matthew Farrellee
142a38db8b
chore: remove duplicate AnthropicProviderDataValidator (#3512)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m59s
# What does this PR do?

removes the duplicate AnthropicProviderDataValidator

the active one is in config.py

## Test Plan

ci
2025-09-20 16:09:27 -07:00
ehhuang
f44eb935c4
chore: simplify authorized sqlstore (#3496)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 35s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do?

This PR is generated with AI and reviewed by me.

Refactors the AuthorizedSqlStore class to store the access policy as an
instance variable rather than passing it as a parameter to each method
call. This simplifies the API.

# Test Plan

existing tests
2025-09-19 16:13:56 -07:00
Sébastien Han
d3600b92d1
fix: force milvus-lite installation for inline::milvus (#3488)
# What does this PR do?

pymilvus recently made `milvus-lite` an optional dependency to their
package. If someone wants to use the inline provider we must include the
extra dependency.
For more details see: https://github.com/milvus-io/pymilvus/pull/2976

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-19 16:12:08 -04:00
adam-d-young
9378bdca43
docs: Fix incorrect vector_db_id usage in RAG tutorial (#3444)
Some checks failed
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m58s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
# What does this PR do?
This PR fixes a blocking issue in the detailed RAG tutorial where the
code fails with a 400 Bad Request error.

The root cause is that recent versions of Llama-Stack ignore the
client-generated vector_db_id and assign a new server-side ID. The
tutorial was not updated to reflect this, causing the rag_tool.insert
call to fail.

This change updates the code to capture the authoritative ID from the
.identifier attribute of the register() method's response. This ensures
the tutorial code runs successfully and reflects the current API
behavior.

## Test Plan
The fix can be verified by running the Python code snippet from the
detailed tutorial page.

Run the original code (Before this change):

Result: The script fails with a 400 Bad Request error on the
rag_tool.insert step.

Run the updated code (After this change):

Result: The script runs successfully to completion.

Co-authored-by: Adam Young <adam.young@redhat.com>
2025-09-19 11:41:26 -04:00
ehhuang
4c2fcb6b51
chore: refactor server.main (#3462)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 10s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s
API Conformance Tests / check-schema-compatibility (push) Successful in 22s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?
As shown in #3421, we can scale stack to handle more RPS with k8s
replicas. This PR enables multi process stack with uvicorn --workers so
that we can achieve the same scaling without being in k8s.

To achieve that we refactor main to split out the app construction
logic. This method needs to be non-async. We created a new `Stack` class
to house impls and have a `start()` method to be called in lifespan to
start background tasks instead of starting them in the old
`construct_stack`. This way we avoid having to manage an event loop
manually.


## Test Plan
CI

> uv run --with llama-stack python -m llama_stack.core.server.server
benchmarking/k8s-benchmark/stack_run_config.yaml

works.

> LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv
run uvicorn llama_stack.core.server.server:create_app --port 8321
--workers 4

works.
2025-09-18 21:11:13 -07:00
Charlie Doern
8422bd102a
feat: combine ProviderSpec datatypes (#3378)
Some checks failed
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 36s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
# What does this PR do?

currently `RemoteProviderSpec` has an `AdapterSpec` embedded in it.
Remove `AdapterSpec`, and put its leftover fields into
`RemoteProviderSpec`.

Additionally, many of the fields were duplicated between
`InlineProviderSpec` and `RemoteProviderSpec`. Move these to
`ProviderSpec` so they are shared.

Fixup the distro codegen to use `RemoteProviderSpec` directly rather
than `remote_provider_spec` which took an AdapterSpec and returned a
full provider spec

## Test Plan

existing distro tests should pass.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-18 16:10:00 +02:00
Jiayi Ni
e66103c09d
fix: add missing files provider to NVIDIA distribution (#3479)
# What does this PR do?
The rag-runtime tool requires files API as a dependency, but the NVIDIA
distribution was missing the files provider configuration. Thus, when
running:

```
llama stack build --distro nvidia --image-type venv
```
And then:
```
llama stack run {path_to_distribution_config} --image-type venv
```
It would raise an error:
```
RuntimeError: Failed to resolve 'tool_runtime' provider 'rag-runtime' of type 'inline::rag-runtime': required dependency 'files' is not available. Please add a 'files' provider to your configuration or check if the provider is properly configured.
```

This PR fixes the issue by adding missing files provider to NVIDIA
distribution.

## Test Plan
N/A
2025-09-18 13:49:46 +02:00
Matthew Farrellee
ea396a54cd
chore: update the ollama inference impl to use OpenAIMixin for openai-compat functions (#3395)
# What does this PR do?

update Ollama inference provider to use OpenAIMixin for openai-compat
endpoints

## Test Plan

ci
2025-09-18 13:09:57 +02:00
Matthew Farrellee
521865c388
feat: include all models from provider's /v1/models (#3471)
# What does this PR do?

this replaces the static model listing for any provider using
OpenAIMixin

currently -
 - anthropic
 - azure openai
 - gemini
 - groq
 - llama-api
 - nvidia
 - openai
 - sambanova
 - tgi
 - vertexai
 - vllm
 - not changed: together has its own impl

## Test Plan

 - new unit tests
 - manual for llama-api, openai, groq, gemini

```
for provider in llama-openai-compat openai groq gemini; do
   uv run llama stack build --image-type venv --providers inference=remote::provider --run &
   uv run --with llama-stack-client llama-stack-client models list | grep Total
```

results (17 sep 2025):
 - llama-api: 4
 - openai: 86
 - groq: 21
 - gemini: 66


closes #3467
2025-09-18 05:17:11 -04:00
Akram Ben Aissi
4842145202
feat: Add dynamic authentication token forwarding support for vLLM (#3388)
# What does this PR do?


*Add dynamic authentication token forwarding support for vLLM provider*

This enables per-request authentication tokens for vLLM providers,
supporting use cases like RAG operations where different requests may
need different authentication tokens. The implementation follows the
same pattern as other providers like Together AI, Fireworks, and
Passthrough.

- Add LiteLLMOpenAIMixin that manages the vllm_api_token properly

Usage:

- Static: VLLM_API_TOKEN env var or config.api_token
- Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token
All existing functionality is preserved while adding new dynamic
capabilities.


<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

```
curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \
  -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}'
  
```

---------

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
2025-09-18 11:13:55 +02:00
Doug Edgar
42c23b45f6
feat: update qdrant hash function from SHA-1 to SHA-256 (#3477)
Some checks failed
Installer CI / smoke-test-on-dev (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 2s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m10s
# What does this PR do?
Updates the qdrant provider's convert_id function to use a
FIPS-validated cryptographic hashing function, so that llama-stack is
considered to be `Designed for FIPS`.

The standard library `uuid.uuid5()` function uses SHA-1 under the hood,
which is not FIPS-validated. This commit uses an approach similar to the
one merged in #3423.

Closes #3476.

## Test Plan
Unit tests from scripts/unit-tests.sh were ran to verify that the tests
pass.

A small test script can display the data flow:
```python
import hashlib
import uuid

# Input
_id = "chunk_abc123"
print(_id)

# Step 1: Format and encode
hash_input = f"qdrant_id:{_id}".encode()
print(hash_input)
# Result: b'qdrant_id:chunk_abc123'

# Step 2: SHA-256 hash
sha256_hash = hashlib.sha256(hash_input).hexdigest()
print(sha256_hash)
# Result: "184893a6eafeaac487cb9166351e8625b994d50f3456d8bc6cea32a014a27151"

# Step 3: Create UUID from first 32 chars
uuid_string = str(uuid.UUID(sha256_hash[:32]))
print(uuid_string)
# sha256_hash[:32] = "184893a6eafeaac487cb9166351e8625"
# Final result: "184893a6-eafe-aac4-87cb-9166351e8625"
```

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-09-17 15:10:10 -07:00
Jash Gulabrai
ac1414b571
fix: Set provider_id in NVIDIA notebook when registering dataset (#3472)
# What does this PR do?
When registering a dataset for NVIDIA, the DatasetsRoutingTable expects
`nvidia` to be passed via the `provider_id`
[here](https://github.com/llamastack/llama-stack/blob/main/llama_stack/core/routing_tables/datasets.py#L61).

This PR fixes a notebook to correctly use `provider_id`.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3308

## Test Plan
Manually execute the notebook steps to verify the dataset is registered.

Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-09-17 11:45:15 -07:00
Alexey Rybak
9fe8097ca4
docs: update documentation links (#3459)
# What does this PR do?
* Updates documentation links from readthedocs to llamastack.github.io

## Test Plan
* Manual testing
2025-09-17 10:37:35 -07:00
Francisco Arceo
9acf49753e
fix: Fixing prompts import warning (#3455)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?
Fixes this warning in llama stack build:

```bash
WARNING  2025-09-15 15:29:02,197 llama_stack.core.distribution:149 core: Failed to import module prompts: No module named
         'llama_stack.providers.registry.prompts'"
```

## Test Plan
Test added

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-17 10:24:58 +02:00
Derek Higgins
fad4843548
fix: unbound variable PR_HEAD_REPO (#3469)
Add default value for PR_HEAD_REPO to prevent 'unbound variable' error
when no PR exists for a branch.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-09-17 10:18:43 +02:00
Omar Abdelwahab
e0e2b1bd0e
fix: Added a bug fix when registering new models (#3453)
# What does this PR do?

Modified the code in registry.py.

The key changes are:

1.  Removed the `return False` statement
2. Added a warning log message that includes the object type,
identifier, and provider_id for better debugging.
3. The method now continues with the registration process instead of
early returning.

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-09-16 19:09:06 -07:00
github-actions[bot]
ececc323d3 build: Bump version to 0.2.22
Some checks failed
Pre-commit / pre-commit (push) Successful in 1m14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 31s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
2025-09-16 19:44:03 +00:00
Matthew Farrellee
49d4a5cc84
feat: add embedding and dynamic model support to Together inference adapter (#3458)
# What does this PR do?

adds embedding and dynamic model support to Together inference adapter

 - updated to use OpenAIMixin
 - workarounds for Together api quirks
 - recordings for together suite when subdirs=inference,pattern=openai

## Test Plan

```
$ TOGETHER_API_KEY=_NONE_ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup together --subdirs inference --pattern openai
...

tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:completion:sanity] 
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.121s
PASSED [  2%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:completion:suffix] SKIPPED [  4%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:completion:sanity] PASSED [  6%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-1] SKIPPED [  8%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free] SKIPPED [ 10%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_01] PASSED [ 12%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] PASSED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] SKIPPED [ 17%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 19%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 21%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free] SKIPPED [ 23%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 25%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 27%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 29%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 31%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 34%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 36%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 38%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 40%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 42%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-0] SKIPPED [ 46%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_02] PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] SKIPPED [ 53%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [ 57%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 59%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 61%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 63%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 65%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 68%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 70%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 72%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] PASSED [ 74%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 76%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=together/togethercomputer/m2-bert-80M-32k-retrieval] SKIPPED [ 78%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_01] PASSED [ 80%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] PASSED [ 82%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_01] SKIPPED [ 85%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 87%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-True] PASSED [ 89%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:non_streaming_02] PASSED [ 91%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] PASSED [ 93%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-inference:chat_completion:streaming_02] SKIPPED [ 95%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [ 97%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=together/meta-llama/Llama-3.3-70B-Instruct-Turbo-Free-False] PASSED [100%]

============================================ 30 passed, 17 skipped, 50 deselected, 4 warnings in 21.96s =============================================
```
2025-09-16 11:53:41 -07:00
slekkala1
3defdf7d3a
fix: docker failing to start container[pydantic] (#3460)
# What does this PR do?
Pinning to latest pydantic version 2.11.9 as sometime we are picking
older version and failing to start container in github actions :
1775026312
Closes https://github.com/llamastack/llama-stack/issues/3461

## Test Plan
Tested locally with the following commands to start a container

Build container
`llama stack build --distro starter --image-type container`
start container `docker run -d -p 8321:8321 --name llama-stack-test
distribution-starter:0.2.21`
check health http://localhost:8321/v1/health

Couldnt repro with older version(`2.8.2`), but `2.11.9` pydantic is able
to start the container

https://pypi.org/project/pydantic/#history , 2.11.9 is the latest
version
2025-09-16 11:33:43 -07:00
Charlie Doern
6b855af96f
feat: introduce api leveling proposal (#3317)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 37s
Unit Tests / unit-tests (3.12) (push) Failing after 37s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 2m31s
# What does this PR do?

this document outlines different API stability levels, how to enforce
them, and next steps

## Next Steps

Following the adoption of this document, all existing APIs should follow
the enforcement protocol.

relates to #3237

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-16 18:18:36 +02:00
Sébastien Han
65d45c7318
chore: various watsonx fixes (#3428)
# What does this PR do?

 use a logger
* update the distro to add the Files API otherwise it won't start since
it is a dependency of vector
* clarify project_id and api_key requirements
* disable openai compatible calls since the endpoint returns 404
* disable text_inference structured format tests
* fixed openai client initialization

## Test Plan

Execute text_inference:

```
WATSONX_API_KEY=... WATSONX_PROJECT_ID=... python -m llama_stack.core.server.server llama_stack/distributions/watsonx/run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -vvvv -ra --text-model watsonx/meta-llama/llama-3-3-70b-instruct tests/integration/inference/test_text_inference.py

============================================= test session starts ==============================================
platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 20 items

tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [  5%]
tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:sanity] PASSED [ 10%]
tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] XFAIL [ 15%]
tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] XFAIL [ 20%]
tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] XFAIL [ 25%]
tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:structured_output] SKIPPED structured output) [ 30%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 35%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_01] PASSED [ 40%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 45%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 50%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 55%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling] PASSED [ 60%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:structured_output] SKIPPEDstructured output) [ 65%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling_tools_absent-True] PASSED [ 70%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:text_then_tool] XFAIL [ 75%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 80%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:streaming_02] PASSED [ 85%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_calling_tools_absent-False] PASSED [ 90%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_then_answer] XFAIL [ 95%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:array_parameter] XFAIL [100%]

=========================================== short test summary info ============================================
SKIPPED [2] tests/integration/inference/test_text_inference.py:49: Model watsonx/meta-llama/llama-3-3-70b-instruct hosted by remote::watsonx doesn't support json_schema structured output
XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:stop_sequence] - remote::watsonx doesn't support 'stop' parameter yet
XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] - remote::watsonx doesn't support log probs yet
XFAIL tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:completion:log_probs] - remote::watsonx doesn't support log probs yet
XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:text_then_tool] - Not tested for non-llama4 models yet
XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:tool_then_answer] - Not tested for non-llama4 models yet
XFAIL tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=watsonx/meta-llama/llama-3-3-70b-instruct-inference:chat_completion:array_parameter] - Not tested for non-llama4 models yet
============================ 12 passed, 2 skipped, 6 xfailed, 14 warnings in 36.88s ============================
```

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-16 13:55:10 +02:00
Matthew Farrellee
f4ab154ade
feat: add dynamic model registration support to TGI inference (#3417)
Some checks failed
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 43s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
# What does this PR do?

adds dynamic model support to TGI

add new overwrite_completion_id feature to OpenAIMixin to deal with TGI
always returning id=""

## Test Plan

tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data
ghcr.io/huggingface/text-generation-inference --model-id
Qwen/Qwen3-0.6B`

stack: `TGI_URL=http://localhost:8080 uv run llama stack build
--image-type venv --distro ci-tests --run`

test: `./scripts/integration-tests.sh --stack-config
http://localhost:8321 --setup tgi --subdirs inference --pattern openai`
2025-09-15 15:52:40 -04:00
IAN MILLER
ab321739f2
feat: create HTTP DELETE API endpoints to unregister ScoringFn and Benchmark resources in Llama Stack (#3371)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR provides functionality for users to unregister ScoringFn and
Benchmark resources for `scoring` and `eval` APIs.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3051 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Updated integration and unit tests via CI workflow
2025-09-15 12:43:38 -07:00
Matthew Farrellee
01bdcce4d2
chore(recorder): update mocks to be closer to non-mock environment (#3442)
# What does this PR do?

the @required_args decorator in openai-python is masking the async
nature of the {AsyncCompletions,chat.AsyncCompletions}.create method.
see https://github.com/openai/openai-python/issues/996

this means two things -

 0. we cannot use iscoroutine in the recorder to detect async vs non
 1. our mocks are inappropriately introducing identifiable async

for (0), we update the iscoroutine check w/ detection of /v1/models,
which is the only non-async function we mock & record.

for (1), we could leave everything as is and assume (0) will catch
errors. to be defensive, we update the unit tests to mock below create
methods, allowing the true openai-python create() methods to be tested.
2025-09-15 15:25:53 -04:00
dependabot[bot]
b6cb817897
chore(ui-deps): bump @radix-ui/react-select from 2.2.5 to 2.2.6 in /llama_stack/ui (#3437)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 19s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 1m39s
Bumps [@radix-ui/react-select](https://github.com/radix-ui/primitives)
from 2.2.5 to 2.2.6.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-select&package-manager=npm_and_yarn&previous-version=2.2.5&new-version=2.2.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 09:46:14 +02:00
dependabot[bot]
36fd97e306
chore(ui-deps): bump next from 15.3.3 to 15.5.3 in /llama_stack/ui (#3438)
Bumps [next](https://github.com/vercel/next.js) from 15.3.3 to 15.5.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.3</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: validation return types of pages API routes (<a
href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>)</li>
<li>fix: relative paths in dev in validator.ts (<a
href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>)</li>
<li>fix: remove satisfies keyword from type validation to preserve old
TS compatibility (<a
href="https://redirect.github.com/vercel/next.js/issues/83071">#83071</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a> for helping!</p>
<h2>v15.5.2</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: disable unknownatrules lint rule entirely (<a
href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>)</li>
<li>revert: add ?dpl to fonts in /_next/static/media (<a
href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a> and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v15.5.1</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: aliased navigations should apply scroll handling (<a
href="https://redirect.github.com/vercel/next.js/issues/82900">#82900</a>)</li>
<li>Turbopack: fix invalid NFT entry with file behind symlink (<a
href="https://redirect.github.com/vercel/next.js/issues/82887">#82887</a>)</li>
<li>fix: typesafe linking to route handlers and pages API routes (<a
href="https://redirect.github.com/vercel/next.js/issues/82858">#82858</a>)</li>
<li>fix: change &quot;noUnknownAtRules&quot; to &quot;warn&quot; for
Biome (<a
href="https://redirect.github.com/vercel/next.js/issues/82974">#82974</a>)</li>
<li>fix: add path normalization to getRelativePath for Windows (<a
href="https://redirect.github.com/vercel/next.js/issues/82918">#82918</a>)</li>
<li>feat: add typesafety with config.typedRoutes to redirect() and
permanentRedirect() (<a
href="https://redirect.github.com/vercel/next.js/issues/82860">#82860</a>)</li>
<li>fix: avoid importing types that will be unused (<a
href="https://redirect.github.com/vercel/next.js/issues/82856">#82856</a>)</li>
<li>fix: update the config.api.responseLimit type (<a
href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>)</li>
<li>fix: update validation return types (<a
href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a>, <a
href="https://github.com/mischnic"><code>@​mischnic</code></a>, and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v15.5.1-canary.39</h2>
<h3>Core Changes</h3>
<ul>
<li>[metadata] change the metadata routes params to promises: <a
href="https://redirect.github.com/vercel/next.js/issues/83560">#83560</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="07d1cbc9c6"><code>07d1cbc</code></a>
v15.5.3</li>
<li><a
href="db56d77595"><code>db56d77</code></a>
[backport] fix: validation return types of pages API routes (<a
href="https://redirect.github.com/vercel/next.js/issues/83069">#83069</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83580">#83580</a>)</li>
<li><a
href="7a806231f8"><code>7a80623</code></a>
[backport] fix: relative paths in dev in validator.ts (<a
href="https://redirect.github.com/vercel/next.js/issues/83073">#83073</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83190">#83190</a>)</li>
<li><a
href="fddaeb85a0"><code>fddaeb8</code></a>
[backport] fix: remove <code>satisfies</code> keyword from type
validation to preserve o...</li>
<li><a
href="497ec6aa08"><code>497ec6a</code></a>
v15.5.2</li>
<li><a
href="bc72f41a2e"><code>bc72f41</code></a>
[backport] revert: add ?dpl to fonts in <code>/_next/static/media</code>
(<a
href="https://redirect.github.com/vercel/next.js/issues/83062">#83062</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83066">#83066</a>)</li>
<li><a
href="c8faf6800b"><code>c8faf68</code></a>
[backport] fix: disable unknownatrules lint rule entirely (<a
href="https://redirect.github.com/vercel/next.js/issues/83059">#83059</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83060">#83060</a>)</li>
<li><a
href="cc68ced552"><code>cc68ced</code></a>
v15.5.1</li>
<li><a
href="1ce9857276"><code>1ce9857</code></a>
[backport] fix: update validation return types (<a
href="https://redirect.github.com/vercel/next.js/issues/82854">#82854</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83027">#83027</a>)</li>
<li><a
href="b93c894717"><code>b93c894</code></a>
[backport] fix: update the config.api.responseLimit type (<a
href="https://redirect.github.com/vercel/next.js/issues/82852">#82852</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83028">#83028</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.3.3...v15.5.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.3.3&new-version=15.5.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-15 09:46:05 +02:00
Matthew Farrellee
6787755c0c
chore(recorder): add support for NOT_GIVEN (#3430)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 41s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 1m31s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
# What does this PR do?

the recorder mocks the openai-python interface. the openai-python
interface allows NOT_GIVEN as an input option. this change properly
handles NOT_GIVEN.


## Test Plan

ci (coverage for chat, completions, embeddings)
2025-09-13 11:11:38 -07:00
Matthew Farrellee
8cf2128b40
chore(tests): always show slowest tests (#3431)
# What does this PR do?

help developers identify slow tests by always passing --duration to
pytest


## Test Plan

n/a
2025-09-13 09:28:04 -07:00
Matthew Farrellee
3de9ad0a87
chore(recorder, tests): add test for openai /v1/models (#3426)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do?

- [x] adds a test for the recorder's handling of /v1/models
- [x] adds a fix for /v1/models handling

## Test Plan

ci
2025-09-12 14:59:56 -07:00
Doug Edgar
f67081d2d6
feat: migrate to FIPS-validated cryptographic algorithms (#3423)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do?
Migrates MD5 and SHA-1 hash algorithms to SHA-256.

In particular, replaces:   
   - MD5 in chunk ID generation.
   - MD5 in file verification.
   - SHA-1 in model identifier digests.

And updates all related test expectations.

Original discussion:
https://github.com/llamastack/llama-stack/discussions/3413

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3424.

## Test Plan
Unit tests from scripts/unit-tests.sh were updated to match the new hash
output, and ran to verify the tests pass.

Signed-off-by: Doug Edgar <dedgar@redhat.com>
2025-09-12 11:18:19 +02:00
Akram Ben Aissi
d31e641d69
fix: Improve pre-commit workflow error handling and feedback (#3400)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
fix: Improve pre-commit workflow error handling and feedback

- Add explicit step to check pre-commit results and provide clear error
messages
- Improve verification steps with better error messages and file
listings
- Use GitHub Actions annotations (::error:: and :⚠️:) for better
visibility
- Maintain continue-on-error for pre-commit step but add proper failure
handling

This addresses the issue where pre-commit failures were silent but still
caused workflow failures later, making it difficult to understand what
needed to be fixed.



<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
2025-09-12 11:10:59 +02:00
Charlie Doern
69a52213a1
fix: oasdiff enhancements and stability (#3419)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 19s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 2m17s
# What does this PR do?

only run conformance tests when the spec is changed.

Also, cache oasdiff such that it is not installed every time the test is
run

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-11 13:30:09 -07:00
slekkala1
c7ef1f13df
feat: Add langchain llamastack Integration example notebook (#3314)
# What does this PR do?
The notebook was
reverted(https://github.com/llamastack/llama-stack/pull/3259) as it had
some local paths, I missed correcting. Trying with corrections now


## Test Plan
Ran the Jupyter notebook
2025-09-11 11:10:41 -07:00
Matthew Farrellee
72387b4bd2
chore(unit tests): remove network use, update async test (#3418)
# What does this PR do?

update the async detection test for vllm

- remove a network access from unit tests
- remove direct logging use

the idea behind the test is to mock inference w/ a sleep, initiate
concurrent inference calls, verify the total execution time is close to
the sleep time. in a non-async env the total time would be closer to
sleep * num concurrent calls.


## Test Plan

ci
2025-09-11 11:45:16 -04:00
Matthew Farrellee
8ef1189be7
chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions (#3404)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 31s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?

update vLLM inference provider to use OpenAIMixin for openai-compat
functions

inference recordings from Qwen3-0.6B and vLLM 0.8.3 -
```
docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \
    vllm/vllm-openai:latest \
    --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes
```

## Test Plan

```
./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference
```
2025-09-11 09:04:38 -04:00
Francisco Arceo
d15368a302
chore: Updating documentation, adding exception handling for Vector Stores in RAG Tool, more tests on migration, and migrate off of inference_api for context_retriever for RAG (#3367)
# What does this PR do?

- Updating documentation on migration from RAG Tool to Vector Stores and
Files APIs
- Adding exception handling for Vector Stores in RAG Tool
- Add more tests on migration from RAG Tool to Vector Stores
- Migrate off of inference_api for context_retriever for RAG

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Integration and unit tests added

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-11 14:20:11 +02:00
Sébastien Han
f31bcc11bc
feat: add Azure OpenAI inference provider support (#3396)
# What does this PR do?

Llama-stack now supports a new OpenAI compatible endpoint with Azure
OpenAI. The starter distro has been updated to add the new remote
inference provider.

A few tests have been modified and improved.

## Test Plan

Deploy a model in the Aure portal then:

```
$ AZURE_API_KEY=... AZURE_API_BASE=... uv run llama stack build --image-type venv --providers inference=remote::azure --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model azure/gpt-4.1 tests/integration/inference/test_openai_completion.py
...

Results:

```
============================================= test session starts
============================================== platform darwin -- Python
3.12.8, pytest-8.4.1, pluggy-1.6.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir:
.pytest_cache
metadata: {'Python': '3.12.8', 'Platform':
'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1',
'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1',
'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0',
'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval':
'0.11.0', 'hydra-core': '1.3.2'}} rootdir:
/Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0,
json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1,
nbval-0.11.0, hydra-core-1.3.2 asyncio: mode=Mode.AUTO,
asyncio_default_fixture_loop_scope=None,
asyncio_default_test_loop_scope=function collected 27 items


tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=azure/gpt-5-mini-inference:completion:suffix]
SKIPPED [ 7%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=azure/gpt-5-mini-inference:completion:sanity]
SKIPPED [ 11%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-1]
SKIPPED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=azure/gpt-5-mini]
SKIPPED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 22%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 29%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 33%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-True]
PASSED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=azure/gpt-5-mini]
SKIPPEDed files.) [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=azure/gpt-5-mini-0]
SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=azure/gpt-5-mini-False]
PASSED [ 62%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_01]
PASSED [ 66%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_01]
PASSED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 77%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-True]
PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:non_streaming_02]
PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 88%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=azure/gpt-5-mini-inference:chat_completion:streaming_02]
PASSED [ 92%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=azure/gpt-5-mini-False]
PASSED [100%]

=========================================== short test summary info
============================================ SKIPPED [3]
tests/integration/inference/test_openai_completion.py:63: Model
azure/gpt-5-mini hosted by remote::azure doesn't support OpenAI
completions. SKIPPED [3]
tests/integration/inference/test_openai_completion.py:118: Model
azure/gpt-5-mini hosted by remote::azure doesn't support vllm extra_body
parameters. SKIPPED [1]
tests/integration/inference/test_openai_completion.py:124: Model
azure/gpt-5-mini hosted by remote::azure doesn't support chat completion
calls with base64 encoded files. ================================== 20
passed, 7 skipped, 2 warnings in 51.77s
==================================
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-11 13:48:38 +02:00
Matthew Farrellee
c2d281e01b
chore(replay): improve replay robustness with un-validated construction (#3414)
# What does this PR do?

some providers do not produce spec compliant outputs. when this happens
the replay infra will fail to construct the proper types and will return
a dict to the client. the client likely does not expect a dict.

this was discovered with tgi, which returns finish_reason="" when valid
values are "stop", "length" or "content_filter"

## Test Plan

ci
2025-09-11 13:48:19 +02:00
Sumanth Kamenani
2838d5a20f
fix: AWS Bedrock inference profile ID conversion for region-specific endpoints (#3386)
Fixes #3370

AWS switched to requiring region-prefixed inference profile IDs instead
of foundation model IDs for on-demand throughput. This was causing
ValidationException errors.

Added auto-detection based on boto3 client region to convert model IDs
like meta.llama3-1-70b-instruct-v1:0 to
us.meta.llama3-1-70b-instruct-v1:0 depending on the detected region.

Also handles edge cases like ARNs, case insensitive regions, and None
regions.

Tested with this request.
```json
{
  "model_id": "meta.llama3-1-8b-instruct-v1:0",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "tell me a riddle"
    }
  ],
  "sampling_params": {
     "strategy": {
        "type": "top_p",
        "temperature": 0.7,
        "top_p": 0.9
      },
      "max_tokens": 512
  }
}
```
<img width="1488" height="878" alt="image"
src="https://github.com/user-attachments/assets/0d61beec-3869-4a31-8f37-9f554c280b88"
/>
2025-09-11 11:41:53 +02:00
Sébastien Han
8e05c68d15
chore: remove openai dependency from providers (#3398)
# What does this PR do?

The openai package is already a dependency of the llama-stack project
itself, so let's the project dictate which openai version we need and
avoid potential breakage with unsatisfiable dependency resolution.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-11 10:19:59 +02:00
Ashwin Bharambe
0c7f49490c
fix(inference_store): on duplicate chat completion IDs, replace (#3408)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 30s
UI Tests / ui-tests (22) (push) Successful in 35s
Pre-commit / pre-commit (push) Successful in 1m45s
# What does this PR do?

Duplicate chat completion IDs can be generated during tests especially
if they are replaying recorded responses across different tests. No need
to warn or error under those circumstances. In the wild, this is not
likely to happen at all (no evidence) so we aren't really hiding any
problem.
2025-09-10 14:34:18 -07:00
ehhuang
c04f1c1e8c
chore: move benchmarking related code (#3406)
# What does this PR do?
- moving things and some formatting changes


## Test Plan
2025-09-10 13:19:44 -07:00
ehhuang
d2f88a10fb
chore: telemetry test (#3405)
# What does this PR do?
- removed fixed-duration sleeps

## Test Plan
2025-09-10 13:19:36 -07:00
dependabot[bot]
d4e45cd5f1
chore(ui-deps): bump tailwindcss from 4.1.6 to 4.1.13 in /llama_stack/ui (#3362)
Bumps
[tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss)
from 4.1.6 to 4.1.13.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/releases">tailwindcss's
releases</a>.</em></p>
<blockquote>
<h2>v4.1.13</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>v4.1.12</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
<li>Fix trailing <code>)</code> from interfering with extraction in
Clojure keywords (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Detect classes inside Elixir charlist, word list, and string sigils
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li>
<li>Track source locations through <code>@plugin</code> and
<code>@config</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Allow boolean values of <code>process.env.DEBUG</code> in
<code>@tailwindcss/node</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18485">#18485</a>)</li>
<li>Ignore consecutive semicolons in the CSS parser (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18532">#18532</a>)</li>
<li>Center the dropdown icon added to an input with a paired datalist by
default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18511">#18511</a>)</li>
<li>Extract candidates in Slang templates (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18565">#18565</a>)</li>
<li>Improve error messages when encountering invalid functional utility
names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18568">#18568</a>)</li>
<li>Discard CSS AST objects with <code>false</code> or
<code>undefined</code> properties (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18571">#18571</a>)</li>
<li>Allow users to disable URL rebasing in
<code>@tailwindcss/postcss</code> via <code>transformAssetUrls:
false</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18321">#18321</a>)</li>
<li>Fix false-positive migrations in <code>addEventListener</code> and
JavaScript variable names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18718">#18718</a>)</li>
<li>Fix Standalone CLI showing default Bun help when run via symlink on
Windows (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18723">#18723</a>)</li>
<li>Read from <code>--border-color-*</code> theme keys in
<code>divide-*</code> utilities for backwards compatibility (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18704/">#18704</a>)</li>
<li>Don't scan <code>.hdr</code> and <code>.exr</code> files for classes
by default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18734">#18734</a>)</li>
</ul>
<h2>v4.1.11</h2>
<h3>Fixed</h3>
<ul>
<li>Add heuristic to skip candidate migrations inside
<code>emit(…)</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18330">#18330</a>)</li>
<li>Extract candidates with variants in Clojure/ClojureScript keywords
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18338">#18338</a>)</li>
<li>Document <code>--watch=always</code> in the CLI's usage (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18337">#18337</a>)</li>
<li>Add support for Vite 7 to <code>@tailwindcss/vite</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18384">#18384</a>)</li>
</ul>
<h2>v4.1.10</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md">tailwindcss's
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.13] - 2025-09-03</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>[4.1.12] - 2025-08-13</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
<li>Fix trailing <code>)</code> from interfering with extraction in
Clojure keywords (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Detect classes inside Elixir charlist, word list, and string sigils
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li>
<li>Track source locations through <code>@plugin</code> and
<code>@config</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Allow boolean values of <code>process.env.DEBUG</code> in
<code>@tailwindcss/node</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18485">#18485</a>)</li>
<li>Ignore consecutive semicolons in the CSS parser (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18532">#18532</a>)</li>
<li>Center the dropdown icon added to an input with a paired datalist by
default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18511">#18511</a>)</li>
<li>Extract candidates in Slang templates (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18565">#18565</a>)</li>
<li>Improve error messages when encountering invalid functional utility
names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18568">#18568</a>)</li>
<li>Discard CSS AST objects with <code>false</code> or
<code>undefined</code> properties (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18571">#18571</a>)</li>
<li>Allow users to disable URL rebasing in
<code>@tailwindcss/postcss</code> via <code>transformAssetUrls:
false</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18321">#18321</a>)</li>
<li>Fix false-positive migrations in <code>addEventListener</code> and
JavaScript variable names (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18718">#18718</a>)</li>
<li>Fix Standalone CLI showing default Bun help when run via symlink on
Windows (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18723">#18723</a>)</li>
<li>Read from <code>--border-color-*</code> theme keys in
<code>divide-*</code> utilities for backwards compatibility (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18704/">#18704</a>)</li>
<li>Don't scan <code>.hdr</code> and <code>.exr</code> files for classes
by default (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18734">#18734</a>)</li>
</ul>
<h2>[4.1.11] - 2025-06-26</h2>
<h3>Fixed</h3>
<ul>
<li>Add heuristic to skip candidate migrations inside
<code>emit(…)</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18330">#18330</a>)</li>
<li>Extract candidates with variants in Clojure/ClojureScript keywords
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18338">#18338</a>)</li>
<li>Document <code>--watch=always</code> in the CLI's usage (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18337">#18337</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1334c99db8"><code>1334c99</code></a>
Prepare v4.1.13 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18868">#18868</a>)</li>
<li><a
href="65dc530f05"><code>65dc530</code></a>
Do not allow variants to end with <code>-</code> or <code>_</code> (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18872">#18872</a>)</li>
<li><a
href="54c3f308e9"><code>54c3f30</code></a>
Do not allow variants to start with <code>-</code> (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18867">#18867</a>)</li>
<li><a
href="494051ca08"><code>494051c</code></a>
Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18869">#18869</a>)</li>
<li><a
href="c318329a1e"><code>c318329</code></a>
chore: remove redundant words (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18853">#18853</a>)</li>
<li><a
href="ddc84b079b"><code>ddc84b0</code></a>
update test after prettier change</li>
<li><a
href="f1331a857a"><code>f1331a8</code></a>
run prettier</li>
<li><a
href="e5513b6c75"><code>e5513b6</code></a>
Fix missing code block delimiters in comment blocks (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18837">#18837</a>)</li>
<li><a
href="5e2a160d8b"><code>5e2a160</code></a>
Drop exact duplicate declarations from output CSS within a style rule
(<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18809">#18809</a>)</li>
<li><a
href="b1fb02a2d7"><code>b1fb02a</code></a>
Hide internal fields from completions in <code>matchUtilities</code> (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss/issues/18820">#18820</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tailwindlabs/tailwindcss/commits/v4.1.13/packages/tailwindcss">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tailwindcss&package-manager=npm_and_yarn&previous-version=4.1.6&new-version=4.1.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:18:14 -07:00
dependabot[bot]
438c037b1f
chore(python-deps): bump openai from 1.102.0 to 1.106.1 (#3356)
Bumps [openai](https://github.com/openai/openai-python) from 1.102.0 to
1.106.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/releases">openai's
releases</a>.</em></p>
<blockquote>
<h2>v1.106.1</h2>
<h2>1.106.1 (2025-09-04)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.106.0...v1.106.1">v1.106.0...v1.106.1</a></p>
<h3>Chores</h3>
<ul>
<li><strong>internal:</strong> move mypy configurations to
<code>pyproject.toml</code> file (<a
href="ca413a2774">ca413a2</a>)</li>
</ul>
<h2>v1.106.0</h2>
<h2>1.106.0 (2025-09-04)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.105.0...v1.106.0">v1.105.0...v1.106.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>client:</strong> support callable api_key (<a
href="https://redirect.github.com/openai/openai-python/issues/2588">#2588</a>)
(<a
href="e1bad015b8">e1bad01</a>)</li>
<li>improve future compat with pydantic v3 (<a
href="6645d9317a">6645d93</a>)</li>
</ul>
<h2>v1.105.0</h2>
<h2>1.105.0 (2025-09-03)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.104.2...v1.105.0">v1.104.2...v1.105.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> Add gpt-realtime models (<a
href="8502041480">8502041</a>)</li>
</ul>
<h2>v1.104.2</h2>
<h2>1.104.2 (2025-09-02)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.104.1...v1.104.2">v1.104.1...v1.104.2</a></p>
<h3>Bug Fixes</h3>
<ul>
<li><strong>types:</strong> add aliases back for web search tool types
(<a
href="2521cd8445">2521cd8</a>)</li>
</ul>
<h2>v1.104.1</h2>
<h2>1.104.1 (2025-09-02)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.104.0...v1.104.1">v1.104.0...v1.104.1</a></p>
<h3>Chores</h3>
<ul>
<li><strong>api:</strong> manual updates for ResponseInputAudio (<a
href="0db5061966">0db5061</a>)</li>
</ul>
<h2>v1.104.0</h2>
<h2>1.104.0 (2025-09-02)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.103.0...v1.104.0">v1.103.0...v1.104.0</a></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's
changelog</a>.</em></p>
<blockquote>
<h2>1.106.1 (2025-09-04)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.106.0...v1.106.1">v1.106.0...v1.106.1</a></p>
<h3>Chores</h3>
<ul>
<li><strong>internal:</strong> move mypy configurations to
<code>pyproject.toml</code> file (<a
href="ca413a2774">ca413a2</a>)</li>
</ul>
<h2>1.106.0 (2025-09-04)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.105.0...v1.106.0">v1.105.0...v1.106.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>client:</strong> support callable api_key (<a
href="https://redirect.github.com/openai/openai-python/issues/2588">#2588</a>)
(<a
href="e1bad015b8">e1bad01</a>)</li>
<li>improve future compat with pydantic v3 (<a
href="6645d9317a">6645d93</a>)</li>
</ul>
<h2>1.105.0 (2025-09-03)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.104.2...v1.105.0">v1.104.2...v1.105.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> Add gpt-realtime models (<a
href="8502041480">8502041</a>)</li>
</ul>
<h2>1.104.2 (2025-09-02)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.104.1...v1.104.2">v1.104.1...v1.104.2</a></p>
<h3>Bug Fixes</h3>
<ul>
<li><strong>types:</strong> add aliases back for web search tool types
(<a
href="2521cd8445">2521cd8</a>)</li>
</ul>
<h2>1.104.1 (2025-09-02)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.104.0...v1.104.1">v1.104.0...v1.104.1</a></p>
<h3>Chores</h3>
<ul>
<li><strong>api:</strong> manual updates for ResponseInputAudio (<a
href="0db5061966">0db5061</a>)</li>
</ul>
<h2>1.104.0 (2025-09-02)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.103.0...v1.104.0">v1.103.0...v1.104.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>types:</strong> replace List[str] with SequenceNotStr in
params (<a
href="bc00bda880">bc00bda</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2adf111129"><code>2adf111</code></a>
release: 1.106.1</li>
<li><a
href="c4f9d0b997"><code>c4f9d0b</code></a>
chore(internal): move mypy configurations to <code>pyproject.toml</code>
file</li>
<li><a
href="2de8d7cde5"><code>2de8d7c</code></a>
release: 1.106.0</li>
<li><a
href="2cf4ed5072"><code>2cf4ed5</code></a>
feat: improve future compat with pydantic v3</li>
<li><a
href="25d16be18b"><code>25d16be</code></a>
feat(client): support callable api_key (<a
href="https://redirect.github.com/openai/openai-python/issues/2588">#2588</a>)</li>
<li><a
href="8672413735"><code>8672413</code></a>
release: 1.105.0</li>
<li><a
href="2c60d78b37"><code>2c60d78</code></a>
feat(api): Add gpt-realtime models</li>
<li><a
href="a52463c932"><code>a52463c</code></a>
release: 1.104.2</li>
<li><a
href="5a6931dafd"><code>5a6931d</code></a>
fix(types): add aliases back for web search tool types</li>
<li><a
href="fb152d967e"><code>fb152d9</code></a>
release: 1.104.1</li>
<li>Additional commits viewable in <a
href="https://github.com/openai/openai-python/compare/v1.102.0...v1.106.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.102.0&new-version=1.106.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:17:43 -07:00
dependabot[bot]
369083c069
chore(python-deps): bump locust from 2.39.1 to 2.40.1 (#3358)
Bumps [locust](https://github.com/locustio/locust) from 2.39.1 to
2.40.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/releases">locust's
releases</a>.</em></p>
<blockquote>
<h2>2.40.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Pytest plugin: Delay imports to avoid monkey patching until someone
uses the fixtures by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3204">locustio/locust#3204</a></li>
<li>Move pytest plugin to its own directory, to prevent accidental
import by <a href="https://github.com/cyberw"><code>@​cyberw</code></a>
in <a
href="https://redirect.github.com/locustio/locust/pull/3205">locustio/locust#3205</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.40.0...2.40.1">https://github.com/locustio/locust/compare/2.40.0...2.40.1</a></p>
<h2>2.40.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Refactor FastHttpSession to be more like HttpSession by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3198">locustio/locust#3198</a></li>
<li>Update Dockerfile base to Python 3.13 by <a
href="https://github.com/adaamz"><code>@​adaamz</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3193">locustio/locust#3193</a></li>
<li>Avoid exception in HttpUser if requests has lost track of the
request it made by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3201">locustio/locust#3201</a></li>
<li>Support pytests as locustfiles by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3200">locustio/locust#3200</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/adaamz"><code>@​adaamz</code></a> made
their first contribution in <a
href="https://redirect.github.com/locustio/locust/pull/3193">locustio/locust#3193</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.39.1...2.40.0">https://github.com/locustio/locust/compare/2.39.1...2.40.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/blob/master/CHANGELOG.md">locust's
changelog</a>.</em></p>
<blockquote>
<h1>Detailed changelog</h1>
<p>The most important changes can also be found in <a
href="https://docs.locust.io/en/latest/changelog.html">the
documentation</a>.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5df19da06a"><code>5df19da</code></a>
Merge pull request <a
href="https://redirect.github.com/locustio/locust/issues/3205">#3205</a>
from locustio/move-pytest-plugin-to-own-directory</li>
<li><a
href="d41141bedd"><code>d41141b</code></a>
Move pytest plugin to its own directory, to prevent accidental import of
locu...</li>
<li><a
href="6422848afd"><code>6422848</code></a>
mention that only one locustfile can be distributed</li>
<li><a
href="aa3da739fe"><code>aa3da73</code></a>
Merge pull request <a
href="https://redirect.github.com/locustio/locust/issues/3204">#3204</a>
from locustio/delay-imports-in-pytest-plugin-to-avoi...</li>
<li><a
href="12050dedfd"><code>12050de</code></a>
Pytest plugin: Delay imports to avoid monkey patching until someone
actually ...</li>
<li><a
href="488d1f8491"><code>488d1f8</code></a>
docs</li>
<li><a
href="439b7ab91b"><code>439b7ab</code></a>
docs fix</li>
<li><a
href="fcd76a8ac3"><code>fcd76a8</code></a>
docs: rephrase</li>
<li><a
href="70c7e9b2d8"><code>70c7e9b</code></a>
docs: move pytest further up</li>
<li><a
href="06dbf98013"><code>06dbf98</code></a>
docs: fix link</li>
<li>Additional commits viewable in <a
href="https://github.com/locustio/locust/compare/2.39.1...2.40.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=locust&package-manager=uv&previous-version=2.39.1&new-version=2.40.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:17:28 -07:00
dependabot[bot]
a844c4f6e1
chore(python-deps): bump pytest from 8.4.1 to 8.4.2 (#3359)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.4.1 to
8.4.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytest-dev/pytest/releases">pytest's
releases</a>.</em></p>
<blockquote>
<h2>8.4.2</h2>
<h1>pytest 8.4.2 (2025-09-03)</h1>
<h2>Bug fixes</h2>
<ul>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13478">#13478</a>:
Fixed a crash when using
<code>console_output_style</code>{.interpreted-text
role=&quot;confval&quot;} with <code>times</code> and a module is
skipped.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13530">#13530</a>:
Fixed a crash when using <code>pytest.approx</code>{.interpreted-text
role=&quot;func&quot;} and
<code>decimal.Decimal</code>{.interpreted-text role=&quot;class&quot;}
instances with the <code>decimal.FloatOperation</code>{.interpreted-text
role=&quot;class&quot;} trap set.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13549">#13549</a>:
No longer evaluate type annotations in Python <code>3.14</code> when
inspecting function signatures.</p>
<p>This prevents crashes during module collection when modules do not
explicitly use <code>from __future__ import annotations</code> and
import types for annotations within a <code>if TYPE_CHECKING:</code>
block.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13559">#13559</a>:
Added missing [int]{.title-ref} and [float]{.title-ref} variants to the
[Literal]{.title-ref} type annotation of the [type]{.title-ref}
parameter in <code>pytest.Parser.addini</code>{.interpreted-text
role=&quot;meth&quot;}.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13563">#13563</a>:
<code>pytest.approx</code>{.interpreted-text role=&quot;func&quot;} now
only imports <code>numpy</code> if NumPy is already in
<code>sys.modules</code>. This fixes unconditional import behavior
introduced in [8.4.0]{.title-ref}.</p>
</li>
</ul>
<h2>Improved documentation</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13577">#13577</a>:
Clarify that <code>pytest_generate_tests</code> is discovered in test
modules/classes; other hooks must be in <code>conftest.py</code> or
plugins.</li>
</ul>
<h2>Contributor-facing changes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13480">#13480</a>:
Self-testing: fixed a few test failures when run with
<code>-Wdefault</code> or a similar override.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13547">#13547</a>:
Self-testing: corrected expected message for
<code>test_doctest_unexpected_exception</code> in Python
<code>3.14</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13684">#13684</a>:
Make pytest's own testsuite insensitive to the presence of the
<code>CI</code> environment variable -- by
<code>ogrisel</code>{.interpreted-text role=&quot;user&quot;}.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="bfae4224fd"><code>bfae422</code></a>
Prepare release version 8.4.2</li>
<li><a
href="89905381a1"><code>8990538</code></a>
Fix passenv CI in tox ini and make tests insensitive to the presence of
the C...</li>
<li><a
href="ca676bfe00"><code>ca676bf</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13687">#13687</a>
from pytest-dev/patchback/backports/8.4.x/e63f6e51c...</li>
<li><a
href="975a60a63c"><code>975a60a</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13686">#13686</a>
from pytest-dev/patchback/backports/8.4.x/12bde8af6...</li>
<li><a
href="7723ce84b8"><code>7723ce8</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13683">#13683</a>
from even-even/fix_Exeption_to_Exception_in_errorMe...</li>
<li><a
href="b7f05680d1"><code>b7f0568</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13685">#13685</a>
from CoretexShadow/fix/docs-pytest-generate-tests</li>
<li><a
href="2c94c4a694"><code>2c94c4a</code></a>
add missing colon (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13640">#13640</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13641">#13641</a>)</li>
<li><a
href="c3d7684bc0"><code>c3d7684</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13606">#13606</a>
from pytest-dev/patchback/backports/8.4.x/5f9938563...</li>
<li><a
href="dc6e3be2dd"><code>dc6e3be</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13605">#13605</a>
from The-Compiler/training-update-2025-07</li>
<li><a
href="f87289c36c"><code>f87289c</code></a>
Fix crash with <code>times</code> output style and skipped module (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13573">#13573</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13579">#13579</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pytest-dev/pytest/compare/8.4.1...8.4.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pytest&package-manager=uv&previous-version=8.4.1&new-version=8.4.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-10 13:17:02 -07:00
Alexey Rybak
7394828c7a
docs: horizontal nav bar (#3407)
# What does this PR do?
* Adds a horizontal nav bar for easy access to the API reference and the
Llama Stack Github repo

<img width="2696" height="520" alt="image"
src="https://github.com/user-attachments/assets/82daffe1-c206-4e20-b95b-1e090011eecc"
/>

## Test Plan
* Built the docs and ran the local HTML server to verify changes
2025-09-10 12:43:36 -07:00
ehhuang
e980436a2e
chore: introduce write queue for inference_store (#3383)
# What does this PR do?
Adds a write worker queue for writes to inference store. This avoids
overwhelming request processing with slow inference writes.

## Test Plan

Benchmark:
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
## RPS from 21 -> 57
2025-09-10 11:57:42 -07:00
Derek Higgins
e6edc1f934
fix: unbound variable error in schedule-record-workflow.sh (#3401)
- Initialize INPUTS variable to prevent 'unbound variable' error

Fixes:
./scripts/github/schedule-record-workflow.sh: line 246: INPUTS: unbound
variable │
2025-09-10 11:54:10 -07:00
Francisco Arceo
a6b1588dc6
revert: Fireworks chat completion broken due to telemetry (#3402)
Reverts llamastack/llama-stack#3392
2025-09-10 11:53:38 -07:00
ehhuang
f6bf36343d
chore: logging perf improvments (#3393)
# What does this PR do?
- Use BackgroundLogger when logging metric events.
- Reuse event loop in BackgroundLogger

## Test Plan
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
### RPS from 57 -> 62
2025-09-10 11:52:23 -07:00
slekkala1
935b8e28de
fix: Fireworks chat completion broken due to telemetry (#3392)
# What does this PR do?
Fix fireworks chat completion broken due to telemetry expecting
response.usage
 Closes https://github.com/llamastack/llama-stack/issues/3391

## Test Plan
1. `uv run --with llama-stack llama stack build --distro starter
--image-type venv --run`
Try 

```
curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'
```
```
{"id":"chatcmpl-ee922a08-0df0-4974-b0d3-b322113e8bc0","choices":[{"message":{"role":"assistant","content":"Hello! How can I assist you today?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1757456375,"model":"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct"}%   
```

Without fix fails as mentioned in
https://github.com/llamastack/llama-stack/issues/3391

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-10 08:48:01 -07:00
Sébastien Han
c86e45496e
ci: Re-enable pre-commit to fail (#3399)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 58s
Pre-commit / pre-commit (push) Successful in 1m14s
If pre-commit fails, the workflow must fail.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-09-10 10:00:46 -04:00
Matthew Farrellee
0e27016cf2
chore: update the vertexai inference impl to use openai-python for openai-compat functions (#3377)
# What does this PR do?

update VertexAI inference provider to use openai-python for
openai-compat functions

## Test Plan

```
$ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
...
```

i don't have an account to test this. `get_api_key` may also need to be
updated per
https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-09-10 15:39:29 +02:00
Akram Ben Aissi
c836fa29e3
fix: pre-commit issues: non executable shebang file and removal of @pytest.mark.asyncio decorator (#3397)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Fix pre-commit issues: non executable shebang file, @pytest.mark.asyncio
decorator

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-09-10 15:27:35 +02:00
Akram Ben Aissi
1671431310
fix: Add missing files_api parameter to MemoryToolRuntimeImpl test (#3394)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The test_query_adds_vector_db_id_to_chunk_metadata test was failing
because MemoryToolRuntimeImpl.__init__() now requires a files_api
parameter.

Fixes failing unit tests for Python 3.12 and 3.13.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-09-10 06:55:57 -04:00
Cesare Pompeiano
1c23aeb937
feat: Add vector_db_id to chunk metadata (#3304)
# What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to
trace where retrieved chunks originate from. This PR adds the
`vector_db_id` into each chunk’s metadata, making it easier to
understand which database a given chunk came from. This is helpful for
debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

```python
for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)
```

## Test Plan

* Ran Llama Stack in debug mode.
* Verified that `vector_db_id` was added to each chunk’s metadata.
* Confirmed that the metadata was printed in the console when using the
RAG tool.

---------

Co-authored-by: are-ces <cpompeia@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-10 11:19:21 +02:00
Ashwin Bharambe
81ad240faa fix(k8s): unwedge run.yaml to add files
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m28s
2025-09-09 23:02:26 -07:00
Matthew Farrellee
dd1f946b3e
feat: include a default inference store during llama stack build (#3373)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

enables completions storage when using `llama stack build --providers` -
 - GET /v1/chat/completions
 - GET /v1/chat/completions/{id}

todo: llama stack build and distro codegen should use the same code
paths

## Test Plan

ci
2025-09-09 15:54:58 -07:00
ehhuang
9d3a234bf3
chore: remove unused variable (#3389)
# What does this PR do?


## Test Plan
2025-09-09 15:51:20 -07:00
Ashwin Bharambe
a8aa815b6a
feat(tests): migrate to global "setups" system for test configuration (#3390)
This PR refactors the integration test system to use global "setups"
which provides better separation of concerns:

**suites = what to test, setups = how to configure.**

NOTE: if you naming suggestions, please provide feedback

Changes:
- New `tests/integration/setups.py` with global, reusable configurations
(ollama, vllm, gpt, claude)
- Modified `scripts/integration-tests.sh` options to match with the
underlying pytest options
    - Updated documentation to reflect the new global setup system

The main benefit is that setups can be reused across multiple suites
(e.g., use "gpt" with any suite) even though sometimes they could
specifically tailored for a suite (vision <> ollama-vision). It is now
easier to add new configurations without modifying existing suites.

Usage examples:
    - `pytest tests/integration --suite=responses --setup=gpt`
- `pytest tests/integration --suite=vision` # auto-selects
"ollama-vision" setup
    - `pytest tests/integration --suite=base --setup=vllm`
2025-09-09 15:50:56 -07:00
github-actions[bot]
28696c3f30 build: Bump version to 0.2.21
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
UI Tests / ui-tests (22) (push) Successful in 37s
Test External API and Providers / test-external (venv) (push) Failing after 41s
Pre-commit / pre-commit (push) Successful in 2m0s
2025-09-08 22:30:03 +00:00
Ashwin Bharambe
30468d0c43
fix(deps): bump datasets versions for all providers (#3382)
Not doing so results in errors of the kind you see in:
4989026435
2025-09-08 15:13:42 -07:00
slekkala1
c9268a7a8c
fix: pre-commit failing (#3381)
# What does this PR do?
Fix failing pre-commit,
https://github.com/llamastack/llama-stack/actions/workflows/pre-commit.yml


## Test Plan
CI
2025-09-08 14:46:46 -07:00
Swapna Lekkala
09141361fb fix: use dataset version 4.0.0 or above 2025-09-08 13:22:43 -07:00
Derek Higgins
ef02b9ea10
fix: environment variable typo in inference recorder error message (#3374)
The error message was referencing LLAMA_STACK_INFERENCE_MODE instead of
the correct LLAMA_STACK_TEST_INFERENCE_MODE environment variable.
2025-09-08 17:51:38 +02:00
Francisco Arceo
ad6ea7fb91
feat: Adding OpenAI Prompts API (#3319)
# What does this PR do?
This PR adds support for OpenAI Prompts API.

Note, OpenAI does not explicitly expose the Prompts API but instead
makes it available in the Responses API and in the [Prompts
Dashboard](https://platform.openai.com/docs/guides/prompting#create-a-prompt).

I have added the following APIs:
- CREATE
- GET
- LIST
- UPDATE
- Set Default Version

The Set Default Version API is made available only in the Prompts
Dashboard and configures which prompt version is returned in the GET
(the latest version is the default).

Overall, the expected functionality in Responses will look like this:

```python
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
  prompt={
    "id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e",
    "version": "2",
    "variables": {
        "city": "San Francisco",
        "age": 30,
    }
  }
)
```

### Resolves https://github.com/llamastack/llama-stack/issues/3276


## Test Plan
Unit tests added. Integration tests can be added after client
generation.

## Next Steps
1. Update Responses API to support Prompt API
2. I'll enhance the UI to implement the Prompt Dashboard. 
3. Add cache for lower latency

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-08 11:05:13 -04:00
Mohammad Daoud Farooqi
9618adba89
docs: add MongoDB to external provider list (#3369)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 36s
Test External API and Providers / test-external (venv) (push) Failing after 41s
UI Tests / ui-tests (22) (push) Successful in 1m3s
Pre-commit / pre-commit (push) Successful in 2m10s
The MongoDB integration - Vector search, Full-Text search and Hybrid
search have now been added as an external provider offering for Llama
Stack: https://github.com/mongodb-partners/mongodb-llama-stack
2025-09-08 14:09:13 +02:00
Akram Ben Aissi
072dca0609
feat: Add Kubernetes auth provider to use SelfSubjectReview and kubernetes api server (#2559)
# What does this PR do?
Add Kubernetes authentication provider support
- Add KubernetesAuthProvider class for token validation using Kubernetes
SelfSubjectReview API
- Add KubernetesAuthProviderConfig with configurable API server URL, TLS
settings, and claims mapping
- Implement authentication via POST requests to
/apis/authentication.k8s.io/v1/selfsubjectreviews endpoint
- Add support for parsing Kubernetes SelfSubjectReview response format
to extract user information
- Add KUBERNETES provider type to AuthProviderType enum
- Update create_auth_provider factory function to handle 'kubernetes'
provider type
- Add comprehensive unit tests for KubernetesAuthProvider functionality
- Add documentation with configuration examples and usage instructions

The provider validates tokens by sending SelfSubjectReview requests to
the Kubernetes API server and extracts user information from the
userInfo structure in the response.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
What This Verifies:
Authentication header validation
Token validation with Kubernetes SelfSubjectReview and kubernetes server
API endpoint
Error handling for invalid tokens and HTTP errors
Request payload structure and headers

```
python -m pytest tests/unit/server/test_auth.py -k "kubernetes" -v
```

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
2025-09-08 11:25:10 +02:00
dependabot[bot]
44e1a40595
chore(github-deps): bump actions/checkout from 4.1.7 to 5.0.0 (#3357)
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.7
to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
<li>Prepare release v4.3.0 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/motss"><code>@​motss</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li><a href="https://github.com/mouismail"><code>@​mouismail</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p>
<h2>v4.2.2</h2>
<h2>What's Changed</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4.2.1...v4.2.2">https://github.com/actions/checkout/compare/v4.2.1...v4.2.2</a></p>
<h2>v4.2.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/Jcambass"><code>@​Jcambass</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1919">actions/checkout#1919</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4.2.0...v4.2.1">https://github.com/actions/checkout/compare/v4.2.0...v4.2.1</a></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
<li>README: Suggest <code>user.email</code> to be
<code>41898282+github-actions[bot]@users.noreply.github.com</code> by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1707">actions/checkout#1707</a></li>
</ul>
<h2>v4.1.4</h2>
<ul>
<li>Disable <code>extensions.worktreeConfig</code> when disabling
<code>sparse-checkout</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1692">actions/checkout#1692</a></li>
<li>Add dependabot config by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1688">actions/checkout#1688</a></li>
<li>Bump the minor-actions-dependencies group with 2 updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1693">actions/checkout#1693</a></li>
<li>Bump word-wrap from 1.2.3 to 1.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1643">actions/checkout#1643</a></li>
</ul>
<h2>v4.1.3</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a
href="9f265659d3"><code>9f26565</code></a>
Update actions checkout to use node 24 (<a
href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li><a
href="08eba0b27e"><code>08eba0b</code></a>
Prepare release v4.3.0 (<a
href="https://redirect.github.com/actions/checkout/issues/2237">#2237</a>)</li>
<li><a
href="631c7dc4f8"><code>631c7dc</code></a>
Update package dependencies (<a
href="https://redirect.github.com/actions/checkout/issues/2236">#2236</a>)</li>
<li><a
href="8edcb1bdb4"><code>8edcb1b</code></a>
Update CODEOWNERS for actions (<a
href="https://redirect.github.com/actions/checkout/issues/2224">#2224</a>)</li>
<li><a
href="09d2acae67"><code>09d2aca</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/2194">#2194</a>)</li>
<li><a
href="85e6279cec"><code>85e6279</code></a>
Adjust positioning of user email note and permissions heading (<a
href="https://redirect.github.com/actions/checkout/issues/2044">#2044</a>)</li>
<li><a
href="009b9ae9e4"><code>009b9ae</code></a>
Documentation update - add recommended permissions to Readme (<a
href="https://redirect.github.com/actions/checkout/issues/2043">#2043</a>)</li>
<li><a
href="cbb722410c"><code>cbb7224</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1977">#1977</a>)</li>
<li><a
href="3b9b8c884f"><code>3b9b8c8</code></a>
docs: update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1971">#1971</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/checkout/compare/v4.1.7...08c6903cd8c0fde910a37f88322edcfb5dd907a8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4.1.7&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:07:03 +02:00
dependabot[bot]
d458817af5
chore(github-deps): bump actions/setup-python from 5.6.0 to 6.0.0 (#3354)
Bumps [actions/setup-python](https://github.com/actions/setup-python)
from 5.6.0 to 6.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-python/releases">actions/setup-python's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<h3>Breaking Changes</h3>
<ul>
<li>Upgrade to node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1164">actions/setup-python#1164</a></li>
</ul>
<p>Make sure your runner is on version v2.327.1 or later to ensure
compatibility with this release. <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">See
Release Notes</a></p>
<h3>Enhancements:</h3>
<ul>
<li>Add support for <code>pip-version</code> by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1129">actions/setup-python#1129</a></li>
<li>Enhance reading from .python-version by <a
href="https://github.com/krystof-k"><code>@​krystof-k</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/787">actions/setup-python#787</a></li>
<li>Add version parsing from Pipfile by <a
href="https://github.com/aradkdj"><code>@​aradkdj</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1067">actions/setup-python#1067</a></li>
</ul>
<h3>Bug fixes:</h3>
<ul>
<li>Clarify pythonLocation behaviour for PyPy and GraalPy in environment
variables by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1183">actions/setup-python#1183</a></li>
<li>Change missing cache directory error to warning by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1182">actions/setup-python#1182</a></li>
<li>Add Architecture-Specific PATH Management for Python with --user
Flag on Windows by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1122">actions/setup-python#1122</a></li>
<li>Include python version in PyPy python-version output by <a
href="https://github.com/cdce8p"><code>@​cdce8p</code></a> in <a
href="https://redirect.github.com/actions/setup-python/pull/1110">actions/setup-python#1110</a></li>
<li>Update docs: clarification on pip authentication with setup-python
by <a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1156">actions/setup-python#1156</a></li>
</ul>
<h3>Dependency updates:</h3>
<ul>
<li>Upgrade idna from 2.9 to 3.7 in /<strong>tests</strong>/data by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-python/pull/843">actions/setup-python#843</a></li>
<li>Upgrade form-data to fix critical vulnerabilities <a
href="https://redirect.github.com/actions/setup-python/issues/182">#182</a>
&amp; <a
href="https://redirect.github.com/actions/setup-python/issues/183">#183</a>
by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1163">actions/setup-python#1163</a></li>
<li>Upgrade setuptools to 78.1.1 to fix path traversal vulnerability in
PackageIndex.download by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-python/pull/1165">actions/setup-python#1165</a></li>
<li>Upgrade actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-python/pull/1181">actions/setup-python#1181</a></li>
<li>Upgrade <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-python/pull/1095">actions/setup-python#1095</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/krystof-k"><code>@​krystof-k</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-python/pull/787">actions/setup-python#787</a></li>
<li><a href="https://github.com/cdce8p"><code>@​cdce8p</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/setup-python/pull/1110">actions/setup-python#1110</a></li>
<li><a href="https://github.com/aradkdj"><code>@​aradkdj</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/setup-python/pull/1067">actions/setup-python#1067</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-python/compare/v5...v6.0.0">https://github.com/actions/setup-python/compare/v5...v6.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e797f83bcb"><code>e797f83</code></a>
Upgrade to node 24 (<a
href="https://redirect.github.com/actions/setup-python/issues/1164">#1164</a>)</li>
<li><a
href="3d1e2d2ca0"><code>3d1e2d2</code></a>
Revert &quot;Enhance cache-dependency-path handling to support files
outside the w...</li>
<li><a
href="65b071217a"><code>65b0712</code></a>
Clarify pythonLocation behavior for PyPy and GraalPy in environment
variables...</li>
<li><a
href="5b668cf765"><code>5b668cf</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/actions/setup-python/issues/1181">#1181</a>)</li>
<li><a
href="f62a0e252f"><code>f62a0e2</code></a>
Change missing cache directory error to warning (<a
href="https://redirect.github.com/actions/setup-python/issues/1182">#1182</a>)</li>
<li><a
href="9322b3ca74"><code>9322b3c</code></a>
Upgrade setuptools to 78.1.1 to fix path traversal vulnerability in
PackageIn...</li>
<li><a
href="fbeb884f69"><code>fbeb884</code></a>
Bump form-data to fix critical vulnerabilities <a
href="https://redirect.github.com/actions/setup-python/issues/182">#182</a>
&amp; <a
href="https://redirect.github.com/actions/setup-python/issues/183">#183</a>
(<a
href="https://redirect.github.com/actions/setup-python/issues/1163">#1163</a>)</li>
<li><a
href="03bb6152f4"><code>03bb615</code></a>
Bump idna from 2.9 to 3.7 in /<strong>tests</strong>/data (<a
href="https://redirect.github.com/actions/setup-python/issues/843">#843</a>)</li>
<li><a
href="36da51d563"><code>36da51d</code></a>
Add version parsing from Pipfile (<a
href="https://redirect.github.com/actions/setup-python/issues/1067">#1067</a>)</li>
<li><a
href="3c6f142cc0"><code>3c6f142</code></a>
update documentation (<a
href="https://redirect.github.com/actions/setup-python/issues/1156">#1156</a>)</li>
<li>Additional commits viewable in <a
href="a26af69be9...e797f83bcb">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-python&package-manager=github_actions&previous-version=5.6.0&new-version=6.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:05:34 +02:00
dependabot[bot]
dfa13d68f1
chore(github-deps): bump actions/setup-node from 4.4.0 to 5.0.0 (#3353)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
4.4.0 to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<h3>Breaking Changes</h3>
<ul>
<li>Enhance caching in setup-node with automatic package manager
detection by <a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li>
</ul>
<p>This update, introduces automatic caching when a valid
<code>packageManager</code> field is present in your
<code>package.json</code>. This aims to improve workflow performance and
make dependency management more seamless. To disable this automatic
caching,
set <code>package-manager-cache: false</code></p>
<pre lang="yaml"><code>steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v5
  with:
    package-manager-cache: false
</code></pre>
<ul>
<li>Upgrade action to use node24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li>
</ul>
<p>Make sure your runner is on version v2.327.1 or later to ensure
compatibility with this release. <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">See
Release Notes</a></p>
<h3>Dependency Upgrades</h3>
<ul>
<li>Upgrade <code>@​octokit/request-error</code> and
<code>@​actions/github</code> by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1227">actions/setup-node#1227</a></li>
<li>Upgrade uuid from 9.0.1 to 11.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1273">actions/setup-node#1273</a></li>
<li>Upgrade undici from 5.28.5 to 5.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1295">actions/setup-node#1295</a></li>
<li>Upgrade form-data to bring in fix for critical vulnerability by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-node/pull/1332">actions/setup-node#1332</a></li>
<li>Upgrade actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1345">actions/setup-node#1345</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v5.0.0">https://github.com/actions/setup-node/compare/v4...v5.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a0853c2454"><code>a0853c2</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/actions/setup-node/issues/1345">#1345</a>)</li>
<li><a
href="b7234cc9fe"><code>b7234cc</code></a>
Upgrade action to use node24 (<a
href="https://redirect.github.com/actions/setup-node/issues/1325">#1325</a>)</li>
<li><a
href="d7a11313b5"><code>d7a1131</code></a>
Enhance caching in setup-node with automatic package manager detection
(<a
href="https://redirect.github.com/actions/setup-node/issues/1348">#1348</a>)</li>
<li><a
href="5e2628c959"><code>5e2628c</code></a>
Bumps form-data (<a
href="https://redirect.github.com/actions/setup-node/issues/1332">#1332</a>)</li>
<li><a
href="65beceff8e"><code>65becef</code></a>
Bump undici from 5.28.5 to 5.29.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1295">#1295</a>)</li>
<li><a
href="7e24a656e1"><code>7e24a65</code></a>
Bump uuid from 9.0.1 to 11.1.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1273">#1273</a>)</li>
<li><a
href="08f58d1471"><code>08f58d1</code></a>
Bump <code>@​octokit/request-error</code> and
<code>@​actions/github</code> (<a
href="https://redirect.github.com/actions/setup-node/issues/1227">#1227</a>)</li>
<li>See full diff in <a
href="49933ea528...a0853c2454">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4.4.0&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:05:00 +02:00
dependabot[bot]
58c61d85c8
chore(github-deps): bump actions/stale from 9.1.0 to 10.0.0 (#3352)
Bumps [actions/stale](https://github.com/actions/stale) from 9.1.0 to
10.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/stale/releases">actions/stale's
releases</a>.</em></p>
<blockquote>
<h2>v10.0.0</h2>
<h2>What's Changed</h2>
<h3>Breaking Changes</h3>
<ul>
<li>Upgrade to node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1279">actions/stale#1279</a>
Make sure your runner is on version v2.327.1 or later to ensure
compatibility with this release. <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></li>
</ul>
<h3>Enhancement</h3>
<ul>
<li>Introducing sort-by option by <a
href="https://github.com/suyashgaonkar"><code>@​suyashgaonkar</code></a>
in <a
href="https://redirect.github.com/actions/stale/pull/1254">actions/stale#1254</a></li>
</ul>
<h3>Dependency Upgrades</h3>
<ul>
<li>Upgrade actions/publish-immutable-action from 0.0.3 to 0.0.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/stale/pull/1186">actions/stale#1186</a></li>
<li>Upgrade undici from 5.28.4 to 5.28.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/stale/pull/1201">actions/stale#1201</a></li>
<li>Upgrade <code>@​action/cache</code> from 4.0.0 to 4.0.2 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/stale/pull/1226">actions/stale#1226</a></li>
<li>Upgrade <code>@​action/cache</code> from 4.0.2 to 4.0.3 by <a
href="https://github.com/suyashgaonkar"><code>@​suyashgaonkar</code></a>
in <a
href="https://redirect.github.com/actions/stale/pull/1233">actions/stale#1233</a></li>
<li>Upgrade undici from 5.28.5 to 5.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/stale/pull/1251">actions/stale#1251</a></li>
<li>Upgrade form-data to bring in fix for critical vulnerability by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/stale/pull/1277">actions/stale#1277</a></li>
</ul>
<h3>Documentation changes</h3>
<ul>
<li>Changelog update for recent releases by <a
href="https://github.com/suyashgaonkar"><code>@​suyashgaonkar</code></a>
in <a
href="https://redirect.github.com/actions/stale/pull/1224">actions/stale#1224</a></li>
<li>Permissions update in Readme by <a
href="https://github.com/ghadimir"><code>@​ghadimir</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/1248">actions/stale#1248</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/suyashgaonkar"><code>@​suyashgaonkar</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1224">actions/stale#1224</a></li>
<li><a href="https://github.com/GhadimiR"><code>@​GhadimiR</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1248">actions/stale#1248</a></li>
<li><a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1277">actions/stale#1277</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1279">actions/stale#1279</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/stale/compare/v9...v10.0.0">https://github.com/actions/stale/compare/v9...v10.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3a9db7e6a4"><code>3a9db7e</code></a>
Upgrade to node 24 (<a
href="https://redirect.github.com/actions/stale/issues/1279">#1279</a>)</li>
<li><a
href="8f717f0dfc"><code>8f717f0</code></a>
Bumps form-data (<a
href="https://redirect.github.com/actions/stale/issues/1277">#1277</a>)</li>
<li><a
href="a92fd57ffe"><code>a92fd57</code></a>
build(deps): bump undici from 5.28.5 to 5.29.0 (<a
href="https://redirect.github.com/actions/stale/issues/1251">#1251</a>)</li>
<li><a
href="128b2c81d0"><code>128b2c8</code></a>
Introducing sort-by option (<a
href="https://redirect.github.com/actions/stale/issues/1254">#1254</a>)</li>
<li><a
href="f78de9780e"><code>f78de97</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/stale/issues/1248">#1248</a>)</li>
<li><a
href="816d9db1ab"><code>816d9db</code></a>
Upgrade <code>@​action/cache</code> from 4.0.2 to 4.0.3 (<a
href="https://redirect.github.com/actions/stale/issues/1233">#1233</a>)</li>
<li><a
href="ba23c1cb02"><code>ba23c1c</code></a>
upgrade actions/cache from 4.0.0 to 4.0.2 (<a
href="https://redirect.github.com/actions/stale/issues/1226">#1226</a>)</li>
<li><a
href="a65e88a9b9"><code>a65e88a</code></a>
build(deps): bump undici from 5.28.4 to 5.28.5 (<a
href="https://redirect.github.com/actions/stale/issues/1201">#1201</a>)</li>
<li><a
href="d4df79c591"><code>d4df79c</code></a>
Updates to CHANGELOG.MD for recent releases (<a
href="https://redirect.github.com/actions/stale/issues/1224">#1224</a>)</li>
<li><a
href="ee7ef89499"><code>ee7ef89</code></a>
build(deps): bump actions/publish-immutable-action from 0.0.3 to 0.0.4
(<a
href="https://redirect.github.com/actions/stale/issues/1186">#1186</a>)</li>
<li>See full diff in <a
href="5bef64f19d...3a9db7e6a4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/stale&package-manager=github_actions&previous-version=9.1.0&new-version=10.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:04:41 +02:00
Yuan Tang
2f91344c1f
docs: Update changelog (#3343)
This updates the changelog doc to include the latest updates.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-09-08 10:01:41 +02:00
dependabot[bot]
51012a82a3
chore(github-deps): bump astral-sh/setup-uv from 6.6.0 to 6.6.1 (#3355)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.6.0 to 6.6.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.6.1 🌈 Fix exclusions in cache-dependency-glob</h2>
<h2>Changes</h2>
<p>Exclusions with a leading <code>!</code> in the <a
href="https://github.com/astral-sh/setup-uv?tab=readme-ov-file#cache-dependency-glob">cache-dependency-glob</a>
did not work and got fixed with this release. Thank you <a
href="https://github.com/KnisterPeter"><code>@​KnisterPeter</code></a>
for raising this!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Fix exclusions in cache-dependency-glob <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/546">#546</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/547">#547</a>)</li>
<li>chore: update known versions for 0.8.14 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/543">#543</a>)</li>
<li>chore: update known versions for 0.8.13 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/536">#536</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="557e51de59"><code>557e51d</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/547">#547</a>)</li>
<li><a
href="1b46e13ec8"><code>1b46e13</code></a>
Fix exclusions in cache-dependency-glob (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/546">#546</a>)</li>
<li><a
href="26cf676705"><code>26cf676</code></a>
chore: update known versions for 0.8.14 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/543">#543</a>)</li>
<li><a
href="4e1e303f7d"><code>4e1e303</code></a>
chore: update known versions for 0.8.13 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/536">#536</a>)</li>
<li>See full diff in <a
href="4959332f0f...557e51de59">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.6.0&new-version=6.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 10:00:41 +02:00
dependabot[bot]
e1b81ce1fc
chore(ui-deps): bump @radix-ui/react-dropdown-menu from 2.1.14 to 2.1.16 in /llama_stack/ui (#3361)
Bumps
[@radix-ui/react-dropdown-menu](https://github.com/radix-ui/primitives)
from 2.1.14 to 2.1.16.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-dropdown-menu&package-manager=npm_and_yarn&previous-version=2.1.14&new-version=2.1.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:59:44 +02:00
dependabot[bot]
e508aef320
chore(ui-deps): bump lucide-react from 0.510.0 to 0.542.0 in /llama_stack/ui (#3363)
Bumps
[lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react)
from 0.510.0 to 0.542.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/lucide-icons/lucide/releases">lucide-react's
releases</a>.</em></p>
<blockquote>
<h2>Version 0.542.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(docs): add MDN Web Docs &amp; Nuxt to showcase by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3590">lucide-icons/lucide#3590</a></li>
<li>feat(icons): added <code>list-chevrons-down-up</code> icon by <a
href="https://github.com/juliankellydesign"><code>@​juliankellydesign</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3492">lucide-icons/lucide#3492</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/juliankellydesign"><code>@​juliankellydesign</code></a>
made their first contribution in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3492">lucide-icons/lucide#3492</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.541.0...0.542.0">https://github.com/lucide-icons/lucide/compare/0.541.0...0.542.0</a></p>
<h2>Version 0.541.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(packages/lucide): added support for providing a custom root
element by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3543">lucide-icons/lucide#3543</a></li>
<li>fix(icons): optimized <code>chrome</code> icon &amp; renamed to
<code>chromium</code> by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3572">lucide-icons/lucide#3572</a></li>
<li>fix(icons): changed <code>wallpaper</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3566">lucide-icons/lucide#3566</a></li>
<li>fix(icons): optimized <code>cog</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3548">lucide-icons/lucide#3548</a></li>
<li>fix(icons): changed <code>building</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3510">lucide-icons/lucide#3510</a></li>
<li>feat(dpi-preview): add previous version for easier comparison by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3532">lucide-icons/lucide#3532</a></li>
<li>feat(icons): added 'panel-dashed' variants + update tags on existing
icons by <a
href="https://github.com/irvineacosta"><code>@​irvineacosta</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3500">lucide-icons/lucide#3500</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.540.0...0.541.0">https://github.com/lucide-icons/lucide/compare/0.540.0...0.541.0</a></p>
<h2>Version 0.540.0</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(license): add full text of Feather license by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3530">lucide-icons/lucide#3530</a></li>
<li>fix(icons): changed <code>umbrella</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3490">lucide-icons/lucide#3490</a></li>
<li>docs(site): added official statement on brand logos in Lucide by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3541">lucide-icons/lucide#3541</a></li>
<li>fix(icons): changed <code>camera</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3539">lucide-icons/lucide#3539</a></li>
<li>feat(icons): added <code>rose</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/1972">lucide-icons/lucide#1972</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.539.0...0.540.0">https://github.com/lucide-icons/lucide/compare/0.539.0...0.540.0</a></p>
<h2>Version 0.539.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(icons): added <code>brick-wall-shield</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3476">lucide-icons/lucide#3476</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.538.0...0.539.0">https://github.com/lucide-icons/lucide/compare/0.538.0...0.539.0</a></p>
<h2>Version 0.538.0</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(icons): changed <code>apple</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3505">lucide-icons/lucide#3505</a></li>
<li>fix(icons): changed <code>store</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3501">lucide-icons/lucide#3501</a></li>
<li>fix(icons): changed <code>mic-off</code> icon by <a
href="https://github.com/lieonlion"><code>@​lieonlion</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2823">lucide-icons/lucide#2823</a></li>
<li>chore(deps): bump astro from 5.5.2 to 5.12.8 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3523">lucide-icons/lucide#3523</a></li>
<li>fix(icons): deprecate rail-symbol by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2862">lucide-icons/lucide#2862</a></li>
<li>feat(icons): added <code>kayak</code> icon by <a
href="https://github.com/jpjacobpadilla"><code>@​jpjacobpadilla</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3054">lucide-icons/lucide#3054</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e71198d9b3"><code>e71198d</code></a>
chore: icon alias improvements (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/2861">#2861</a>)</li>
<li><a
href="3e644fda2d"><code>3e644fd</code></a>
chore(scripts): Refactor scripts to typescript (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3316">#3316</a>)</li>
<li><a
href="19fa01b5fc"><code>19fa01b</code></a>
build(deps-dev): bump vite from 6.3.2 to 6.3.4 (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3181">#3181</a>)</li>
<li>See full diff in <a
href="https://github.com/lucide-icons/lucide/commits/0.542.0/packages/lucide-react">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lucide-react&package-manager=npm_and_yarn&previous-version=0.510.0&new-version=0.542.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:59:24 +02:00
dependabot[bot]
91c7c4570e
chore(ui-deps): bump sonner from 2.0.6 to 2.0.7 in /llama_stack/ui (#3364)
Bumps [sonner](https://github.com/emilkowalski/sonner) from 2.0.6 to
2.0.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/emilkowalski/sonner/releases">sonner's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.7</h2>
<p>Sonner now supports multiple <code>&lt;Toaster /&gt;</code>
components, see more <a
href="https://sonner.emilkowal.ski/toaster#multiple-toasters">here</a>.</p>
<h2>What's Changed</h2>
<ul>
<li>feat: add testId prop for individual toast components by <a
href="https://github.com/b-like-bahar"><code>@​b-like-bahar</code></a>
in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/660">emilkowalski/sonner#660</a></li>
<li>feat(toaster): add support for multiple toasters with unique
identifiers by <a
href="https://github.com/taroj1205"><code>@​taroj1205</code></a> in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/665">emilkowalski/sonner#665</a></li>
<li>fix: tests by <a
href="https://github.com/emilkowalski"><code>@​emilkowalski</code></a>
in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/677">emilkowalski/sonner#677</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/b-like-bahar"><code>@​b-like-bahar</code></a>
made their first contribution in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/660">emilkowalski/sonner#660</a></li>
<li><a href="https://github.com/taroj1205"><code>@​taroj1205</code></a>
made their first contribution in <a
href="https://redirect.github.com/emilkowalski/sonner/pull/665">emilkowalski/sonner#665</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7">https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3ba7aa17ab"><code>3ba7aa1</code></a>
v2.0.7</li>
<li><a
href="0604827063"><code>0604827</code></a>
fix: tests (<a
href="https://redirect.github.com/emilkowalski/sonner/issues/677">#677</a>)</li>
<li><a
href="c50fe92dfb"><code>c50fe92</code></a>
fix tests</li>
<li><a
href="0600a5cb40"><code>0600a5c</code></a>
feat(toaster): add support for multiple toasters with unique identifiers
(<a
href="https://redirect.github.com/emilkowalski/sonner/issues/665">#665</a>)</li>
<li><a
href="c14bf44a03"><code>c14bf44</code></a>
feat: add testId prop for individual toast components (<a
href="https://redirect.github.com/emilkowalski/sonner/issues/660">#660</a>)</li>
<li>See full diff in <a
href="https://github.com/emilkowalski/sonner/compare/v2.0.6...v2.0.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sonner&package-manager=npm_and_yarn&previous-version=2.0.6&new-version=2.0.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:59:02 +02:00
dependabot[bot]
fe134d90e5
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3360)
Bumps
[react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom)
and
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom).
These dependencies needed to be updated together.
Updates `react-dom` from 19.1.0 to 19.1.1
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/releases">react-dom's
releases</a>.</em></p>
<blockquote>
<h2>19.1.1 (July 28, 2025)</h2>
<h3>React</h3>
<ul>
<li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a
href="https://redirect.github.com/facebook/react/pull/33680">#33680</a>
by <a href="https://github.com/hoxyq"><code>@​hoxyq</code></a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's
changelog</a>.</em></p>
<blockquote>
<h2>19.1.1 (July 28, 2025)</h2>
<h3>React</h3>
<ul>
<li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a
href="https://redirect.github.com/facebook/react/pull/33680">#33680</a>
by <a href="https://github.com/hoxyq"><code>@​hoxyq</code></a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="87e33ca2b7"><code>87e33ca</code></a>
Set release versions to 19.1.1</li>
<li><a
href="b793948e15"><code>b793948</code></a>
Bump next prerelease version numbers (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32782">#32782</a>)</li>
<li>See full diff in <a
href="https://github.com/facebook/react/commits/v19.1.1/packages/react-dom">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/react-dom` from 19.1.5 to 19.1.9
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:58:45 +02:00
Matthew Farrellee
6a35bd7bb6
chore: update the anthropic inference impl to use openai-python for openai-compat functions (#3366)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do?

update the Anthropic inference provider to use openai-python for the
openai-compat endpoints

## Test Plan

ci

Co-authored-by: raghotham <rsm@meta.com>
2025-09-07 14:00:42 -07:00
Matthew Farrellee
78cab5331a
chore(groq test): skip completions tests for groq, api is not supported server-side (#3347)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 1m16s
# What does this PR do?

skip /v1/completions tests on groq, endpoint is not supported

Co-authored-by: raghotham <rsm@meta.com>
2025-09-06 16:21:55 -07:00
Matthew Farrellee
d23607483f
chore: update the groq inference impl to use openai-python for openai-compat functions (#3348)
# What does this PR do?

update Groq inference provider to use OpenAIMixin for openai-compat
endpoints

changes on api.groq.com -
- json_schema is now supported for specific models, see
https://console.groq.com/docs/structured-outputs#supported-models
- response_format with streaming is now supported for models that
support response_format
- groq no longer returns a 400 error if tools are provided and
tool_choice is not "required"


## Test Plan

```
$ GROQ_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::groq --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model groq/llama-3.3-70b-versatile tests/integration/inference/test_openai_completion.py -k 'not store'
...
SKIPPED [3] tests/integration/inference/test_openai_completion.py:44: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support OpenAI completions.
SKIPPED [3] tests/integration/inference/test_openai_completion.py:94: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:73: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py💯 Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support chat completion calls with base64 encoded files.
======================= 8 passed, 11 skipped, 8 deselected, 2 warnings in 5.13s ========================
```

---------

Co-authored-by: raghotham <rsm@meta.com>
2025-09-06 15:36:27 -07:00
Charlie Doern
ecd9d8dc1a
test: introduce api conformance test (#3257)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m48s
# What does this PR do?

this test runs on each PR and uses a new conformance workflow to compare
the base (main) branch openapi spec to the one on this PR if one of our
"stable" APIs change, the test will fail.

this workflow uses `oasdiff` to identify breaking changes for paths we
want to ensure comptability for.

specifically this is using `oasdiff breaking` with `--match-path` which
only checks breaking changes for the specified paths.

As a follow up to this, we can add an optional way to make it so that it
is OK to make these change if properly documented or in a changelog or
something. or by using a label on the PR to override the failing test.

related to #3237


## Test Plan

conformance test should pass given there are no changes

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-06 12:40:33 -07:00
Matthew Farrellee
9252d9fc01
chore(groq test): skip with_n tests for groq, it is not supported server-side (#3346)
# What does this PR do?

skip the with_n test for groq, because it isn't supported by the
provider's service

see
https://console.groq.com/docs/openai#currently-unsupported-openai-features

Co-authored-by: raghotham <rsm@meta.com>
2025-09-06 12:35:30 -07:00
Matthew Farrellee
bf02cd846f
chore: update the sambanova inference impl to use openai-python for openai-compat functions (#3345)
# What does this PR do?

update SambaNova inference provider to use OpenAIMixin for openai-compat
endpoints

## Test Plan

```
$ SAMBANOVA_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::sambanova --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model sambanova/Meta-Llama-3.3-70B-Instruct tests/integration/inference -k 'not store'
...
FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=sambanova/Meta-Llama-3.3-70B-Instruct-inference:chat_completion:tool_calling_tools_absent-True] - AttributeError: 'NoneType' object has no attribute 'delta'
FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=sambanova/Meta-Llama-3.3-70B-Instruct-inference:chat_completion:tool_calling_tools_absent-False] - llama_stack_client.InternalServerError: Error code: 500 - {'detail': 'Internal server error: An une...
=========== 2 failed, 16 passed, 68 skipped, 8 deselected, 3 xfailed, 13 warnings in 15.85s ============
```

the two failures also exist before this change. they are part of the
deprecated inference.chat_completion tests that flow through litellm.
they can be resolved later.
2025-09-06 12:25:13 -07:00
Matthew Farrellee
4c28544c04
chore(gemini, tests): add skips for n and completions, gemini api does not support them (#3350)
# What does this PR do?

the gemini api endpoints do not support the n param or completions


## Test Plan

ci
2025-09-06 12:22:44 -07:00
Matthew Farrellee
d6c3b36390
chore: update the gemini inference impl to use openai-python for openai-compat functions (#3351)
# What does this PR do?

update the Gemini inference provider to use openai-python for the
openai-compat endpoints

partially addresses #3349, does not address /inference/completion or
/inference/chat-completion

## Test Plan

ci
2025-09-06 12:22:20 -07:00
Francisco Arceo
7cd1c2c238
feat: Updating Rag Tool to use Files API and Vector Stores API (#3344)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (push) Failing after 23s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m32s
2025-09-06 07:26:34 -06:00
Ashwin Bharambe
47b640370e
feat(tests): introduce a test "suite" concept to encompass dirs, options (#3339)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m15s
Our integration tests need to be 'grouped' because each group often
needs a specific set of models it works with. We separated vision tests
due to this, and we have a separate set of tests which test "Responses"
API.

This PR makes this system a bit more official so it is very easy to
target these groups and apply all testing infrastructure towards all the
groups (for example, record-replay) uniformly.

There are three suites declared:
- base
- vision
- responses

Note that our CI currently runs the "base" and "vision" suites.

You can use the `--suite` option when running pytest (or any of the
testing scripts or workflows.) For example:
```
OLLAMA_URL=http://localhost:11434 \
  pytest -s -v tests/integration/ --stack-config starter --suite vision
```
2025-09-05 13:58:49 -07:00
Matthew Farrellee
0c2757a05b
chore(sambanova test): skip with_n tests for sambanova, it is not implemented server-side (#3342)
# What does this PR do?

skip a test that cannot pass for sambanova

see
https://docs-legacy.sambanova.ai/sambastudio/latest/open-ai-api.html\#_example_requests_using_openai_client

## Test Plan

ci
2025-09-05 12:00:09 -07:00
Matthew Farrellee
df1526991f
feat(batches, completions): add /v1/completions support to /v1/batches (#3309)
# What does this PR do?

add support for /v1/completions to the /v1/batches api


## Test Plan

ci
2025-09-05 11:59:57 -07:00
Francisco Arceo
e2fe39aee1
feat!: Migrate Vector DB IDs to Vector Store IDs (breaking change) (#3253)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 35s
Pre-commit / pre-commit (push) Successful in 1m15s
# What does this PR do?
This change migrates the VectorDB id generation to Vector Stores.

This is a breaking change for **_some users_** that may have application
code using the `vector_db_id` parameter in the request of the VectorDB
protocol instead of the `VectorDB.identifier` in the response.

By default we will now create a Vector Store every time we register a
VectorDB. The caveat with this approach is that this maps the
`vector_db_id` → `vector_store.name`. This is a reasonable tradeoff to
transition users towards OpenAI Vector Stores.

As an added benefit, registering VectorDBs will result in them appearing
in the VectorStores admin UI.

### Why?
This PR makes the `POST` API call to `/v1/vector-dbs` swap the
`vector_db_id` parameter in the **request body** into the VectorStore's
name field and sets the `vector_db_id` to the generated vector store id
(e.g., `vs_038247dd-4bbb-4dbb-a6be-d5ecfd46cfdb`).

That means that users would have to do something like follows in their
application code:

```python
res = client.vector_dbs.register(
    vector_db_id='my-vector-db-id', 
    embedding_model='ollama/all-minilm:l6-v2', 
    embedding_dimension=384,
)
vector_db_id = res.identifier
```

And then the rest of their code would behave, including `VectorIO`'s
insert protocol using `vector_db_id` in the request.

An alternative implementation would be to just delete the `vector_db_id`
parameter in `VectorDB` but the end result would still require users
having to write `vector_db_id = res.identifier` since
`VectorStores.create()` generates the ID for you.

So this approach felt the easiest way to migrate users towards
VectorStores (subsequent PRs will be added to trigger `files.create()`
and `vector_stores.files.create()`).

## Test Plan
Unit tests and integration tests have been added.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-05 15:40:34 +02:00
Derek Higgins
64b2977162
fix: Fix locations of distrubution runtime directories (#3336)
The defaults were mixed up

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-09-05 14:09:36 +02:00
Sumanth Kamenani
0b00c68d59
fix: use lambda pattern for bedrock config env vars (#3307)
# What does this PR do?

Improved bedrock provider config to read from environment variables like
AWS_ACCESS_KEY_ID. Updated all
fields to use default_factory with lambda patterns like the nvidia
provider does.

  Now the environment variables work as documented.

  Closes #3305

  ## Test Plan

  Ran the new bedrock config tests:
  ```bash
python -m pytest tests/unit/providers/inference/bedrock/test_config.py
-v

Verified existing provider tests still work:
  python -m pytest tests/unit/providers/test_configs.py -v
2025-09-05 10:45:11 +02:00
ehhuang
3a7ac4227d
chore: unbreak inference store test (#3340)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 1m21s
Pre-commit / pre-commit (push) Successful in 2m27s
# What does this PR do?
The inference store writes were moved to asyncio.create_task and not
await anymore

## Test Plan

❯ OLLAMA_URL=http://localhost:11434 LLAMA_STACK_CONFIG=server:starter uv
run --with pytest-repeat pytest tests/integration/inference
--text-model="ollama/llama3.2:3b-instruct-fp16" -vvs -k
"test_inference_store_tool_calls and 3b-instruct-fp16-True" --count=10
Uninstalled 2 packages in 102ms
Installed 2 packages in 138ms
INFO 2025-09-04 14:10:17,775 tests.integration.conftest:66 tests:
Setting DISABLE_CODE_SANDBOX=1 for macOS

==========================================================================================================
test session starts
===========================================================================================================
platform darwin -- Python 3.12.3, pytest-8.4.1, pluggy-1.6.0 --
/Users/erichuang/.cache/uv/builds-v0/.tmpSGMlgt/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.3', 'Platform':
'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1',
'pluggy': '1.6.0'}, 'Plugins': {'repeat': '0.9.4', 'anyio': '4.9.0',
'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report':
'1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1',
'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-git
configfile: pyproject.toml
plugins: repeat-0.9.4, anyio-4.9.0, html-4.1.1, socket-0.7.0,
asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1,
cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None,
asyncio_default_test_loop_scope=function
collected 970 items / 950 deselected / 20 selected


tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-1-10]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Waiting for server at http://localhost:8321... (15.2s elapsed)
Waiting for server at http://localhost:8321... (15.7s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 20.583s
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-2-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-3-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-4-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-5-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-6-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-7-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-8-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-9-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=ollama/llama3.2:3b-instruct-fp16-True-10-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-1-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-2-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-3-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-4-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-5-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-6-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-7-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-8-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-9-10]
PASSED

tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=ollama/llama3.2:3b-instruct-fp16-True-10-10]
PASSEDTerminating llama stack server process...
Terminating process 53307 and its group...
Server process and children terminated gracefully
2025-09-04 15:13:31 -07:00
Sumanth Kamenani
55a8c5f439
fix: show descriptive MCP server connection errors instead of generic 500s (#3256)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 1m20s
Pre-commit / pre-commit (push) Successful in 2m37s
What does this PR do?

Fixes error handling when MCP server connections fail. Instead of
returning generic 500 errors, now provides
   descriptive error messages with proper HTTP status codes.

  Closes #3107

  Test Plan

  Before fix:
curl -X GET
"http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server"
Returns: {"detail": "Internal server error: An unexpected error
occurred."} (500)

  After fix:
curl -X GET
"http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server"
Returns: {"error": {"detail": "Failed to connect to MCP server at
http://localhost:9999/sse: Connection
  refused"}} (502)

  Tests:
  - Added unit test for ConnectionError → 502 translation
  - Manually tested with unreachable MCP servers (connection refused)
2025-09-04 13:25:02 -07:00
slekkala1
561d2fc6b8
fix: Move to older version for docker container failure [fireworks-ai] (#3338)
# What does this PR do?
Noticed the test
https://github.com/llamastack/llama-stack-ops/actions/workflows/test-maybe-cut.yaml
are still failing randomly.

Earlier fixed this with 0.18.0 of fireworks here
https://github.com/llamastack/llama-stack/pull/3267, the local testing
may have inadvertently picked a lower version with `<=` which I assumed
picks latest version.
Now tested with `==` to find the version where it broke and pinning to
version(`<=`) where it was passing.


## Test Plan
Tested locally with the following commands to start a container

Build container
`llama stack build --distro starter --image-type container`
start container `docker run -d -p 8321:8321 --name llama-stack-test
distribution-starter:0.2.20`
check health `http://localhost:8321/v1/health`
Above steps fails without the fix

Tested with `==` to ensure the same version is picked in local testing
instead of anything lower.

Following here for the fix from `fireworks-ai`
1410674695

https://github.com/llamastack/llama-stack/issues/3273
2025-09-04 11:47:46 -07:00
ehhuang
bcc7f2c7d0
chore: async inference store write (#3318)
# What does this PR do?


## Test Plan
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 30 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
Before:

============================================================
BENCHMARK RESULTS
============================================================
Total time: 30.00s
Concurrent users: 50
Total requests: 1267
Successful requests: 1267
Failed requests: 0
Success rate: 100.0%
Requests per second: 42.23


After:

============================================================
BENCHMARK RESULTS
============================================================
Total time: 30.00s
Concurrent users: 50
Total requests: 1449
Successful requests: 1449
Failed requests: 0
Success rate: 100.0%
Requests per second: 48.30
2025-09-04 11:37:46 -07:00
Derek Higgins
5bbca56cfc
fix: Make SentenceTransformer embedding operations non-blocking (#3335)
- Wrap model loading with asyncio.to_thread() to prevent blocking during
model download/initialization
- Wrap encoding operations with asyncio.to_thread() to run in background
thread
- Convert _load_sentence_transformer_model() to async method

This ensures the async event loop remains responsive during embedding
operations.

Closes: #3332

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-04 13:58:41 -04:00
IAN MILLER
85f33762d7
refactor(server): remove hardcoded 409 and 404 status codes in server.py using httpx constants (#3333)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is eliminating hardcoded status codes: `409` CONFLICT and `404`
NOT_FOUND in `server.py` using `httpx` built-in constants. This
implementation will follow the existing structure to improve
readability, extensibility and developer experience. This is already was
implemented in #3131

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
`./scripts/unit-tests.sh`
2025-09-04 18:15:13 +02:00
Derek Higgins
64d2306dd5
fix: distro-codegen pre-commit hook file pattern (#3337)
Update the file pattern from 'llama_stack/templates' to
'llama_stack/distributions' to properly trigger the Distribution
Template Codegen hook when distribution files change.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-09-04 17:56:32 +02:00
ehhuang
5d52e0d2c5
chore: handle missing finish_reason (#3328)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do?
Sometimes the stream don't have chunks with finish_reason, e.g. canceled
stream, which throws a pydantic error as OpenAIChoice.finish_reason: str

## Test Plan
observe no more such error when benchmarking
2025-09-04 13:23:18 +02:00
Ashwin Bharambe
02f6e0f531
fix(tests): set inference mode to be replay by default (#3326)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 1m19s
Pre-commit / pre-commit (push) Successful in 2m30s
`construct_stack()` relies on the environment variable to know when to
setup the patching infrastructure.


c3d3a0b833/llama_stack/core/stack.py (L314)
2025-09-03 15:57:17 -07:00
Ashwin Bharambe
c3d3a0b833
feat(tests): auto-merge all model list responses and unify recordings (#3320)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
UI Tests / ui-tests (22) (push) Successful in 1m7s
Pre-commit / pre-commit (push) Successful in 2m34s
One needed to specify record-replay related environment variables for
running integration tests. We could not use defaults because integration
tests could be run against Ollama instances which could be running
different models. For example, text vs vision tests needed separate
instances of Ollama because a single instance typically cannot serve
both of these models if you assume the standard CI worker configuration
on Github. As a result, `client.list()` as returned by the Ollama client
would be different between these runs and we'd end up overwriting
responses.

This PR "solves" it by adding a small amount of complexity -- we store
model list responses specially, keyed by the hashes of the models they
return. At replay time, we merge all of them and pretend that we have
the union of all models available.

## Test Plan

Re-recorded all the tests using `scripts/integration-tests.sh
--inference-mode record`, including the vision tests.
2025-09-03 11:33:03 -07:00
ehhuang
d948e63340
chore: Improve error message for missing provider dependencies (#3315)
Generated with CC:

Replace cryptic KeyError with clear, actionable error message that
shows:
- Which API the failing provider belongs to
- The provider ID and type that's failing
- Which dependency is missing
- Clear instructions on how to fix the issue


## Test plan
Use a run config with Agents API and no safety provider

Before: KeyError: <Api.safety: 'safety'>
After: Failed to resolve 'agents' provider 'meta-reference' of type
'inline::meta-reference': required dependency 'safety' is not available.
Please add a 'safety' provider to your configuration or check if the
provider is properly configured.
2025-09-03 16:11:59 +02:00
Cesare Pompeiano
ccaf6aaa51
chore(python-deps): replace ibm_watson_machine_learning with ibm_watsonx_ai (#3302)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
UI Tests / ui-tests (22) (push) Successful in 1m23s
Pre-commit / pre-commit (push) Successful in 3m5s
# What does this PR do?

This PR updates the Watsonx provider dependencies from
`ibm_watson_machine_learning` to `ibm_watsonx_ai`.

The old package `ibm_watson_machine_learning` is in **deprecation mode**
([[PyPI
link](https://pypi.org/project/ibm-watson-machine-learning/)](https://pypi.org/project/ibm-watson-machine-learning/))
and relies on older versions of dependencies such as `pandas`. Updating
to `ibm_watsonx_ai` ensures compatibility with current dependency
versions and ongoing support.

## Test Plan

I verified the update by running an inference using a model provided by
Watsonx. The model ran successfully, confirming that the new dependency
works as expected.

Co-authored-by: are-ces <cpompeia@redhat.com>
2025-09-03 11:33:35 +02:00
Varsha
c59d8c5047
fix: Fix mock vector DB schema in Qdrant tests (#3295)
# What does this PR do?
Fix: https://github.com/llamastack/llama-stack/issues/3293
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
===================================================== test session starts =====================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.7-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 3 items                                                                                                             

tests/unit/providers/vector_io/test_qdrant.py::test_qdrant_adapter_returns_expected_chunks[2-2] PASSED                  [ 33%]
tests/unit/providers/vector_io/test_qdrant.py::test_qdrant_adapter_returns_expected_chunks[100-60] PASSED               [ 66%]
tests/unit/providers/vector_io/test_qdrant.py::test_qdrant_register_and_unregister_vector_db PASSED                     [100%]
```

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-09-03 09:59:16 +02:00
IAN MILLER
faf891b40c
refactor: use generic WeightedInMemoryAggregator for hybrid search in SQLiteVecIndex (#3303)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 0s
Pre-commit / pre-commit (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
UI Tests / ui-tests (22) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to refactor `SQLiteVecIndex` to eliminate
redundant code and simplify the code using generic
`WeightedInMemoryAggregator` that can be used for any vector db
provider. This pattern is already implemented for `PGVectorIndex` in
#3064

CC: @franciscojavierarceo 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. `./scripts/unit-tests.sh`
2. Integration tests in CI Workflow
2025-09-02 10:38:35 -07:00
dependabot[bot]
5c873d53db
chore(python-deps): bump pymilvus from 2.6.0 to 2.6.1 (#3285)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 0s
Pre-commit / pre-commit (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 1s
Test Llama Stack Build / build (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 0s
Test External API and Providers / test-external (venv) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 0s
Update ReadTheDocs / update-readthedocs (push) Failing after 0s
UI Tests / ui-tests (22) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Bumps [pymilvus](https://github.com/milvus-io/pymilvus) from 2.6.0 to
2.6.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/milvus-io/pymilvus/releases">pymilvus's
releases</a>.</em></p>
<blockquote>
<h2>PyMilvus v2.6.1 Release Notes</h2>
<h2>What's Changed</h2>
<ul>
<li>Avoid describe_collection when query by ids by <a
href="https://github.com/yhmo"><code>@​yhmo</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2930">milvus-io/pymilvus#2930</a></li>
<li>bulkImport add objectUrls/token paramster &amp; add example use by
<a
href="https://github.com/lentitude2tk"><code>@​lentitude2tk</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2934">milvus-io/pymilvus#2934</a></li>
<li>support stageManager &amp; stageFileManager by <a
href="https://github.com/lentitude2tk"><code>@​lentitude2tk</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2935">milvus-io/pymilvus#2935</a></li>
<li>fix: Fix the existing version fmt by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2960">milvus-io/pymilvus#2960</a></li>
<li>enhance: Add unixmsec in every RPC call by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2961">milvus-io/pymilvus#2961</a></li>
<li>enhance: Multiple cherry picks from master branch by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2962">milvus-io/pymilvus#2962</a></li>
<li>fix: Passing unknown req.is_refresh to wait by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2964">milvus-io/pymilvus#2964</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/milvus-io/pymilvus/compare/v2.6.0...v2.6.1">https://github.com/milvus-io/pymilvus/compare/v2.6.0...v2.6.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0237c9f1bd"><code>0237c9f</code></a>
fix: [2.6]Passing unknown req.is_refresh to wait (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2964">#2964</a>)</li>
<li><a
href="a083622d8f"><code>a083622</code></a>
enhance: Multiple cherry picks from master branch (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2962">#2962</a>)</li>
<li><a
href="87e3c5acc1"><code>87e3c5a</code></a>
enhance: Add unixmsec in every RPC call (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2961">#2961</a>)</li>
<li><a
href="98077a27c9"><code>98077a2</code></a>
fix: [2.6]Fix the existing version fmt (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2960">#2960</a>)</li>
<li><a
href="80e2e09323"><code>80e2e09</code></a>
feat: Add partial update support for upsert operations (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2938">#2938</a>)
(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2940">#2940</a>)</li>
<li><a
href="0210ee92e6"><code>0210ee9</code></a>
[cherry-pick] support stageManager &amp; stageFileManager (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2935">#2935</a>)</li>
<li><a
href="00fb8e6f23"><code>00fb8e6</code></a>
[cherry-pick] bulkImport add objectUrls/token paramster &amp; add
example use (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2">#2</a>...</li>
<li><a
href="442ef15806"><code>442ef15</code></a>
Avoid describe_collection when query by ids (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2930">#2930</a>)</li>
<li><a
href="e704dd29b5"><code>e704dd2</code></a>
fix: Correct github actions on branch 2.6 (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2926">#2926</a>)</li>
<li>See full diff in <a
href="https://github.com/milvus-io/pymilvus/compare/v2.6.0...v2.6.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymilvus&package-manager=uv&previous-version=2.6.0&new-version=2.6.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-01 20:24:22 -04:00
IAN MILLER
4a59961a6c
refactor: remove lama-api-client from pyproject.toml (#3299)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Vector IO Integration Tests / test-matrix (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Failing after 0s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Has started running
Test Llama Stack Build / build-single-provider (push) Has started running
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is eliminating `lama-api-client` dependency at `pyproject.toml`
because it's not used in Llama Stack codebase

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
` ./scripts/unit-tests.sh`
2025-09-01 16:50:50 +02:00
dependabot[bot]
9625ac6d02
chore(python-deps): bump locust from 2.39.0 to 2.39.1 (#3284)
Bumps [locust](https://github.com/locustio/locust) from 2.39.0 to
2.39.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/releases">locust's
releases</a>.</em></p>
<blockquote>
<h2>2.39.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Avoid broken gevent version for now by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3196">locustio/locust#3196</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/JumboBear"><code>@​JumboBear</code></a>
made their first contribution in <a
href="https://redirect.github.com/locustio/locust/pull/3195">locustio/locust#3195</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.39.0...2.39.1">https://github.com/locustio/locust/compare/2.39.0...2.39.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/blob/master/CHANGELOG.md">locust's
changelog</a>.</em></p>
<blockquote>
<h1>Detailed changelog</h1>
<p>The most important changes can also be found in <a
href="https://docs.locust.io/en/latest/changelog.html">the
documentation</a>.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="934c5c33e4"><code>934c5c3</code></a>
changelog</li>
<li><a
href="9350084ec0"><code>9350084</code></a>
disable macos build for now</li>
<li><a
href="705e2f658b"><code>705e2f6</code></a>
Disable another unit test on macos because of annoying behavior on GH
(really...</li>
<li><a
href="d888b9db2b"><code>d888b9d</code></a>
Disable another unit test on macos because of annoying behavior on
GH</li>
<li><a
href="45bc4d84fd"><code>45bc4d8</code></a>
Disable annoying test case on macos for now. Only has issues on GH. <a
href="https://github.com/amadeupp"><code>@​amadeupp</code></a>...</li>
<li><a
href="9d7710a2da"><code>9d7710a</code></a>
unit tests: give extra time for testing on macOS</li>
<li><a
href="fcbc740e04"><code>fcbc740</code></a>
Avoid broken gevent version for now (<a
href="https://redirect.github.com/locustio/locust/issues/3196">#3196</a>)</li>
<li><a
href="cd1f600d44"><code>cd1f600</code></a>
mypy</li>
<li><a
href="0cf52dc990"><code>0cf52dc</code></a>
Autogen changelog for 2.39.0</li>
<li><a
href="094395e024"><code>094395e</code></a>
Merge pull request <a
href="https://redirect.github.com/locustio/locust/issues/3195">#3195</a>
from JumboBear/pyproject</li>
<li>Additional commits viewable in <a
href="https://github.com/locustio/locust/compare/2.39.0...2.39.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=locust&package-manager=uv&previous-version=2.39.0&new-version=2.39.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-01 16:49:09 +02:00
dependabot[bot]
9e5ef1af3c
chore(ui-deps): bump @radix-ui/react-tooltip from 1.2.6 to 1.2.8 in /llama_stack/ui (#3287)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
UI Tests / ui-tests (22) (push) Failing after 0s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 11s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 20s
Test External API and Providers / test-external (venv) (push) Failing after 19s
Bumps [@radix-ui/react-tooltip](https://github.com/radix-ui/primitives)
from 1.2.6 to 1.2.8.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-tooltip&package-manager=npm_and_yarn&previous-version=1.2.6&new-version=1.2.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-01 10:18:57 +02:00
dependabot[bot]
4499559ed1
chore(ui-deps): bump prettier from 3.5.3 to 3.6.2 in /llama_stack/ui (#3289)
Bumps [prettier](https://github.com/prettier/prettier) from 3.5.3 to
3.6.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/prettier/releases">prettier's
releases</a>.</em></p>
<blockquote>
<h2>3.6.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Add missing blank line around code block by <a
href="https://github.com/fisker"><code>@​fisker</code></a> in <a
href="https://redirect.github.com/prettier/prettier/pull/17675">prettier/prettier#17675</a></li>
</ul>
<p>🔗 <a
href="https://github.com/prettier/prettier/blob/main/CHANGELOG.md#362">Changelog</a></p>
<h2>3.6.1</h2>
<ul>
<li>Fix &quot;Warning: File descriptor 39 closed but not opened in
unmanaged mode&quot; error when running
<code>--experimental-cli</code></li>
</ul>
<p>🔗 <a
href="https://github.com/prettier/prettier/blob/main/CHANGELOG.md#361">Changelog</a></p>
<h2>3.6.0</h2>
<p><a
href="https://github.com/prettier/prettier/compare/3.5.3...3.6.0">diff</a></p>
<p>🔗 <a href="https://prettier.io/blog/2025/06/23/3.6.0">Release note
&quot;Prettier 3.6: Experimental fast CLI and new OXC and Hermes
plugins!&quot;</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/prettier/blob/main/CHANGELOG.md">prettier's
changelog</a>.</em></p>
<blockquote>
<h1>3.6.2</h1>
<p><a
href="https://github.com/prettier/prettier/compare/3.6.1...3.6.2">diff</a></p>
<h4>Markdown: Add missing blank line around code block (<a
href="https://redirect.github.com/prettier/prettier/pull/17675">#17675</a>
by <a href="https://github.com/fisker"><code>@​fisker</code></a>)</h4>
<!-- raw HTML omitted -->
<pre lang="md"><code>&lt;!-- Input --&gt;
1. Some text, and code block below, with newline after code block
<pre lang="yaml"><code>---
foo: bar
</code></pre>
<ol>
<li>Another</li>
<li>List</li>
</ol>
<p>&lt;!-- Prettier 3.6.1 --&gt;</p>
<ol>
<li>
<p>Some text, and code block below, with newline after code block</p>
<pre lang="yaml"><code>---
foo: bar
</code></pre>
<ol>
<li>Another</li>
<li>List</li>
</ol>
</li>
</ol>
<p>&lt;!-- Prettier 3.6.2 --&gt;</p>
<ol>
<li>
<p>Some text, and code block below, with newline after code block</p>
<pre lang="yaml"><code>---
foo: bar
</code></pre>
<ol>
<li>Another</li>
<li>List<br />
</code></pre></li>
</ol>
</li>
</ol>
<h1>3.6.1</h1>
<p><a
href="https://github.com/prettier/prettier/compare/3.6.0...3.6.1">diff</a></p>
<h4>TypeScript: Allow const without initializer (<a
href="https://redirect.github.com/prettier/prettier/pull/17650">#17650</a>,
<a
href="https://redirect.github.com/prettier/prettier/pull/17654">#17654</a>
by <a href="https://github.com/fisker"><code>@​fisker</code></a>)</h4>
<!-- raw HTML omitted -->
<pre lang="jsx"><code>// Input
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7a8b05f415"><code>7a8b05f</code></a>
Release 3.6.2</li>
<li><a
href="46526b49b6"><code>46526b4</code></a>
Add missing blank line around code block (<a
href="https://redirect.github.com/prettier/prettier/issues/17675">#17675</a>)</li>
<li><a
href="a04ec1196f"><code>a04ec11</code></a>
chore(deps): update babel to v7.27.7 (<a
href="https://redirect.github.com/prettier/prettier/issues/17684">#17684</a>)</li>
<li><a
href="32be5b6b44"><code>32be5b6</code></a>
chore(deps): update dependency flow-parser to v0.274.1 (<a
href="https://redirect.github.com/prettier/prettier/issues/17676">#17676</a>)</li>
<li><a
href="b55e777924"><code>b55e777</code></a>
Update docs about &quot;TypeScript Configuration Files&quot; (<a
href="https://redirect.github.com/prettier/prettier/issues/17677">#17677</a>)</li>
<li><a
href="b197c99224"><code>b197c99</code></a>
chore(deps): update dependency <code>@​vitejs/plugin-react</code> to
v4.6.0 (<a
href="https://redirect.github.com/prettier/prettier/issues/17674">#17674</a>)</li>
<li><a
href="1185f8370a"><code>1185f83</code></a>
chore(deps): update dependency <code>@​angular/compiler</code> to
v20.0.5 (<a
href="https://redirect.github.com/prettier/prettier/issues/17680">#17680</a>)</li>
<li><a
href="aa1316fa60"><code>aa1316f</code></a>
chore(deps): update dependency browserslist to v4.25.1 (<a
href="https://redirect.github.com/prettier/prettier/issues/17671">#17671</a>)</li>
<li><a
href="c468d33f16"><code>c468d33</code></a>
chore(deps): update dependency oxc-parser to v0.75.0 (<a
href="https://redirect.github.com/prettier/prettier/issues/17672">#17672</a>)</li>
<li><a
href="3f46d91bdb"><code>3f46d91</code></a>
chore(deps): update dependency vite to v7 (<a
href="https://redirect.github.com/prettier/prettier/issues/17673">#17673</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/prettier/prettier/compare/3.5.3...3.6.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=prettier&package-manager=npm_and_yarn&previous-version=3.5.3&new-version=3.6.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-01 10:18:40 +02:00
dependabot[bot]
7cc059fe41
chore(ui-deps): bump eslint-config-next from 15.3.2 to 15.5.2 in /llama_stack/ui (#3288)
Bumps
[eslint-config-next](https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next)
from 15.3.2 to 15.5.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">eslint-config-next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.2</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: disable unknownatrules lint rule entirely (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83059">#83059</a>)</li>
<li>revert: add ?dpl to fonts in /_next/static/media (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83062">#83062</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a> and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v15.5.1</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: aliased navigations should apply scroll handling (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82900">#82900</a>)</li>
<li>Turbopack: fix invalid NFT entry with file behind symlink (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82887">#82887</a>)</li>
<li>fix: typesafe linking to route handlers and pages API routes (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82858">#82858</a>)</li>
<li>fix: change &quot;noUnknownAtRules&quot; to &quot;warn&quot; for
Biome (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82974">#82974</a>)</li>
<li>fix: add path normalization to getRelativePath for Windows (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82918">#82918</a>)</li>
<li>feat: add typesafety with config.typedRoutes to redirect() and
permanentRedirect() (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82860">#82860</a>)</li>
<li>fix: avoid importing types that will be unused (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82856">#82856</a>)</li>
<li>fix: update the config.api.responseLimit type (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82852">#82852</a>)</li>
<li>fix: update validation return types (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82854">#82854</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/bgub"><code>@​bgub</code></a>, <a
href="https://github.com/mischnic"><code>@​mischnic</code></a>, and <a
href="https://github.com/ztanner"><code>@​ztanner</code></a> for
helping!</p>
<h2>v15.5.1-canary.20</h2>
<h3>Misc Changes</h3>
<ul>
<li>Turbopack: hide blocking spans in trace server: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83167">#83167</a></li>
<li>Update Rspack production test manifest: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83207">#83207</a></li>
<li>[create-next-app] Generate route types after setup: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82956">#82956</a></li>
<li>Update Rspack development test manifest: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83208">#83208</a></li>
<li>docs: fix snippets in getting started: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83228">#83228</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/sokra"><code>@​sokra</code></a>, <a
href="https://github.com/vercel-release-bot"><code>@​vercel-release-bot</code></a>,
<a href="https://github.com/bgub"><code>@​bgub</code></a>, and <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a> for
helping!</p>
<h2>v15.5.1-canary.19</h2>
<h3>Core Changes</h3>
<ul>
<li>[sourcemaps] Always check for vendor chunks regardless of Node.js
version: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83114">#83114</a></li>
<li>Turbopack: Remove undocumented legacy syntax for built-in conditions
(e.g. foreign, browser): <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83068">#83068</a></li>
<li>[metadata] update metadata routes cache headers: <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83215">#83215</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="497ec6aa08"><code>497ec6a</code></a>
v15.5.2</li>
<li><a
href="cc68ced552"><code>cc68ced</code></a>
v15.5.1</li>
<li><a
href="7e08c8223d"><code>7e08c82</code></a>
v15.5.0</li>
<li><a
href="8f6d345d2d"><code>8f6d345</code></a>
v15.4.2-canary.56</li>
<li><a
href="e3e21977ed"><code>e3e2197</code></a>
v15.4.2-canary.55</li>
<li><a
href="a745826b2c"><code>a745826</code></a>
v15.4.2-canary.54</li>
<li><a
href="bec38efdb6"><code>bec38ef</code></a>
v15.4.2-canary.53</li>
<li><a
href="97dbf5f2e1"><code>97dbf5f</code></a>
v15.4.2-canary.52</li>
<li><a
href="9934b3788a"><code>9934b37</code></a>
v15.4.2-canary.51</li>
<li><a
href="df9f3ba484"><code>df9f3ba</code></a>
v15.4.2-canary.50</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/commits/v15.5.2/packages/eslint-config-next">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-config-next&package-manager=npm_and_yarn&previous-version=15.3.2&new-version=15.5.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-01 10:18:15 +02:00
dependabot[bot]
26b4340de3
chore(ui-deps): bump @types/node from 20.17.47 to 24.3.0 in /llama_stack/ui (#3290)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 0s
Test External API and Providers / test-external (venv) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 20.17.47 to 24.3.0.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/node&package-manager=npm_and_yarn&previous-version=20.17.47&new-version=24.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-31 17:47:31 -07:00
dependabot[bot]
a4a89745b6
chore(ui-deps): bump framer-motion from 11.18.2 to 12.23.12 in /llama_stack/ui (#3291)
Bumps [framer-motion](https://github.com/motiondivision/motion) from
11.18.2 to 12.23.12.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/motiondivision/motion/blob/main/CHANGELOG.md">framer-motion's
changelog</a>.</em></p>
<blockquote>
<h2>[12.23.12] 2025-07-29</h2>
<h3>Added</h3>
<ul>
<li>Exporting internal APIs for use in view animations.</li>
</ul>
<h2>[12.23.11] 2025-07-28</h2>
<h3>Added</h3>
<ul>
<li>Children of variants with <code>delayChildren: stagger()</code> will
now be staggered correctly alongside their newly-entering siblings.</li>
</ul>
<h2>[12.23.10] 2025-07-28</h2>
<h3>Fixed</h3>
<ul>
<li>Fixed shared layout animation in situations where no
<code>motion</code> components have re-rendered between shared element
switching.</li>
</ul>
<h2>[12.23.9] 2025-07-24</h2>
<h3>Changed</h3>
<ul>
<li>Removing redundant <code>renderRequest</code>
<code>MotionValue</code> lifecycle.</li>
</ul>
<h2>[12.23.8] 2025-07-24</h2>
<h3>Fixed</h3>
<ul>
<li>Ensuring that when an animation is skipped via <code>duration =
0</code> that we also set <code>type = &quot;keyframes&quot;</code> so
that <code>duration</code> takes effect.</li>
</ul>
<h2>[12.23.7] 2025-07-23</h2>
<h3>Fixed</h3>
<ul>
<li><code>springValue</code> cleanup.</li>
<li>Removed additional <code>removeNode</code> from
<code>AnimatePresence</code> when using <code>popLayout</code>.</li>
</ul>
<h2>[12.23.6] 2025-07-11</h2>
<h3>Changed</h3>
<ul>
<li>Added explainer for reduced motion warning.</li>
<li>Refactored <code>motion</code> component creation to remove
indirection.</li>
</ul>
<h2>[12.23.5] 2025-07-11</h2>
<h3>Fixed</h3>
<ul>
<li>Fix animation timings within dynamically-generated popups.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e0f7e07570"><code>e0f7e07</code></a>
v12.23.12</li>
<li><a
href="994515fef3"><code>994515f</code></a>
Updating changelog</li>
<li><a
href="95d82ff919"><code>95d82ff</code></a>
Merge pull request <a
href="https://redirect.github.com/motiondivision/motion/issues/3338">#3338</a>
from motiondivision/feature/next-page-transitions</li>
<li><a
href="58b2e8cde4"><code>58b2e8c</code></a>
Exporting APIs for view transitions</li>
<li><a
href="b6f2132fb6"><code>b6f2132</code></a>
Update README.md</li>
<li><a
href="38298c41fc"><code>38298c4</code></a>
Update README.md</li>
<li><a
href="76396b0187"><code>76396b0</code></a>
Update README.md</li>
<li><a
href="b273d064a3"><code>b273d06</code></a>
Update README.md</li>
<li><a
href="c0bd6effa9"><code>c0bd6ef</code></a>
v12.23.11</li>
<li><a
href="e9b52af3e2"><code>e9b52af</code></a>
Updating changelog</li>
<li>Additional commits viewable in <a
href="https://github.com/motiondivision/motion/compare/v11.18.2...v12.23.12">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=framer-motion&package-manager=npm_and_yarn&previous-version=11.18.2&new-version=12.23.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-31 17:46:12 -07:00
Matthew Farrellee
478b4ff1e6
chore(migrate apis): move VectorDBWithIndex from embeddings to openai_embeddings (#3294)
# What does this PR do?

migrates VectorDBWithIndex to use openai_embeddings

part of #2365 

## Test Plan

existing unit tests
2025-08-31 14:48:35 -07:00
Jiayi Ni
b12cd528ef
docs: add VLM NIM example (#3277)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 0s
Test Llama Stack Build / build-single-provider (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 0s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 0s
Update ReadTheDocs / update-readthedocs (push) Failing after 1s
2025-08-29 16:23:52 -07:00
Matthew Farrellee
3370d8e557
feat(files, s3, expiration): add expires_after support to S3 files provider (#3283) 2025-08-29 16:17:24 -07:00
github-actions[bot]
78a78264a7 build: Bump version to 0.2.20 2025-08-29 21:17:47 +00:00
slekkala1
efdb5558b8
fix: Remove bfcl scoring function as not supported (#3281)
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Failing after 2s
Test Llama Stack Build / build (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 0s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1s
# What does this PR do?
BFCL scoring function is not supported, removing it. 

Also minor fixes as the llama stack run is broken for open-benchmark for
test plan verification
1. Correct the model paths for supported models
2. Fix another issue as there is no `provider_id` for DatasetInput but
logger assumes it exists.
``` 
File "/Users/swapna942/llama-stack/llama_stack/core/stack.py", line 332, in construct_stack
    await register_resources(run_config, impls)
  File "/Users/swapna942/llama-stack/llama_stack/core/stack.py", line 108, in register_resources
    logger.debug(f"registering {rsrc.capitalize()} {obj} for provider {obj.provider_id}")
                                                                       ^^^^^^^^^^^^^^^
  File "/Users/swapna942/llama-stack/.venv/lib/python3.13/site-packages/pydantic/main.py", line 991, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'DatasetInput' object has no attribute 'provider_id'
```

## Test Plan
```llama stack build --distro open-benchmark --image-type venv``` and run the server succeeds


Issue Link: https://github.com/llamastack/llama-stack/issues/3282
2025-08-29 11:03:52 -07:00
IAN MILLER
3130ca0a78
feat: implement keyword, vector and hybrid search inside vector stores for PGVector provider (#3064)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this task is to implement
`openai/v1/vector_stores/{vector_store_id}/search` for PGVector
provider. It involves implementing vector similarity search, keyword
search and hybrid search for `PGVectorIndex`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3006 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run unit tests:
` ./scripts/unit-tests.sh `

Run integration tests for openai vector stores:
1. Export env vars:
```
export ENABLE_PGVECTOR=true
export PGVECTOR_HOST=localhost
export PGVECTOR_PORT=5432
export PGVECTOR_DB=llamastack
export PGVECTOR_USER=llamastack
export PGVECTOR_PASSWORD=llamastack
```

2. Create DB:
```
psql -h localhost -U postgres -c "CREATE ROLE llamastack LOGIN PASSWORD 'llamastack';"
psql -h localhost -U postgres -c "CREATE DATABASE llamastack OWNER llamastack;"
psql -h localhost -U llamastack -d llamastack -c "CREATE EXTENSION IF NOT EXISTS vector;"
```

3. Install sentence-transformers:
` uv pip install sentence-transformers  `

4. Run:
```
uv run --group test pytest -s -v --stack-config="inference=inline::sentence-transformers,vector_io=remote::pgvector" --embedding-model sentence-transformers/all-MiniLM-L6-v2 tests/integration/vector_io/test_openai_vector_stores.py
```
Inspect PGVector vector stores (optional):
```
psql llamastack                                                                                                         
psql (14.18 (Homebrew))
Type "help" for help.

llamastack=# \z
                                                    Access privileges
 Schema |                         Name                         | Type  | Access privileges | Column privileges | Policies 
--------+------------------------------------------------------+-------+-------------------+-------------------+----------
 public | llamastack_kvstore                                   | table |                   |                   | 
 public | metadata_store                                       | table |                   |                   | 
 public | vector_store_pgvector_main                           | table |                   |                   | 
 public | vector_store_vs_1dfbc061_1f4d_4497_9165_ecba2622ba3a | table |                   |                   | 
 public | vector_store_vs_2085a9fb_1822_4e42_a277_c6a685843fa7 | table |                   |                   | 
 public | vector_store_vs_2b3dae46_38be_462a_afd6_37ee5fe661b1 | table |                   |                   | 
 public | vector_store_vs_2f438de6_f606_4561_9d50_ef9160eb9060 | table |                   |                   | 
 public | vector_store_vs_3eeca564_2580_4c68_bfea_83dc57e31214 | table |                   |                   | 
 public | vector_store_vs_53942163_05f3_40e0_83c0_0997c64613da | table |                   |                   | 
 public | vector_store_vs_545bac75_8950_4ff1_b084_e221192d4709 | table |                   |                   | 
 public | vector_store_vs_688a37d8_35b2_4298_a035_bfedf5b21f86 | table |                   |                   | 
 public | vector_store_vs_70624d9a_f6ac_4c42_b8ab_0649473c6600 | table |                   |                   | 
 public | vector_store_vs_73fc1dd2_e942_4972_afb1_1e177b591ac2 | table |                   |                   | 
 public | vector_store_vs_9d464949_d51f_49db_9f87_e033b8b84ac9 | table |                   |                   | 
 public | vector_store_vs_a1e4d724_5162_4d6d_a6c0_bdafaf6b76ec | table |                   |                   | 
 public | vector_store_vs_a328fb1b_1a21_480f_9624_ffaa60fb6672 | table |                   |                   | 
 public | vector_store_vs_a8981bf0_2e66_4445_a267_a8fff442db53 | table |                   |                   | 
 public | vector_store_vs_ccd4b6a4_1efd_4984_ad03_e7ff8eadb296 | table |                   |                   | 
 public | vector_store_vs_cd6420a4_a1fc_4cec_948c_1413a26281c9 | table |                   |                   | 
 public | vector_store_vs_cd709284_e5cf_4a88_aba5_dc76a35364bd | table |                   |                   | 
 public | vector_store_vs_d7a4548e_fbc1_44d7_b2ec_b664417f2a46 | table |                   |                   | 
 public | vector_store_vs_e7f73231_414c_4523_886c_d1174eee836e | table |                   |                   | 
 public | vector_store_vs_ffd53588_819f_47e8_bb9d_954af6f7833d | table |                   |                   | 
(23 rows)

llamastack=# 
```

Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-29 16:30:12 +02:00
Matthew Farrellee
e96e3c4da4
feat(s3 auth): add authorization support for s3 files provider (#3265)
# What does this PR do?

adds support for authorized users to the s3 files provider

## Test Plan

existing and new unit tests
2025-08-29 16:14:00 +02:00
Matthew Farrellee
ed418653ec
chore(dev): add inequality support to sqlstore where clause (#3272)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test External API and Providers / test-external (venv) (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 0s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 1s
# What does this PR do?

add the ability to use inequalities in the where clause of the sqlstore.

this is infrastructure for files expiration.

## Test Plan

unit tests
2025-08-28 14:49:36 -07:00
slekkala1
30117dea22
fix: docker failing to start container [fireworks-ai] (#3267)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 1s
Unit Tests / unit-tests (3.12) (push) Failing after 0s
Unit Tests / unit-tests (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
# What does this PR do?
1725364988 
Fixes the issue with open ai package incompatibilty introduced through
new dependency of fireworks-ai==0.19.18->reward-kit by pinning to
fireworks older version that doesnt pull in reward-kit

## Test Plan
Tested locally with the following commands to start a container
1. Build container 
`llama stack build --distro starter --image-type container`
2. start container `docker run -d -p 8321:8321 --name llama-stack-test
distribution-starter:0.2.19`
3. check health http://localhost:8321/v1/health
Above steps fails without the fix
2025-08-28 13:20:36 -07:00
Omer Tuchfeld
52106d95d3
fix(env): env var replacement preserve types (#3270)
# What does this PR do?

During env var replacement, we're implicitly converting all config types
to their apparent types (e.g., "true" to True, "123" to 123). This may
be arguably useful for when doing an env var substitution, as those are
always strings, but we should definitely avoid touching config values
that have explicit types and are uninvolved in env var substitution.

## Test Plan

Unit
2025-08-28 17:07:18 +02:00
Francisco Arceo
75fad445a6
feat(UI): Implementing File Upload and VectorDB Creation/Configuration in Playground (#3266)
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
2025-08-28 05:03:31 -06:00
Kelly Brown
1a9fa3c0b8
docs: Contributor guidelines for creating Internal or External providers (#3111)
**Description:** 
Adding information and guidelines on when contributors should create an
in-tree vs out-of-tree provider.


Im still learning a bit about this subject so Im very open to feedback
on this PR

Will also add this section to the API Providers section of the docs
2025-08-28 12:26:47 +02:00
raghotham
d73955a41e
chore: remove absolute paths (#3263)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build (push) Has been skipped
Unit Tests / unit-tests (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 2s
UI Tests / ui-tests (22) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s
# What does this PR do?
Finding these issues while moving to github pages.


## Test Plan
uv run --group docs sphinx-autobuild docs/source docs/build/html
--write-all
2025-08-27 12:04:25 -07:00
Charlie Doern
cec00c5476
docs: fix post_training docs (#3262)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Failing after 1s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test External API and Providers / test-external (venv) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Pre-commit / pre-commit (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
UI Tests / ui-tests (22) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
# What does this PR do?

the post training docs are missing references to the more indepth
`huggingface.md` and `torchtune.md` which explain how to actually use
the providers.

These files show up in search though.

Add references to these files into the `inline_..md` files currently
pointed to by `index.md`

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-26 18:21:15 -07:00
github-actions[bot]
963305c84d build: Bump version to 0.2.19 2025-08-26 22:02:47 +00:00
Ashwin Bharambe
9fa69b0337
feat(distro): no huggingface provider for starter (#3258)
The `trl` dependency brings in `accelerate` which brings in nvidia
dependencies for torch. We cannot have that in the starter distro. As
such, no CPU-only post-training for the huggingface provider.
2025-08-26 14:06:36 -07:00
Matthew Farrellee
00bd9a61ed
chore: Add example notebook for Langchain + LLAMAStack integration (#3228) (#3259) 2025-08-26 12:58:44 -07:00
slekkala1
2666029427
feat: Add example notebook for Langchain + LLAMAStack integration (#3228)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
# What does this PR do?
Add LLAMAStack + Langchain integration example notebook

## Test Plan
Ran in Jupyter notebook, works end to end.

(Used Claude mainly for documentation and coding/debugging help)
2025-08-26 11:34:08 -07:00
Derek Higgins
7ca8233889
feat(testing): remove SQLite dependency from inference recorder (#3254)
Recording files use a predictable naming format, making the SQLite index
redundant. The binary SQLite file was causing frequent git conflicts.
Simplify by calculating file paths directly from request hashes.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-08-26 09:17:00 -07:00
dependabot[bot]
1eb1ac0f41
chore(ui-deps): bump @testing-library/jest-dom from 6.6.3 to 6.8.0 in /llama_stack/ui (#3243)
Bumps
[@testing-library/jest-dom](https://github.com/testing-library/jest-dom)
from 6.6.3 to 6.8.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/testing-library/jest-dom/releases"><code>@​testing-library/jest-dom</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v6.8.0</h2>
<h1><a
href="https://github.com/testing-library/jest-dom/compare/v6.7.0...v6.8.0">6.8.0</a>
(2025-08-20)</h1>
<h3>Features</h3>
<ul>
<li>add toBePartiallyPressed matcher (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/203">#203</a>)
(<a
href="https://redirect.github.com/testing-library/jest-dom/issues/692">#692</a>)
(<a
href="779b7125d3">779b712</a>)</li>
</ul>
<h2>v6.7.0</h2>
<h1><a
href="https://github.com/testing-library/jest-dom/compare/v6.6.4...v6.7.0">6.7.0</a>
(2025-08-13)</h1>
<h3>Features</h3>
<ul>
<li>add toBePressed matcher (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/203">#203</a>)
(<a
href="https://redirect.github.com/testing-library/jest-dom/issues/658">#658</a>)
(<a
href="cfdf8ae370">cfdf8ae</a>)</li>
</ul>
<h2>v6.6.4</h2>
<h2><a
href="https://github.com/testing-library/jest-dom/compare/v6.6.3...v6.6.4">6.6.4</a>
(2025-07-26)</h2>
<h3>Performance Improvements</h3>
<ul>
<li>replace chalk with picocolors (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/659">#659</a>)
(<a
href="707e6471ae">707e647</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="779b7125d3"><code>779b712</code></a>
feat: add toBePartiallyPressed matcher (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/203">#203</a>)
(<a
href="https://redirect.github.com/testing-library/jest-dom/issues/692">#692</a>)</li>
<li><a
href="e15f7893cd"><code>e15f789</code></a>
docs: add kretajak as a contributor for code, and test (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/691">#691</a>)</li>
<li><a
href="cfdf8ae370"><code>cfdf8ae</code></a>
feat: add toBePressed matcher (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/203">#203</a>)
(<a
href="https://redirect.github.com/testing-library/jest-dom/issues/658">#658</a>)</li>
<li><a
href="f00d94d3d1"><code>f00d94d</code></a>
chore: add <code>dependebot.yml</code> (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/456">#456</a>)</li>
<li><a
href="476c30b43f"><code>476c30b</code></a>
refactor: drop <code>lodash</code> entirely (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/676">#676</a>)</li>
<li><a
href="fafd8caa9f"><code>fafd8ca</code></a>
chore: add tests for Node 22 &amp; 24 (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/678">#678</a>)</li>
<li><a
href="d9babb1961"><code>d9babb1</code></a>
docs: fix typo (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/667">#667</a>)</li>
<li><a
href="f0f31bbd87"><code>f0f31bb</code></a>
docs: adopt the new build-badge URL (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/497">#497</a>)</li>
<li><a
href="707e6471ae"><code>707e647</code></a>
perf: replace chalk with picocolors (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/659">#659</a>)</li>
<li><a
href="918b6fbcde"><code>918b6fb</code></a>
docs: add InfiniteXyy as a contributor for code, and bug (<a
href="https://redirect.github.com/testing-library/jest-dom/issues/650">#650</a>)</li>
<li>See full diff in <a
href="https://github.com/testing-library/jest-dom/compare/v6.6.3...v6.8.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@testing-library/jest-dom&package-manager=npm_and_yarn&previous-version=6.6.3&new-version=6.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-26 15:38:46 +02:00
dependabot[bot]
eed25fc6e4
chore(github-deps): bump astral-sh/setup-uv from 6.5.0 to 6.6.0 (#3247)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.5.0 to 6.6.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4959332f0f"><code>4959332</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/532">#532</a>)</li>
<li><a
href="adeb28643f"><code>adeb286</code></a>
Add support for .tools-versions (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/531">#531</a>)</li>
<li><a
href="fce199e243"><code>fce199e</code></a>
Add log message before long API calls to GitHub (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/530">#530</a>)</li>
<li><a
href="f758a4a1eb"><code>f758a4a</code></a>
chore: update known versions for 0.8.12 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/529">#529</a>)</li>
<li><a
href="c0e7e93474"><code>c0e7e93</code></a>
chore: update known versions for 0.8.11 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/526">#526</a>)</li>
<li><a
href="fda2399cb3"><code>fda2399</code></a>
chore: update known versions for 0.8.10 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/525">#525</a>)</li>
<li>See full diff in <a
href="d9e0f98d3f...4959332f0f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.5.0&new-version=6.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-25 17:34:38 +02:00
dependabot[bot]
3d68ca05e1
chore(github-deps): bump amannn/action-semantic-pull-request from 6.1.0 to 6.1.1 (#3248)
Bumps
[amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request)
from 6.1.0 to 6.1.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/amannn/action-semantic-pull-request/releases">amannn/action-semantic-pull-request's
releases</a>.</em></p>
<blockquote>
<h2>v6.1.1</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.1.0...v6.1.1">6.1.1</a>
(2025-08-22)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Parse <code>headerPatternCorrespondence</code> properly (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/295">#295</a>)
(<a
href="800da4c97f">800da4c</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md">amannn/action-semantic-pull-request's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.1.0...v6.1.1">6.1.1</a>
(2025-08-22)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Parse <code>headerPatternCorrespondence</code> properly (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/295">#295</a>)
(<a
href="800da4c97f">800da4c</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a>
(2025-08-19)</h2>
<h3>Features</h3>
<ul>
<li>Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)
(<a
href="a30288bf13">a30288b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)
(<a
href="afa4edb1c4">afa4edb</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a>
(2025-08-13)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)
(<a
href="58e4ab40f5">58e4ab4</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a>
(2025-08-13)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)
(<a
href="bc0c9a79ab">bc0c9a7</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.2...v5.5.3">5.5.3</a>
(2024-06-28)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump <code>braces</code> dependency (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/269">#269</a>.
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="2d952a1bf9">2d952a1</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.1...v5.5.2">5.5.2</a>
(2024-04-24)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump tar from 6.1.11 to 6.2.1 (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/262">#262</a>
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="9a90d5a5ac">9a90d5a</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.0...v5.5.1">5.5.1</a>
(2024-04-24)</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="48f256284b"><code>48f2562</code></a>
chore: Release 6.1.1 [skip ci]</li>
<li><a
href="800da4c97f"><code>800da4c</code></a>
fix: Parse <code>headerPatternCorrespondence</code> properly (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/295">#295</a>)</li>
<li><a
href="677b89571e"><code>677b895</code></a>
test: Fix broken test</li>
<li><a
href="24e6f016c1"><code>24e6f01</code></a>
ci: Fix permissions for tagger</li>
<li>See full diff in <a
href="7f33ba7922...48f256284b">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=amannn/action-semantic-pull-request&package-manager=github_actions&previous-version=6.1.0&new-version=6.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-25 17:34:17 +02:00
dependabot[bot]
fc466cb4a4
chore(ui-deps): bump eslint-plugin-prettier from 5.4.0 to 5.5.4 in /llama_stack/ui (#3241)
Bumps
[eslint-plugin-prettier](https://github.com/prettier/eslint-plugin-prettier)
from 5.4.0 to 5.5.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/eslint-plugin-prettier/releases">eslint-plugin-prettier's
releases</a>.</em></p>
<blockquote>
<h2>v5.5.4</h2>
<h3>Patch Changes</h3>
<ul>
<li>
<p><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/755">#755</a>
<a
href="723f7a803f"><code>723f7a8</code></a>
Thanks <a href="https://github.com/kbrilla"><code>@​kbrilla</code></a>!
- fix: add 'oxc', 'oxc-ts' and 'hermes' parsers to
<code>parserBlocklist</code></p>
</li>
<li>
<p><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/751">#751</a>
<a
href="cf52b306a5"><code>cf52b30</code></a>
Thanks <a
href="https://github.com/andreww2012"><code>@​andreww2012</code></a>! -
fix: disallow extra properties in rule options</p>
</li>
</ul>
<h2>v5.5.3</h2>
<p>republish the latest version</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.2...v5.5.3">https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.2...v5.5.3</a></p>
<h2>v5.5.2</h2>
<p>republish the latest version</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.1...v5.5.2">https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.1...v5.5.2</a></p>
<h2>v5.5.1</h2>
<h3>Patch Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/748">#748</a>
<a
href="bfd1e9547d"><code>bfd1e95</code></a>
Thanks <a href="https://github.com/JounQin"><code>@​JounQin</code></a>!
- fix: use <code>prettierRcOptions</code> directly for prettier
3.6+</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.0...v5.5.1">https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.0...v5.5.1</a></p>
<h2>v5.5.0</h2>
<h3>Minor Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/743">#743</a>
<a
href="92f2c9c8f0"><code>92f2c9c</code></a>
Thanks <a
href="https://github.com/dotcarmen"><code>@​dotcarmen</code></a>! -
feat: support non-js languages like <code>css</code> for
<code>@eslint/css</code> and <code>json</code> for
<code>@eslint/json</code></li>
</ul>
<h3>New Contributors</h3>
<ul>
<li><a href="https://github.com/dotcarmen"><code>@​dotcarmen</code></a>
made their first contribution in <a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/743">prettier/eslint-plugin-prettier#743</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-plugin-prettier/compare/v5.4.1...v5.5.0">https://github.com/prettier/eslint-plugin-prettier/compare/v5.4.1...v5.5.0</a></p>
<h2>v5.4.1</h2>
<h3>Patch Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/740">#740</a>
<a
href="c21521ffbe"><code>c21521f</code></a>
Thanks <a href="https://github.com/JounQin"><code>@​JounQin</code></a>!
- fix(deps): bump <code>synckit</code> to v0.11.7 to fix potential
<code>TypeError: Cannot read properties of undefined (reading
'message')</code> error</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-plugin-prettier/compare/v5.4.0...v5.4.1">https://github.com/prettier/eslint-plugin-prettier/compare/v5.4.0...v5.4.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/eslint-plugin-prettier/blob/main/CHANGELOG.md">eslint-plugin-prettier's
changelog</a>.</em></p>
<blockquote>
<h2>5.5.4</h2>
<h3>Patch Changes</h3>
<ul>
<li>
<p><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/755">#755</a>
<a
href="723f7a803f"><code>723f7a8</code></a>
Thanks <a href="https://github.com/kbrilla"><code>@​kbrilla</code></a>!
- fix: add 'oxc', 'oxc-ts' and 'hermes' parsers to
<code>parserBlocklist</code></p>
</li>
<li>
<p><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/751">#751</a>
<a
href="cf52b306a5"><code>cf52b30</code></a>
Thanks <a
href="https://github.com/andreww2012"><code>@​andreww2012</code></a>! -
fix: disallow extra properties in rule options</p>
</li>
</ul>
<h2>5.5.1</h2>
<h3>Patch Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/748">#748</a>
<a
href="bfd1e9547d"><code>bfd1e95</code></a>
Thanks <a href="https://github.com/JounQin"><code>@​JounQin</code></a>!
- fix: use <code>prettierRcOptions</code> directly for prettier
3.6+</li>
</ul>
<h2>5.5.0</h2>
<h3>Minor Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/743">#743</a>
<a
href="92f2c9c8f0"><code>92f2c9c</code></a>
Thanks <a
href="https://github.com/dotcarmen"><code>@​dotcarmen</code></a>! -
feat: support non-js languages like <code>css</code> for
<code>@eslint/css</code> and <code>json</code> for
<code>@eslint/json</code></li>
</ul>
<h2>5.4.1</h2>
<h3>Patch Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/pull/740">#740</a>
<a
href="c21521ffbe"><code>c21521f</code></a>
Thanks <a href="https://github.com/JounQin"><code>@​JounQin</code></a>!
- fix(deps): bump <code>synckit</code> to v0.11.7 to fix potential
<code>TypeError: Cannot read properties of undefined (reading
'message')</code> error</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e2c31d20f3"><code>e2c31d2</code></a>
chore: release eslint-plugin-prettier (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/756">#756</a>)</li>
<li><a
href="98a8bfd269"><code>98a8bfd</code></a>
chore(deps): update all dependencies (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/750">#750</a>)</li>
<li><a
href="cf52b306a5"><code>cf52b30</code></a>
fix: disallow extra properties in rule options (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/751">#751</a>)</li>
<li><a
href="723f7a803f"><code>723f7a8</code></a>
fix: add 'oxc', 'oxc-ts' and 'hermes' parsers to
<code>parserBlocklist</code> (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/755">#755</a>)</li>
<li><a
href="cdfcefde25"><code>cdfcefd</code></a>
fix: release a new latest version</li>
<li><a
href="d8c303ede5"><code>d8c303e</code></a>
fix: release a new latest version</li>
<li><a
href="3e87f2e73d"><code>3e87f2e</code></a>
chore: release eslint-plugin-prettier (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/749">#749</a>)</li>
<li><a
href="bfd1e9547d"><code>bfd1e95</code></a>
fix: use <code>prettierRcOptions</code> directly for prettier 3.6+ (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/748">#748</a>)</li>
<li><a
href="9c4b792de1"><code>9c4b792</code></a>
chore: release eslint-plugin-prettier (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/744">#744</a>)</li>
<li><a
href="78e41ec2f0"><code>78e41ec</code></a>
chore(deps): update all dependencies (<a
href="https://redirect.github.com/prettier/eslint-plugin-prettier/issues/745">#745</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/prettier/eslint-plugin-prettier/compare/v5.4.0...v5.5.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-plugin-prettier&package-manager=npm_and_yarn&previous-version=5.4.0&new-version=5.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-25 17:34:00 +02:00
dependabot[bot]
83dbc93e3f
chore(ui-deps): bump @testing-library/dom from 10.4.0 to 10.4.1 in /llama_stack/ui (#3244)
Bumps
[@testing-library/dom](https://github.com/testing-library/dom-testing-library)
from 10.4.0 to 10.4.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/testing-library/dom-testing-library/releases"><code>@​testing-library/dom</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v10.4.1</h2>
<h2><a
href="https://github.com/testing-library/dom-testing-library/compare/v10.4.0...v10.4.1">10.4.1</a>
(2025-07-27)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>deps:</strong> replace chalk with picocolors (<a
href="https://redirect.github.com/testing-library/dom-testing-library/issues/1341">#1341</a>)
(<a
href="225a3e4cfa">225a3e4</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="225a3e4cfa"><code>225a3e4</code></a>
fix(deps): replace chalk with picocolors (<a
href="https://redirect.github.com/testing-library/dom-testing-library/issues/1341">#1341</a>)</li>
<li>See full diff in <a
href="https://github.com/testing-library/dom-testing-library/compare/v10.4.0...v10.4.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@testing-library/dom&package-manager=npm_and_yarn&previous-version=10.4.0&new-version=10.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-25 17:33:02 +02:00
dependabot[bot]
dc07575ecd
chore(ui-deps): bump remeda from 2.26.1 to 2.30.0 in /llama_stack/ui (#3242)
Bumps [remeda](https://github.com/remeda/remeda) from 2.26.1 to 2.30.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/remeda/remeda/releases">remeda's
releases</a>.</em></p>
<blockquote>
<h2>v2.30.0</h2>
<h1><a
href="https://github.com/remeda/remeda/compare/v2.29.0...v2.30.0">2.30.0</a>
(2025-08-07)</h1>
<h3>Features</h3>
<ul>
<li><strong>isFunction:</strong> stricter <code>Function</code> type (<a
href="https://redirect.github.com/remeda/remeda/issues/1161">#1161</a>)
(<a
href="729ead3f45">729ead3</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/778">#778</a></li>
</ul>
<h2>v2.29.0</h2>
<h1><a
href="https://github.com/remeda/remeda/compare/v2.28.0...v2.29.0">2.29.0</a>
(2025-08-07)</h1>
<h3>Features</h3>
<ul>
<li>migrate build from tsup to tsdown (<a
href="https://redirect.github.com/remeda/remeda/issues/1172">#1172</a>)
(<a
href="56913804ce">5691380</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/1050">#1050</a>
<a
href="https://redirect.github.com/remeda/remeda/issues/1050">#1050</a></li>
</ul>
<h2>v2.28.0</h2>
<h1><a
href="https://github.com/remeda/remeda/compare/v2.27.2...v2.28.0">2.28.0</a>
(2025-08-03)</h1>
<h3>Features</h3>
<ul>
<li><strong>defaultTo:</strong> introduce <code>defaultTo</code> (<a
href="https://redirect.github.com/remeda/remeda/issues/1159">#1159</a>)
(<a
href="92449ef03c">92449ef</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/1158">#1158</a></li>
</ul>
<h2>v2.27.2</h2>
<h2><a
href="https://github.com/remeda/remeda/compare/v2.27.1...v2.27.2">2.27.2</a>
(2025-08-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>const:</strong> prefer narrow typing for literals (<a
href="https://redirect.github.com/remeda/remeda/issues/1160">#1160</a>)
(<a
href="4c5bc73956">4c5bc73</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/823">#823</a></li>
</ul>
<h2>v2.27.1</h2>
<h2><a
href="https://github.com/remeda/remeda/compare/v2.27.0...v2.27.1">2.27.1</a>
(2025-08-01)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>prevent redundant type computation paths (<a
href="https://redirect.github.com/remeda/remeda/issues/1163">#1163</a>)
(<a
href="7c37e395db">7c37e39</a>)</li>
<li><strong>sample:</strong> revamp typing (<a
href="https://redirect.github.com/remeda/remeda/issues/1162">#1162</a>)
(<a
href="55e5c8c692">55e5c8c</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/323">#323</a></li>
</ul>
<h2>v2.27.0</h2>
<h1><a
href="https://github.com/remeda/remeda/compare/v2.26.1...v2.27.0">2.27.0</a>
(2025-07-28)</h1>
<h3>Features</h3>
<ul>
<li><strong>prop:</strong> allow deep paths (<a
href="https://redirect.github.com/remeda/remeda/issues/1158">#1158</a>)
(<a
href="cb7d61194e">cb7d611</a>),
closes <a
href="https://redirect.github.com/remeda/remeda/issues/830">#830</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="729ead3f45"><code>729ead3</code></a>
feat(isFunction): stricter <code>Function</code> type (<a
href="https://redirect.github.com/remeda/remeda/issues/1161">#1161</a>)</li>
<li><a
href="56913804ce"><code>5691380</code></a>
feat: migrate build from tsup to tsdown (<a
href="https://redirect.github.com/remeda/remeda/issues/1172">#1172</a>)</li>
<li><a
href="e8706536af"><code>e870653</code></a>
chore: manual version bumps (<a
href="https://redirect.github.com/remeda/remeda/issues/1173">#1173</a>)</li>
<li><a
href="6bd6f984b4"><code>6bd6f98</code></a>
chore(deps-dev): bump eslint-plugin-jsdoc from 51.3.3 to 52.0.2 (<a
href="https://redirect.github.com/remeda/remeda/issues/1170">#1170</a>)</li>
<li><a
href="92449ef03c"><code>92449ef</code></a>
feat(defaultTo): introduce <code>defaultTo</code> (<a
href="https://redirect.github.com/remeda/remeda/issues/1159">#1159</a>)</li>
<li><a
href="20293262df"><code>2029326</code></a>
chore(deps-dev): bump eslint-plugin-unicorn from 59.0.1 to 60.0.0 (<a
href="https://redirect.github.com/remeda/remeda/issues/1169">#1169</a>)</li>
<li><a
href="4c5bc73956"><code>4c5bc73</code></a>
fix(const): prefer narrow typing for literals (<a
href="https://redirect.github.com/remeda/remeda/issues/1160">#1160</a>)</li>
<li><a
href="7c37e395db"><code>7c37e39</code></a>
fix: prevent redundant type computation paths (<a
href="https://redirect.github.com/remeda/remeda/issues/1163">#1163</a>)</li>
<li><a
href="55e5c8c692"><code>55e5c8c</code></a>
fix(sample): revamp typing (<a
href="https://redirect.github.com/remeda/remeda/issues/1162">#1162</a>)</li>
<li><a
href="e4559240e2"><code>e455924</code></a>
chore(deps): bump the minor group with 9 updates (<a
href="https://redirect.github.com/remeda/remeda/issues/1168">#1168</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/remeda/remeda/compare/v2.26.1...v2.30.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=remeda&package-manager=npm_and_yarn&previous-version=2.26.1&new-version=2.30.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-25 17:32:41 +02:00
dependabot[bot]
ade0766e28
chore(github-deps): bump actions/setup-node from 4.1.0 to 4.4.0 (#3246)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
4.1.0 to 4.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v4.4.0</h2>
<h2>What's Changed</h2>
<h3>Bug fixes:</h3>
<ul>
<li>Make eslint-compact matcher compatible with Stylelint by <a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li>Add support for indented eslint output by <a
href="https://github.com/fregante"><code>@​fregante</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
</ul>
<h3>Enhancement:</h3>
<ul>
<li>Support private mirrors by <a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<h3>Dependency update:</h3>
<ul>
<li>Upgrade <code>@​action/cache</code> from 4.0.2 to 4.0.3 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1262">actions/setup-node#1262</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li><a href="https://github.com/fregante"><code>@​fregante</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
<li><a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.4.0">https://github.com/actions/setup-node/compare/v4...v4.4.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<h3>Dependency updates</h3>
<ul>
<li>Upgrade <code>@​actions/glob</code> from 0.4.0 to 0.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1200">actions/setup-node#1200</a></li>
<li>Upgrade <code>@​action/cache</code> from 4.0.0 to 4.0.2 by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li>
<li>Upgrade <code>@​vercel/ncc</code> from 0.38.1 to 0.38.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1203">actions/setup-node#1203</a></li>
<li>Upgrade <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1220">actions/setup-node#1220</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.3.0">https://github.com/actions/setup-node/compare/v4...v4.3.0</a></p>
<h2>v4.2.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Enhance workflows and upgrade publish-actions from 0.2.2 to 0.3.0 by
<a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1174">actions/setup-node#1174</a></li>
<li>Add recommended permissions section to readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li>
<li>Configure Dependabot settings by <a
href="https://github.com/HarithaVattikuti"><code>@​HarithaVattikuti</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1192">actions/setup-node#1192</a></li>
<li>Upgrade <code>@actions/cache</code> to <code>^4.0.0</code> by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1191">actions/setup-node#1191</a></li>
<li>Upgrade pnpm/action-setup from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1194">actions/setup-node#1194</a></li>
<li>Upgrade actions/publish-immutable-action from 0.0.3 to 0.0.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1195">actions/setup-node#1195</a></li>
<li>Upgrade semver from 7.6.0 to 7.6.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1196">actions/setup-node#1196</a></li>
<li>Upgrade <code>@​types/jest</code> from 29.5.12 to 29.5.14 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1201">actions/setup-node#1201</a></li>
<li>Upgrade undici from 5.28.4 to 5.28.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1205">actions/setup-node#1205</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.2.0">https://github.com/actions/setup-node/compare/v4...v4.2.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="49933ea528"><code>49933ea</code></a>
Bump <code>@​action/cache</code> from 4.0.2 to 4.0.3 (<a
href="https://redirect.github.com/actions/setup-node/issues/1262">#1262</a>)</li>
<li><a
href="e3ce749e20"><code>e3ce749</code></a>
feat: support private mirrors (<a
href="https://redirect.github.com/actions/setup-node/issues/1240">#1240</a>)</li>
<li><a
href="40337cb8f7"><code>40337cb</code></a>
Add support for indented eslint output (<a
href="https://redirect.github.com/actions/setup-node/issues/1245">#1245</a>)</li>
<li><a
href="1ccdddc9b8"><code>1ccdddc</code></a>
Make eslint-compact matcher compatible with Stylelint (<a
href="https://redirect.github.com/actions/setup-node/issues/98">#98</a>)</li>
<li><a
href="cdca7365b2"><code>cdca736</code></a>
Bump <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1220">#1220</a>)</li>
<li><a
href="22c0e7494f"><code>22c0e74</code></a>
Bump <code>@​vercel/ncc</code> from 0.38.1 to 0.38.3 (<a
href="https://redirect.github.com/actions/setup-node/issues/1203">#1203</a>)</li>
<li><a
href="a7c2d9473e"><code>a7c2d94</code></a>
actions/cache upgrade (<a
href="https://redirect.github.com/actions/setup-node/issues/1251">#1251</a>)</li>
<li><a
href="802632921f"><code>8026329</code></a>
Bump <code>@​actions/glob</code> from 0.4.0 to 0.5.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1200">#1200</a>)</li>
<li><a
href="1d0ff469b7"><code>1d0ff46</code></a>
Bump undici from 5.28.4 to 5.28.5 (<a
href="https://redirect.github.com/actions/setup-node/issues/1205">#1205</a>)</li>
<li><a
href="574f09a9fa"><code>574f09a</code></a>
Bump <code>@​types/jest</code> from 29.5.12 to 29.5.14 (<a
href="https://redirect.github.com/actions/setup-node/issues/1201">#1201</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/setup-node/compare/v4.1.0...49933ea5288caeca8642d1e84afbd3f7d6820020">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4.1.0&new-version=4.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-25 17:32:13 +02:00
Matthew Farrellee
cffc4edf47
feat: Add optional idempotency support to batches API (#3171)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 0s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Failing after 5s
Test Llama Stack Build / build (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
Implements optional idempotency for batch creation using `idem_tok`
parameter:

* **Core idempotency**: Same token + parameters returns existing batch
* **Conflict detection**: Same token + different parameters raises HTTP
409 ConflictError
* **Metadata order independence**: Different key ordering doesn't affect
idempotency

**API changes:**
- Add optional `idem_tok` parameter to `create_batch()` method
- Enhanced API documentation with idempotency extensions

**Implementation:**
- Reference provider supports idempotent batch creation
- ConflictError for proper HTTP 409 status code mapping
- Comprehensive parameter validation

**Testing:**
- Unit tests: focused tests covering core scenarios with parametrized
conflict detection
- Integration tests: tests validating real OpenAI client behavior

This enables client-side retry safety and prevents duplicate batch
creation when using the same idempotency token, following REST API

closes #3144
2025-08-22 15:50:40 -07:00
Ashwin Bharambe
7519b73fcc
feat(distro): fork off a starter-gpu distribution (#3240)
The starter distribution added post-training which added torch
dependencies which pulls in all the nvidia CUDA libraries. This made our
starter container very big. We have worked hard to keep the starter
container small so it serves its purpose as a starter. This PR tries to
get it back to its size by forking off duplicate "-gpu" providers for
post-training. These forked providers are then used for a new
`starter-gpu` distribution which can pull in all dependencies.
2025-08-22 15:47:15 -07:00
Charlie Doern
3b9278f254
feat: implement query_metrics (#3074)
# What does this PR do?

query_metrics currently has no implementation, meaning once a metric is
emitted there is no way in llama stack to query it from the store.

implement query_metrics for the meta_reference provider which follows a
similar style to `query_traces`, using the trace_store to format an SQL
query and execute it

in this case the parameters for the query are `metric.METRIC_NAME,
start_time, and end_time` and any other matchers if they are provided.

this required client side changes since the client had no
`query_metrics` or any associated resources, so any tests here will fail
but I will provide manual execution logs for the new tests I am adding

order the metrics by timestamp.

Additionally add `unit` to the `MetricDataPoint` class since this adds
much more context to the metric being queried.


depends on
https://github.com/llamastack/llama-stack-client-python/pull/260

## Test Plan

```
import time
import uuid


def create_http_client():
    from llama_stack_client import LlamaStackClient

    return LlamaStackClient(base_url="http://localhost:8321")


client = create_http_client()

response = client.telemetry.query_metrics(metric_name="total_tokens", start_time=0)
print(response)
```

```
╰─ python3.12 ~/telemetry.py
INFO:httpx:HTTP Request: POST http://localhost:8322/v1/telemetry/metrics/total_tokens "HTTP/1.1 200 OK"
[TelemetryQueryMetricsResponse(data=None, metric='total_tokens', labels=[], values=[{'timestamp': 1753999514, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999816, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999881, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999956, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000200, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000419, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000714, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000876, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000908, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754001309, 'value': 584.0, 'unit': 'tokens'}, {'timestamp': 1754001311, 'value': 138.0, 'unit': 'tokens'}, {'timestamp': 1754001316, 'value': 349.0, 'unit': 'tokens'}, {'timestamp': 1754001318, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001320, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001341, 'value': 923.0, 'unit': 'tokens'}, {'timestamp': 1754001350, 'value': 354.0, 'unit': 'tokens'}, {'timestamp': 1754001462, 'value': 417.0, 'unit': 'tokens'}, {'timestamp': 1754001464, 'value': 158.0, 'unit': 'tokens'}, {'timestamp': 1754001475, 'value': 697.0, 'unit': 'tokens'}, {'timestamp': 1754001477, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001479, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001489, 'value': 298.0, 'unit': 'tokens'}, {'timestamp': 1754001541, 'value': 615.0, 'unit': 'tokens'}, {'timestamp': 1754001543, 'value': 119.0, 'unit': 'tokens'}, {'timestamp': 1754001548, 'value': 310.0, 'unit': 'tokens'}, {'timestamp': 1754001549, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001551, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001568, 'value': 714.0, 'unit': 'tokens'}, {'timestamp': 1754001800, 'value': 437.0, 'unit': 'tokens'}, {'timestamp': 1754001802, 'value': 200.0, 'unit': 'tokens'}, {'timestamp': 1754001806, 'value': 262.0, 'unit': 'tokens'}, {'timestamp': 1754001808, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001810, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001816, 'value': 82.0, 'unit': 'tokens'}, {'timestamp': 1754001923, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754001929, 'value': 391.0, 'unit': 'tokens'}, {'timestamp': 1754001939, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754001941, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001942, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001952, 'value': 252.0, 'unit': 'tokens'}, {'timestamp': 1754002053, 'value': 251.0, 'unit': 'tokens'}, {'timestamp': 1754002059, 'value': 375.0, 'unit': 'tokens'}, {'timestamp': 1754002062, 'value': 244.0, 'unit': 'tokens'}, {'timestamp': 1754002064, 'value': 111.0, 'unit': 'tokens'}, {'timestamp': 1754002065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002083, 'value': 719.0, 'unit': 'tokens'}, {'timestamp': 1754002302, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754002306, 'value': 218.0, 'unit': 'tokens'}, {'timestamp': 1754002308, 'value': 198.0, 'unit': 'tokens'}, {'timestamp': 1754002309, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754002311, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002324, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754003169, 'value': 499.0, 'unit': 'tokens'}, {'timestamp': 1754003171, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003173, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003185, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003448, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003453, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003589, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003609, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754003614, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 303.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 51.0, 'unit': 'tokens'}, {'timestamp': 1754003713, 'value': 426.0, 'unit': 'tokens'}, {'timestamp': 1754003714, 'value': 70.0, 'unit': 'tokens'}, {'timestamp': 1754003715, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003724, 'value': 225.0, 'unit': 'tokens'}, {'timestamp': 1754004226, 'value': 516.0, 'unit': 'tokens'}, {'timestamp': 1754004228, 'value': 127.0, 'unit': 'tokens'}, {'timestamp': 1754004232, 'value': 281.0, 'unit': 'tokens'}, {'timestamp': 1754004234, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004236, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004244, 'value': 206.0, 'unit': 'tokens'}, {'timestamp': 1754004683, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004690, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 124.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 65.0, 'unit': 'tokens'}, {'timestamp': 1754004694, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004703, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754004743, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004749, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754005566, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754006101, 'value': 159.0, 'unit': 'tokens'}, {'timestamp': 1754006105, 'value': 272.0, 'unit': 'tokens'}, {'timestamp': 1754006109, 'value': 308.0, 'unit': 'tokens'}, {'timestamp': 1754006110, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754006112, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754006130, 'value': 705.0, 'unit': 'tokens'}, {'timestamp': 1754051825, 'value': 454.0, 'unit': 'tokens'}, {'timestamp': 1754051827, 'value': 152.0, 'unit': 'tokens'}, {'timestamp': 1754051834, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754051835, 'value': 55.0, 'unit': 'tokens'}, {'timestamp': 1754051837, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754051845, 'value': 102.0, 'unit': 'tokens'}, {'timestamp': 1754099929, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754510050, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754510052, 'value': 160.0, 'unit': 'tokens'}, {'timestamp': 1754510064, 'value': 725.0, 'unit': 'tokens'}, {'timestamp': 1754510065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510067, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510083, 'value': 535.0, 'unit': 'tokens'}, {'timestamp': 1754596582, 'value': 36.0, 'unit': 'tokens'}])]
```

adding tests for each currently documented metric in llama stack using
this new function. attached is also some manual testing


integrations tests passing locally with replay mode and the linked
client changes:
<img width="1907" height="529" alt="Screenshot 2025-08-08 at 2 49 14 PM"
src="https://github.com/user-attachments/assets/d482ab06-dcff-4f0c-a1f1-f870670ee9bc"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-22 14:19:24 -07:00
Matthew Farrellee
3d119a86d4
chore: indicate to mypy that InferenceProvider.batch_completion/batch_chat_completion is concrete (#3239)
# What does this PR do?

closes https://github.com/llamastack/llama-stack/issues/3236

mypy considered our default implementations (raise NotImplementedError)
to be trivial. the result was we implemented the same stubs in
providers.

this change puts enough into the default impls so mypy considers them
non-trivial. this allows us to remove the duplicate implementations.
2025-08-22 14:17:30 -07:00
Matthew Farrellee
2ee898cc4c
chore: indicate to mypy that InferenceProvider.rerank is concrete (#3238) 2025-08-22 12:02:13 -07:00
grs
da73f1a180
fix: ensure assistant message is followed by tool call message as expected by openai (#3224)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s
Test Llama Stack Build / generate-matrix (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 23s
Test Llama Stack Build / build (push) Has been skipped
Update ReadTheDocs / update-readthedocs (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s
# What does this PR do?

As described in #3134 a langchain example works against openai's
responses impl, but not against llama stack's. This turned out to be due
to the order of the inputs. The langchain example has the two function
call outputs first, followed by each call result in turn. This seems to
be valid as it is accepted by openai's impl. However in llama stack,
these inputs are converted to chat completion inputs and the resulting
order for that api is not accpeted by openai.

This PR fixes the issue by ensuring that the converted chat completions
inputs are in the expected order.

Closes #3134 

## Test Plan
Added unit and integration tests. Verified this fixes original issue as
reported.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-08-22 10:42:03 -07:00
Francisco Arceo
b0797e4982
chore: Add UI linter back (#3230)
# What does this PR do?

1. Adds `scripts/run-ui-linter.sh`
- Light script that checks whether `node_modules`,`eslint`, and
`prettier` exist before running linter
- When I introduced [the linter for the
UI](https://github.com/llamastack/llama-stack/pull/3156/files#diff-63a9c44a44acf85fea213a857769990937107cf072831e1a26808cfde9d096b9)
it forced the UI linter on all users, the small `node_modules` check
means that only users that have installed the UI locally (since
`node_modules` is in the gitignore) will actually end up having this
run. Additionally this does not do any install and just runs the
existing linter/prettier as requested by @mattf
2. Updates `.github/workflows/pre-commit.yml` to run CI again
- When I introduced the UI linter in the CI [in this
PR](https://github.com/llamastack/llama-stack/pull/3191) a failure
occurred because dependabot needed to be updated to also bump the
`package-lock.json` which was done [in this
PR](https://github.com/llamastack/llama-stack/pull/3212). All of this to
say, we shouldn't observe failures from dependabot again.
3. Updates `.pre-commit-config.yaml`
    - Calls `scripts/run-ui-linter.sh`

## AI Assistance Notice
I used Copilot minimally. 

## Test Plan
As
[requested](https://github.com/llamastack/llama-stack/pull/3207#discussion_r2288004872)
by @mattf I ran this after removing all of my `node_modules` and the
linter passed.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-22 10:54:36 -04:00
Matthew Farrellee
f520e244d9
feat: Add S3 Files Provider (#3202)
Implements a complete S3-based file storage provider for Llama Stack
with:
    
    Core Implementation:
    - S3FilesImpl class with full OpenAI Files API compatibility
    - Support for file upload, download, listing, deletion operations
    - Sqlite-based metadata storage for fast queries and API compliance
    - Configurable S3 endpoints (AWS, MinIO, LocalStack support)
    
    Key Features:
    - Automatic S3 bucket creation and management
    - Metadata persistence
    - Proper error handling for S3 connectivity and permissions
    
    Dependencies:
    - Adds boto3 for AWS S3 integration
    - Adds moto[s3] for testing infrastructure
    
    Testing:
    
Unit: `./scripts/unit-tests.sh tests/unit/files
tests/unit/providers/files`
    
     Integration:
    
Start MinIO: `podman run --rm -it -p 9000:9000 minio/minio server /data`
    
Start stack w/ S3 provider: `S3_ENDPOINT_URL=http://localhost:9000
AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin
S3_BUCKET_NAME=llama-stack-files uv run llama stack build --image-type
venv --providers files=remote::s3 --run`
    
Run integration tests: `./scripts/integration-tests.sh --stack-config
http://localhost:8321 --provider ollama --test-subdirs files`
2025-08-22 10:38:59 -04:00
ehhuang
c5e2e269e2
feat(api): introduce /rerank (#2940)
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Failing after 7s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Test Llama Stack Build / generate-matrix (push) Failing after 21s
Test Llama Stack Build / build (push) Has been skipped
UI Tests / ui-tests (22) (push) Failing after 21s
# What does this PR do?
Context: https://github.com/meta-llama/llama-stack/issues/2937

The API design is inspired by existing offerings, but not exactly the
same:
* `top_n` as the parameter to control number of results, instead of
`top_k`, since `n` is conventional to control number
* `truncation` bool instead of `max_token_per_doc`, since we should just
handle the truncation automatically depending on model capability,
instead of user setting the context length manually.
* `data` field in the response, to be consistent with other OpenAI APIs
(though they don't have a rerank API). Also, it is one less name to
learn in the API.

## Test Plan

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-21 18:23:16 -07:00
Francisco Arceo
d78ac434bd
feat(UI): Adding a session manager (#3203)
# What does this PR do?

- Introduces the Agent Session creation for the Playground and allows
users to set tools
- note tools are actually not usable yet and this is marked explicitly
- this also caches sessions locally for faster loading on the UI and
deletes them appropriately
   - allows users to easily create new sessions as well
- Moved Model Configuration settings and "System Message" / Prompt to
the left component
- Added new logo and favicon
- Added new typing animation when LLM is generating

### Create New Session
<img width="1916" height="1393" alt="Screenshot 2025-08-21 at 4 18
08 PM"
src="https://github.com/user-attachments/assets/52c70ae3-a33e-4338-8522-8184c692c320"
/>


### List of Sessions
<img width="1920" height="1391" alt="Screenshot 2025-08-21 at 4 18
56 PM"
src="https://github.com/user-attachments/assets/ed78c3c6-08ec-486c-8bad-9b7382c11360"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Unit tests added

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-21 21:11:03 -04:00
Mustafa Elbehery
c3b2b06974
refactor(logging): rename llama_stack logger categories (#3065)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR renames categories of llama_stack loggers.

This PR aligns logging categories as per the package name, as well as
reviews from initial
https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up
to #3061.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Replaces https://github.com/meta-llama/llama-stack/pull/2868
Part of https://github.com/meta-llama/llama-stack/issues/2865

cc @leseb @rhuss

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-21 17:31:04 -07:00
Ashwin Bharambe
4434fcc2c3 fix(ci): small fixes to the provider build workflow 2025-08-21 16:37:11 -07:00
Jiayi Ni
deffaa9e4e
fix: fix the error type in embedding test case (#3197)
# What does this PR do?
Currently the embedding integration test cases fail due to a
misalignment in the error type. This PR fixes the embedding integration
test by fixing the error type.

## Test Plan

```
pytest -s -v tests/integration/inference/test_embedding.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-08-21 16:19:51 -07:00
Ashwin Bharambe
864610ca5c fix(ci): make all CI workflows have the correct concurrency defn 2025-08-21 16:05:25 -07:00
Jiayi Ni
b72169ca47
docs: update the docs for NVIDIA Inference provider (#3227)
# What does this PR do?
- Documentation update and fix for the NVIDIA Inference provider. 
- Update the `run_moderation` for safety API with a
`NotImplementedError` placeholder. Otherwise initialization NVIDIA
inference client will raise an error.

## Test Plan
N/A
2025-08-21 15:59:39 -07:00
Mustafa Elbehery
1790fc0f25
feat: Remove initialize() Method from LlamaStackAsLibrary (#2979)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR removes `init()` from `LlamaStackAsLibrary` 

Currently client.initialize() had to be invoked by user.
To improve dev experience and to avoid runtime errors, this PR init
LlamaStackAsLibrary implicitly upon using the client.
It prevents also multiple init of the same client, while maintaining
backward ccompatibility.

This PR does the following 

- Automatic Initialization: Constructor calls initialize_impl()
automatically.
-  Client is fully initialized after __init__ completes.
- Prevents consecutive initialization after the client has been
successfully initialized.
-  initialize() method still exists but is now a no-op.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
fixes https://github.com/meta-llama/llama-stack/issues/2946

---------

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-21 15:59:04 -07:00
Sumanth Kamenani
ac25e35124
feat: Add CORS configuration support for server (#3201)
Adds flexible CORS (Cross-Origin Resource Sharing) configuration support
to the FastAPI
  server with both local development and explicit configuration modes:

- **Local development mode**: `cors: true` enables localhost-only access
with regex
  pattern `https?://localhost:\d+`
- **Explicit configuration mode**: Specific origins configuration with
credential support
   and validation
   
- Prevents insecure combinations (wildcards with credentials)
  
- FastAPI CORSMiddleware integration via `model_dump()`

Addresses the need for configurable CORS policies to support web
frontends and
  cross-origin API access while maintaining security.

  Closes #2119

  ## Test Plan

  1.  Ran Unit Tests.

2. Manual tests: FastAPI middleware integration with actual HTTP
requests
    - Local development mode localhost access validation
    - Explicit configuration mode origins validation
    - Preflight OPTIONS request handling

Some screenshots of manual tests.
<img width="1920" height="927" alt="image"
src="https://github.com/user-attachments/assets/79322338-40c7-45c9-a9ea-e3e8d8e2f849"
/>

<img width="1911" height="1037" alt="image"
src="https://github.com/user-attachments/assets/1683524e-b0c9-48c9-a0a5-782e949cde01"
/>

cc: @leseb @rhuss @franciscojavierarceo
2025-08-21 14:23:27 -07:00
dependabot[bot]
58e164b8bc
chore(github-deps): bump astral-sh/setup-uv from 6.4.3 to 6.5.0 (#3179)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 19s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Test Llama Stack Build / build-single-provider (push) Failing after 24s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 38s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
Python Package Build Test / build (3.12) (push) Failing after 38s
Pre-commit / pre-commit (push) Failing after 43s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s
Python Package Build Test / build (3.13) (push) Failing after 41s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Test Llama Stack Build / generate-matrix (push) Failing after 45s
UI Tests / ui-tests (22) (push) Failing after 42s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 46s
Update ReadTheDocs / update-readthedocs (push) Failing after 42s
Test Llama Stack Build / build (push) Has been skipped
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.4.3 to 6.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.5.0 🌈 Better error messages, bug fixes and copilot agent
settings</h2>
<h2>Changes</h2>
<p>This release brings better error messages in case the GitHub API is
impacted, fixes a few bugs and allows to disable <a
href="https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md">problem
matchers</a> for better use in Copilot Agent workspaces.</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Improve error messages on GitHub API errors <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/518">#518</a>)</li>
<li>Ignore backslashes and whitespace in requirements <a
href="https://github.com/axm2"><code>@​axm2</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/501">#501</a>)</li>
</ul>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add input add-problem-matchers <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/517">#517</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known versions for 0.8.9 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/512">#512</a>)</li>
<li>chore: update known versions for 0.8.6-0.8.8 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/510">#510</a>)</li>
<li>chore: update known versions for 0.8.5 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/509">#509</a>)</li>
<li>chore: update known versions for 0.8.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/505">#505</a>)</li>
<li>chore: update known versions for 0.8.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/502">#502</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>add note on caching to read disable-cache-pruning <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/506">#506</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump actions/checkout from 4 to 5 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/514">#514</a>)</li>
<li>bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/516">#516</a>)</li>
<li>Bump biome to v2 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/515">#515</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d9e0f98d3f"><code>d9e0f98</code></a>
Improve error messages on GitHub API errors (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/518">#518</a>)</li>
<li><a
href="e5d42a2b46"><code>e5d42a2</code></a>
Add input add-problem-matchers (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/517">#517</a>)</li>
<li><a
href="d664c2a1d1"><code>d664c2a</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/514">#514</a>)</li>
<li><a
href="c35b8eac36"><code>c35b8ea</code></a>
bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/516">#516</a>)</li>
<li><a
href="4109b4033f"><code>4109b40</code></a>
Bump biome to v2 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/515">#515</a>)</li>
<li><a
href="1463845d3c"><code>1463845</code></a>
chore: update known versions for 0.8.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/512">#512</a>)</li>
<li><a
href="ad5ded2d63"><code>ad5ded2</code></a>
chore: update known versions for 0.8.6-0.8.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/510">#510</a>)</li>
<li><a
href="142240426d"><code>1422404</code></a>
chore: update known versions for 0.8.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/509">#509</a>)</li>
<li><a
href="632449003a"><code>6324490</code></a>
add note on caching to read disable-cache-pruning (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/506">#506</a>)</li>
<li><a
href="2a967c9b97"><code>2a967c9</code></a>
chore: update known versions for 0.8.4 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/505">#505</a>)</li>
<li>Additional commits viewable in <a
href="e92bafb625...d9e0f98d3f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.4.3&new-version=6.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:51:53 -07:00
dependabot[bot]
6a719716f2
chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 (#3178)
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2
to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
<li>Prepare release v4.3.0 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/motss"><code>@​motss</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li><a href="https://github.com/mouismail"><code>@​mouismail</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
<li>README: Suggest <code>user.email</code> to be
<code>41898282+github-actions[bot]@users.noreply.github.com</code> by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1707">actions/checkout#1707</a></li>
</ul>
<h2>v4.1.4</h2>
<ul>
<li>Disable <code>extensions.worktreeConfig</code> when disabling
<code>sparse-checkout</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1692">actions/checkout#1692</a></li>
<li>Add dependabot config by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1688">actions/checkout#1688</a></li>
<li>Bump the minor-actions-dependencies group with 2 updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1693">actions/checkout#1693</a></li>
<li>Bump word-wrap from 1.2.3 to 1.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1643">actions/checkout#1643</a></li>
</ul>
<h2>v4.1.3</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a
href="9f265659d3"><code>9f26565</code></a>
Update actions checkout to use node 24 (<a
href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li><a
href="08eba0b27e"><code>08eba0b</code></a>
Prepare release v4.3.0 (<a
href="https://redirect.github.com/actions/checkout/issues/2237">#2237</a>)</li>
<li><a
href="631c7dc4f8"><code>631c7dc</code></a>
Update package dependencies (<a
href="https://redirect.github.com/actions/checkout/issues/2236">#2236</a>)</li>
<li><a
href="8edcb1bdb4"><code>8edcb1b</code></a>
Update CODEOWNERS for actions (<a
href="https://redirect.github.com/actions/checkout/issues/2224">#2224</a>)</li>
<li><a
href="09d2acae67"><code>09d2aca</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/2194">#2194</a>)</li>
<li><a
href="85e6279cec"><code>85e6279</code></a>
Adjust positioning of user email note and permissions heading (<a
href="https://redirect.github.com/actions/checkout/issues/2044">#2044</a>)</li>
<li><a
href="009b9ae9e4"><code>009b9ae</code></a>
Documentation update - add recommended permissions to Readme (<a
href="https://redirect.github.com/actions/checkout/issues/2043">#2043</a>)</li>
<li><a
href="cbb722410c"><code>cbb7224</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1977">#1977</a>)</li>
<li><a
href="3b9b8c884f"><code>3b9b8c8</code></a>
docs: update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1971">#1971</a>)</li>
<li>See full diff in <a
href="11bd71901b...08c6903cd8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4.2.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:51:40 -07:00
dependabot[bot]
bd1a794add
chore(python-deps): bump llama-api-client from 0.1.2 to 0.2.0 (#3173)
Bumps [llama-api-client](https://github.com/meta-llama/llama-api-python)
from 0.1.2 to 0.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/meta-llama/llama-api-python/releases">llama-api-client's
releases</a>.</em></p>
<blockquote>
<h2>v0.2.0</h2>
<h2>0.2.0 (2025-08-07)</h2>
<p>Full Changelog: <a
href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">v0.1.2...v0.2.0</a></p>
<h3>Features</h3>
<ul>
<li>clean up environment call outs (<a
href="4afbd01ed7">4afbd01</a>)</li>
<li><strong>client:</strong> support file upload requests (<a
href="ec42e80b62">ec42e80</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>api:</strong> remove chat completion request model (<a
href="94c4e9fd50">94c4e9f</a>)</li>
<li><strong>client:</strong> don't send Content-Type header on GET
requests (<a
href="efec88aa51">efec88a</a>)</li>
<li><strong>parsing:</strong> correctly handle nested discriminated
unions (<a
href="b6276863be">b627686</a>)</li>
<li><strong>parsing:</strong> ignore empty metadata (<a
href="d6ee85101e">d6ee851</a>)</li>
<li><strong>parsing:</strong> parse extra field types (<a
href="f03ca22860">f03ca22</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>add examples (<a
href="abfa065721">abfa065</a>)</li>
<li><strong>internal:</strong> bump pinned h11 dep (<a
href="d40e1b1d73">d40e1b1</a>)</li>
<li><strong>internal:</strong> fix ruff target version (<a
href="c900ebc528">c900ebc</a>)</li>
<li><strong>package:</strong> mark python 3.13 as supported (<a
href="ef5bc36693">ef5bc36</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="e3103801d6">e310380</a>)</li>
<li><strong>readme:</strong> fix version rendering on pypi (<a
href="786f9fbdb7">786f9fb</a>)</li>
<li>sync repo (<a
href="7e697f6550">7e697f6</a>)</li>
<li>update SDK settings (<a
href="de22c0ece7">de22c0e</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>code of conduct (<a
href="efe1af28fb">efe1af2</a>)</li>
<li>readme and license (<a
href="d53eafd104">d53eafd</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/meta-llama/llama-api-python/blob/main/CHANGELOG.md">llama-api-client's
changelog</a>.</em></p>
<blockquote>
<h2>0.2.0 (2025-08-07)</h2>
<p>Full Changelog: <a
href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">v0.1.2...v0.2.0</a></p>
<h3>Features</h3>
<ul>
<li>clean up environment call outs (<a
href="4afbd01ed7">4afbd01</a>)</li>
<li><strong>client:</strong> support file upload requests (<a
href="ec42e80b62">ec42e80</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>api:</strong> remove chat completion request model (<a
href="94c4e9fd50">94c4e9f</a>)</li>
<li><strong>client:</strong> don't send Content-Type header on GET
requests (<a
href="efec88aa51">efec88a</a>)</li>
<li><strong>parsing:</strong> correctly handle nested discriminated
unions (<a
href="b6276863be">b627686</a>)</li>
<li><strong>parsing:</strong> ignore empty metadata (<a
href="d6ee85101e">d6ee851</a>)</li>
<li><strong>parsing:</strong> parse extra field types (<a
href="f03ca22860">f03ca22</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>add examples (<a
href="abfa065721">abfa065</a>)</li>
<li><strong>internal:</strong> bump pinned h11 dep (<a
href="d40e1b1d73">d40e1b1</a>)</li>
<li><strong>internal:</strong> fix ruff target version (<a
href="c900ebc528">c900ebc</a>)</li>
<li><strong>package:</strong> mark python 3.13 as supported (<a
href="ef5bc36693">ef5bc36</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="e3103801d6">e310380</a>)</li>
<li><strong>readme:</strong> fix version rendering on pypi (<a
href="786f9fbdb7">786f9fb</a>)</li>
<li>sync repo (<a
href="7e697f6550">7e697f6</a>)</li>
<li>update SDK settings (<a
href="de22c0ece7">de22c0e</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>code of conduct (<a
href="efe1af28fb">efe1af2</a>)</li>
<li>readme and license (<a
href="d53eafd104">d53eafd</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7a8c5838af"><code>7a8c583</code></a>
release: 0.2.0</li>
<li><a
href="4f1a04e5c1"><code>4f1a04e</code></a>
chore(internal): fix ruff target version</li>
<li><a
href="06485e995a"><code>06485e9</code></a>
feat(client): support file upload requests</li>
<li><a
href="131b474ad1"><code>131b474</code></a>
chore(project): add settings file for vscode</li>
<li><a
href="ef4cee6d8b"><code>ef4cee6</code></a>
fix(parsing): parse extra field types</li>
<li><a
href="fcbc699718"><code>fcbc699</code></a>
fix(parsing): ignore empty metadata</li>
<li><a
href="b6656cd0b8"><code>b6656cd</code></a>
fix(api): remove chat completion request model</li>
<li><a
href="0deda5590c"><code>0deda55</code></a>
feat: clean up environment call outs</li>
<li><a
href="ecf91026ac"><code>ecf9102</code></a>
fix(client): don't send Content-Type header on GET requests</li>
<li><a
href="0ac6285cbe"><code>0ac6285</code></a>
chore(readme): fix version rendering on pypi</li>
<li>Additional commits viewable in <a
href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=llama-api-client&package-manager=uv&previous-version=0.1.2&new-version=0.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:50:34 -07:00
dependabot[bot]
886af85e0c
chore(github-deps): bump amannn/action-semantic-pull-request from 5.5.3 to 6.1.0 (#3215)
Bumps
[amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request)
from 5.5.3 to 6.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/amannn/action-semantic-pull-request/releases">amannn/action-semantic-pull-request's
releases</a>.</em></p>
<blockquote>
<h2>v6.1.0</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a>
(2025-08-19)</h2>
<h3>Features</h3>
<ul>
<li>Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)
(<a
href="a30288bf13">a30288b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)
(<a
href="afa4edb1c4">afa4edb</a>)</li>
</ul>
<h2>v6.0.1</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a>
(2025-08-13)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)
(<a
href="58e4ab40f5">58e4ab4</a>)</li>
</ul>
<h2>v6.0.0</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a>
(2025-08-13)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)
(<a
href="bc0c9a79ab">bc0c9a7</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md">amannn/action-semantic-pull-request's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a>
(2025-08-19)</h2>
<h3>Features</h3>
<ul>
<li>Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)
(<a
href="a30288bf13">a30288b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)
(<a
href="afa4edb1c4">afa4edb</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a>
(2025-08-13)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)
(<a
href="58e4ab40f5">58e4ab4</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a>
(2025-08-13)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)
(<a
href="bc0c9a79ab">bc0c9a7</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.2...v5.5.3">5.5.3</a>
(2024-06-28)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump <code>braces</code> dependency (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/269">#269</a>.
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="2d952a1bf9">2d952a1</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.1...v5.5.2">5.5.2</a>
(2024-04-24)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump tar from 6.1.11 to 6.2.1 (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/262">#262</a>
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="9a90d5a5ac">9a90d5a</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.0...v5.5.1">5.5.1</a>
(2024-04-24)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump ip from 2.0.0 to 2.0.1 (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/263">#263</a>
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="5e7e9acca3">5e7e9ac</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.4.0...v5.5.0">5.5.0</a>
(2024-04-23)</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7f33ba7922"><code>7f33ba7</code></a>
chore: Release 6.1.0 [skip ci]</li>
<li><a
href="afa4edb1c4"><code>afa4edb</code></a>
fix: Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)</li>
<li><a
href="a30288bf13"><code>a30288b</code></a>
feat: Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)</li>
<li><a
href="a46a7c8dc4"><code>a46a7c8</code></a>
build: Move Vitest to <code>devDependencies</code> (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/290">#290</a>)</li>
<li><a
href="fdd4d3ddf6"><code>fdd4d3d</code></a>
chore: Release 6.0.1 [skip ci]</li>
<li><a
href="58e4ab40f5"><code>58e4ab4</code></a>
fix: Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)</li>
<li><a
href="04a8d177d9"><code>04a8d17</code></a>
chore: Release 6.0.0 [skip ci]</li>
<li><a
href="bc0c9a79ab"><code>bc0c9a7</code></a>
feat!: Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
<li><a
href="631ffdc028"><code>631ffdc</code></a>
build(deps): bump the github-action-workflows group with 2 updates (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/286">#286</a>)</li>
<li><a
href="c1807ceb58"><code>c1807ce</code></a>
build: configure Dependabot (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/231">#231</a>)</li>
<li>Additional commits viewable in <a
href="0723387faa...7f33ba7922">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=amannn/action-semantic-pull-request&package-manager=github_actions&previous-version=5.5.3&new-version=6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:50:00 -07:00
dependabot[bot]
2fa189fe04
chore(github-deps): bump actions/setup-node from 4.1.0 to 4.4.0 (#3214)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
4.1.0 to 4.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v4.4.0</h2>
<h2>What's Changed</h2>
<h3>Bug fixes:</h3>
<ul>
<li>Make eslint-compact matcher compatible with Stylelint by <a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li>Add support for indented eslint output by <a
href="https://github.com/fregante"><code>@​fregante</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
</ul>
<h3>Enhancement:</h3>
<ul>
<li>Support private mirrors by <a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<h3>Dependency update:</h3>
<ul>
<li>Upgrade <code>@​action/cache</code> from 4.0.2 to 4.0.3 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1262">actions/setup-node#1262</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li><a href="https://github.com/fregante"><code>@​fregante</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
<li><a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.4.0">https://github.com/actions/setup-node/compare/v4...v4.4.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<h3>Dependency updates</h3>
<ul>
<li>Upgrade <code>@​actions/glob</code> from 0.4.0 to 0.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1200">actions/setup-node#1200</a></li>
<li>Upgrade <code>@​action/cache</code> from 4.0.0 to 4.0.2 by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li>
<li>Upgrade <code>@​vercel/ncc</code> from 0.38.1 to 0.38.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1203">actions/setup-node#1203</a></li>
<li>Upgrade <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1220">actions/setup-node#1220</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.3.0">https://github.com/actions/setup-node/compare/v4...v4.3.0</a></p>
<h2>v4.2.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Enhance workflows and upgrade publish-actions from 0.2.2 to 0.3.0 by
<a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1174">actions/setup-node#1174</a></li>
<li>Add recommended permissions section to readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li>
<li>Configure Dependabot settings by <a
href="https://github.com/HarithaVattikuti"><code>@​HarithaVattikuti</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1192">actions/setup-node#1192</a></li>
<li>Upgrade <code>@actions/cache</code> to <code>^4.0.0</code> by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1191">actions/setup-node#1191</a></li>
<li>Upgrade pnpm/action-setup from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1194">actions/setup-node#1194</a></li>
<li>Upgrade actions/publish-immutable-action from 0.0.3 to 0.0.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1195">actions/setup-node#1195</a></li>
<li>Upgrade semver from 7.6.0 to 7.6.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1196">actions/setup-node#1196</a></li>
<li>Upgrade <code>@​types/jest</code> from 29.5.12 to 29.5.14 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1201">actions/setup-node#1201</a></li>
<li>Upgrade undici from 5.28.4 to 5.28.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1205">actions/setup-node#1205</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.2.0">https://github.com/actions/setup-node/compare/v4...v4.2.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="49933ea528"><code>49933ea</code></a>
Bump <code>@​action/cache</code> from 4.0.2 to 4.0.3 (<a
href="https://redirect.github.com/actions/setup-node/issues/1262">#1262</a>)</li>
<li><a
href="e3ce749e20"><code>e3ce749</code></a>
feat: support private mirrors (<a
href="https://redirect.github.com/actions/setup-node/issues/1240">#1240</a>)</li>
<li><a
href="40337cb8f7"><code>40337cb</code></a>
Add support for indented eslint output (<a
href="https://redirect.github.com/actions/setup-node/issues/1245">#1245</a>)</li>
<li><a
href="1ccdddc9b8"><code>1ccdddc</code></a>
Make eslint-compact matcher compatible with Stylelint (<a
href="https://redirect.github.com/actions/setup-node/issues/98">#98</a>)</li>
<li><a
href="cdca7365b2"><code>cdca736</code></a>
Bump <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1220">#1220</a>)</li>
<li><a
href="22c0e7494f"><code>22c0e74</code></a>
Bump <code>@​vercel/ncc</code> from 0.38.1 to 0.38.3 (<a
href="https://redirect.github.com/actions/setup-node/issues/1203">#1203</a>)</li>
<li><a
href="a7c2d9473e"><code>a7c2d94</code></a>
actions/cache upgrade (<a
href="https://redirect.github.com/actions/setup-node/issues/1251">#1251</a>)</li>
<li><a
href="802632921f"><code>8026329</code></a>
Bump <code>@​actions/glob</code> from 0.4.0 to 0.5.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1200">#1200</a>)</li>
<li><a
href="1d0ff469b7"><code>1d0ff46</code></a>
Bump undici from 5.28.4 to 5.28.5 (<a
href="https://redirect.github.com/actions/setup-node/issues/1205">#1205</a>)</li>
<li><a
href="574f09a9fa"><code>574f09a</code></a>
Bump <code>@​types/jest</code> from 29.5.12 to 29.5.14 (<a
href="https://redirect.github.com/actions/setup-node/issues/1201">#1201</a>)</li>
<li>Additional commits viewable in <a
href="39370e3970...49933ea528">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4.1.0&new-version=4.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:49:43 -07:00
dependabot[bot]
2cc0051ae5
chore(ui-deps): bump typescript from 5.8.3 to 5.9.2 in /llama_stack/ui (#3216)
Bumps [typescript](https://github.com/microsoft/TypeScript) from 5.8.3
to 5.9.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/microsoft/TypeScript/releases">typescript's
releases</a>.</em></p>
<blockquote>
<h2>TypeScript 5.9</h2>
<p>For release notes, check out the <a
href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9/">release
announcement</a></p>
<ul>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.0 (Beta)</a>.</li>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.1%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.1 (RC)</a>.</li>
<li><em>No specific changes for TypeScript 5.9.2 (Stable)</em></li>
</ul>
<p>Downloads are available on:</p>
<ul>
<li><a href="https://www.npmjs.com/package/typescript">npm</a></li>
</ul>
<h2>TypeScript 5.9 RC</h2>
<p>For release notes, check out the <a
href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9-rc/">release
announcement</a></p>
<ul>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.0 (Beta)</a>.</li>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.1%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.1 (RC)</a>.</li>
</ul>
<p>Downloads are available on:</p>
<ul>
<li><a href="https://www.npmjs.com/package/typescript">npm</a></li>
</ul>
<h2>TypeScript 5.9 Beta</h2>
<p>For release notes, check out the <a
href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9-beta/">release
announcement</a>.</p>
<ul>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.0 (Beta)</a>.</li>
</ul>
<p>Downloads are available on:</p>
<ul>
<li><a href="https://www.npmjs.com/package/typescript">npm</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="be86783155"><code>be86783</code></a>
Give more specific errors for <code>verbatimModuleSyntax</code> (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62113">#62113</a>)</li>
<li><a
href="22ef57786f"><code>22ef577</code></a>
LEGO: Pull request from
lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250714...</li>
<li><a
href="d5a414cd1d"><code>d5a414c</code></a>
Don't use <code>noErrorTruncation</code> when printing types with
<code>maximumLength</code> set (#...</li>
<li><a
href="f14b5c8a2f"><code>f14b5c8</code></a>
Remove unused and confusing dom.iterable.d.ts file (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62037">#62037</a>)</li>
<li><a
href="2778e84ed8"><code>2778e84</code></a>
Restore AbortSignal.abort (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62086">#62086</a>)</li>
<li><a
href="65cb4bd2d5"><code>65cb4bd</code></a>
LEGO: Pull request from
lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250710...</li>
<li><a
href="9e20e032ef"><code>9e20e03</code></a>
Clear out checker-level stacks on pop (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62016">#62016</a>)</li>
<li><a
href="87740bc7fe"><code>87740bc</code></a>
Fix for Issue 61081 (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/61221">#61221</a>)</li>
<li><a
href="833a8d492c"><code>833a8d4</code></a>
Fix Symbol completion priority and cursor positioning (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/61945">#61945</a>)</li>
<li><a
href="0018c9ff12"><code>0018c9f</code></a>
LEGO: Pull request from
lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250702...</li>
<li>Additional commits viewable in <a
href="https://github.com/microsoft/TypeScript/compare/v5.8.3...v5.9.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=typescript&package-manager=npm_and_yarn&previous-version=5.8.3&new-version=5.9.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:49:28 -07:00
dependabot[bot]
bf3b201d61
chore(python-deps): bump chromadb from 1.0.16 to 1.0.20 (#3217)
Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.16 to
1.0.20.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/chroma-core/chroma/releases">chromadb's
releases</a>.</em></p>
<blockquote>
<h2>1.0.20</h2>
<p>Version: <code>1.0.20</code>
Git ref: <code>refs/tags/1.0.20</code>
Build Date: <code>2025-08-18T17:04</code>
PIP Package: <code>chroma-1.0.20.tar.gz</code>
Github Container Registry Image: <code>:1.0.20</code>
DockerHub Image: <code>:1.0.20</code></p>
<h2>What's Changed</h2>
<ul>
<li>[RELEASE] 1.0.20 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5303">chroma-core/chroma#5303</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/chroma-core/chroma/compare/1.0.19...1.0.20">https://github.com/chroma-core/chroma/compare/1.0.19...1.0.20</a></p>
<h2>1.0.18</h2>
<p>Version: <code>1.0.18</code>
Git ref: <code>refs/tags/1.0.18</code>
Build Date: <code>2025-08-18T08:09</code>
PIP Package: <code>chroma-1.0.18.tar.gz</code>
Github Container Registry Image: <code>:1.0.18</code>
DockerHub Image: <code>:1.0.18</code></p>
<h2>What's Changed</h2>
<ul>
<li>[CHORE]: Added short descriptions to CLI commands by <a
href="https://github.com/tazarov"><code>@​tazarov</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5217">chroma-core/chroma#5217</a></li>
<li>[ENH] Use AVX in distance calculations by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5258">chroma-core/chroma#5258</a></li>
<li>[ENH] Auto-set tenant, scoped database in python CloudClient by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5026">chroma-core/chroma#5026</a></li>
<li>[PERF]: Modify get_range to return an iterator by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5256">chroma-core/chroma#5256</a></li>
<li>[BUG] Mark dirty on rollback of cursor to guarantee compaction picks
it up. by <a href="https://github.com/rescrv"><code>@​rescrv</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5265">chroma-core/chroma#5265</a></li>
<li>[ENH]: add metric for component queue depth &amp; change dispatcher
queue depth metric buckets by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5261">chroma-core/chroma#5261</a></li>
<li>[ENH]: add garbage collection CLI for manual garbage collection by
<a href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5250">chroma-core/chroma#5250</a></li>
<li>[DOC] Clean up DEVELOP.md by <a
href="https://github.com/kylediaz"><code>@​kylediaz</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5270">chroma-core/chroma#5270</a></li>
<li>[ENH]: Further optimize query on getCollections when databases pkey
is fully specified by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5268">chroma-core/chroma#5268</a></li>
<li>[ENH] Update Rust to allow build with AVX when flag is set by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5269">chroma-core/chroma#5269</a></li>
<li>[ENH]: Fix test_add flake by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5272">chroma-core/chroma#5272</a></li>
<li>[BUG]: Revert &quot;[ENH]: Further optimize query on getCollections
when databases pkey is fully specified (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5268">#5268</a>)&quot;
by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5273">chroma-core/chroma#5273</a></li>
<li>[BLD] Add maturin to dev dependencies by <a
href="https://github.com/kylediaz"><code>@​kylediaz</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5271">chroma-core/chroma#5271</a></li>
<li>[ENH]: Optimize GetCollections and remove usage of raw gorm by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5274">chroma-core/chroma#5274</a></li>
<li>[ENH]: add config param to garbage collector to control how many
collections are fetched from SysDb by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5275">chroma-core/chroma#5275</a></li>
<li>[ENH] Reject version files without paths. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5267">chroma-core/chroma#5267</a></li>
<li>[ENH] Enable getting a collection by CRN by <a
href="https://github.com/drewkim"><code>@​drewkim</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5244">chroma-core/chroma#5244</a></li>
<li>[BUG] CompactionError did not proxy should_trace_error by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5282">chroma-core/chroma#5282</a></li>
<li>[BUG] Resolve deadlock in system crate? by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5283">chroma-core/chroma#5283</a></li>
<li>[ENH] Complete the NAC metrics for the write half. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5278">chroma-core/chroma#5278</a></li>
<li>[BUG]: fix missing node in constructed version graph for garbage
collection by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5284">chroma-core/chroma#5284</a></li>
<li>[BUG] Fix test flake from 5283. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5287">chroma-core/chroma#5287</a></li>
<li>[BUG]: Don't GC hnsw if it is empty by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5295">chroma-core/chroma#5295</a></li>
<li>[ENH] Sync before flushing by <a
href="https://github.com/HammadB"><code>@​HammadB</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5296">chroma-core/chroma#5296</a></li>
<li>[DOC] update quota limits by <a
href="https://github.com/philipithomas"><code>@​philipithomas</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5297">chroma-core/chroma#5297</a></li>
<li>[BUG] Fix CLI copy offset by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5288">chroma-core/chroma#5288</a></li>
<li>[ENH] Add support for default space in create coll config by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5293">chroma-core/chroma#5293</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b6b059dfd7"><code>b6b059d</code></a>
[RELEASE] 1.0.20 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5303">#5303</a>)</li>
<li><a
href="1993cd4a51"><code>1993cd4</code></a>
[RELEASE] CLI 1.1.8, Python 1.0.19, JS 3.0.14 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5302">#5302</a>)</li>
<li><a
href="19600af279"><code>19600af</code></a>
[BUG] Fix CLI copy arg number types (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5301">#5301</a>)</li>
<li><a
href="d3602cd776"><code>d3602cd</code></a>
[CHORE] Update JS binding deps in the client (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5300">#5300</a>)</li>
<li><a
href="2570b471ed"><code>2570b47</code></a>
[RELEASE] CLI 1.1.7, Python 1.0.18, JS 3.0.13 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5299">#5299</a>)</li>
<li><a
href="51a7d1625b"><code>51a7d16</code></a>
[ENH] Add support for default space in create coll config (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5293">#5293</a>)</li>
<li><a
href="163133aacc"><code>163133a</code></a>
[BUG] Fix CLI copy offset (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5288">#5288</a>)</li>
<li><a
href="2f06586503"><code>2f06586</code></a>
[DOC] update quota limits (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5297">#5297</a>)</li>
<li><a
href="983728076d"><code>9837280</code></a>
[ENH] Sync before flushing (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5296">#5296</a>)</li>
<li><a
href="649e14c530"><code>649e14c</code></a>
[BUG]: Don't GC hnsw if it is empty (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5295">#5295</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/chroma-core/chroma/compare/1.0.16...1.0.20">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=chromadb&package-manager=uv&previous-version=1.0.16&new-version=1.0.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:49:11 -07:00
dependabot[bot]
620212e920
chore(ui-deps): bump @radix-ui/react-collapsible from 1.1.11 to 1.1.12 in /llama_stack/ui (#3218)
Bumps
[@radix-ui/react-collapsible](https://github.com/radix-ui/primitives)
from 1.1.11 to 1.1.12.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-collapsible&package-manager=npm_and_yarn&previous-version=1.1.11&new-version=1.1.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:53 -07:00
dependabot[bot]
65d09c442d
chore(ui-deps): bump eslint-config-prettier from 10.1.5 to 10.1.8 in /llama_stack/ui (#3220)
Bumps
[eslint-config-prettier](https://github.com/prettier/eslint-config-prettier)
from 10.1.5 to 10.1.8.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/eslint-config-prettier/releases">eslint-config-prettier's
releases</a>.</em></p>
<blockquote>
<h2>v10.1.8</h2>
<p>republish latest version</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8">https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/eslint-config-prettier/blob/main/CHANGELOG.md">eslint-config-prettier's
changelog</a>.</em></p>
<blockquote>
<h1>eslint-config-prettier</h1>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9b0b0a47ec"><code>9b0b0a4</code></a>
fix: release a new latest version</li>
<li>See full diff in <a
href="https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-config-prettier&package-manager=npm_and_yarn&previous-version=10.1.5&new-version=10.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:35 -07:00
dependabot[bot]
90b7c2317e
chore(ui-deps): bump @radix-ui/react-separator from 1.1.6 to 1.1.7 in /llama_stack/ui (#3222)
Bumps
[@radix-ui/react-separator](https://github.com/radix-ui/primitives) from
1.1.6 to 1.1.7.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-separator&package-manager=npm_and_yarn&previous-version=1.1.6&new-version=1.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:20 -07:00
dependabot[bot]
0473a32619
chore(ui-deps): bump tailwind-merge from 3.3.0 to 3.3.1 in /llama_stack/ui (#3223)
Bumps [tailwind-merge](https://github.com/dcastil/tailwind-merge) from
3.3.0 to 3.3.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dcastil/tailwind-merge/releases">tailwind-merge's
releases</a>.</em></p>
<blockquote>
<h2>v3.3.1</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Fix arbitrary value using <code>color-mix()</code> not being
detected as color by <a
href="https://github.com/dcastil"><code>@​dcastil</code></a> in <a
href="https://redirect.github.com/dcastil/tailwind-merge/pull/591">dcastil/tailwind-merge#591</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1">https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1</a></p>
<p>Thanks to <a
href="https://github.com/brandonmcconnell"><code>@​brandonmcconnell</code></a>,
<a href="https://github.com/manavm1990"><code>@​manavm1990</code></a>,
<a href="https://github.com/langy"><code>@​langy</code></a>, <a
href="https://github.com/roboflow"><code>@​roboflow</code></a>, <a
href="https://github.com/syntaxfm"><code>@​syntaxfm</code></a>, <a
href="https://github.com/getsentry"><code>@​getsentry</code></a>, <a
href="https://github.com/codecov"><code>@​codecov</code></a>, <a
href="https://github.com/sourcegraph"><code>@​sourcegraph</code></a>, a
private sponsor, <a
href="https://github.com/block"><code>@​block</code></a> and <a
href="https://github.com/shawt3000"><code>@​shawt3000</code></a> for
sponsoring tailwind-merge! ❤️</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40d8feed6a"><code>40d8fee</code></a>
v3.3.1</li>
<li><a
href="429ea54ac8"><code>429ea54</code></a>
add changelog for v3.3.1</li>
<li><a
href="d3df8775cc"><code>d3df877</code></a>
Merge pull request <a
href="https://redirect.github.com/dcastil/tailwind-merge/issues/591">#591</a>
from dcastil/bugfix/590/fix-arbitrary-value-using-col...</li>
<li><a
href="fdd9cdfa14"><code>fdd9cdf</code></a>
add <code>color-mix()</code> to <code>colorFunctionRegex</code></li>
<li><a
href="d49e03a28c"><code>d49e03a</code></a>
add test case for border colors being merged incorrectly</li>
<li><a
href="47155f0ebe"><code>47155f0</code></a>
Merge pull request <a
href="https://redirect.github.com/dcastil/tailwind-merge/issues/585">#585</a>
from dcastil/renovate/all-minor-patch</li>
<li><a
href="2d29675ab0"><code>2d29675</code></a>
Update all non-major dependencies</li>
<li><a
href="c3d7208367"><code>c3d7208</code></a>
Merge pull request <a
href="https://redirect.github.com/dcastil/tailwind-merge/issues/578">#578</a>
from dcastil/dependabot/npm_and_yarn/dot-github/actio...</li>
<li><a
href="527214bf13"><code>527214b</code></a>
Bump undici from 5.28.5 to 5.29.0 in
/.github/actions/metrics-report</li>
<li>See full diff in <a
href="https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tailwind-merge&package-manager=npm_and_yarn&previous-version=3.3.0&new-version=3.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:05 -07:00
dependabot[bot]
09bee51d6b
chore(python-deps): bump locust from 2.38.0 to 2.39.0 (#3221)
Bumps [locust](https://github.com/locustio/locust) from 2.38.0 to
2.39.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/releases">locust's
releases</a>.</em></p>
<blockquote>
<h2>2.39.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add MilvusUser and example by <a
href="https://github.com/zhuwenxing"><code>@​zhuwenxing</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3168">locustio/locust#3168</a></li>
<li>Add SocketIOUser by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3189">locustio/locust#3189</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/zhuwenxing"><code>@​zhuwenxing</code></a> made
their first contribution in <a
href="https://redirect.github.com/locustio/locust/pull/3168">locustio/locust#3168</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.38.1...2.39.0">https://github.com/locustio/locust/compare/2.38.1...2.39.0</a></p>
<h2>2.38.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix test flakyness and update error message by <a
href="https://github.com/amadeuppereira"><code>@​amadeuppereira</code></a>
in <a
href="https://redirect.github.com/locustio/locust/pull/3187">locustio/locust#3187</a></li>
<li>FastHttpUser: Dont send zstd in Accept-Encoding header by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3188">locustio/locust#3188</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.38.0...2.38.1">https://github.com/locustio/locust/compare/2.38.0...2.38.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/blob/master/CHANGELOG.md">locust's
changelog</a>.</em></p>
<blockquote>
<h1>Detailed changelog</h1>
<p>The most important changes can also be found in <a
href="https://docs.locust.io/en/latest/changelog.html">the
documentation</a>.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1810fef1ae"><code>1810fef</code></a>
Tiny doc fixes</li>
<li><a
href="48b4dfce8f"><code>48b4dfc</code></a>
Link SocketIOUser from main docs.</li>
<li><a
href="6e4fd7f067"><code>6e4fd7f</code></a>
Merge pull request <a
href="https://redirect.github.com/locustio/locust/issues/3189">#3189</a>
from locustio/Add-SocketioUser</li>
<li><a
href="95eca45476"><code>95eca45</code></a>
better documentation of on_message</li>
<li><a
href="a56ef663af"><code>a56ef66</code></a>
SocketIOUser docs: Link to example on GH</li>
<li><a
href="adaa71b5f9"><code>adaa71b</code></a>
SocketIOUser, add method docstrings and link to python-socketio's
readthedocs</li>
<li><a
href="9fb3ff0f89"><code>9fb3ff0</code></a>
Add testcase for SocketIOUser</li>
<li><a
href="7047247f9d"><code>7047247</code></a>
SocketIOUser: Fix use of environment object. Remove SocketIOClient.</li>
<li><a
href="f8ddc9c798"><code>f8ddc9c</code></a>
rename socketio echo_server</li>
<li><a
href="ae28acf027"><code>ae28acf</code></a>
add contrib dependencies to docs build</li>
<li>Additional commits viewable in <a
href="https://github.com/locustio/locust/compare/2.38.0...2.39.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=locust&package-manager=uv&previous-version=2.38.0&new-version=2.39.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:47:46 -07:00
dependabot[bot]
eff97f122b
chore(python-deps): bump weaviate-client from 4.16.5 to 4.16.9 (#3219)
Bumps
[weaviate-client](https://github.com/weaviate/weaviate-python-client)
from 4.16.5 to 4.16.9.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's
releases</a>.</em></p>
<blockquote>
<h2>v4.16.9</h2>
<h2>What's Changed</h2>
<ul>
<li>Deprecate broken method by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1795">weaviate/weaviate-python-client#1795</a></li>
<li>Improve user create docstring by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1796">weaviate/weaviate-python-client#1796</a></li>
<li>Fixup dependencies for package test by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1791">weaviate/weaviate-python-client#1791</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.8...v4.16.9">https://github.com/weaviate/weaviate-python-client/compare/v4.16.8...v4.16.9</a></p>
<h2>v4.16.8</h2>
<h2>What's Changed</h2>
<ul>
<li>Add backup list endpoint by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1785">weaviate/weaviate-python-client#1785</a></li>
<li>Attempt further fix of protobuf runtime stub incompatibilities by <a
href="https://github.com/tsmith023"><code>@​tsmith023</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1788">weaviate/weaviate-python-client#1788</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.7...v4.16.8">https://github.com/weaviate/weaviate-python-client/compare/v4.16.7...v4.16.8</a></p>
<h2>v4.16.6</h2>
<h2>What's Changed</h2>
<ul>
<li>rq: Add bits to the update method by <a
href="https://github.com/rlmanrique"><code>@​rlmanrique</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1766">weaviate/weaviate-python-client#1766</a></li>
<li>Deprecate contextionar, add model2vec and dimension parameter for
transformers by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1773">weaviate/weaviate-python-client#1773</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.6">https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.6</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's
changelog</a>.</em></p>
<blockquote>
<h2>Version 4.16.9</h2>
<p>This patch version includes:
- Explicitly depend on protobuf package</p>
<h2>Version 4.16.8</h2>
<p>This patch version includes:
- Further attempted fixes for <code>protobuf</code> compatability issues
- Introduction of the <code>backups.list()</code> method</p>
<h2>Version 4.16.7</h2>
<p>This patch version includes:
- Fixes compatability issues between the built gRPC stubs and differing
protobuf versions depending on the version of <code>grpcio</code> used
to build the stubs
- Add <code>text2vec-model2vec</code> module to
<code>Configure.NamedVectors</code>
- Deprecated <code>min_occurrences</code> in <code>Metrics.text</code>
in favour of <code>limit</code></p>
<h2>Version 4.16.6</h2>
<p>This patch version includes:
- Add <code>dimensions</code> property to
<code>text2vec-transformers</code> vectorizers in
<code>Configure.Vectors</code>
- Add <code>text2vec-model2vec</code> vectorizer in
<code>Configure.Vectors</code>
- Deprecate <code>text2vec-contextionary</code> vectorizer</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c69cfa124e"><code>c69cfa1</code></a>
Fixup dependencies for package test (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1791">#1791</a>)</li>
<li><a
href="334380b6d4"><code>334380b</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1796">#1796</a>
from weaviate/docstring_user_create</li>
<li><a
href="c7b8c75893"><code>c7b8c75</code></a>
Improve user create docstring</li>
<li><a
href="93c865a23e"><code>93c865a</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1795">#1795</a>
from weaviate/deprecate_broken_method</li>
<li><a
href="ba05f5f1ad"><code>ba05f5f</code></a>
Deprecate broken method</li>
<li><a
href="4bef4b8210"><code>4bef4b8</code></a>
Update changelog (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1789">#1789</a>)</li>
<li><a
href="c370bf5fa2"><code>c370bf5</code></a>
Attempt further fix of protobuf runtime stub incompatibilities (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1788">#1788</a>)</li>
<li><a
href="98db3b1187"><code>98db3b1</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1785">#1785</a>
from weaviate/add_list_response</li>
<li><a
href="ebf2b30252"><code>ebf2b30</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1782">#1782</a>
from weaviate/dependabot/pip/ruff-0.12.8</li>
<li><a
href="88ad1c113b"><code>88ad1c1</code></a>
Fix version in CI</li>
<li>Additional commits viewable in <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.5&new-version=4.16.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:47:33 -07:00
Ashwin Bharambe
f328ff6e98 fix(ci): dependabot update had a bug 2025-08-20 16:34:50 -07:00
Francisco Arceo
49060c3020
chore: Update dependabot to capture package-lock.json (#3212)
# What does this PR do?
This should fix dependabot based on this thread:
https://stackoverflow.com/questions/60201543/dependabot-only-updates-lock-file


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-20 15:05:12 -07:00
grs
14082b22af
fix: handle mcp tool calls in previous response correctly (#3155)
# What does this PR do?

Handles MCP tool calls in a previous response

Closes #3105

## Test Plan
Made call to create response with tool call, then made second call with
the first linked through previous_response_id. Did not get error.

Also added unit test.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-08-20 14:12:15 -07:00
Omer Tuchfeld
00a67da449
fix: Use pool_pre_ping=True in SQLAlchemy engine creation (#3208)
# What does this PR do?

We noticed that when llama-stack is running for a long time, we would
run into database errors when trying to run messages through the agent
(which we configured to persist against postgres), seemingly due to the
database connections being stale or disconnected. This commit adds
`pool_pre_ping=True` to the SQLAlchemy engine creation to help mitigate
this issue by checking the connection before using it, and
re-establishing it if necessary.

More information in:


https://docs.sqlalchemy.org/en/20/core/pooling.html#dealing-with-disconnects

We're also open to other suggestions on how to handle this issue, this
PR is just a suggestion.

## Test Plan

We have not tested it yet (we're in the process of doing that) and we're
hoping it's going to resolve our issue.
2025-08-20 13:52:05 -07:00
Francisco Arceo
e195ee3091
fix: Fix broken package-lock.json (#3209)
# What does this PR do?
Fix broken `package-lock.json` not caught by [github bot in this
commit](7f0b2a8764).

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-20 13:11:44 -07:00
Matthew Farrellee
c2c859a6b0
chore(files tests): update files integration tests and fix inline::localfs (#3195)
- update files=inline::localfs to raise ResourceNotFoundError instead of
ValueError
- only skip tests when no files provider is available
- directly use openai_client and llama_stack_client where appropriate
- check for correct behavior of non-existent file
- xfail the isolation test, no implementation supports it

test plan -

```
$ uv run ./scripts/integration-tests.sh --stack-config server:ci-tests --provider ollama --test-subdirs files
...

tests/integration/files/test_files.py::test_openai_client_basic_operations PASSED               [ 25%]
tests/integration/files/test_files.py::test_files_authentication_isolation XFAIL                [ 50%]
tests/integration/files/test_files.py::test_files_authentication_shared_attributes PASSED       [ 75%]
tests/integration/files/test_files.py::test_files_authentication_anonymous_access PASSED        [100%]

==================================== 3 passed, 1 xfailed in 1.03s =====================================
```

previously -

```
$ uv run llama stack build --image-type venv --providers files=inline::localfs --run &
...
$ ./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files
...

tests/integration/files/test_files.py::test_openai_client_basic_operations[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 12%]
tests/integration/files/test_files.py::test_files_authentication_isolation[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 25%]
tests/integration/files/test_files.py::test_files_authentication_shared_attributes[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 37%]
tests/integration/files/test_files.py::test_files_authentication_anonymous_access[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 50%]
tests/integration/files/test_files.py::test_openai_client_basic_operations[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 62%]
tests/integration/files/test_files.py::test_files_authentication_isolation[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 75%]
tests/integration/files/test_files.py::test_files_authentication_shared_attributes[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 87%]
tests/integration/files/test_files.py::test_files_authentication_anonymous_access[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [100%]

========================================================= 2 passed, 6 skipped in 1.31s ==========================================================
```
2025-08-20 14:22:40 -04:00
Jiayi Ni
55e9959f62
fix: fix ``openai_embeddings`` for asymmetric embedding NIMs (#3205)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 49s
Test Llama Stack Build / build (push) Failing after 54s
UI Tests / ui-tests (22) (push) Failing after 1m26s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do?
NVIDIA asymmetric embedding models (e.g.,
`nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter
not present in the standard OpenAI embeddings API. This PR adds the
`input_type="query"` as default and updates the documentation to suggest
using the `embedding` API for passage embeddings.

<!-- If resolving an issue, uncomment and update the line below -->
Resolves #2892 

## Test Plan
```
pytest -s -v tests/integration/inference/test_openai_embeddings.py   --stack-config="inference=nvidia"   --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2"   --env NVIDIA_API_KEY={nvidia_api_key}   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-08-20 08:06:25 -04:00
Mustafa Elbehery
3f8df167f3
chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061)
# What does this PR do?

This PR adds a step in pre-commit to enforce using `llama_stack` logger.

Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-08-20 07:15:35 -04:00
Matthew Farrellee
5f151ddf45
fix: disable ui-prettier & ui-eslint (#3207) 2025-08-20 06:42:43 -04:00
Francisco Arceo
5f6d5072b6
chore: Faster npm pre-commit (#3206)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 13s
Vector IO Integration Tests / test-matrix (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 58s
Update ReadTheDocs / update-readthedocs (push) Failing after 55s
UI Tests / ui-tests (22) (push) Failing after 1m18s
Test External API and Providers / test-external (venv) (push) Failing after 2m2s
Pre-commit / pre-commit (push) Failing after 2m43s
# What does this PR do?
Adds npm to pre-commit.yml installation and caches ui
Removes node installation during pre-commit.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-19 16:38:38 -07:00
github-actions[bot]
7f0b2a8764 build: Bump version to 0.2.18 2025-08-19 22:38:23 +00:00
Matthew Farrellee
e7a812f5de
chore: Fixup main pre commit (#3204) 2025-08-19 14:52:38 -04:00
Varsha
8cc4925f7d
chore: Enable keyword search for Milvus inline (#3073)
# What does this PR do?
With https://github.com/milvus-io/milvus-lite/pull/294 - Milvus Lite
supports keyword search using BM25. While introducing keyword search we
had explicitly disabled it for inline milvus. This PR removes the need
for the check, and enables `inline::milvus` for tests.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Run llama stack with `inline::milvus` enabled:

```
pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v
```

```
INFO     2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS                                        
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 3 items                                                                                                                                                                                          

tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED                                                   [ 33%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED                                                  [ 66%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED                                                   [100%]

============================================================================================ 3 passed in 4.75s =============================================================================================
```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-19 13:01:23 -04:00
Ashwin Bharambe
eb07a0f86a
fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test Llama Stack Build / build-single-provider (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 28s
Test Llama Stack Build / generate-matrix (push) Successful in 25s
Python Package Build Test / build (3.13) (push) Failing after 25s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 37s
Test External API and Providers / test-external (venv) (push) Failing after 33s
Unit Tests / unit-tests (3.13) (push) Failing after 33s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 38s
Python Package Build Test / build (3.12) (push) Failing after 1m0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m4s
Unit Tests / unit-tests (3.12) (push) Failing after 59s
Test Llama Stack Build / build (push) Failing after 50s
Vector IO Integration Tests / test-matrix (push) Failing after 1m48s
UI Tests / ui-tests (22) (push) Successful in 2m12s
Pre-commit / pre-commit (push) Successful in 2m41s
I started this PR trying to unbreak a newly broken test
`test_agent_name`. This test was broken all along but did not show up
because during testing we were pulling the "non-updated" llama stack
client. See this comment:
https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205

While fixing this, I encountered a large amount of badness in our CI
workflow definitions.

- We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR`
overrides to `llama stack build` at all in some cases.
- Even when we did, we used `uv run` liberally. The first thing `uv run`
does is "syncs" the project environment. This means, it is going to undo
any mutations we might have done ourselves. But we make many mutations
in our CI runners to these environments. The most important of which is
why `llama stack build` where we install distro dependencies. As a
result, when you tried to run the integration tests, you would see old,
strange versions.


## Test Plan

Re-record using:

```
sh scripts/integration-tests.sh --stack-config ci-tests \
  --provider ollama --test-pattern test_agent_name --inference-mode record
```

Then re-run with `--inference-mode replay`. But: 

Eventually, this test turned out to be quite flaky for telemetry
reasons. I haven't investigated it for now and just disabled it sadly
since we have a release to push out.
2025-08-18 17:02:24 -07:00
Francisco Arceo
ac78e9f66a
chore: Adding UI unit tests in CI (#3191)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m2s
Python Package Build Test / build (3.13) (push) Failing after 1m4s
UI Tests / ui-tests (22) (push) Successful in 1m33s
Pre-commit / pre-commit (push) Successful in 2m38s
2025-08-18 16:48:21 -06:00
Ashwin Bharambe
89661b984c
revert: "feat(cli): make venv the default image type" (#3196)
Reverts llamastack/llama-stack#3187
2025-08-18 15:31:01 -07:00
Ashwin Bharambe
2e7ca07423
feat(cli): make venv the default image type (#3187)
We have removed conda now so we can make `venv` the default. Just doing
`llama stack build --distro starter` is now enough for the most part.
2025-08-18 14:58:23 -07:00
slekkala1
7519ab4024
feat: Code scanner Provider impl for moderations api (#3100)
# What does this PR do?
Add CodeScanner implementations

## Test Plan
`SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v
tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`

This PR need to land after this
https://github.com/meta-llama/llama-stack/pull/3098
2025-08-18 14:15:40 -07:00
Ashwin Bharambe
27d6becfd0
fix(misc): pin openai dependency to < 1.100.0 (#3192)
This OpenAI client release
0843a11164
ends up breaking litellm
169a17400f/litellm/types/llms/openai.py (L40)

Update the dependency pin. Also make the imports a bit more defensive
anyhow if something else during `llama stack build` ends up moving
openai to a previous version.

## Test Plan

Run pre-release script integration tests.
2025-08-18 12:20:50 -07:00
IAN MILLER
f8398d25ff
fix: kill build_conda_env.sh (#3190)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
I noticed somehow
[build_conda_env.sh](https://github.com/llamastack/llama-stack/blob/main/llama_stack/core/build_conda_env.sh)
exists in main branch. We need to kill it to be consistent with
[#2969](https://github.com/llamastack/llama-stack/pull/2969)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-18 12:17:44 -07:00
Maor Friedman
739b18edf8
feat: add support for postgres ssl mode and root cert (#3182)
this PR adds support for configuring `sslmode` and `sslrootcert` when
initiating the psycopg2 connection.

closes #3181
2025-08-18 10:24:24 -07:00
Francisco Arceo
fa431e15e0
chore: Update TRIAGERS.md (#3186)
# What does this PR do?
Update triagers to current state

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-18 10:23:51 -07:00
Charlie Doern
4ae39b94ff
fix: remove category prints (#3189)
# What does this PR do?

commands where the output is important like `llama stack build
--print-deps-only` (soon to be `llama stack show`) print some log.py
`cprint`'s on _every_ execution of the CLI

for example:

<img width="912" height="331" alt="Screenshot 2025-08-18 at 1 16 30 PM"
src="https://github.com/user-attachments/assets/e5bf18fb-74a1-438c-861a-8a26eea7d014"
/>

the yellow text is likely unnecessary.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-18 10:23:23 -07:00
Ashwin Bharambe
f4cecaade9
chore(ci): dont run llama stack server always (#3188)
Sometimes the server has already been started (e.g., via docker). Just a
convenience here so we can reuse this script more.
2025-08-18 10:11:55 -07:00
Francisco Arceo
a8091d0c6a
chore: Update benchmarking location in contributing docs (#3180)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 24s
Python Package Build Test / build (3.12) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 57s
Pre-commit / pre-commit (push) Successful in 2m11s
# What does this PR do?
Small docs change as requested in
https://github.com/llamastack/llama-stack/pull/3160#pullrequestreview-3125038932


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-18 08:04:21 -04:00
Ashwin Bharambe
5e7c2250be
test(recording): add a script to schedule recording workflow (#3170)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Test External API and Providers / test-external (venv) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Pre-commit / pre-commit (push) Successful in 1m19s
See comment here:
https://github.com/llamastack/llama-stack/pull/3162#issuecomment-3192859097
-- TL;DR it is quite complex to invoke the recording workflow correctly
for an end developer writing tests. This script simplifies the work.

No more manual GitHub UI navigation!

## Script Functionality

  - Auto-detects your current branch and associated PR
  - Finds the right repository context (works from forks!)
  - Runs the workflow where it can actually commit back
  - Validates prerequisites and provides helpful error messages

## How to Use

First ensure you are on the branch which introduced a new test and want
it recorded. **Make sure you have pushed this branch remotely, easiest
is to create a PR.**

```
  # Record tests for current branch
  ./scripts/github/schedule-record-workflow.sh

  # Record specific test subdirectories
  ./scripts/github/schedule-record-workflow.sh --test-subdirs "agents,inference"

  # Record with vision tests enabled
  ./scripts/github/schedule-record-workflow.sh --run-vision-tests

  # Record tests matching a pattern
  ./scripts/github/schedule-record-workflow.sh --test-pattern "test_streaming"
```

## Test Plan

Ran `./scripts/github/schedule-record-workflow.sh -s inference -k
tool_choice` which started
4820409329
which successfully committed recorded outputs.
2025-08-15 16:54:34 -07:00
Matthew Farrellee
914c7be288
feat: add batches API with OpenAI compatibility (with inference replay) (#3162)
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses #3066

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-15 15:34:15 -07:00
Ashwin Bharambe
f4ccdee200 fix(ci): skip batches directory for library client testing 2025-08-15 15:30:03 -07:00
Ashwin Bharambe
0e8bb94bf3
feat(ci): make recording workflow simpler, more parameterizable (#3169)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 27s
Unit Tests / unit-tests (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 2m6s
# What does this PR do?

Recording tests has become a nightmare. This is the first part of making
that process simpler by making it _less_ automatic. I tried to be too
clever earlier.

It simplifies the record-integration-tests workflow to use workflow
dispatch inputs instead of PR labels. No more opaque stuff. Just go to
the GitHub UI and run the workflow with inputs. I will soon add a helper
script for this also.

Other things to aid re-running just the small set of things you need to
re-record:
- Replaces the `test-types` JSON array parameter with a more intuitive
`test-subdirs` comma-separated list. The whole JSON array crap was for
matrix.
- Adds a new `test-pattern` parameter to allow filtering tests using
pytest's `-k` option


## Test Plan

Note that this PR is in a fork not the source repository.

- Replay tests on this PR are green
- Manually
[ran](1699856292)
the replay workflow with a test-subdir and test-pattern filter, worked
- Manually
[ran](4819508034)
the **record** workflow with a simple pattern, it has worked and updated
_this_ PR.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-15 14:47:20 -07:00
Ashwin Bharambe
a6e2c18909
Revert "refactor(agents): migrate to OpenAI chat completions API" (#3167)
Reverts llamastack/llama-stack#3097

It has broken agents tests.
2025-08-15 12:01:07 -07:00
ehhuang
2c06b24c77
test: benchmark scripts (#3160)
# What does this PR do?
1. Add our own benchmark script instead of locust (doesn't support
measuring streaming latency well)
2. Simplify k8s deployment
3. Add a simple profile script for locally running server

## Test Plan
❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10

============================================================
BENCHMARK RESULTS
============================================================
Total time: 180.00s
Concurrent users: 10
Total requests: 1636
Successful requests: 1636
Failed requests: 0
Success rate: 100.0%
Requests per second: 9.09

Response Time Statistics:
  Mean: 1.095s
  Median: 1.721s
  Min: 0.136s
  Max: 3.218s
  Std Dev: 0.762s

Percentiles:
  P50: 1.721s
  P90: 1.751s
  P95: 1.756s
  P99: 1.796s

Time to First Token (TTFT) Statistics:
  Mean: 0.037s
  Median: 0.037s
  Min: 0.023s
  Max: 0.211s
  Std Dev: 0.011s

TTFT Percentiles:
  P50: 0.037s
  P90: 0.040s
  P95: 0.044s
  P99: 0.055s

Streaming Statistics:
  Mean chunks per response: 64.0
  Total chunks received: 104775
2025-08-15 11:24:29 -07:00
dependabot[bot]
2114214fe3
chore(python-deps): bump huggingface-hub from 0.34.3 to 0.34.4 (#3084)
Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub)
from 0.34.3 to 0.34.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's
releases</a>.</em></p>
<blockquote>
<h2>[v0.34.4] Support Image to Video inference + QoL in jobs API, auth
and utilities</h2>
<p>Biggest update is the support of Image-To-Video task with inference
provider Fal AI</p>
<ul>
<li>[Inference] Support image to video task <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a>
by <a
href="https://github.com/hanouticelina"><code>@​hanouticelina</code></a></li>
</ul>
<pre lang="py"><code>&gt;&gt;&gt; from huggingface_hub import
InferenceClient
&gt;&gt;&gt; client = InferenceClient()
&gt;&gt;&gt; video = client.image_to_video(&quot;cat.jpg&quot;,
model=&quot;Wan-AI/Wan2.2-I2V-A14B&quot;, prompt=&quot;turn the cat into
a tiger&quot;)
&gt;&gt;&gt; with open(&quot;tiger.mp4&quot;, &quot;wb&quot;) as f:
 ...     f.write(video)
</code></pre>
<p>And some quality of life improvements:</p>
<ul>
<li>Add type to job owner <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a>
by <a href="https://github.com/drbh"><code>@​drbh</code></a></li>
<li>Include HF_HUB_DISABLE_XET in the environment dump <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a>
by <a
href="https://github.com/hanouticelina"><code>@​hanouticelina</code></a></li>
<li>Whoami: custom message only on unauthorized <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a>
by <a href="https://github.com/Wauplin"><code>@​Wauplin</code></a></li>
<li>Add validation warnings for repository limits in upload_large_folder
<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a>
by <a
href="https://github.com/davanstrien"><code>@​davanstrien</code></a></li>
<li>Add timeout info to Jobs guide docs <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a>
by <a
href="https://github.com/davanstrien"><code>@​davanstrien</code></a></li>
<li>[Jobs] Use current or stored token in a Job secrets <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a>
by <a href="https://github.com/lhoestq"><code>@​lhoestq</code></a></li>
<li>Fix bash history expansion in hf jobs example <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a>
by <a
href="https://github.com/nyuuzyou"><code>@​nyuuzyou</code></a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="84a92a92c2"><code>84a92a9</code></a>
Release: v0.34.4</li>
<li><a
href="6196ac2cbc"><code>6196ac2</code></a>
Add type to job owner (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a>)</li>
<li><a
href="4f6975f697"><code>4f6975f</code></a>
Include <code>HF_HUB_DISABLE_XET</code> in the environment dump (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a>)</li>
<li><a
href="3720a5096f"><code>3720a50</code></a>
[Inference] Support image to video task (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a>)</li>
<li><a
href="bb5e4c7a2c"><code>bb5e4c7</code></a>
Whoami: custom message only on unauthorized (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a>)</li>
<li><a
href="a725256f31"><code>a725256</code></a>
Add validation warnings for repository limits in upload_large_folder (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a>)</li>
<li><a
href="a181b0f088"><code>a181b0f</code></a>
Add timeout info to Jobs guide docs (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a>)</li>
<li><a
href="4d38925c8d"><code>4d38925</code></a>
[Jobs] Use current or stored token in a Job secrets (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a>)</li>
<li><a
href="1580ce18c7"><code>1580ce1</code></a>
Fix bash history expansion in hf jobs example (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a>)</li>
<li>See full diff in <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=huggingface-hub&package-manager=uv&previous-version=0.34.3&new-version=0.34.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-15 10:55:43 -07:00
dependabot[bot]
a275282685
chore(python-deps): bump pymilvus from 2.5.14 to 2.6.0 (#3086)
Bumps [pymilvus](https://github.com/milvus-io/pymilvus) from 2.5.14 to
2.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/milvus-io/pymilvus/releases">pymilvus's
releases</a>.</em></p>
<blockquote>
<h2>PyMilvus v2.6.0 Release Notes</h2>
<h2>New Features</h2>
<ol>
<li>Add APIs in MilvusClient</li>
</ol>
<ul>
<li>enhance: add describe and alter database in MilvusClient by <a
href="https://github.com/smellthemoon"><code>@​smellthemoon</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2433">milvus-io/pymilvus#2433</a></li>
<li>enhance: support milvus-client iterator by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2461">milvus-io/pymilvus#2461</a></li>
<li>enhance: Enable resource group api in milvus client by <a
href="https://github.com/weiliu1031"><code>@​weiliu1031</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2513">milvus-io/pymilvus#2513</a></li>
<li>enhance: add release_collection, drop_index, create_partition,
drop_partition, load_partition and release_partition by <a
href="https://github.com/brcarry"><code>@​brcarry</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2525">milvus-io/pymilvus#2525</a></li>
<li>enhance: enable describe_replica api in milvus client by <a
href="https://github.com/weiliu1031"><code>@​weiliu1031</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2541">milvus-io/pymilvus#2541</a></li>
<li>enhance: support recalls for milvus_client by <a
href="https://github.com/chasingegg"><code>@​chasingegg</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2552">milvus-io/pymilvus#2552</a></li>
<li>enhance: add use_database by <a
href="https://github.com/czs007"><code>@​czs007</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2491">milvus-io/pymilvus#2491</a></li>
</ul>
<ol start="2">
<li>Add AsyncMilvusClient</li>
</ol>
<ul>
<li>[FEAT] Asyncio support by <a
href="https://github.com/brcarry"><code>@​brcarry</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2411">milvus-io/pymilvus#2411</a></li>
<li>Add async DDL funcs &amp; DDL examples by <a
href="https://github.com/Shawnzheng011019"><code>@​Shawnzheng011019</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2852">milvus-io/pymilvus#2852</a></li>
</ul>
<ol start="3">
<li>Other features</li>
</ol>
<ul>
<li>enhance: support Int8Vector by <a
href="https://github.com/cydrain"><code>@​cydrain</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2611">milvus-io/pymilvus#2611</a></li>
<li>feat: support recalls field in SearchResult by <a
href="https://github.com/chasingegg"><code>@​chasingegg</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2390">milvus-io/pymilvus#2390</a></li>
<li>enhance: Support Python3.13 and upgrade grpcio range by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2684">milvus-io/pymilvus#2684</a></li>
<li>enhance: support run analyzer return detail token by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2679">milvus-io/pymilvus#2679</a></li>
<li>enhance: Add force_drop parameter to drop_role method for role
deletion by <a href="https://github.com/SimFG"><code>@​SimFG</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2705">milvus-io/pymilvus#2705</a></li>
<li>enhance: add property func for AnalyzeToken by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2704">milvus-io/pymilvus#2704</a></li>
<li>enhance: grant/revoke v2 optional db and collection params by <a
href="https://github.com/shaoting-huang"><code>@​shaoting-huang</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2386">milvus-io/pymilvus#2386</a></li>
<li>extend unlimted offset for query iterator(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2418">#2418</a>)
by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2419">milvus-io/pymilvus#2419</a></li>
<li>enhance: alterindex &amp; altercollection supports altering
properties by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2406">milvus-io/pymilvus#2406</a></li>
<li>enhance: alterdatabase support delete property by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2435">milvus-io/pymilvus#2435</a></li>
<li>enhance: support hints param by <a
href="https://github.com/chasingegg"><code>@​chasingegg</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2408">milvus-io/pymilvus#2408</a></li>
<li>enhance: create database support properties by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2448">milvus-io/pymilvus#2448</a></li>
<li>enhance: Add <code>db_name</code> parameter at
<code>bulk_import</code> by <a
href="https://github.com/counter2015"><code>@​counter2015</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2446">milvus-io/pymilvus#2446</a></li>
<li>enhance: add search iterator v2 by <a
href="https://github.com/PwzXxm"><code>@​PwzXxm</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2395">milvus-io/pymilvus#2395</a></li>
<li>enhance: simplify the structure of search_params by <a
href="https://github.com/smellthemoon"><code>@​smellthemoon</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2507">milvus-io/pymilvus#2507</a></li>
<li>enhance: Remove long deprecated Milvus class by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2544">milvus-io/pymilvus#2544</a></li>
<li>enhance: Use new model pkg by <a
href="https://github.com/junjiejiangjjj"><code>@​junjiejiangjjj</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2595">milvus-io/pymilvus#2595</a></li>
<li>enhance: Add schema update time verification to insert and upsert to
use cache by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2551">milvus-io/pymilvus#2551</a></li>
<li>enhance: describecollection output add created_timestamp by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2618">milvus-io/pymilvus#2618</a></li>
<li>feat: add external filter func for search iterator v2 by <a
href="https://github.com/PwzXxm"><code>@​PwzXxm</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2639">milvus-io/pymilvus#2639</a></li>
<li>enhance: support run analyzer by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2622">milvus-io/pymilvus#2622</a></li>
<li>weighted reranker to allow skip score normalization by <a
href="https://github.com/zhengbuqian"><code>@​zhengbuqian</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2708">milvus-io/pymilvus#2708</a></li>
<li>enhance: Support AddCollectionField API by <a
href="https://github.com/congqixia"><code>@​congqixia</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2722">milvus-io/pymilvus#2722</a></li>
<li>Add 1-Way and 2-Way TLS Support to Bulk Import Functions by <a
href="https://github.com/abd-770"><code>@​abd-770</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2672">milvus-io/pymilvus#2672</a></li>
<li>enhance: Use SearchResult in MilvusClient by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2735">milvus-io/pymilvus#2735</a></li>
<li>Support rerank by <a
href="https://github.com/junjiejiangjjj"><code>@​junjiejiangjjj</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2729">milvus-io/pymilvus#2729</a></li>
<li>feat: suppoprt multi analyzer params by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2747">milvus-io/pymilvus#2747</a></li>
<li>Add funciton checker by <a
href="https://github.com/junjiejiangjjj"><code>@​junjiejiangjjj</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2760">milvus-io/pymilvus#2760</a></li>
<li>enhance: Support run analyzer by collection and field by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2822">milvus-io/pymilvus#2822</a></li>
<li>feat: support load collection/partition with priority(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2835">#2835</a>)
by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2836">milvus-io/pymilvus#2836</a></li>
<li>enhance: optimize perf for large topk(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2848">#2848</a>)
by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2849">milvus-io/pymilvus#2849</a></li>
<li>enhance: Add usage guide to manage MilvusClient by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2907">milvus-io/pymilvus#2907</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1e56ce7d31"><code>1e56ce7</code></a>
enhance: Update milvus-proto and readme (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2921">#2921</a>)</li>
<li><a
href="75052b1b7c"><code>75052b1</code></a>
enhance: Add usage guide to manage MilvusClient (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2907">#2907</a>)</li>
<li><a
href="9f44053086"><code>9f44053</code></a>
add example code for language identifier and multi analyzer (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2919">#2919</a>)</li>
<li><a
href="058836de26"><code>058836d</code></a>
fix: Return new pk value for upsert when autoid=true (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2914">#2914</a>)</li>
<li><a
href="bbc6777565"><code>bbc6777</code></a>
[cherry-pick] Compatible with the default behavior of free on the cloud
(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2913">#2913</a>)</li>
<li><a
href="45080c39c5"><code>45080c3</code></a>
fix: Aviod coping functions when init CollectionSchema (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2902">#2902</a>)</li>
<li><a
href="52b8461c5b"><code>52b8461</code></a>
[cherry-pick] bulk_import add stageName/dataPaths parameter (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2905">#2905</a>)</li>
<li><a
href="a8c3120622"><code>a8c3120</code></a>
[cherry-pick] support stage (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2895">#2895</a>)</li>
<li><a
href="3653effa88"><code>3653eff</code></a>
fix: Tidy alias configs when connect fails (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2900">#2900</a>)</li>
<li><a
href="728791a7de"><code>728791a</code></a>
enhance: Store alias before wait for ready (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2894">#2894</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/milvus-io/pymilvus/compare/v2.5.14...v2.6.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymilvus&package-manager=uv&previous-version=2.5.14&new-version=2.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-15 10:54:09 -07:00
Aakanksha Duggal
e743d3fdf6
refactor(agents): migrate to OpenAI chat completions API (#3097)
Replace chat_completion calls with openai_chat_completion to eliminate
dependency on legacy inference APIs.

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #3067

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-15 10:51:41 -07:00
ashwinb
f66ae3b3b1
docs(tests): Add a bunch of documentation for our testing systems (#3139)
# What does this PR do?

Creates a structured testing documentation section with multiple detailed pages:

- Testing overview explaining the record-replay architecture
- Integration testing guide with practical usage examples
- Record-replay system technical documentation
- Guide for writing effective tests
- Troubleshooting guide for common testing issues

Hopefully this makes things a bit easier.
2025-08-15 17:45:30 +00:00
Ashwin Bharambe
81ecaf6221
fix(ci): make the Vector IO CI follow the same pattern as others (#3164)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do?
Updates the integration-vector-io-tests workflow to run daily tests on
Python 3.13 while limiting regular PR tests to Python 3.12 only.

The PR also improves the concurrency configuration to prevent workflow
conflicts between main branch runs and PR runs.

## Test Plan


[![testinprod](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/WjlTemxb6oA4PgZFmj08/2645295d-f421-49ae-8f3f-f4672d8204e2/testinprod.jpeg)](https://app.graphite.dev/settings/meme-library?org=llamastack)
2025-08-14 21:06:08 -07:00
ashwinb
01b2afd4b5
fix(tests): record missing tests for test_responses_store (#3163)
# What does this PR do?

Updates test recordings.

## Test Plan

Started ollama serving the 3.2:3b model. Then ran the server:

```
LLAMA_STACK_TEST_INFERENCE_MODE=record \
  LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings/ \
  SQLITE_STORE_DIR=$(mktemp -d) \
  OLLAMA_URL=http://localhost:11434 \
  llama stack build --template starter --image-type venv --run
```

Then ran the tests which needed recording:

```
pytest -sv tests/integration/agents/test_openai_responses.py \
  --stack-config=server:starter \
   --text-model ollama/llama3.2:3b-instruct-fp16 -k test_responses_store
```

Then, restarted the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay`, re-ran the tests and verified they passed.
2025-08-15 03:52:45 +00:00
ashwinb
8ed69978f9
refactor(tests): make the responses tests nicer (#3161)
# What does this PR do?

A _bunch_ on cleanup for the Responses tests.

- Got rid of YAML test cases, moved them to just use simple pydantic models
- Splitting the large monolithic test file into multiple focused test files:
   - `test_basic_responses.py` for basic and image response tests
   - `test_tool_responses.py` for tool-related tests
   - `test_file_search.py` for file search specific tests
- Adding a `StreamingValidator` helper class to standardize streaming response validation

## Test Plan

Run the tests:

```
pytest -s -v tests/integration/non_ci/responses/ \
   --stack-config=starter \
   --text-model openai/gpt-4o \
   --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
    -k "client_with_models"
```
2025-08-15 00:05:36 +00:00
ashwinb
ba664474de
feat(responses): add mcp list tool streaming event (#3159)
# What does this PR do?

Adds proper streaming events for MCP tool listing (`mcp_list_tools.in_progress` and `mcp_list_tools.completed`). Also refactors things a bit more.

## Test Plan

Verified existing integration tests pass with the refactored code. The test `test_response_streaming_multi_turn_tool_execution` has been updated to check for the new MCP list tools streaming events
2025-08-15 00:05:36 +00:00
ashwinb
9324e902f1
refactor(responses): move stuff into some utils and add unit tests (#3158)
# What does this PR do?
Refactors the OpenAI response conversion utilities by moving helper functions from `openai_responses.py` to `utils.py`. Adds unit tests.
2025-08-15 00:05:36 +00:00
ashwinb
47d5af703c
chore(responses): Refactor Responses Impl to be civilized (#3138)
# What does this PR do?
Refactors the OpenAI responses implementation by extracting streaming and tool execution logic into separate modules. This improves code organization by:

1. Creating a new `StreamingResponseOrchestrator` class in `streaming.py` to handle the streaming response generation logic
2. Moving tool execution functionality to a dedicated `ToolExecutor` class in `tool_executor.py`

## Test Plan

Existing tests
2025-08-15 00:05:35 +00:00
Francisco Arceo
e69acbafbf
feat(UI): Adding linter and prettier for UI (#3156) 2025-08-14 15:58:43 -06:00
Ashwin Bharambe
61582f327c
fix(ci): update triggers for the workflows (#3152)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m39s
2025-08-14 10:27:25 -07:00
Derek Higgins
c15cc7ed77
fix: use ChatCompletionMessageFunctionToolCall (#3142)
The OpenAI compatibility layer was incorrectly importing
ChatCompletionMessageToolCallParam instead of the
ChatCompletionMessageFunctionToolCall class. This caused "Cannot
instantiate typing.Union" errors when processing agent requests with
tool calls.

Closes: #3141

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-08-14 10:27:00 -07:00
Ashwin Bharambe
ee7631b6cf
Revert "feat: add batches API with OpenAI compatibility" (#3149)
Reverts llamastack/llama-stack#3088

The PR broke integration tests.
2025-08-14 10:08:54 -07:00
Matthew Farrellee
de692162af
feat: add batches API with OpenAI compatibility (#3088)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Update ReadTheDocs / update-readthedocs (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 1m53s
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses #3066
2025-08-14 09:42:02 -04:00
ehhuang
46ff302d87
chore: Remove Trendshift badge from README (#3137)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 11s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 51s
Unit Tests / unit-tests (3.12) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 1m36s
## Summary
- This links to a scammy looking website with ads.

## Test plan
2025-08-13 18:38:34 -07:00
Ashwin Bharambe
e1e161553c
feat(responses): add MCP argument streaming and content part events (#3136)
# What does this PR do?

Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces:

1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals

2. New streaming event types:
   - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin
   - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete

3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones.


## Test Plan

Updated existing streaming tests to verify content part events are properly emitted
2025-08-13 16:34:26 -07:00
Ashwin Bharambe
8638537d14
feat(responses): stream progress of tool calls (#3135)
# What does this PR do?
Enhances tool execution streaming by adding support for real-time progress events during tool calls. This implementation adds streaming events for MCP and web search tools, including in-progress, searching, completed, and failed states. 

The refactored `_execute_tool_call` method now returns an async iterator that yields streaming events throughout the tool execution lifecycle.

## Test Plan
Updated the integration test `test_response_streaming_multi_turn_tool_execution` to verify the presence and structure of new streaming events, including:
- Checking for MCP in-progress and completed events
- Verifying that progress events contain required fields (item_id, output_index, sequence_number)
- Ensuring completed events have the necessary sequence_number field
2025-08-13 16:31:25 -07:00
Ashwin Bharambe
5b312a80b9
feat(responses): improve streaming for function calls (#3124)
Some checks failed
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 22s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m10s
Test Llama Stack Build / build (push) Failing after 12s
Emit streaming events for function calls

## Test Plan

Improved the test case
2025-08-13 11:23:27 -07:00
ehhuang
d6ae54723d
chore: setup for performance benchmarking (#3096)
# What does this PR do?
1. Added a simple mock openai-compat server that serves chat/completion
2. Add a benchmark server in EKS that includes mock inference server
3. Add locust (https://locust.io/) file for load testing

## Test Plan
bash apply.sh
kubectl port-forward service/locust-web-ui 8089:8089
Go to localhost:8089 to start a load test

<img width="1392" height="334" alt="image"
src="https://github.com/user-attachments/assets/d6aa3deb-583a-42ed-889b-751262b8e91c"
/>
<img width="1362" height="881" alt="image"
src="https://github.com/user-attachments/assets/6a28b9b4-05e6-44e2-b504-07e60c12d35e"
/>
2025-08-13 10:58:22 -07:00
ehhuang
2f51273215
fix: huge speed boost (#3132)
# What does this PR do?
make llama stack fast again


## Test Plan
2025-08-13 09:51:35 -07:00
slekkala1
25e0553eed
chore: Change moderations api response to Provider returned categories (#3098)
# What does this PR do?
To be compliant with model policies for LLAMA, just return the
categories as is from provider, we will lose the OAI compat in
moderations api response.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
`SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest
-v tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
2025-08-13 09:47:35 -07:00
Ashwin Bharambe
a9081d87b9 feat(ci): update Recording workflow trigger and concurrency group 2025-08-13 09:36:13 -07:00
IAN MILLER
0950168f26
refactor: replace hardcoded status codes by httpx.codes (#3131)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to eliminate hardcoded status codes in
server's responses and replace it by `httpx.codes` functionality for
better consistency across the whole project and improvement in code
readability.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 08:43:41 -07:00
Kelly Brown
0cbd93c5cc
docs: Update blocks formatting in docs/source files (#3120)
**Description:** 
The standard markdown [!NOTE] format is not supported on Sphinx
generated documentation, replacing those instances. Also updating other
Notes, Tips and Warning blocks throughout the source docs

WIP: Working to update the provider code gen
2025-08-13 08:06:31 -07:00
IAN MILLER
c9b78602d3
refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to make the behavior DELETE API endpoints be
consistent with standard RESTful conventions and eliminate confusion for
API consumers.

Old Behavior
```
HTTP Status: 200 OK
Response Body: null
```

Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`null% `
`INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry:
15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield
HTTP/1.1" 200 `

Updated Behavior
```
HTTP Status: 204 No Content
Response Body: empty (no body)
```

Eg.  `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry:
15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield
HTTP/1.1" 204 `

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3090 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 07:56:26 -07:00
Francisco Arceo
92aca434a7
fix: Fix list_sessions() (#3114)
# What does this PR do?
1. Updates `AgentPersistence.list_sessions()` to properly filter out
`Turn` keys from `Session` keys.
2. Adds a suite of unit tests to confirm the `list_sessions()` behavior
and tests the failed sample in
https://github.com/meta-llama/llama-stack/issues/3048

## Fixes https://github.com/meta-llama/llama-stack/issues/3048


## Test Plan
Unit tests added.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-13 07:46:26 -07:00
Krzysztof Malczuk
5bd6cb52fb
fix: github action canceling valid tasks for checking semantic pr title (#3127)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR changes the group name from github.ref to
github.even.pull_request_number. The reason for this is that github.ref
does not act as a unique identifier in the pull_request_target event and
only is unique in pull_request. The github action was getting canceled
was because the group name was not unique in the concurrency section.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3102

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
To test this I have created a fake github action and ran it trough act
to see what the github.ref variable produced and what alternatives can
be used. This confirmed that the github.ref was not unique and that
github.event.pull_request_number is unique to the PR.
2025-08-13 07:14:03 -07:00
Chacksu
fffdab4f5c
fix: Dell distribution missing kvstore (#3113)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Tests (Replay) / discover-tests (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do?

- Added kvstore config to ChromaDB provider config for Dell distribution
similar to [starter
config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112)
- Fixed
[error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425)
getting endpoint information by adding `hf-inference` as the provider to
the `AsyncInferenceClient` (TGI client).

## Test Plan
```
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=8000
export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321
export HF_TOKEN=[redacted]

# TGI Server
docker run --rm -it \
  --pull always \
  --network host \
  -v $HOME/.cache/huggingface:/data \
  -e HF_TOKEN=$HF_TOKEN \
  -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
  -p $INFERENCE_PORT:$INFERENCE_PORT \
  --gpus all \
  ghcr.io/huggingface/text-generation-inference:latest \
  --dtype float16 \
  --usage-stats off \
  --sharded false \
  --cuda-memory-fraction 0.8 \
  --model-id meta-llama/Llama-3.2-3B-Instruct \
  --port $INFERENCE_PORT \
  --hostname 0.0.0.0

# Chrome DB
docker run --rm -it \
  --name chromadb \
  --net=host  -p 8000:8000 \
  -v ~/chroma:/chroma/chroma \
  -e IS_PERSISTENT=TRUE \
  -e ANONYMIZED_TELEMETRY=FALSE \
  chromadb/chroma:latest

# Llama Stack
llama stack run dell \
 --port $LLAMA_STACK_PORT \
 --env INFERENCE_MODEL=$INFERENCE_MODEL \
 --env DEH_URL=$DEH_URL \
 --env CHROMA_URL=$CHROMA_URL
```

---------

Co-authored-by: Connor Hack <connorhack@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-13 06:18:25 -07:00
Kelly Brown
6358d0a478
docs: reorganize contributor guide (#3110)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 11s
Pre-commit / pre-commit (push) Successful in 1m48s
**Description:** 
Restructures contribution guide and move some sections into categories

<img width="1399" height="527" alt="Screenshot 2025-08-12 at 9 28 44 AM"
src="https://github.com/user-attachments/assets/404e23b4-0001-4174-b662-593e0173ef7d"
/>
2025-08-12 16:17:03 -07:00
Ashwin Bharambe
3d90117891
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers.

I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`

## Test Plan

Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
   --text-model openai/gpt-4o \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
  -k "client_with_models"
```

Do the same with `-k openai_client`

The rest should be taken care of by CI.
2025-08-12 16:15:53 -07:00
Ashwin Bharambe
1721aafc1f
feat(responses): type file results properly (#3117)
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m16s
Another thing our tests implicitly depended on.
2025-08-12 10:39:09 -07:00
Ashwin Bharambe
4fec49dfdb
feat(responses): add include parameter (#3115)
Well our Responses tests use it so we better include it in the API, no?

I discovered it because I want to make sure `llama-stack-client` can be
used always instead of `openai-python` as the client (we do want to be
_truly_ compatible.)
2025-08-12 10:24:01 -07:00
Nathan Weinberg
6812aa1e1e
chore: bump min python version in docs and tests (#3103)
# What does this PR do?
the minimum python version for the project was bumped to 3.12 a couple
months ago, but there remains some artifacts in the repo suggesting we
support >=3.10

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-12 08:52:57 -07:00
dependabot[bot]
88c4fdc5d7
chore(python-deps): bump chromadb from 1.0.15 to 1.0.16 (#3083)
Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.15 to
1.0.16.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/chroma-core/chroma/releases">chromadb's
releases</a>.</em></p>
<blockquote>
<h2>1.0.16</h2>
<p>Version: <code>1.0.16</code>
Git ref: <code>refs/tags/1.0.16</code>
Build Date: <code>2025-08-08T00:26</code>
PIP Package: <code>chroma-1.0.16.tar.gz</code>
Github Container Registry Image: <code>:1.0.16</code>
DockerHub Image: <code>:1.0.16</code></p>
<h2>What's Changed</h2>
<ul>
<li>[ENH]: add cache mount &amp; tolerations to garbage collector
template in Helm chart by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5016">chroma-core/chroma#5016</a></li>
<li>[DOC] Fix docs typo by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5018">chroma-core/chroma#5018</a></li>
<li>[CLN] Change GenericQuotaError from 429 to 422 by <a
href="https://github.com/drewkim"><code>@​drewkim</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5022">chroma-core/chroma#5022</a></li>
<li>[CHORE] Fix type error in batch_utils by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5024">chroma-core/chroma#5024</a></li>
<li>[ENH] Add block-level metrics by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/4801">chroma-core/chroma#4801</a></li>
<li>[ENH]: return error on /add if embeddings are not provided by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5033">chroma-core/chroma#5033</a></li>
<li>[DOC] Docs Polish 07/2025 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5032">chroma-core/chroma#5032</a></li>
<li>[DOC] Flatten public txt files by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5040">chroma-core/chroma#5040</a></li>
<li>[ENH]: require embeddings &amp; require min embedding dimension on
/add by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5037">chroma-core/chroma#5037</a></li>
<li>[ENH] - Adds in dark mode support for hero image by <a
href="https://github.com/tjkrusinskichroma"><code>@​tjkrusinskichroma</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5042">chroma-core/chroma#5042</a></li>
<li>[BLD] Use 8core runners for all our windows jobs by <a
href="https://github.com/eculver"><code>@​eculver</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5027">chroma-core/chroma#5027</a></li>
<li>[TST] More benchmark queries for regex by <a
href="https://github.com/Sicheng-Pan"><code>@​Sicheng-Pan</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/4910">chroma-core/chroma#4910</a></li>
<li>[BUG]: refactor otel/tracing initialization in the frontend to be
independent of hosted entry point by <a
href="https://github.com/c-gamble"><code>@​c-gamble</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5028">chroma-core/chroma#5028</a></li>
<li>[BUG] js client: handle 422 billing errors as QuotaExceeded instead
of ChromaConnectionError by <a
href="https://github.com/philipithomas"><code>@​philipithomas</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5049">chroma-core/chroma#5049</a></li>
<li>[BUG] RLS should use 32MB GRPC payload size limit by <a
href="https://github.com/Sicheng-Pan"><code>@​Sicheng-Pan</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5044">chroma-core/chroma#5044</a></li>
<li>[BUG] Sync protoc arch and version in dockerfile by <a
href="https://github.com/Sicheng-Pan"><code>@​Sicheng-Pan</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5045">chroma-core/chroma#5045</a></li>
<li>[BLD] Fix windows runner label by <a
href="https://github.com/eculver"><code>@​eculver</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5052">chroma-core/chroma#5052</a></li>
<li>[PERF]: Prefetch segments in get and query by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5053">chroma-core/chroma#5053</a></li>
<li>[PERF]: Parallelize fetching blocks for brute force regex by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5051">chroma-core/chroma#5051</a></li>
<li>[RELEASE] JS 3.0.7 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5059">chroma-core/chroma#5059</a></li>
<li>[ENH] Add a delete_many call to the storage API. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5020">chroma-core/chroma#5020</a></li>
<li>[ENH] Consume delete_many from the wal3 garbage collector. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5021">chroma-core/chroma#5021</a></li>
<li>[ENH]: limit number of concurrent get_all_block_ids() when using
buffer_unordered() by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5062">chroma-core/chroma#5062</a></li>
<li>[ENH]: use new <code>delete_many()</code> storage method in
DeleteUnusedFiles operator by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5061">chroma-core/chroma#5061</a></li>
<li>[BUG]: Disable aws stalled stream protection by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5063">chroma-core/chroma#5063</a></li>
<li>[DOC] Update manage collections docs with correct delete collection
info by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5066">chroma-core/chroma#5066</a></li>
<li>[BUG] Improve wal3 robustness with better shutdown handling and
error recovery by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5046">chroma-core/chroma#5046</a></li>
<li>[ENH] Do not do any mutations of the manifest from within GC. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5050">chroma-core/chroma#5050</a></li>
<li>[CHORE]: enable change notifier otel/tracing by <a
href="https://github.com/c-gamble"><code>@​c-gamble</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5073">chroma-core/chroma#5073</a></li>
<li>[CHORE] Add pprof server to query service by <a
href="https://github.com/eculver"><code>@​eculver</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5072">chroma-core/chroma#5072</a></li>
<li>[ENH]: Dedup inserts to the same key in foyer by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5074">chroma-core/chroma#5074</a></li>
<li>[ENH] &quot;Failed to fetch: status: NotFound&quot; be gone. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5064">chroma-core/chroma#5064</a></li>
<li>[CLN] Remove the the top most spammy log lines from rls/wal3. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5071">chroma-core/chroma#5071</a></li>
<li>[DOC] Fix badge in readme by <a
href="https://github.com/kylediaz"><code>@​kylediaz</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5025">chroma-core/chroma#5025</a></li>
<li>[ENH] A tool for patching logs that were deleted before a new
manifest was installed. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5083">chroma-core/chroma#5083</a></li>
<li>[BUG] Add billing errors to JS client by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5084">chroma-core/chroma#5084</a></li>
<li>[CHORE]: Add s3 get metrics and pod name to tracing spans by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5086">chroma-core/chroma#5086</a></li>
<li>[RELEASE] JS 3.0.8 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5087">chroma-core/chroma#5087</a></li>
<li>[ENH] A tool to purge the cache. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5085">chroma-core/chroma#5085</a></li>
<li>[DOC] Update PR template for migration and observability by <a
href="https://github.com/HammadB"><code>@​HammadB</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5089">chroma-core/chroma#5089</a></li>
<li>[CHORE]: Fix s3 get metric name by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5091">chroma-core/chroma#5091</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="dff3a786db"><code>dff3a78</code></a>
[RELEASE] CLI 1.1.5, Python 1.0.16, JS 3.0.11 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5227">#5227</a>)</li>
<li><a
href="f60f932b8d"><code>f60f932</code></a>
[ENH]: Increase nprobe for smaller collections (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5226">#5226</a>)</li>
<li><a
href="f593a43b5d"><code>f593a43</code></a>
[ENH] Add <code>InsertRecordSet</code> to JS client (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5225">#5225</a>)</li>
<li><a
href="76a14c226a"><code>76a14c2</code></a>
[DOC] Made light/dark mode for Chroma logo (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5215">#5215</a>)</li>
<li><a
href="d80817ede4"><code>d80817e</code></a>
[ENH]: Add more tracing in the filter path (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5219">#5219</a>)</li>
<li><a
href="73abfdc51a"><code>73abfdc</code></a>
[ENH] Handle when the garbage doesn't overlap the manifest. (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5207">#5207</a>)</li>
<li><a
href="fa392226ba"><code>fa39222</code></a>
[BUG] Revert accidentally commited code (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5205">#5205</a>)</li>
<li><a
href="815c3ac561"><code>815c3ac</code></a>
[ENH]: Fix CI flake with adaptive nsearch (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5203">#5203</a>)</li>
<li><a
href="ea66d6929c"><code>ea66d69</code></a>
[BUG] Switch to rust-tls (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5204">#5204</a>)</li>
<li><a
href="04aeb22139"><code>04aeb22</code></a>
[ENH]: Calculate cache weight of block size instead of hardcoding (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5201">#5201</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/chroma-core/chroma/compare/1.0.15...1.0.16">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=chromadb&package-manager=uv&previous-version=1.0.15&new-version=1.0.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-12 08:44:39 -07:00
dependabot[bot]
393f3714b0
chore(python-deps): bump torch from 2.7.1 to 2.8.0 (#3082)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.7.1 to 2.8.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytorch/pytorch/releases">torch's
releases</a>.</em></p>
<blockquote>
<h1>PyTorch 2.8.0 Release Notes</h1>
<ul>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#highlights">Highlights</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#backwards-incompatible-changes">Backwards
Incompatible Changes</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#deprecations">Deprecations</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#new-features">New
Features</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#improvements">Improvements</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#bug-fixes">Bug
fixes</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#performance">Performance</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#documentation">Documentation</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#developers">Developers</a></li>
</ul>
<h1>Highlights</h1>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ba56102387"><code>ba56102</code></a>
Cherrypick: Add the RunLLM widget to the website (<a
href="https://redirect.github.com/pytorch/pytorch/issues/159592">#159592</a>)</li>
<li><a
href="c525a02c89"><code>c525a02</code></a>
[dynamo, docs] cherry pick torch.compile programming model docs into 2.8
(<a
href="https://redirect.github.com/pytorch/pytorch/issues/15">#15</a>...</li>
<li><a
href="a1cb3cc05d"><code>a1cb3cc</code></a>
[Release Only] Remove nvshmem from list of preload libraries (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158925">#158925</a>)</li>
<li><a
href="c76b2356bc"><code>c76b235</code></a>
Move out super large one off foreach_copy test (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158880">#158880</a>)</li>
<li><a
href="20a0e225a0"><code>20a0e22</code></a>
Revert &quot;[Dynamo] Allow inlining into AO quantization modules (<a
href="https://redirect.github.com/pytorch/pytorch/issues/152934">#152934</a>)&quot;
(<a
href="https://redirect.github.com/pytorch/pytorch/issues/158">#158</a>...</li>
<li><a
href="9167ac8c75"><code>9167ac8</code></a>
[MPS] Switch Cholesky decomp to column wise (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158237">#158237</a>)</li>
<li><a
href="5534685c62"><code>5534685</code></a>
[MPS] Reimplement <code>tri[ul]</code> as Metal shaders (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158867">#158867</a>)</li>
<li><a
href="d19e08d74b"><code>d19e08d</code></a>
Cherry pick PR 158746 (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158801">#158801</a>)</li>
<li><a
href="a6c044ab9a"><code>a6c044a</code></a>
[cherry-pick] Unify torch.tensor and torch.ops.aten.scalar_tensor
behavior (#...</li>
<li><a
href="620ebd0646"><code>620ebd0</code></a>
[Dynamo] Use proper sources for constructing dataclass defaults (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158689">#158689</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pytorch/pytorch/compare/v2.7.1...v2.8.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torch&package-manager=uv&previous-version=2.7.1&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-12 08:44:24 -07:00
Matthew Farrellee
b70e2f1f09
fix(dep): update to openai >= 1.99.6 and use new Function location (#3087)
# What does this PR do?

closes #3072 

## Test Plan

ci
2025-08-12 08:40:32 -07:00
Mustafa Elbehery
4a13ef45e9
fix: Implement missing run_moderation method in PromptGuardSafetyImpl (#3101)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR addresses an issue where `PromptGuardSafetyImpl` was an
incomplete implementation of an abstract class. The class was missing
the required run_moderation method from its parent interface.


Currently, running `pre-commit` locally fails with the error below.

```
llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation"  [abstract]
Found 1 error in 1 file (checked 410 source files)
```

This PR fixes the issue as follows

- Added the missing run_moderation method to PromptGuardSafetyImpl
- Method raises NotImplementedError with appropriate message indicating
this functionality is not implemented for PromptGuard
- This allows the class to be properly instantiated while clearly
indicating the limitation

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-12 08:32:52 -07:00
Nathan Weinberg
19123ca957
refactor: standardize InferenceRouter model handling (#2965)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m19s
2025-08-12 04:20:39 -06:00
Ashwin Bharambe
803114180b
chore(logging)!: use comma as a delimiter (#3095)
Some checks failed
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Integration Tests (Replay) / discover-tests (push) Successful in 31s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 30s
Python Package Build Test / build (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 32s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m24s
Test Llama Stack Build / build (push) Failing after 54s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 13s
Using commas is much more shell-friendly. A semi-colon is a statement
delimiter and must be escaped.

This change is backwards incompatible but I imagine not many people are
using this. I could be wrong. Looking for feedback.
2025-08-11 11:51:43 -07:00
Francisco Arceo
f7adf58b1b
docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093)
# What does this PR do?

- Adds documentation on how to contribute a Vector DB provider.
- Updates the testing section to be a little friendlier to navigate.
- Also added new shortcut for search so that `/` and `⌘ K` or `ctrl+K`
trigger search


<img width="1903" height="1346" alt="Screenshot 2025-08-11 at 10 10
12 AM"
src="https://github.com/user-attachments/assets/6995b3b8-a2ab-4200-be72-c5b03a784a29"
/>

<img width="1915" height="1438" alt="Screenshot 2025-08-11 at 10 10
25 AM"
src="https://github.com/user-attachments/assets/1f54d30e-5be1-4f27-b1e9-3c3537dcb8e9"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-11 11:11:09 -07:00
Mustafa Elbehery
b5b5f5b9ae
chore: add mypy prompt guard (#2678)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-11 08:40:40 -07:00
Francisco Arceo
7448a4a88c
chore: Updating UI Sidebar (#3081)
# What does this PR do?
This updates the sidebar to look a little more like other popular ones.

<img width="1913" height="1352" alt="Screenshot 2025-08-08 at 11 25
31 PM"
src="https://github.com/user-attachments/assets/00738412-1101-48ec-8864-cde4a8733ec1"
/>

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-11 07:39:52 -07:00
Matthew Farrellee
8faff92591
chore: remove redundant code in unregister_toolgroup (#3092)
# What does this PR do?

removes redundant code

## Test Plan

ci
2025-08-11 07:38:54 -07:00
Eran Cohen
a4bad6c0b4
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration

<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-11 08:22:04 -04:00
Francisco Arceo
78a59a4dbe
chore: Adding GitHub Stars, trends, and contributor shout out to README (#3079)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / discover-tests (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 50s
Unit Tests / unit-tests (3.13) (push) Failing after 48s
Pre-commit / pre-commit (push) Successful in 1m54s
# What does this PR do?

Updates READMe to add 
1. GitHub badge highlighting Llama Stack as #1 Repo of the Day
2. GitHub Star History (cumulative stars chart)
3. Contributor shout out

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-10 21:11:14 -04:00
Varsha
69dc789e15
docs: Add unsupported search mode info about FAISS (#3089) 2025-08-10 17:34:34 -06:00
Varsha
ce72a28525
docs: Update doc on search modes for Milvus (#3078)
# What does this PR do?
Update Milvus doc on using search modes. 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-10 18:48:36 -04:00
Vlastimil Eliáš
1677d6bffd
feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 59s
Pre-commit / pre-commit (push) Successful in 1m41s
PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider

Closes #3046 

## Test Plan
I was not able to locate any existing test for this provider, so I
performed manual testing. But the change is really trivial and
straightforward.
2025-08-08 13:48:15 -07:00
ehhuang
0b5a794c27
fix: telemetry logger spams when queue is full (#3070)
# What does this PR do?


## Test Plan
Ran a stress test on chat completion endpoint locally:

For 10 concurrent users over 3 minutes:
Before:
<img width="1440" height="201" alt="image"
src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6"
/>

After:
<img width="1434" height="204" alt="image"
src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866"
/>

(Will send scripts in a future PR.)
2025-08-08 13:47:36 -07:00
Francisco Arceo
9b70bb9d4b
feat(ui): Adding Vector Store Files to Admin UI (#3041)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 57s
Unit Tests / unit-tests (3.12) (push) Failing after 55s
Pre-commit / pre-commit (push) Successful in 2m10s
# What does this PR do?
This PR updates the UI to create new:
1. `/files/{file_id}` 
2. `files/{file_id}/contents`
3. `files/{file_id}/contents/{content_id}` 

The list of files are clickable which brings the user to the FIles
Detail page
The File Details page shows all of the content
The content details page shows the individual chunk/content parsed 

These only use our existing OpenAI compatible APIs. I have a separate
branch where I expose the embedding and the portal is correctly
populated. I included the FE rendering code for that in this PR.

1. `vector-stores/{vector_store_id}/files/{file_id}` 
<img width="1913" height="1351" alt="Screenshot 2025-08-06 at 10 20
12 PM"
src="https://github.com/user-attachments/assets/08010d5e-60c8-4bd9-9f3e-a2731ed1ad55"
/>

2. `vector-stores/{vector_store_id}/files/{file_id}/contents`
<img width="1920" height="1272" alt="Screenshot 2025-08-06 at 10 21
23 PM"
src="https://github.com/user-attachments/assets/3b91e67b-5d64-4fe6-91b6-18f14587e850"
/>

3.
`vector-stores/{vector_store_id}/files/{file_id}/contents/{content_id}`
<img width="1916" height="1273" alt="Screenshot 2025-08-06 at 10 21
45 PM"
src="https://github.com/user-attachments/assets/d38ca996-e8d9-460c-9e39-7ff0cb5ec0dd"
/>

## Test Plan
I tested this locally and reviewed the code. I generated a significant
share of the code with Claude and some manual intervention. After this,
I'll begin adding tests to the UI.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-08 07:44:06 -07:00
Jiayi Ni
9e78f2da96
docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s
# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A
2025-08-08 11:27:55 +02:00
Ashwin Bharambe
e90fe25890
fix(tests): move llama stack client init back to fixture (#3071)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 50s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 54s
Unit Tests / unit-tests (3.13) (push) Failing after 47s
Pre-commit / pre-commit (push) Successful in 1m44s
See inline comments
2025-08-07 15:29:53 -07:00
Ashwin Bharambe
5f1ddd35e4
chore(tests): refactor and move responses tests away from verifications (#3068)
This PR kills the verifications infrastructure which is no longer used.
It was relocated to the `llama-stack-evals`
(https://github.com/meta-llama/llama-stack-evals) repository previously.

Responses tests used this infrastructure but that wasn't quite
necessary, just a little useful back when @bbrownin introduced the
tests. On Discord, we agreed that tests can be moved to our regular
integrations test infra.

## Test Plan

Some tests currently do fail (although they run!) I will send a
follow-up PR which makes them all pass.
2025-08-07 13:48:16 -07:00
Dean Wampler
342550c1e2
docs: Added comment about a known limitation of AgentEventLogger (#2930)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 7s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Pre-commit / pre-commit (push) Successful in 1m11s
# What does this PR do?
`AgentEventLogger` only supports streaming responses, so I suggest
adding a comment near the bottom of `demo_script.py` letting the user
know this, e.g., if they change the `stream` value to `False` in the
call to `create_turn`, they need to comment out the logging lines.

See https://github.com/llamastack/llama-stack-client-python/issues/15 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Dean Wampler <dean.wampler@ibm.com>
2025-08-07 10:09:57 -07:00
Varsha
e3928e6a29
feat: Implement hybrid search in Milvus (#2644)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
   
    To test:
    ```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
    ```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-07 09:42:03 +02:00
Nathan Weinberg
5a2d323eca
docs: add use of custom exceptions to code style guide (#3049)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Integration Tests (Replay) / discover-tests (push) Successful in 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 1m3s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m5s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 48s
Unit Tests / unit-tests (3.13) (push) Failing after 1m0s
Pre-commit / pre-commit (push) Successful in 1m55s
# What does this PR do?
Adds a blurb to the `CONTRIBUTING.md` encouraging the use of the
standardized custom exception classes for resources where applicable

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-06 14:12:08 -07:00
slekkala1
26d3d25c87
feat: Add moderations create api (#3020)
# What does this PR do?
This PR adds Open AI Compatible moderations api. Currently only
implementing for llama guard safety provider
Image support, expand to other safety providers and Deprecation of
run_shield will be next steps.


## Test Plan
Added 2 new tests for safe/ unsafe text prompt examples for the new open
ai compatible moderations api usage
`SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest
-v tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
(Had some issue with previous PR
https://github.com/meta-llama/llama-stack/pull/2994 while updating and
accidentally close it , reopened new one )
2025-08-06 13:51:23 -07:00
Charlie Doern
0caef40e0d
fix: telemetry fixes (inference and core telemetry) (#2733)
# What does this PR do?

I found a few issues while adding new metrics for various APIs:

currently metrics are only propagated in `chat_completion` and
`completion`

since most providers use the `openai_..` routes as the default in
`llama-stack-client inference chat-completion`, metrics are currently
not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new `openai_` versions of the metric gathering functions which
use `.usage` from the `OpenAI..` response types to gather the metrics
which are already populated.
3. define a `stream_generator` which counts the tokens and computes the
metrics (only for stream=True)
5. add metrics to response


NOTE: I could not add metrics to `openai_completion` where stream=True
because that ONLY returns an `OpenAICompletion` not an AsyncGenerator
that we can manipulate.


acquire the lock, and add event to the span as the other `_log_...`
methods do

some new output:

`llama-stack-client inference chat-completion --message hi`

<img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM"
src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326"
/>


and in the client:

<img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM"
src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007"
/>

these were not previously being recorded nor were they being printed to
the server due to the improper console sink handling

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 13:37:40 -07:00
Ashwin Bharambe
c252dfa3ef
fix(ci): allow tests to skip llama stack client instantiation (#3052)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m16s
Test Llama Stack Build / build (push) Failing after 8s
2025-08-06 11:15:41 -07:00
IAN MILLER
8ba04205ac
docs: remove pure venv references (#3047)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Remove pure venv (without uv) references in docs

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-06 10:42:34 -07:00
Nathan Weinberg
e9fced773a
refactor: introduce common 'ResourceNotFoundError' exception (#3032)
# What does this PR do?
1. Introduce new base custom exception class `ResourceNotFoundError`
2. All other "not found" exception classes now inherit from
`ResourceNotFoundError`

Closes #3030

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-06 10:22:55 -07:00
Ashwin Bharambe
dfce05d0c5
fix(docs): update llama stack build CLI doc (#3050) 2025-08-06 09:32:09 -07:00
ehhuang
3e695cf320
chore: update postgres_demo with new config (#3045)
# What does this PR do?

closes https://github.com/meta-llama/llama-stack/issues/3044

## Test Plan
matches starter's template
2025-08-06 07:48:40 -07:00
Mohamed Rebai
7eff1bb3ec
ci(pre-commit): enforce presence of 'upload-time' field in uv.lock (#2920)
# What does this PR do?
This PR adds a minimum version `0.7.0` to the project. The diff issue
happens because an `upload-time` field in the `uv.lock` file did not
exist in older uv versions (pre `0.6.15`). This effectively prevents
large diffs in PRs from devs that use older versions of uv.

Closes #2887

---------

Co-authored-by: Charlie Doern <charlie@doern.me>
2025-08-06 07:46:59 -07:00
Ashwin Bharambe
7f834339ba
chore(misc): make tests and starter faster (#3042)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s
Pre-commit / pre-commit (push) Successful in 1m53s
A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.

- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
2025-08-05 14:55:05 -07:00
IAN MILLER
e12524af85
feat: create unregister shield API endpoint in Llama Stack (#2853)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Integration Tests (Replay) / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m21s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

Extend the Shields Protocol and implement the capability to unregister
previously registered shields and CLI for shields management.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2581 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

First of, test API for shields
1. Install and start Ollama:

`ollama serve`


2. Pull Llama Guard Model in Ollama:

`ollama pull llama-guard3:8b`

3. Configure env variables:

```
export ENABLE_OLLAMA=ollama
export OLLAMA_URL=http://localhost:11434
```

4. Build Llama Stack distro:

`llama stack build --template starter --image-type venv  `

5. Start Llama Stack server:

`llama stack run starter --port 8321`

6. Check if Ollama model is available:

`curl -X GET http://localhost:8321/v1/models | jq '.data[] |
select(.provider_id=="ollama")'`

7. Register a new Shield using Ollama provider:

```
curl -X POST http://localhost:8321/v1/shields \
 -H "Content-Type: application/json" \
 -d '{
   "shield_id": "test-shield",
   "provider_id": "llama-guard",
   "provider_shield_id": "ollama/llama-guard3:8b",
   "params": {}
 }'
```

`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

8. Check if shield was registered:

`curl -X GET http://localhost:8321/v1/shields/test-shield`


`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

9. Run shield:

```
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "test-shield",
    "messages": [
      {
        "role": "user",
        "content": "How can I hack into someone computer?"
      }
    ],
    "params": {}
  }'
```

`{"violation":{"violation_level":"error","user_message":"I can't answer
that. Can I help with something
else?","metadata":{"violation_type":"S2"}}}% `

10. Unregister shield:

`curl -X DELETE http://localhost:8321/v1/shields/test-shield`

`null% `

11. Verify shield was deleted:

`curl -X GET http://localhost:8321/v1/shields/test-shield`

`{"detail":"Invalid value: Shield 'test-shield' not found"}%`

All tests passed 

```
========================================================================== 430 passed, 194 warnings in 19.54s ==========================================================================
/Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
  loop.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Wrote HTML report to htmlcov-3.12/index.html

```
2025-08-05 07:33:46 -07:00
github-actions[bot]
e565b91182 build: Bump version to 0.2.17
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m38s
2025-08-05 01:43:30 +00:00
Ashwin Bharambe
ea46f74092 fix: rectify typo in MANIFEST.in due to #2975 2025-08-04 18:22:49 -07:00
ehhuang
bb6b6041d6
chore: fix: integration tests failures marked as successful (#3039) 2025-08-04 17:06:28 -07:00
Francisco Arceo
eac1e0c7d4
chore: Fixing Markdown renderer (#3038) 2025-08-04 14:16:09 -07:00
Nathan Weinberg
68b0071861
chore: standardize session not found error (#3031)
# What does this PR do?
1. Creates a new `SessionNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-04 13:12:02 -07:00
Nathan Weinberg
05cfa213b6
chore: standardize tool group not found error (#2986)
# What does this PR do?
1. Creates a new `ToolGroupNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-04 11:41:33 -07:00
dependabot[bot]
55a2694c80
chore(python-deps): bump openai from 1.97.1 to 1.98.0 (#3025)
Bumps [openai](https://github.com/openai/openai-python) from 1.97.1 to
1.98.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/releases">openai's
releases</a>.</em></p>
<blockquote>
<h2>v1.98.0</h2>
<h2>1.98.0 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.2...v1.98.0">v1.97.2...v1.98.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> manual updates (<a
href="88a8036c5e">88a8036</a>)</li>
</ul>
<h2>v1.97.2</h2>
<h2>1.97.2 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.1...v1.97.2">v1.97.1...v1.97.2</a></p>
<h3>Chores</h3>
<ul>
<li><strong>client:</strong> refactor streaming slightly to better
future proof it (<a
href="71c0c74713">71c0c74</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="29c22c90fd">29c22c9</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's
changelog</a>.</em></p>
<blockquote>
<h2>1.98.0 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.2...v1.98.0">v1.97.2...v1.98.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> manual updates (<a
href="88a8036c5e">88a8036</a>)</li>
</ul>
<h2>1.97.2 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.1...v1.97.2">v1.97.1...v1.97.2</a></p>
<h3>Chores</h3>
<ul>
<li><strong>client:</strong> refactor streaming slightly to better
future proof it (<a
href="71c0c74713">71c0c74</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="29c22c90fd">29c22c9</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a3315d9fcc"><code>a3315d9</code></a>
release: 1.98.0 (<a
href="https://redirect.github.com/openai/openai-python/issues/2503">#2503</a>)</li>
<li><a
href="48188cc8d5"><code>48188cc</code></a>
release: 1.97.2 (<a
href="https://redirect.github.com/openai/openai-python/issues/2494">#2494</a>)</li>
<li>See full diff in <a
href="https://github.com/openai/openai-python/compare/v1.97.1...v1.98.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.97.1&new-version=1.98.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-04 11:40:56 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
dependabot[bot]
12f964437a
chore(python-deps): bump opentelemetry-exporter-otlp-proto-http from 1.35.0 to 1.36.0 (#3027)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Test Llama Stack Build / build-single-provider (push) Failing after 19s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 28s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 34s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Has started running
Test Llama Stack Build / build (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m46s
Bumps
[opentelemetry-exporter-otlp-proto-http](https://github.com/open-telemetry/opentelemetry-python)
from 1.35.0 to 1.36.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/open-telemetry/opentelemetry-python/blob/main/CHANGELOG.md">opentelemetry-exporter-otlp-proto-http's
changelog</a>.</em></p>
<blockquote>
<h2>Version 1.36.0/0.57b0 (2025-07-29)</h2>
<ul>
<li>
<p>Add missing Prometheus exporter documentation
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4485">#4485</a>)</p>
</li>
<li>
<p>Overwrite logging.config.fileConfig and logging.config.dictConfig to
ensure
the OTLP <code>LogHandler</code> remains attached to the root logger.
Fix a bug that
can cause a deadlock to occur over <code>logging._lock</code> in some
cases (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4636">#4636</a>).</p>
</li>
<li>
<p>otlp-http-exporter: set default value for param
<code>timeout_sec</code> in <code>_export</code> method
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4691">#4691</a>)</p>
</li>
<li>
<p>Update OTLP gRPC/HTTP exporters: calling shutdown will now interrupt
exporters that are sleeping
before a retry attempt, and cause them to return failure immediately.
Update BatchSpan/LogRecordProcessors: shutdown will now complete after
30 seconds of trying to finish
exporting any buffered telemetry, instead of continuing to export until
all telemetry was exported.
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4638">#4638</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1aaa2a2587"><code>1aaa2a2</code></a>
Prepare release 1.36.0/0.57b0 (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4704">#4704</a>)</li>
<li><a
href="f9ca4755af"><code>f9ca475</code></a>
Use <code>@pytest.mark.flaky</code> decorator instead of
<code>@flaky.flaky</code> (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4700">#4700</a>)</li>
<li><a
href="eb1a4c574c"><code>eb1a4c5</code></a>
otlp-http-exporter: set default value for param <code>timeout_sec</code>
in <code>_export</code> me...</li>
<li><a
href="23aad5e4ad"><code>23aad5e</code></a>
Add permissions that were missed on the first pass (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4692">#4692</a>)</li>
<li><a
href="344c647774"><code>344c647</code></a>
Add minimum token permissions for all github workflow files (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4663">#4663</a>)</li>
<li><a
href="ff9dc82d3a"><code>ff9dc82</code></a>
Migrate from opentelemetrybot to otelbot (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4685">#4685</a>)</li>
<li><a
href="d4e606846e"><code>d4e6068</code></a>
Interrupt exporter retry backoff sleeps when shutdown is called. Update
Batch...</li>
<li><a
href="a28b0cadce"><code>a28b0ca</code></a>
Fix broken link in Prometheus exporter README. Fixes <a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4399">#4399</a>
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4485">#4485</a>)</li>
<li><a
href="9746645818"><code>9746645</code></a>
Introducing tox-uv (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4516">#4516</a>)</li>
<li><a
href="57cb935e88"><code>57cb935</code></a>
Fix issue where deadlock can occur over logging._lock (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4636">#4636</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/open-telemetry/opentelemetry-python/compare/v1.35.0...v1.36.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=opentelemetry-exporter-otlp-proto-http&package-manager=uv&previous-version=1.35.0&new-version=1.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-04 09:37:58 -07:00
dependabot[bot]
48b49e318f
chore(python-deps): bump weaviate-client from 4.16.4 to 4.16.5 (#3026)
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps
[weaviate-client](https://github.com/weaviate/weaviate-python-client)
from 4.16.4 to 4.16.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's
releases</a>.</em></p>
<blockquote>
<h2>v3.13.0 - Support for Weaviate v1.18</h2>
<h2>What's Changed</h2>
<ul>
<li>Extend CRUD operations for single data objects and reference with
consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
<li>Extend batch operations with consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/240">weaviate/weaviate-python-client#240</a></li>
<li>Add Cursor api by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/241">weaviate/weaviate-python-client#241</a></li>
<li>Add support for backup Azure module by <a
href="https://github.com/antas-marcin"><code>@​antas-marcin</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/246">weaviate/weaviate-python-client#246</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
made their first contribution in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
<li><a
href="https://github.com/antas-marcin"><code>@​antas-marcin</code></a>
made their first contribution in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/246">weaviate/weaviate-python-client#246</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.13.0">https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.13.0</a></p>
<h2>v3.12.1b - Support for weaviate v1.18</h2>
<h2>What's Changed</h2>
<ul>
<li>Extend CRUD operations for single data objects and reference with
consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
<li>Extend batch operations with consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/240">weaviate/weaviate-python-client#240</a></li>
<li>Add Cursor api by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/241">weaviate/weaviate-python-client#241</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
made their first contribution in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.12.1b">https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.12.1b</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's
changelog</a>.</em></p>
<blockquote>
<h2>Version 4.16.5</h2>
<p>This patch version includes:
- Add <code>dimensions</code> property to Google vectorizers in
<code>Configure.Vectors</code></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="731cbf0b9a"><code>731cbf0</code></a>
Update changelog (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1768">#1768</a>)</li>
<li><a
href="2627bf39c1"><code>2627bf3</code></a>
Bump ruff from 0.12.4 to 0.12.5 (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1761">#1761</a>)</li>
<li><a
href="401a1e2ff0"><code>401a1e2</code></a>
Bump coverage from 7.9.2 to 7.10.1 (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1760">#1760</a>)</li>
<li><a
href="44aef22189"><code>44aef22</code></a>
Bump authlib from 1.6.0 to 1.6.1 (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1749">#1749</a>)</li>
<li><a
href="dca002e39e"><code>dca002e</code></a>
Add <code>dimensions</code> property to Google vectorizers in
<code>Configure.Vectors</code> (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1767">#1767</a>)</li>
<li>See full diff in <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.4...v4.16.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.4&new-version=4.16.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-04 09:37:31 -07:00
Matthew Farrellee
4411e6e362
chore(ci): remove reportlab dep (#3033)
# What does this PR do?

remove reportlab dep. change dynamic pdf generation into a pre-computed
pdf.

## Test Plan

ci
2025-08-04 09:36:13 -07:00
Eran Cohen
e5b542dd8e
feat: switch to async completion in LiteLLM OpenAI mixin (#3029)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 17s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Test External API and Providers / test-external (venv) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m10s
2025-08-03 12:08:56 -07:00
Varsha
dbfc15123e
test: Implement vector store search test (#3001)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 7s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 45s
Update ReadTheDocs / update-readthedocs (push) Failing after 35s
Pre-commit / pre-commit (push) Successful in 1m30s
# What does this PR do?
Implement vector store search test

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
```
pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v
```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-02 15:57:38 -07:00
Varsha
3c2aee610d
refactor: Remove double filtering based on score threshold (#3019)
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`

Closes: https://github.com/meta-llama/llama-stack/issues/3018

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:57:03 -07:00
ehhuang
1e3b5aa9b8
chore: CI action names (#3014)
# What does this PR do?


## Test Plan

CI
<img width="795" height="162" alt="image"
src="https://github.com/user-attachments/assets/78dedfa6-809c-4d82-9eb3-6479234dd657"
/>
2025-08-02 15:56:42 -07:00
dependabot[bot]
edc19698fb
chore(python-deps): bump huggingface-hub from 0.34.2 to 0.34.3 (#3028)
Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub)
from 0.34.2 to 0.34.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's
releases</a>.</em></p>
<blockquote>
<h2>[v0.34.3] Jobs improvements and <code>whoami</code> user prefix</h2>
<ul>
<li>[Jobs] Update uv image <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3270">#3270</a>
by <a href="https://github.com/lhoestq"><code>@​lhoestq</code></a></li>
<li>[Update] HF Jobs Documentation <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3268">#3268</a>
by <a
href="https://github.com/ariG23498"><code>@​ariG23498</code></a></li>
<li>Add 'user:' prefix to whoami command output <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3267">#3267</a>
by <a href="https://github.com/gary149"><code>@​gary149</code></a></li>
</ul>
<p>Full Changelog: <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3">https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0bbc5e1b10"><code>0bbc5e1</code></a>
Release: v0.34.3</li>
<li><a
href="f464fc15f3"><code>f464fc1</code></a>
update uv image (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3270">#3270</a>)</li>
<li><a
href="24c77eb319"><code>24c77eb</code></a>
[Update] HF Jobs Documentation (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3268">#3268</a>)</li>
<li><a
href="977c018e3d"><code>977c018</code></a>
Add 'user:' prefix to whoami command output for consistency (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3267">#3267</a>)</li>
<li>See full diff in <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=huggingface-hub&package-manager=uv&previous-version=0.34.2&new-version=0.34.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-02 15:53:46 -07:00
IAN MILLER
a749d5f4a4
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2539

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:52:59 -07:00
ehhuang
f2eee4e417
chore: create integration-tests script (#3016)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s
Python Package Build Test / build (3.13) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 36s
Unit Tests / unit-tests (3.12) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 40s
Python Package Build Test / build (3.12) (push) Failing after 33s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 43s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s
Pre-commit / pre-commit (push) Successful in 2m22s
2025-08-01 17:38:49 -07:00
ehhuang
6ac710f3b0
fix(recording): endpoint resolution (#3013)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Unit Tests / unit-tests (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 56s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s
Pre-commit / pre-commit (push) Successful in 1m49s
# What does this PR do?


## Test Plan
2025-08-01 16:23:54 -07:00
Matthew Farrellee
140ee7d337
fix: sambanova inference provider (#2996)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

closes #2995 

update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin

## Test Plan

```
$ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct
...
======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ========================
```
2025-08-01 09:09:14 -07:00
Francisco Arceo
0527c0fb15
chore: Update README for supported DBs (#3005)
# What does this PR do?
Update README for supported DBs

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-01 08:23:36 -07:00
Varsha
1f0766308d
feat: Add openAI compatible APIs to Qdrant (#2465)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Integration Tests (Replay) / discover-tests (push) Successful in 24s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s
Test Llama Stack Build / build (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #2463 


## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-01 00:41:34 -04:00
ehhuang
194abe7734
test: use llama stack build when starting server (#2999)
# What does this PR do?
This should be more robust as sometimes its run without running build
first.

## Test Plan
OLLAMA_URL=http://localhost:11434 LLAMA_STACK_TEST_INFERENCE_MODE=replay
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings
LLAMA_STACK_CONFIG=server:starter uv run --with pytest-repeat pytest
tests/integration/telemetry
--text-model="ollama/llama3.2:3b-instruct-fp16" -vvs
2025-07-31 21:09:14 -07:00
Ashwin Bharambe
0b08d64ddb
feat(ci): introduce workflow for re-recording inference outputs (#3002) 2025-07-31 17:30:47 -07:00
Francisco Arceo
33cca26154
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do?

This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.

This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.

## Test Plan
CI enabled

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-31 20:29:50 -04:00
Ashwin Bharambe
369286f95b fix(ci): syntax error in the disabled workflow
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s
Python Package Build Test / build (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Python Package Build Test / build (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s
Test External API and Providers / test-external (venv) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 23s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 45s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m3s
2025-07-31 15:35:42 -07:00
Ashwin Bharambe
89ff93182c
feat(ci): only run on 3.12, run on both 3.12 and 3.13 nightly (#3000)
We don't need to run on all python versions all the time
2025-07-31 15:32:05 -07:00
Ashwin Bharambe
f4489eeb83
fix(ci): simplify integration tests replay mode (#2997)
We are going to split record and replay workflows completely to simplify
the concurrency key design.

We can add vision tests by just adding to our matrix.
2025-07-31 15:18:18 -07:00
Matthew Farrellee
218c89fff1
feat: Add clear error message when API key is missing (#2992)
# What does this PR do?

Improve user experience by providing specific guidance when no API key
is available, showing both provider data header and config options with
the correct field name for each provider.

Also adds comprehensive test coverage for API key resolution scenarios.

addresses #2990 for providers using litellm openai mixin

## Test Plan

`./scripts/unit-tests.sh
tests/unit/providers/inference/test_litellm_openai_mixin.py`
2025-07-31 16:33:16 -04:00
Ashwin Bharambe
22f79bdb9e fix(ci): lets attempt another fix for concurrency 2025-07-31 13:22:24 -07:00
Ashwin Bharambe
18576349ca fix(ci): simplified concurrency and job eligibility criteria 2025-07-31 13:11:04 -07:00
Ashwin Bharambe
d1b300ead9 fix(ci, nvidia): do not use module level pytest skip for now 2025-07-31 12:32:31 -07:00
Ashwin Bharambe
752fd3b1c1 fix(ci): use single quotes please 2025-07-31 11:56:25 -07:00
Ashwin Bharambe
5ba25efd54 fix(ci): ensure workflow runs when manually run or scheduled 2025-07-31 11:54:51 -07:00
Ashwin Bharambe
27d866795c
feat(ci): add support for running vision inference tests (#2972)
This PR significantly refactors the Integration Tests workflow. The main
goal behind the PR was to enable recording of vision tests which were
never run as part of our CI ever before. During debugging, I ended up
making several other changes refactoring and hopefully increasing the
robustness of the workflow.

After doing the experiments, I have updated the trigger event to be
`pull_request_target` so this workflow can get write permissions by
default but it will run with source code from the base (main) branch in
the source repository only. If you do change the workflow, you'd need to
experiment using the `workflow_dispatch` triggers. This should not be
news to anyone using Github Actions (except me!)

It is likely to be a little rocky though while I learn more about GitHub
Actions, etc. Please be patient :)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-31 11:50:42 -07:00
Charlie Doern
709c974bd8
fix: integration tests not triggering on PR open (#2985)
# What does this PR do?

I realized that when a new PR is opened, the integration tests aren't
triggering (or aren't always?) since the replay logic was introduced

amend the concurrency logic a bit to trigger  on opened PRs

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-31 11:36:44 -07:00
Nehanth Narendrula
b41d696e4f
fix: Post Training Model change in Tests in order to make it less intensive (#2991)
# What does this PR do?

Changed from` ibm-granite/granite-3.3-2b-instruct` to`
HuggingFaceTB/SmolLM2-135M-Instruct` so it as not resource intensive in
CI

Idea came from -
https://github.com/meta-llama/llama-stack/pull/2984#issuecomment-3140400830
2025-07-31 11:22:34 -07:00
Nathan Weinberg
ffb6306fbd
fix: remove redundant code from unregister_vector_db (#2983)
get_vector_db() will raise an exception if a vector store won't be
returned

client handling is redundant

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-31 09:22:04 -07:00
Christian Zaccaria
ea8dd58144
chore: Remove coverage badge from README.md (#2976)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
It looks like the coverage badge is still present in the README. This PR
removes it.

For more context: https://github.com/meta-llama/llama-stack/pull/2950
2025-07-31 09:21:30 -07:00
Kelly Brown
8a6c0fb930
docs: Reformat external provider documentation (#2982)
**Description** 
This PR adjusts the external providers documentation to align with the
new providers format. Splits up sections into the existing external
providers and how to create them as well.

<img width="1049" height="478" alt="Screenshot 2025-07-31 at 9 48 26 AM"
src="https://github.com/user-attachments/assets/f13599cb-2fd1-4e57-8ca9-27b067264e33"
/>

Open to feedback and adjusting titles
2025-07-31 09:21:13 -07:00
Nehanth Narendrula
3a574ef23c
fix: remove unused DPO parameters from schema and tests (#2988)
# What does this PR do?

I removed these DPO parameters from the schema in [this
PR](https://github.com/meta-llama/llama-stack/pull/2804), but I may not
have done it correctly, since they were reintroduced in [this
commit](cb7354a9ce (diff-4e9a8cb358213d6118c4b6ec2a76d0367af06441bf0717e13a775ade75e2061dR15081))—likely
due to a pre-commit hook.

I've made the changes again, and the pre-commit hook automatically
updated the spec sheet.
2025-07-31 09:11:08 -07:00
Charlie Doern
5c33bc1353
fix: post_training ci (#2984)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s
Integration Tests / record-tests (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 28s
Integration Tests / run-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 29s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 45s
Pre-commit / pre-commit (push) Successful in 1m30s
2025-07-31 08:26:06 -07:00
Nehanth Narendrula
cf73146132
feat: Enable DPO training with HuggingFace inline provider (#2825)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 31s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s
Unit Tests / unit-tests (3.12) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s
Test Llama Stack Build / build (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s
Pre-commit / pre-commit (push) Successful in 1m50s
What does this PR do?

This PR adds support for Direct Preference Optimization (DPO) training
via the existing HuggingFace inline provider. It introduces a new DPO
training recipe, config schema updates, dataset integration, and
end-to-end testing to support preference-based fine-tuning with TRL.

Test Plan

Added integration test:

tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize

Ran tests on both CPU and CUDA environments

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 23:33:36 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Francisco Arceo
f3d5459647
feat(UI): adding MVP playground UI (#2828)
# What does this PR do?
I've been tinkering a little with a simple chat playground in the UI, so
I'm opening the PR with what's kind of a WIP.

If you look at the first commit, that includes the big part of the
changes. The rest of the files changed come from adding installing the
`shadcn` components.

Note this is missing a lot; e.g.,
- sessions
- document upload
- audio (the shadcn components install these by default from
https://shadcn-chatbot-kit.vercel.app/docs/components/chat)

I still need to wire up a lot more to make it actually fully functional
but it does basic chat using the LS Typescript Client.

Basic demo: 

<img width="1329" height="1430" alt="Image"
src="https://github.com/user-attachments/assets/917a2096-36d4-4925-b83b-f1f2cda98698"
/>

<img width="1319" height="1424" alt="Image"
src="https://github.com/user-attachments/assets/fab1583b-1c72-4bf3-baf2-405aee13c6bb"
/>


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-30 19:44:16 -07:00
Ashwin Bharambe
d6ae2b0f47
fix(ci): more correct concurrency key for workflows (#2973)
See comment inline. We don't want a random label to pre-empt an existing
workflow which had gone ahead.
2025-07-30 18:23:14 -07:00
Nathan Weinberg
406ca72957
fix: remove redundant code from unregister_dataset (#2971)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 53s
Update ReadTheDocs / update-readthedocs (push) Failing after 52s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s
Pre-commit / pre-commit (push) Successful in 1m44s
get_dataset() will raise an exception if a dataset won't be returned

client handling is redundant

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 16:40:01 -07:00
Sai Prashanth S
cb7354a9ce
docs: Add detailed docstrings to API models and update OpenAPI spec (#2889)
This PR focuses on improving the developer experience by adding
comprehensive docstrings to the API data models across the Llama Stack.
These docstrings provide detailed explanations for each model and its
fields, making the API easier to understand and use.

**Key changes:**
- **Added Docstrings:** Added reST formatted docstrings to Pydantic
models in the `llama_stack/apis/` directory. This includes models for:
  - Agents (`agents.py`)
  - Benchmarks (`benchmarks.py`)
  - Datasets (`datasets.py`)
  - Inference (`inference.py`)
  - And many other API modules.
- **OpenAPI Spec Update:** Regenerated the OpenAPI specification
(`docs/_static/llama-stack-spec.yaml` and
`docs/_static/llama-stack-spec.html`) to include the new docstrings.
This will be reflected in the API documentation, providing richer
information to users.

**Impact:**
- Developers using the Llama Stack API will have a better understanding
of the data structures.
- The auto-generated API documentation is now more informative.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 16:32:59 -07:00
Nathan Weinberg
cd5c6a2fcd
chore: standardize vector store not found error (#2968)
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 15:19:16 -07:00
Nathan Weinberg
272a3e9937
chore: standardize dataset not found error (#2962)
# What does this PR do?
1. Adds a broad schema for custom exception classes in the Llama Stack
project
2. Creates a new `DatasetNotFoundError` class
3. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 14:52:46 -07:00
IAN MILLER
25d3dfa30f
fix: fix No module named 'ollama' in test_inference_recordings.py (#2967)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the following error in unit test that was running on up to
date main branch:
```
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_recording_mode - ModuleNotFoundError: No module named 'ollama'
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_replay_mode - ModuleNotFoundError: No module named 'ollama'
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_replay_missing_recording - ModuleNotFoundError: No module named 'ollama'
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_embeddings_recording - ModuleNotFoundError: No module named 'ollama'
=============================== 4 failed, 499 passed, 198 warnings in 34.50s ================================
```


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run  `./scripts/unit-tests.sh`
2025-07-30 16:33:33 -04:00
Nathan Weinberg
c5622c79de
chore: standardize model not found error (#2964)
# What does this PR do?
1. Creates a new `ModelNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 12:19:53 -07:00
Ashwin Bharambe
266e2afb9c
fix(ci): slightly update workflow trigger (#2966)
We want to avoid re-triggering the workflow when random other labels are
added (e.g., `meta-cla`, etc.) Also no point restarting the workflow
when someone _unlabels_.
2025-07-30 12:04:13 -07:00
Kelly Brown
026caa5551
docs: part 1 - fix warnings in documentation generation (#2861)
**Description**
This PR removes some of the warnings when uv builds the docs
- Errors appear when generating docs about .md files not appearing in
toctree. ~~Adding content to the `providers-gen.py ` file that adds `---
orphan: true ---` to to each file.~~. Added a toctree generator to the
`providers-gen.py` file, this gets rid of the errors in the builds.
- Deletes the `_openai_compat` files, extension of PR #2849
- Adds the `files` APIs section to the `providers` toctree on the index
page
- Manually adds the `--- orphan: true ---` to the advanced apis. Ill try
to find a way to modify the providers code gen so it automatically adds
it, but this fixes the errors.
- Adds the `testing.md` to the `contributing` toctree
- Adds `starting_llama_stack_server.md` to `distributions` toctree

There are some other warnings im still looking at but this PR gets rid
of most of the toctree errors
Theres also an issue with the actual distribution-codegen that I can
investigate in another PR. Opened a bug for it here #2873
2025-07-30 10:50:10 -07:00
ehhuang
38d5c44354
chore: fix k8s config (#2959)
# What does this PR do?


## Test Plan
deployed to EKS
2025-07-30 10:11:59 -07:00
Ashwin Bharambe
fd2aaf4978
fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)
We tried to always keep Ollama enabled. However doing so makes the
provider implementation half-assed -- should it error when it cannot
connect to Ollama or not? What happens during periodic model refresh?
Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to
conditionally enable the provider.

## Test Plan

Run `uv run llama stack build --template starter --image-type venv
--run` with and without `OLLAMA_URL` set. Verify using
`llama-stack-client provider list` that ollama is correctly enabled.
2025-07-30 10:11:17 -07:00
Matthew Farrellee
b69bafba30
fix(library_client): improve initialization error handling and prevent AttributeError (#2944)
# What does this PR do?

- Initialize route_impls to None in constructor to prevent
AttributeError
- Consolidate initialization checks to single point in request() method
- Improve error message to be more helpful ("Please call initialize()
first")
- Add comprehensive test suite to prevent regressions

The library client now has better error handling when users forget to
call initialize(), showing a clear ValueError instead of confusing
AttributeError. All initialization validation is now centralized in the
request() method, with internal methods (_call_non_streaming,
_call_streaming, _convert_body) relying on this single check for
cleaner, more maintainable code.

closes #2943 

## Test Plan

`./scripts/unit-tests.sh`
2025-07-30 11:58:47 -04:00
Ashwin Bharambe
9b69b6ac05 fix: pre-commit issue
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Integration Tests / discover-tests (push) Successful in 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 27s
Integration Tests / record-tests (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 29s
Unit Tests / unit-tests (3.13) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 33s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 33s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 37s
Unit Tests / unit-tests (3.12) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Integration Tests / run-tests (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 1m43s
2025-07-29 17:52:36 -07:00
Ashwin Bharambe
f6afb3c26b
feat(ci): keep only one re-recording job because independent recordings will conflict (#2956)
A couple of important updates:

- When recording tests, we cannot be generating a matrix because all the
independent recordings will conflict.
- In fact, we just don't need a matrix on test types any more because
things are very fast and the overhead of `llama stack build` and setting
up `uv` etc. is much more.
- Refactored the running of tests into an independent action
2025-07-29 17:48:04 -07:00
Ashwin Bharambe
b237df8f18
feat(ci): use replay mode, setup ollama if specific label exists on PR (#2955)
This PR makes setting up Ollama optional for CI. By default, we use
`replay` mode for inference requests and use the stored results from the
`tests/integration/recordings/` directory.

Every so often, users will update tests which will need us to re-record.
To do this, we check for the existence of a label `re-record-tests` on
the PR. If detected,
- ollama is spun up
- inference mode is set to record
- after the tests are done, if any new changes are detected, they are
pushed back to the PR

## Test Plan

This is GitHub CI. Gotta test it live.
2025-07-29 16:50:26 -07:00
Ashwin Bharambe
0ac503ec0d
feat(tests): record responses for evals and telemetry tests (#2954)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 10s
Test Llama Stack Build / build (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s
Python Package Build Test / build (3.13) (push) Failing after 38s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
Pre-commit / pre-commit (push) Successful in 2m2s
Continuing with https://github.com/meta-llama/llama-stack/pull/2952

This also includes a "fix" to inference store related tests so that we
pull a large number of inference responses from the DB so as to always
find the one we just wrote.
2025-07-29 15:46:21 -07:00
Ashwin Bharambe
81c7d6fa2e
chore(ci): disable post training tests (#2953)
Post training tests need _much_ better thinking before we can re-enable
them to be run on every single PR. Running periodically should be
approached only when it is shown that the tests are reliable and as
light-weight as can be; otherwise, it is just kicking the can down the
road.
2025-07-29 14:20:09 -07:00
Ashwin Bharambe
072d20a124
feat(test): record agents, safety and vector_io integration tests (#2952)
Continue to build on top of
https://github.com/meta-llama/llama-stack/pull/2941

## Test Plan

Run server with `LLAMA_STACK_TEST_INFERENCE_MODE=record` and then run
the integration tests with `--stack-config=server:starter`. Then restart
the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay` and re-run the
tests. Verify that no request hit Ollama at any point.
2025-07-29 14:02:14 -07:00
Matthew Farrellee
2d1ab3ca55
fix: use same image_name logic for build & run config (#2949)
# What does this PR do?

when --image-name is not provided the build script default to the
image_name in the config, this makes sure the same is done for the run
script

## Test Plan

llama stack build w/o --image-name
2025-07-29 12:54:21 -07:00
Francisco Arceo
6ac973ec80
chore: Delete coverage-badge (#2950)
At the moment, the code coverage action has just been failing. It's
misleading when interpreting the status badge on the main branch.


https://github.com/meta-llama/llama-stack/actions/workflows/coverage-badge.yml

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-29 12:53:25 -07:00
Ashwin Bharambe
2e5ca3f15c chore: move recordings one directory upwards 2025-07-29 12:46:19 -07:00
Ashwin Bharambe
08b4a1deb3
feat(tests): introduce inference record/replay to increase test reliability (#2941)
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.

For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.

As expected, tests become much much faster (more than 3x in just
inference testing.)

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)
2025-07-29 12:41:31 -07:00
Ashwin Bharambe
abf1d6a703 fix: random breakage in llama_stack/ui/package.json 2025-07-29 12:31:29 -07:00
Ashwin Bharambe
fee365b71e fix: delete requirements.txt which crept back in 2025-07-29 11:30:25 -07:00
Nehanth Narendrula
58ffd82853
fix: Update SFTConfig parameter to fix CI and Post Training Workflow (#2948)
# What does this PR do?

- Change max_seq_length to max_length in SFTConfig constructor
- TRL deprecated max_seq_length in Feb 2024 and removed it in v0.20.0
- Reference: https://github.com/huggingface/trl/pull/2895

This resolves the SFT training failure in CI tests
2025-07-29 11:14:04 -07:00
Matthew Farrellee
c7dc0f21b4
fix: error on failed job, do not wait for timeout (#2945)
# What does this PR do?

cause post training integration test to error when job fails.

## Test Plan

ci
2025-07-29 11:07:51 -07:00
Nathan Weinberg
870a37ff4b
feat: add base64 encoded PDF support for OpenAI Chat Completions (#2881)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 29s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s
Python Package Build Test / build (3.13) (push) Failing after 54s
Integration Tests / test-matrix (push) Failing after 56s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m4s
Pre-commit / pre-commit (push) Successful in 2m15s
# What does this PR do?
OpenAI Chat Completions supports passing a base64 encoded PDF file to a
model, but Llama Stack currently does not allow for this behavior. This
PR extends our implementation of the OpenAI API spec to change that.

Closes #2129

## Test Plan
A new functional test has been added to test the validity of such a
request

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-29 06:23:41 -04:00
github-actions[bot]
cf8722079c build: Bump version to 0.2.16
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / discover-tests (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Integration Tests / test-matrix (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s
Pre-commit / pre-commit (push) Successful in 1m23s
2025-07-28 23:13:50 +00:00
Mark Campbell
19c90d9bfc
docs: update using llama stack as library docs (#2931)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 10s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Coverage Badge / unit-tests (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 42s
Update ReadTheDocs / update-readthedocs (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 1m58s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Updates provider template from outdated `ollama` to `starter` 
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes: #2839 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-28 15:35:26 -07:00
ehhuang
4019027070
chore: revert #2855 (#2939)
# What does this PR do?
revert https://github.com/meta-llama/llama-stack/pull/2855 to unblock
release (running out of disk space)

Error here:
4689354931

## Test Plan
2025-07-28 15:30:25 -07:00
dependabot[bot]
e189f65548
chore(python-deps): bump pydantic from 2.10.6 to 2.11.7 (#2925)
Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.10.6 to
2.11.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pydantic/pydantic/releases">pydantic's
releases</a>.</em></p>
<blockquote>
<h2>v2.11.7 2025-06-14</h2>
<!-- raw HTML omitted -->
<h2>What's Changed</h2>
<h3>Fixes</h3>
<ul>
<li>Copy <code>FieldInfo</code> instance if necessary during
<code>FieldInfo</code> build by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11980">pydantic/pydantic#11980</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/pydantic/pydantic/compare/v2.11.6...v2.11.7">https://github.com/pydantic/pydantic/compare/v2.11.6...v2.11.7</a></p>
<h2>v2.11.6 2025-06-13</h2>
<h2>v2.11.6 (2025-06-13)</h2>
<h3>What's Changed</h3>
<h4>Fixes</h4>
<ul>
<li>Rebuild dataclass fields before schema generation by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11949">#11949</a></li>
<li>Always store the original field assignment on <code>FieldInfo</code>
by <a href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11946">#11946</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/pydantic/pydantic/compare/v2.11.5...v2.11.6">https://github.com/pydantic/pydantic/compare/v2.11.5...v2.11.6</a></p>
<h2>v2.11.5 2025-05-22</h2>
<!-- raw HTML omitted -->
<h2>What's Changed</h2>
<h3>Fixes</h3>
<ul>
<li>Check if <code>FieldInfo</code> is complete after applying type
variable map by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11855">#11855</a></li>
<li>Do not delete mock validator/serializer in
<code>model_rebuild()</code> by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11890">#11890</a></li>
<li>Do not duplicate metadata on model rebuild by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11902">#11902</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/pydantic/pydantic/compare/v2.11.4...v2.11.5">https://github.com/pydantic/pydantic/compare/v2.11.4...v2.11.5</a></p>
<h2>v2.11.4 2025-04-29</h2>
<h3>What's Changed</h3>
<h4>Packaging</h4>
<ul>
<li>Bump <code>mkdocs-llmstxt</code> to v0.2.0 by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11725">#11725</a></li>
</ul>
<h4>Changes</h4>
<ul>
<li>Allow config and bases to be specified together in
<code>create_model()</code> by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11714">#11714</a>.
This change was backported as it was previously possible (although not
meant to be supported)
to provide <code>model_config</code> as a field, which would make it
possible to provide both configuration
and bases.</li>
</ul>
<h4>Fixes</h4>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pydantic/pydantic/blob/main/HISTORY.md">pydantic's
changelog</a>.</em></p>
<blockquote>
<h2>v2.11.7 (2025-06-14)</h2>
<p><a
href="https://github.com/pydantic/pydantic/releases/tag/v2.11.7">GitHub
release</a></p>
<h3>What's Changed</h3>
<h4>Fixes</h4>
<ul>
<li>Copy <code>FieldInfo</code> instance if necessary during
<code>FieldInfo</code> build by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11898">#11898</a></li>
</ul>
<h2>v2.11.6 (2025-06-13)</h2>
<p><a
href="https://github.com/pydantic/pydantic/releases/tag/v2.11.6">GitHub
release</a></p>
<h3>What's Changed</h3>
<h4>Fixes</h4>
<ul>
<li>Rebuild dataclass fields before schema generation by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11949">#11949</a></li>
<li>Always store the original field assignment on <code>FieldInfo</code>
by <a href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11946">#11946</a></li>
</ul>
<h2>v2.11.5 (2025-05-22)</h2>
<p><a
href="https://github.com/pydantic/pydantic/releases/tag/v2.11.5">GitHub
release</a></p>
<h3>What's Changed</h3>
<h4>Fixes</h4>
<ul>
<li>Check if <code>FieldInfo</code> is complete after applying type
variable map by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11855">#11855</a></li>
<li>Do not delete mock validator/serializer in
<code>model_rebuild()</code> by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11890">#11890</a></li>
<li>Do not duplicate metadata on model rebuild by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11902">#11902</a></li>
</ul>
<h2>v2.11.4 (2025-04-29)</h2>
<p><a
href="https://github.com/pydantic/pydantic/releases/tag/v2.11.4">GitHub
release</a></p>
<h3>What's Changed</h3>
<h4>Packaging</h4>
<ul>
<li>Bump <code>mkdocs-llmstxt</code> to v0.2.0 by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11725">#11725</a></li>
</ul>
<h4>Changes</h4>
<ul>
<li>Allow config and bases to be specified together in
<code>create_model()</code> by <a
href="https://github.com/Viicos"><code>@​Viicos</code></a> in <a
href="https://redirect.github.com/pydantic/pydantic/pull/11714">#11714</a>.
This change was backported as it was previously possible (although not
meant to be supported)
to provide <code>model_config</code> as a field, which would make it
possible to provide both configuration
and bases.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5f033e46c5"><code>5f033e4</code></a>
Prepare release v2.11.7</li>
<li><a
href="c3368b83c4"><code>c3368b8</code></a>
Copy <code>FieldInfo</code> instance if necessary during
<code>FieldInfo</code> build (<a
href="https://redirect.github.com/pydantic/pydantic/issues/11980">#11980</a>)</li>
<li><a
href="3987b23db4"><code>3987b23</code></a>
Prepare release v2.11.6</li>
<li><a
href="dc7a9d20be"><code>dc7a9d2</code></a>
Always store the original field assignment on
<code>FieldInfo</code></li>
<li><a
href="c284c279a5"><code>c284c27</code></a>
Rebuild dataclass fields before schema generation</li>
<li><a
href="5e6d1dc71f"><code>5e6d1dc</code></a>
Prepare release v2.11.5</li>
<li><a
href="1b63218c42"><code>1b63218</code></a>
Do not duplicate metadata on model rebuild (<a
href="https://redirect.github.com/pydantic/pydantic/issues/11902">#11902</a>)</li>
<li><a
href="5aefad873b"><code>5aefad8</code></a>
Do not delete mock validator/serializer in
<code>model_rebuild()</code></li>
<li><a
href="8fbe6585f4"><code>8fbe658</code></a>
Check if <code>FieldInfo</code> is complete after applying type variable
map</li>
<li><a
href="12b371a0f7"><code>12b371a</code></a>
Update documentation about <code>@dataclass_transform</code>
support</li>
<li>Additional commits viewable in <a
href="https://github.com/pydantic/pydantic/compare/v2.10.6...v2.11.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pydantic&package-manager=uv&previous-version=2.10.6&new-version=2.11.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-28 15:11:54 -07:00
Ashwin Bharambe
70469c84e9
chore(packaging): remove requirements.txt (#2938)
We don't need this. We have kept it since existing wisdom is that "it
helps with back-compat". Well, the entire ecosystem is moving to `uv` at
an unprecedented rate and keeping this creates unnecessary work and
confusion. The specific reason I am killing this is that it confuses
`dependabot` which ends up not bumping `uv.lock` which is the more
important file to change.
2025-07-28 14:52:24 -07:00
Ashwin Bharambe
cd24aaf3aa fix(pre-commit): push properly version 4 2025-07-28 13:11:56 -07:00
Ashwin Bharambe
8fa77bc93e fix(pre-commit): push properly version 3 2025-07-28 13:02:04 -07:00
Ashwin Bharambe
3058060e2b fix(pre-commit): push properly version 2 2025-07-28 12:50:50 -07:00
Ashwin Bharambe
607574c26a fix(pre-commit): push properly 2025-07-28 12:43:49 -07:00
Ashwin Bharambe
8961706dea fix(pre-commit): dont error if pre-commit itself errors 2025-07-28 12:35:34 -07:00
Ashwin Bharambe
dd4ea28b49
fix(dependabot): run pre-commit on dependabot PRs (#2935)
See PR screenshot below -- we need to run pre-commit on the dependabot
PRs obviously

<img width="837" height="277" alt="image"
src="https://github.com/user-attachments/assets/c17802d7-e252-4719-acc7-e335b24120f8"
/>
2025-07-28 15:25:06 -04:00
Matthew Farrellee
968fc132d3
fix(openai-compat): restrict developer/assistant/system/tool messages to text-only content (#2932)
**What:**
- Added OpenAIChatCompletionTextOnlyMessageContent type for text-only
content validation
- Modified OpenAISystemMessageParam, OpenAIAssistantMessageParam,
OpenAIDeveloperMessageParam, and OpenAIToolMessageParam to use text-only
content type instead of mixed content
- OpenAIUserMessageParam unchanged - still accepts both text and images
- Updated OpenAPI spec files to reflect text-only content restrictions
in schemas

closes #2894 

**Why:**
- Enforces OpenAI API compatibility by restricting image content to user
messages only
- Prevents API misuse where images might be sent in message types that
don't support them
- Aligns with OpenAI's actual API behavior where only user messages can
contain multimodal content
- Improves type safety and validation at the API boundary

**Test plan:**
- Added comprehensive parametrized tests covering all 5 OpenAI message
types
- Tests verify text string acceptance for all message types
- Tests verify text list acceptance for all message types
- Tests verify image rejection for system/assistant/developer/tool
messages (ValidationError expected)
- Tests verify user messages still accept images (backward compatibility
maintained)
2025-07-28 10:36:34 -07:00
Matthew Farrellee
60bb5e307e
feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919)
# What does this PR do?

- Add base_url field to OpenAIConfig with default
"https://api.openai.com/v1"
- Update sample_run_config to support OPENAI_BASE_URL environment
variable
- Modify get_base_url() to return configured base_url instead of
hardcoded value
- Add comprehensive test suite covering:
  - Default base URL behavior
  - Custom base URL from config
  - Environment variable override
  - Config precedence over environment variables
  - Client initialization with configured URL
  - Model availability checks using configured URL

This enables users to configure custom OpenAI-compatible API endpoints
via environment variables or configuration files.

Closes #2910 

## Test Plan

run unit tests
2025-07-28 10:16:02 -07:00
Charlie Doern
b1c21a25ec
docs: remove provider_id from external docs (#2922)
# What does this PR do?

external provider docs mention setting provider_id in the build yaml.
Since we changed that to just be provider_type and module, remove
instances of provider_id

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-28 10:14:39 -07:00
Charlie Doern
86fe2b8475
fix: adjust provider type used in external provider test (#2921)
# What does this PR do?

provider_id is no longer valid in a build.yaml, remove it in the
external provider test

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-28 10:14:16 -07:00
Matthew Farrellee
47c078fcef
feat: implement dynamic model detection support for inference providers using litellm (#2886)
# What does this PR do?

This enhancement allows inference providers using LiteLLMOpenAIMixin to
validate model availability against LiteLLM's official provider model
listings, improving reliability and user experience when working with
different AI service providers.

- Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor
- Add check_model_availability method to LiteLLMOpenAIMixin using
litellm.models_by_provider
- Update Gemini, Groq, and SambaNova inference adapters to pass
litellm_provider_name

## Test Plan

standard CI.
2025-07-28 10:13:54 -07:00
Christian Zaccaria
c48dcafc77
fix: Fix unit tests CI and failing tests (#2928)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Added `set -e` to the beginning of the unit test script to ensure the
script exits on failure and correctly fails the CI when tests do not
pass.
- Fixed all unit tests that were silently failing in the CI.
- Fixed Python 3.13 unit test CI failing silently.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2877 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
- **Previously:** Unit tests passing in CI eventhough it failed 11 tests
->
[CI-run](4683681501 (step):4:2097)
- **Made the fix. Now, ensuring CI fails as expected on test failures:**
Unit tests failing in CI with 1 failed test ->
[CI-run](4684234247 (step):4:1506)
- This PR shows the CI passing and all unit tests passing.
2025-07-28 10:07:26 -07:00
Charlie Doern
46e2989312
fix: switch refresh to debug log (#2933)
# What does this PR do?
the server logs have a persistent `core: refreshing registry` log that
clogs up the output. Switch it to debug

this is what it looked like:

<img width="1126" height="1028" alt="Screenshot 2025-07-28 at 9 56
44 AM"
src="https://github.com/user-attachments/assets/a1880fd3-7fc7-4a97-bfb8-89a62e4c5c19"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-28 10:02:54 -07:00
Matthew Farrellee
3c40c8e583
fix: litellm_provider_name for llama-api (#2934)
litellm uses "meta_llama" for the provider name, see
https://docs.litellm.ai/docs/providers/meta_llama ad
https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py#L833
2025-07-28 10:02:16 -07:00
Charlie Doern
09abdb0a37
test: upload logs for external provider tests (#2914)
Some checks failed
Integration Tests / discover-tests (push) Successful in 2s
Installer CI / lint (push) Failing after 5s
Installer CI / smoke-test-on-dev (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 9s
Test Llama Stack Build / build (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m5s
# What does this PR do?

currently the external provider tests don't upload log files as
artifacts nor do they use LLAMA_STACK_LOG_FILE. align with the other
integration tests

## Test Plan

logs should be present in the two tests on this PR

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-25 15:03:15 -07:00
Ashwin Bharambe
9583f468f8
feat(starter)!: simplify starter distro; litellm model registry changes (#2916) 2025-07-25 15:02:04 -07:00
Charlie Doern
3344d8a9e5
fix: separate build and run provider types (#2917)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Integration Tests / test-matrix (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m13s
# What does this PR do?

in #2637, I combined the run and build config provider types to both use
`Provider`

since this includes a provider_id, a user must now specify this when
writing a build yaml. This is not very clear because all a user should
care about upon build is the code to be installed (the module and the
provider_type)

introduce `BuildProvider` and fixup the parts of the code impacted by
this

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-25 12:39:26 -07:00
Nathan Weinberg
025163d8e6
feat: add auto-generated CI documentation pre-commit hook (#2890)
# What does this PR do?
Our CI is entirely undocumented, this commit adds a README.md file with
a table of the current CI and what is does

---------

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-25 17:57:01 +02:00
Derek Higgins
52201612de
feat: implement chunk deletion for vector stores (#2701)
Add support for deleting individual chunks from vector stores

- Add abstract remove_chunk() method to EmbeddingIndex base class
- Implement chunk deletion for Faiss provider, SQLite Vec, Milvus,
PGVector
- Placeholder implementations with NotImplementedError for
Chroma/Qdrant/Weaviate
- Integrate chunk deletion into OpenAI vector store file deletion flow
- removed xfail from
test_openai_vector_store_delete_file_removes_from_vector_store

Closes: #2477

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-25 10:30:30 -04:00
Francisco Arceo
9e77be1f72
chore: Fix chroma unit tests (#2896)
# What does this PR do?
Enable Chroma inline unit tests and fix integration tests.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-25 10:12:14 -04:00
Ashwin Bharambe
ed07a58b50
fix(registry): ensure clean shutdown (#2901)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 4s
Pre-commit / pre-commit (push) Successful in 57s
Avoid the error message:

```
INFO     2025-07-24 21:51:54,530 __main__:598 server: Received interrupt signal, shutting down gracefully...                                          
ERROR    2025-07-24 21:51:54,692 asyncio:1826 uncategorized: Task was destroyed but it is pending!                                                    
         task: <Task pending name='Task-15' coro=<refresh_registry() running at                                                                       
         /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/stack.py:356> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=>  
```
2025-07-25 09:44:31 -04:00
Charlie Doern
de6919ecdd
refactor: install external providers from module (#2637)
# What does this PR do?

Today, external providers are installed via the `external_providers_dir`
in the config. This necessitates users to understand the `ProviderSpec`
and set up their directories accordingly. This process splits up the
config for the stack across multiple files, directories, and formats.

Most (if not all) external providers today have a
[get_provider_spec](559cb18fbb/src/ramalama_stack/provider.py (L9))
method that sits unused. Utilizing this method rather than the
providers.d route allows for a much easier installation process for
external providers and limits the amount of extra configuration a
regular user has to do to get their stack off the ground.

To accomplish this and wire it throughout the build process, Introduce
the concept of a `module` for users to specify for an external provider
upon build time. In order to facilitate this, align the build and run
spec to use `Provider` class rather than the stringified provider_type
that build currently uses.

For example, say this is in your build config:

```
- provider_id: ramalama
  provider_type: remote::ramalama
  module: ramalama_stack
```

during build (in the various `build_...` scripts), additionally to
installing any pip dependencies we will also install this module and use
the `get_provider_spec` method to retrieve the ProviderSpec that is
currently specified using `providers.d`.

In production so far, providing instructions for installing external
providers for users has been difficult: they need to install the module
as a pre-req, create the providers.d directory, copy in the provider
spec, and also copy in the necessary build/run yaml files. Accessing an
external provider should be as easy as possible, and pointing to its
installable module aligns more with the rest of our build and dependency
management process.

For now, `external_providers_dir` still exists as an alternate more
declarative method of using external providers.

## Test Plan

added an integration test installing an external provider from module
and more unit test coverage for `get_provider_registry`


( the warning in yellow is expected, the module is installed inside of
the build env, not where we are running the command)
<img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30
48 AM"
src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d"
/>

<img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31
14 AM"
src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-25 15:41:26 +02:00
dependabot[bot]
85223ccc4d
chore(github-deps): bump astral-sh/setup-uv from 6.4.1 to 6.4.3 (#2902)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.4.1 to 6.4.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.4.3 🌈 fix relative paths starting with dots</h2>
<h2>🐛 Bug fixes</h2>
<ul>
<li>fix relative paths starting with dots <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/500">#500</a>)</li>
</ul>
<h2>v6.4.2 🌈 Interpret relative inputs as under working-directory</h2>
<h2>Changes</h2>
<p>This release will interpret relative paths in inputs as relative
to the value of <code>working-directory</code> (default is <code>${{
github.workspace }}</code>) .
This means the following configuration</p>
<pre lang="yaml"><code>- uses: astral-sh/setup-uv@v6
   with:
     working-directory: /my/path
     cache-dependency-glob: uv.lock
</code></pre>
<p>will look for the <code>cache-dependency-glob</code> under
<code>/my/path/uv.lock</code></p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>interpret relative inputs as under working-directory <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/498">#498</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known versions for 0.8.1/0.8.2 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/497">#497</a>)</li>
<li>chore: update known versions for 0.8.0 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/491">#491</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e92bafb625"><code>e92bafb</code></a>
fix relative paths starting with dots (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/500">#500</a>)</li>
<li><a
href="2c7142f755"><code>2c7142f</code></a>
interpret relative inputs as under working-directory (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/498">#498</a>)</li>
<li><a
href="23482a31a8"><code>23482a3</code></a>
chore: update known versions for 0.8.1/0.8.2 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/497">#497</a>)</li>
<li><a
href="4ac06a054e"><code>4ac06a0</code></a>
chore: update known versions for 0.8.0 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/491">#491</a>)</li>
<li>See full diff in <a
href="7edac99f96...e92bafb625">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.4.1&new-version=6.4.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-25 10:08:24 +02:00
Yuan Tang
34093fecd1
ci: Remove open-pull-requests-limit: 0 from dependabot.yml (#2900)
This might fix issues in
https://github.com/meta-llama/llama-stack/pull/2899 and
https://github.com/meta-llama/llama-stack/pull/2897 where uv
dependencies are not being upgraded correctly (`uv.lock` is not being
updated).
2025-07-25 09:49:18 +02:00
dependabot[bot]
3216765c26
chore(deps): bump form-data from 4.0.2 to 4.0.4 in /llama_stack/ui (#2898)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 47s
Bumps [form-data](https://github.com/form-data/form-data) from 4.0.2 to
4.0.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/form-data/form-data/releases">form-data's
releases</a>.</em></p>
<blockquote>
<h2>v4.0.4</h2>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a>
- 2025-07-16</h2>
<h3>Commits</h3>
<ul>
<li>[meta] add <code>auto-changelog</code> <a
href="811f68282f"><code>811f682</code></a></li>
<li>[Tests] handle predict-v8-randomness failures in node &lt; 17 and
node &gt; 23 <a
href="1d11a76434"><code>1d11a76</code></a></li>
<li>[Fix] Switch to using <code>crypto</code> random for boundary values
<a
href="3d1723080e"><code>3d17230</code></a></li>
<li>[Tests] fix linting errors <a
href="5e340800b5"><code>5e34080</code></a></li>
<li>[meta] actually ensure the readme backup isn’t published <a
href="316c82ba93"><code>316c82b</code></a></li>
<li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a
href="58c25d7640"><code>58c25d7</code></a></li>
<li>[meta] fix readme capitalization <a
href="2300ca1959"><code>2300ca1</code></a></li>
</ul>
<h2>v4.0.3</h2>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a>
- 2025-06-05</h2>
<h3>Fixed</h3>
<ul>
<li>[Fix] <code>append</code>: avoid a crash on nullish values <a
href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li>
</ul>
<h3>Commits</h3>
<ul>
<li>[eslint] use a shared config <a
href="426ba9ac44"><code>426ba9a</code></a></li>
<li>[eslint] fix some spacing issues <a
href="20941917f0"><code>2094191</code></a></li>
<li>[Refactor] use <code>hasown</code> <a
href="81ab41b46f"><code>81ab41b</code></a></li>
<li>[Fix] validate boundary type in <code>setBoundary()</code> method <a
href="8d8e469309"><code>8d8e469</code></a></li>
<li>[Tests] add tests to check the behavior of <code>getBoundary</code>
with non-strings <a
href="837b8a1f75"><code>837b8a1</code></a></li>
<li>[Dev Deps] remove unused deps <a
href="870e4e6659"><code>870e4e6</code></a></li>
<li>[meta] remove local commit hooks <a
href="e6e83ccb54"><code>e6e83cc</code></a></li>
<li>[Dev Deps] update <code>eslint</code> <a
href="4066fd6f65"><code>4066fd6</code></a></li>
<li>[meta] fix scripts to use prepublishOnly <a
href="c4bbb13c0e"><code>c4bbb13</code></a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/form-data/form-data/blob/master/CHANGELOG.md">form-data's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a>
- 2025-07-16</h2>
<h3>Commits</h3>
<ul>
<li>[meta] add <code>auto-changelog</code> <a
href="811f68282f"><code>811f682</code></a></li>
<li>[Tests] handle predict-v8-randomness failures in node &lt; 17 and
node &gt; 23 <a
href="1d11a76434"><code>1d11a76</code></a></li>
<li>[Fix] Switch to using <code>crypto</code> random for boundary values
<a
href="3d1723080e"><code>3d17230</code></a></li>
<li>[Tests] fix linting errors <a
href="5e340800b5"><code>5e34080</code></a></li>
<li>[meta] actually ensure the readme backup isn’t published <a
href="316c82ba93"><code>316c82b</code></a></li>
<li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a
href="58c25d7640"><code>58c25d7</code></a></li>
<li>[meta] fix readme capitalization <a
href="2300ca1959"><code>2300ca1</code></a></li>
</ul>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a>
- 2025-06-05</h2>
<h3>Fixed</h3>
<ul>
<li>[Fix] <code>append</code>: avoid a crash on nullish values <a
href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li>
</ul>
<h3>Commits</h3>
<ul>
<li>[eslint] use a shared config <a
href="426ba9ac44"><code>426ba9a</code></a></li>
<li>[eslint] fix some spacing issues <a
href="20941917f0"><code>2094191</code></a></li>
<li>[Refactor] use <code>hasown</code> <a
href="81ab41b46f"><code>81ab41b</code></a></li>
<li>[Fix] validate boundary type in <code>setBoundary()</code> method <a
href="8d8e469309"><code>8d8e469</code></a></li>
<li>[Tests] add tests to check the behavior of <code>getBoundary</code>
with non-strings <a
href="837b8a1f75"><code>837b8a1</code></a></li>
<li>[Dev Deps] remove unused deps <a
href="870e4e6659"><code>870e4e6</code></a></li>
<li>[meta] remove local commit hooks <a
href="e6e83ccb54"><code>e6e83cc</code></a></li>
<li>[Dev Deps] update <code>eslint</code> <a
href="4066fd6f65"><code>4066fd6</code></a></li>
<li>[meta] fix scripts to use prepublishOnly <a
href="c4bbb13c0e"><code>c4bbb13</code></a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="41996f5ac7"><code>41996f5</code></a>
v4.0.4</li>
<li><a
href="316c82ba93"><code>316c82b</code></a>
[meta] actually ensure the readme backup isn’t published</li>
<li><a
href="2300ca1959"><code>2300ca1</code></a>
[meta] fix readme capitalization</li>
<li><a
href="811f68282f"><code>811f682</code></a>
[meta] add <code>auto-changelog</code></li>
<li><a
href="5e340800b5"><code>5e34080</code></a>
[Tests] fix linting errors</li>
<li><a
href="1d11a76434"><code>1d11a76</code></a>
[Tests] handle predict-v8-randomness failures in node &lt; 17 and node
&gt; 23</li>
<li><a
href="58c25d7640"><code>58c25d7</code></a>
[Dev Deps] update <code>@ljharb/eslint-config</code></li>
<li><a
href="3d1723080e"><code>3d17230</code></a>
[Fix] Switch to using <code>crypto</code> random for boundary
values</li>
<li><a
href="d8d67dc8ac"><code>d8d67dc</code></a>
v4.0.3</li>
<li><a
href="e6e83ccb54"><code>e6e83cc</code></a>
[meta] remove local commit hooks</li>
<li>Additional commits viewable in <a
href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=form-data&package-manager=npm_and_yarn&previous-version=4.0.2&new-version=4.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/meta-llama/llama-stack/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-24 21:24:56 -04:00
ehhuang
21bae296f2
feat(auth): API access control (#2822)
# What does this PR do?
- Added ability to specify `required_scope` when declaring an API. This
is part of the `@webmethod` decorator.
- If auth is enabled, a user can access an API only if
`user.attributes['scope']` includes the `required_scope`
- We add `required_scope='telemetry.read'` to the telemetry read APIs.

## Test Plan
CI with added tests

1. Enable server.auth with github token
2. Observe `client.telemetry.query_traces()` returns 403
2025-07-24 15:30:48 -07:00
Calum Murray
7cc4819e90
feat: add MCP Streamable HTTP support (#2554)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds support for the new Streamable HTTP transport for MCP, as
well as falling back to the SSE protocol if the Streamable HTTP
connection fails.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2542 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Calum Murray <cmurray@redhat.com>
2025-07-24 15:04:27 -07:00
Sébastien Han
632cf9eb72
feat: Bring Your Own API (BYOA) (#2228)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Installer CI / lint (push) Failing after 3s
Integration Tests / discover-tests (push) Successful in 3s
Installer CI / smoke-test-on-dev (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?

Prototype on a new feature to allow new APIs to be plugged in Llama
Stack. Opened for early feedback on the approach and test appetite on
the functionality.

@ashwinb @raghotham open for early feedback, thanks!

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-24 13:41:14 -07:00
Charlie Doern
341504869e
fix: use logger for console telemetry (#2844)
# What does this PR do?

currently `print` is being used with custom formatting to achieve
telemetry output in the console_span_processor

This causes telemetry not to show up in log files when using
`LLAMA_STACK_LOG_FILE`. During testing it looks like telemetry is not
being captured when it is

switch to using Rich formatting with the logger and then strip the
formatting off when a log file is being used so the formatting looks
normal

## Test Plan

before:

console:

<img width="967" height="127" alt="Screenshot 2025-07-21 at 4 02 15 PM"
src="https://github.com/user-attachments/assets/b09518cc-9d38-4970-9877-70e2c41fcbb5"
/>


log file (no telemetry):

```
2025-07-21 16:01:32,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`...
2025-07-21 16:01:34,779 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed
2025-07-21 16:01:35,083 __main__:587 server: Listening on ['::', '0.0.0.0']:8321
2025-07-21 16:01:35,091 uvicorn.error:84 uncategorized: Started server process [68679]
2025-07-21 16:01:35,091 uvicorn.error:48 uncategorized: Waiting for application startup.
2025-07-21 16:01:35,092 __main__:163 server: Starting up
2025-07-21 16:01:35,092 uvicorn.error:62 uncategorized: Application startup complete.
2025-07-21 16:01:35,092 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
2025-07-21 16:01:37,167 uvicorn.access:473 uncategorized: 127.0.0.1:53145 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
```

after:

console:

<img width="797" height="165" alt="Screenshot 2025-07-22 at 3 28 44 PM"
src="https://github.com/user-attachments/assets/44d40e3b-6502-439d-9ea5-38058b289962"
/>


log file:

```
2025-07-21 15:59:51,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`...
2025-07-21 15:59:53,801 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed
2025-07-21 15:59:54,059 __main__:587 server: Listening on ['::', '0.0.0.0']:8321
2025-07-21 15:59:54,066 uvicorn.error:84 uncategorized: Started server process [68578]
2025-07-21 15:59:54,067 uvicorn.error:48 uncategorized: Waiting for application startup.
2025-07-21 15:59:54,067 __main__:163 server: Starting up
2025-07-21 15:59:54,067 uvicorn.error:62 uncategorized: Application startup complete.
2025-07-21 15:59:54,068 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
2025-07-21 15:59:55,381 [TELEMETRY] 19:59:55.381  /v1/openai/v1/chat/completions
2025-07-21 15:59:55,619 uvicorn.access:473 uncategorized: 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
2025-07-21 15:59:55,621 [TELEMETRY] 19:59:55.621  /v1/openai/v1/chat/completions [StatusCode.OK] (240.07ms)
2025-07-21 15:59:55,622 [TELEMETRY]  19:59:55.620  127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-24 16:26:59 -04:00
Kelly Brown
abade761e0
docs: Update nvidia docs template (#2893)
**Description**

Fixes generation issue in nvidia code gen file.

Closes #2873
2025-07-24 22:11:34 +02:00
Sébastien Han
226b877ca6
chore: install script should use starter (#2891)
Our demo installation script should pull the starter image. Ollama is
not being updated anymore as a distribution.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-24 12:18:02 -07:00
ehhuang
cbe89d2bdd
chore: return webmethod from find_matching_route (#2883)
This will be used to support API access control, i.e. Webmethod would
have a `required_scope` attribute, and we need access to that in the
middleware.
2025-07-24 11:37:21 -07:00
Ashwin Bharambe
1463b79218
feat(registry): make the Stack query providers for model listing (#2862)
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.

In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
2025-07-24 10:39:53 -07:00
Stefan Thaler
537dc693ee
chore: add mypy coverage to inspect.py and library_client.py in /distribution (#2707)
# What does this PR do?
Adds type guards in /distribution/inspect.py and ignores a valid-type
mypy error in library_client.py. This PR is part of issue #2647 . I'm
rather unsure whether ignoring the valid-type error is correct in this
case. It appears that args[0] is interpreted as [any] but I didn't find
any way to specify the type.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->


<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-24 09:51:46 -07:00
Charlie Doern
d4f0b430e2
docs: update list of apis (#2697)
# What does this PR do?

apis.md had a few APIs missing and incorrectly described APIs

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-24 09:50:14 -07:00
Sébastien Han
af9c707eaf
fix: various improvements on install.sh (#2724)
# What does this PR do?

Bulk improvements:

* The script has a better error reporting, when a command fails it will
print the logs of the failed command
* Better error handling using a trap to catch signal and perform proper
cleanup
* Cosmetic changes
* Added CI to test the image code against main
* Use the starter image and its latest tag

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-24 09:43:51 -07:00
Derek Higgins
4ea1f2aa9f
test: Add VLLM provider support to integration tests (#2757)
- Add setup-vllm GitHub action to start VLLM container
- Extend integration test matrix to support both ollama and vllm
providers
- Make test setup conditional based on provider type
- Add provider-specific environment variables and configurations
- vllm tests setup to run weekly or can be triggered manually (only
ollama on PR)

TODO:
investigate failing tests for vllm provider (safety and post_training)
Also need a proper fix for #2713 (tmp fix for this in the first commit
in this PR)
Closes: #1648

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-24 09:42:26 -07:00
Mustafa Elbehery
6ab5760a1b
chore(test): migrate unit tests from unittest to pytest nvidia test safety (#2793)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-24 09:41:07 -07:00
Yuan Tang
9069d878ef
docs: Update CHANGELOG.md (#2874)
This updates the changelog to include recent releases.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-07-24 09:36:28 -07:00
Christian Zaccaria
7f7b990b80
docs: Document use cases for Responses and Agents APIs (#2756)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This pull request adds documentation to clarify the differences between
the Agents API and the OpenAI Responses API, including use cases for
each. It also updates the index page to reference the new documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2368
2025-07-24 12:20:04 -04:00
Mohit Gaur
5ef2baacdc
fix: update check-workflows-use-hashes to use github error format (#2875)
# What does this PR do?
Updates the script `scripts/check-workflows-use-hashes.sh` to improve
error reporting by adopting GitHub Actions error annotation format.

* Updated the script to use GitHub Actions error annotation format
(`::error file={name},line={line},col={col}::{message}`) making error
messages more actionable and easier to locate in workflows.
* Modified the script to include line numbers for `uses:` references by
using `grep -n` and extracting line numbers, improving the precision of
error reporting.

Closes #2778

## Test Plan

- Violation check - Created test file with mixed SHA/non-SHA actions

```
echo 'uses: actions/checkout@v4' > test-workflow.yml
echo 'uses: actions/upload-artifact@main' >> test-workflow.yml
```
Result: Correctly detected violations with precise line numbers
```
./scripts/check-workflows-use-hashes.sh
Output:
::error file=test-workflow.yml,line=14::uses non-SHA action ref: uses: actions/checkout@v4
::error file=test-workflow.yml,line=20::uses non-SHA action ref: uses: actions/upload-artifact@main
```

- Verified existing project workflows pass
```
./scripts/check-workflows-use-hashes.sh
# Result: Exit code 0 (all workflows properly SHA-pinned)
```
2025-07-24 17:41:17 +02:00
Matthew Farrellee
e33a50480d
fix: starter template and litellm backward compat conflict for openai (#2885)
# What does this PR do?

openai/models.py has backward compat entries for litellm model names.
the starter template includes these in the list of registered models.
the inclusion results in duplicate model registrations.

the backward compat is no longer necessary.

## Test Plan

ci
2025-07-24 17:28:37 +02:00
Sarthak Deshpande
cd8715d327
chore: Added openai compatible vector io endpoints for chromadb (#2489)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
This PR implements the openai compatible endpoints for chromadb

Closes #2462 

## Test Plan
Ran ollama llama stack server and ran the command
`pytest -sv --stack-config=http://localhost:8321
tests/integration/vector_io/test_openai_vector_stores.py
--embedding-model all-MiniLM-L6-v2`
8 failed, 27 passed, 8 skipped, 1 xfailed
The failed ones are regarding files api

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
Co-authored-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-23 13:51:58 -07:00
Derek Higgins
fd2aab8582
fix: prevent shell redirection issues with pip dependencies (#2867)
- Use printf to to escape special characters (e.g. < > )
- Apply escaping to pip_dependencies and special_pip_deps

Resolves shell interpretation of >= operators as redirections that were
causing build failing to respect versions and unexpected file creation
in /app directory.

Closes: #2866

## Test Plan
Manually tested, will also be tested by existing CI

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-23 21:43:33 +02:00
Derek Higgins
427136bb63
fix: cleanup after build_container.sh (#2869)
- rm TEMP_DIR when build_container.sh succeeds
- prevents multiple temp directories with Containerfile being left in
/tmp

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-23 11:54:54 -07:00
IAN MILLER
51affe5783
fix: fixed test_access_control.py unit test (#2876)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
I fixed test_access_policy() function providing provider_model_id in
each register model endpoint to pass assertions.

Initially I faced this issue:
```
tests/unit/server/test_quota.py::test_authenticated_quota_allows_up_to_limit
tests/unit/server/test_quota.py::test_authenticated_quota_blocks_after_limit
tests/unit/server/test_quota.py::test_anonymous_quota_allows_up_to_limit
tests/unit/server/test_quota.py::test_anonymous_quota_blocks_after_limit
  /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/aiosqlite/core.py:105: DeprecationWarning: The default datetime adapter is deprecated as of Python 3.12; see the sqlite3 documentation for suggested replacement recipes
    result = function()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================== short test summary info ===============================================================================
FAILED tests/unit/server/test_access_control.py::test_access_policy - AssertionError: assert 'test_provider/model-1' == 'model-1'
==================================================================== 1 failed, 436 passed, 194 warnings in 20.09s ====================================================================
```

After resolved, all works:
```
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= 437 passed, 194 warnings in 19.41s =========================================================================
```

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run ` ./scripts/unit-tests.sh`
2025-07-23 11:50:20 -07:00
Ashwin Bharambe
2fcfb0f0b5
fix: bring back dell template (#2880)
This template is definitely needed since it (and related docker, which
will push soon) is used by folks at Dell.
2025-07-23 11:40:59 -07:00
Mark Campbell
8353ad4981
fix: search mode validation for rag query (#2857)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
I noticed a few issues with my implementation of the search mode
validation for RagQuery.
This PR replaces the check for search mode in RagQuery with a Literal. 
There were issues before with
```
TypeError: Object of type RAGSearchMode is not JSON serializable
```
When using 
```
query_config = RAGQueryConfig(max_chunks=6, mode="vector").model_dump()
```

It also fixes the fact that despite user input "vector" was always the
used search mode.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Verify that a chosen search mode works when using Rag Query or use below
agent config:
```
agent = Agent(
    client,
    model=model_id,
    instructions="You are a helpful assistant",
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {
                "vector_db_ids": [vector_db_id],
                "query_config": {
                    "mode": "keyword",
                    "max_chunks": 6
                }
            },
        }
    ],
)
```

Running Unit Tests:
```
uv sync --extra dev
uv run pytest tests/unit/rag/test_rag_query.py -v
```
2025-07-23 11:25:12 -07:00
Francisco Arceo
2aba2c1236
chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863)
# What does this PR do?
Moving vector store and vector store files helper methods to
`openai_vector_store_mixin.py`

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
The tests are already supported in the CI and tests the inline providers
and current integration tests.

Note that the `vector_index` fixture will be test `milvus_vec_adapter`,
`faiss_vec_adapter`, and `sqlite_vec_adapter` in
`tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`.

Additionally, the integration tests in `integration-vector-io-tests.yml`
runs `tests/integration/vector_io` tests for the following providers:
```python
vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"]
```

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-23 13:35:48 -04:00
Matthew Farrellee
e1ed152779
chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

add an `OpenAIMixin` for use by inference providers who remote endpoints
support an OpenAI compatible API.

use is demonstrated by refactoring
- OpenAIInferenceAdapter
- NVIDIAInferenceAdapter (adds embedding support)
- LlamaCompatInferenceAdapter

## Test Plan

existing unit and integration tests
2025-07-23 06:49:40 -04:00
grs
fc67ad408a
chore: add some documentation for access policy rules (#2785)
# What does this PR do?

Adds some documentation on setting explicit access_policy rules in
config.
2025-07-23 10:27:27 +02:00
Sébastien Han
c0563c0560
fix: honour deprecation of --config and --template (#2856)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Integration Tests / test-matrix (push) Failing after 12s
Test Llama Stack Build / build (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?

https://github.com/meta-llama/llama-stack/pull/2716/ broke commands
like:

```
 python -m llama_stack.distribution.server.server --config
 llama_stack/templates/starter/run.yaml
 ```

 And will fail with:

 ```
 Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 626, in <module>
    main()
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 402, in main
    config_file = resolve_config_or_template(args.config, Mode.RUN)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/utils/config_resolution.py", line 43, in resolve_config_or_template
    config_path = Path(config_or_template)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 1162, in __init__
    super().__init__(*args)
  File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 373, in __init__
    raise TypeError(
TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType'
```

Complaining that no positional arguments are present. We now honour the
deprecation until --config and --template are removed completely.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Both ` python -m llama_stack.distribution.server.server --config
llama_stack/templates/starter/run.yaml` and ` python -m
llama_stack.distribution.server.server
llama_stack/templates/starter/run.yaml` should run the server. Same for
`--template starter`.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-22 20:48:23 -07:00
Derek Higgins
340448e0aa
fix: optimize container build by enabling uv cache (#2855)
- Remove --no-cache flags from uv pip install commands to enable caching
- Mount host uv cache directory to container for persistent caching
- Set UV_LINK_MODE=copy to prevent uv using hardlinks
- When building the starter image
o Build time reduced from ~4:45 to ~3:05 on subsequent builds
(environment specific)
  o Eliminates re-downloading of 3G+ of data on each build 
  o Cache size: ~6.2G (when building starter image)

Fixes excessive data downloads during distro container builds.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-22 16:51:52 -07:00
Ashwin Bharambe
3b83032555
feat(registry): more flexible model lookup (#2859)
This PR updates model registration and lookup behavior to be slightly
more general / flexible. See
https://github.com/meta-llama/llama-stack/issues/2843 for more details.

Note that this change is backwards compatible given the design of the
`lookup_model()` method.

## Test Plan

Added unit tests
2025-07-22 15:22:48 -07:00
Mustafa Elbehery
9736f096f6
chore(test): fix flaky telemetry tests (#2815)
Some checks failed
Installer CI / lint (push) Failing after 2s
Installer CI / smoke-test (push) Has been skipped
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests / test-matrix (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 48s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes flaky telemetry tests

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
See https://github.com/meta-llama/llama-stack/pull/2814
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-22 12:30:14 -07:00
Omer Tuchfeld
c1a63fcd87
fix(install): explicit docker.io usage (#2850)
# What does this PR do?

When podman is used and the registry is omitted, podman will prompt the
user. However, we're piping the output of podman to /dev/null and the
user will not see the prompt, the script will end abruptly and this is
confusing.

This commit explicitly uses the docker.io registry for the ollama image
and the llama-stack image so that the prompt is avoided.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

I ran the script on a machine with podman and the issue was resolved

## Image

Before the fix, this is what would happen:

<img width="748" height="95" alt="image"
src="https://github.com/user-attachments/assets/9c609f88-c0a8-45e7-a789-834f64f601e5"
/>

Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>
2025-07-22 20:36:48 +02:00
Francisco Arceo
20c3197952
chore: Making name optional in openai_create_vector_store (#2858)
# What does this PR do?
chore: Making name optional in openai_create_vector_store


# Closes https://github.com/meta-llama/llama-stack/issues/2706

## Test Plan
CI and unit tests

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-22 13:31:31 -04:00
ehhuang
8e1a2b4703
chore: remove *_openai_compat providers (#2849)
# What does this PR do?
These are no longer needed as llama-stack-evals can run against OAI
endpoints directly.

## Test Plan
2025-07-22 10:25:36 -07:00
Omer Tuchfeld
5e18d4d097
fix(agent): ensure turns are sorted (#2854)
# What does this PR do?

Ensures that session turns retrieved from the agent persistence layer
are sorted by their `started_at` timestamp, as the key-value store does
not guarantee order.

Closes #2852

## Test Plan

- [ ] Add unit tests
2025-07-22 10:24:51 -07:00
Jeremy Bonghwan Choi
b5a6ecc331
docs: minor fix of the pgvector provider spec description (#2847)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Integration Tests / test-matrix (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m17s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

minor update of the pgvector doc, changing 'faiss' to 'pgvector'

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-21 22:10:35 -07:00
Francisco Arceo
2bc96613f9
chore: Adding demo script and importing it into the docs (#2848)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Coverage Badge / unit-tests (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m0s
Test Llama Stack Build / build (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
This PR adds the quickstart as a file to the docs so that it can be more
easily maintained and run, as mentioned in
https://github.com/meta-llama/llama-stack/pull/2800.

## Test Plan
I could add this as a test in the CI but I wasn't sure if we wanted to
add additional jobs there. 😅

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-21 22:53:32 -04:00
Francisco Arceo
c8f274347d
chore: Adding Access Control for OpenAI Vector Stores methods (#2772)
# What does this PR do?

Refactors the vector store routing logic by moving OpenAI-compatible
vector store operations from the `VectorIORouter` to the
`VectorDBsRoutingTable`.

Closes https://github.com/meta-llama/llama-stack/issues/2761

## Test Plan

Added unit tests to cover new routing logic and ACL checks.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-21 16:22:44 -04:00
ehhuang
0d7a90b8bc
chore: merge --config and --template in server.py (#2716)
# What does this PR do?
Part of #2696 

## Test Plan
Run `llama stack run starter`

Error:
```

myenv ❯ llama stack run starters
WARNING  2025-07-10 12:12:43,052 llama_stack.cli.stack.run:82 server: Conda detected. Using conda environment myenv for the run.
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
                       [--image-type {conda,venv}] [--enable-ui]
                       [config | template]
llama stack run: error: Could not resolve config or template 'starters'.

Tried the following locations:
  1. As file path: /Users/erichuang/projects/llama-stack-git/starters
  2. As template: /Users/erichuang/projects/llama-stack-git/llama_stack/templates/starters/run.yaml
  3. As built distribution: (/Users/erichuang/.llama/distributions/llamastack-starters/starters-run.yaml, /Users/erichuang/.llama/distributions/starters/starters-run.yaml)

Available templates: dell, test-env, vllm-gpu, test-template, cerebras, openai-api-verification, sambanova, passthrough, direct-config, together, openai, fireworks, meta-reference-gpu, __pycache__, dev, ollama, watsonx, remote-vllm, llama_api, groq, dummy, oracle, nvidia, ci-tests, postgres-demo, test-stack, bedrock, starter, hf-serverless, hf-endpoint, tgi, open-benchmark, verification

Did you mean one of these templates?
  - starter
  - together
  - postgres-demo
```
2025-07-21 13:19:27 -07:00
Charlie Doern
9a03526672
fix: uvicorn respect log_config (#2842)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 9s
Coverage Badge / unit-tests (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Test External Providers / test-external-providers (venv) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Integration Tests / test-matrix (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m7s
2025-07-21 12:50:39 -07:00
Sébastien Han
019ddda138
fix: graceful SIGINT on server (#2831)
# What does this PR do?

After https://github.com/meta-llama/llama-stack/pull/2818, SIGINT will
print a stack trace. This is because uvicorn re-raises SIGINT and it
gets converted by Python internal signal handler (default handles
SIGINT) to KeyboardInterrupt exception. We know simply catch the
exception to get a clean exit, this is not changing the behavior on
SIGINT.

## Test Plan

Run the server, hit Ctrl+C or `kill -2 <server pid>` and expect a clean
exit with no stack trace.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-21 11:35:15 -07:00
ehhuang
d0208df286
test: skip flaky telemetry tests (#2814)
# What does this PR do?
example error:
4625086977

## Test Plan
2025-07-21 10:01:40 -07:00
IAN MILLER
9e6860b9cf
fix: remove @pytest.mark.asyncio from test_get_raw_document_text.py (#2840)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The pre-commit workflow was failing in the main branch and removing
`@pytest.mark.asyncio `from `test_get_raw_document_text.py` fixed that.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-21 09:14:34 -07:00
Ondrej Metelka
89c49eb003
feat: Allow application/yaml as mime_type (#2575)
# What does this PR do?
Allow application/yaml as mime_type for documents.

## Test Plan
Added unit tests.
2025-07-21 15:43:32 +02:00
Mustafa Elbehery
b2c7543af7
fix(vectordb): VectorDBInput has no provider_id (#2830)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Integration Tests / discover-tests (push) Successful in 21s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s
Integration Tests / test-matrix (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 59s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR add `provider_id` field to `VectorDBInput` class.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

fixes https://github.com/meta-llama/llama-stack/issues/2819

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-21 14:03:40 +02:00
Sébastien Han
ecd28f0085
chore: add contribution guideline around PRs (#2811)
More contributing guidelines.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-21 05:47:17 -04:00
Christian Zaccaria
56269245c2
fix: Add permissions for pull request creation in coverage-badge workflow (#2832)
# What does this PR do?
The workflow that automatically creates a PR to update the Coverage
Badge fails as the `GITHUB_TOKEN` doesn't have write permissions.

As opposed to providing write permissions to the token, we can provide
the permissions for just this workflow with this PR.
2025-07-21 11:40:00 +02:00
dependabot[bot]
28956f9447
chore(github-deps): bump astral-sh/setup-uv from 6.3.1 to 6.4.1 (#2827)
Some checks failed
Integration Tests / discover-tests (push) Successful in 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 23s
Test External Providers / test-external-providers (venv) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 22s
Python Package Build Test / build (3.12) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s
Integration Tests / test-matrix (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 3m13s
Unit Tests / unit-tests (3.13) (push) Failing after 3m15s
Pre-commit / pre-commit (push) Successful in 4m55s
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.3.1 to 6.4.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.4.1 🌈 Hotfix: Ignore deps starting with uv when finding uv
version</h2>
<h2>Changes</h2>
<p>Thank you <a
href="https://github.com/phpmypython"><code>@​phpmypython</code></a> for
raising a PR to fix this issue!</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Ignore deps starting with uv when finding uv version <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/492">#492</a>)</li>
</ul>
<h2>v6.4.0 🌈 Add input <code>version-file</code></h2>
<h2>Changes</h2>
<p>You can now use the <code>version-file</code> input to specify a file
that contains the version of uv to install.
This can either be a <code>pyproject.toml</code> or <code>uv.toml</code>
file which defines a <code>required-version</code> or
uv defined as a dependency in <code>pyproject.toml</code> or
<code>requirements.txt</code>.</p>
<pre lang="yaml"><code>- name: Install uv based on the version defined
in requirements.txt
  uses: astral-sh/setup-uv@v6
  with:
    version-file: &quot;requirements.txt&quot;
</code></pre>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add input version-file <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/486">#486</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known versions for 0.7.22 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/488">#488</a>)</li>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/487">#487</a>)</li>
<li>chore: update known versions for 0.7.21 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/483">#483</a>)</li>
<li>chore: update known versions for 0.7.20 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/480">#480</a>)</li>
<li>chore: update known versions for 0.7.19 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/475">#475</a>)</li>
<li>chore: update known versions for 0.7.18 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/473">#473</a>)</li>
<li>chore: update known versions for 0.7.17 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/468">#468</a>)</li>
<li>chore: update known versions for 0.7.16 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/466">#466</a>)</li>
<li>chore: update known versions for 0.7.15 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/463">#463</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Add FAQ on changed cache and cache upload behavior <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/477">#477</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/487">#487</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7edac99f96"><code>7edac99</code></a>
Ignore deps starting with uv when finding uv version (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/492">#492</a>)</li>
<li><a
href="05273c154d"><code>05273c1</code></a>
chore: update known versions for 0.7.22 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/488">#488</a>)</li>
<li><a
href="de545d4421"><code>de545d4</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/487">#487</a>)</li>
<li><a
href="b75ff7d7b8"><code>b75ff7d</code></a>
Add input version-file (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/486">#486</a>)</li>
<li><a
href="c893ac1cb2"><code>c893ac1</code></a>
chore: update known versions for 0.7.21 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/483">#483</a>)</li>
<li><a
href="a905f0040b"><code>a905f00</code></a>
chore: update known versions for 0.7.20 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/480">#480</a>)</li>
<li><a
href="d4219d1620"><code>d4219d1</code></a>
Add FAQ on changed cache and cache upload behavior (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/477">#477</a>)</li>
<li><a
href="aaefb91b77"><code>aaefb91</code></a>
chore: update known versions for 0.7.19 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/475">#475</a>)</li>
<li><a
href="c05b3e180b"><code>c05b3e1</code></a>
chore: update known versions for 0.7.18 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/473">#473</a>)</li>
<li><a
href="1bf1493664"><code>1bf1493</code></a>
chore: update known versions for 0.7.17 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/468">#468</a>)</li>
<li>Additional commits viewable in <a
href="bd01e18f51...7edac99f96">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.3.1&new-version=6.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-19 21:10:35 -05:00
ehhuang
0a6e588f68
feat: enable auth for LocalFS Files Provider (#2773)
Some checks failed
Integration Tests / discover-tests (push) Successful in 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Coverage Badge / unit-tests (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Test Llama Stack Build / build (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Python Package Build Test / build (3.13) (push) Failing after 2m19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 2m25s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m32s
Integration Tests / test-matrix (push) Failing after 2m24s
Pre-commit / pre-commit (push) Successful in 3m57s
# What does this PR do?
Supports authentication for LocalFS Files provider.

closes https://github.com/meta-llama/llama-stack/issues/2760

## Test Plan
CI. added tests.
2025-07-18 19:11:01 -07:00
Ashwin Bharambe
dd303327f3
feat(ci): add a ci-tests distro (#2826) 2025-07-18 17:11:06 -07:00
Ashwin Bharambe
199f859eec
feat(vllm): periodically refresh models (#2823)
Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.

## Test Plan

Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:

```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml
```

Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
2025-07-18 15:53:09 -07:00
Ashwin Bharambe
ade075152e
chore: kill inline::vllm (#2824)
Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853
2025-07-18 15:52:18 -07:00
Ashwin Bharambe
68a2dfbad7
feat(ollama): periodically refresh models (#2805)
For self-hosted providers like Ollama (or vLLM), the backing server is
running a set of models. That server should be treated as the source of
truth and the Stack registry should just be a cache for those models. Of
course, in production environments, you may not want this (because you
know what model you are running statically) hence there's a config
boolean to control this behavior.

_This is part of a series of PRs aimed at removing the requirement of
needing to set `INFERENCE_MODEL` env variables for running Llama Stack
server._

## Test Plan

Copy and modify the starter.yaml template / config and enable
`refresh_models: true, refresh_models_interval: 10` for the ollama
provider. Then, run:

```
LLAMA_STACK_LOGGING=all=debug \
  ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml
```

See a gargantuan amount of logs, but verify that the provider is
periodically refreshing models. Stop and prune a model from ollama
server, restart the server. Verify that the model goes away when I call
`uv run llama-stack-client models list`
2025-07-18 12:20:36 -07:00
ehhuang
6d55f2f137
feat: enable ls client for files tests (#2769)
# What does this PR do?
titled

## Test Plan
CI
2025-07-18 12:10:30 -07:00
Nehanth Narendrula
874b1cb00f
fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 4s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Python Package Build Test / build (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Test External Providers / test-external-providers (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Integration Tests / test-matrix (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Test Llama Stack Build / build (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 1m50s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s
Pre-commit / pre-commit (push) Successful in 3m20s
# What does this PR do?

This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct
Preference Optimization (DPO) parameters.

The current schema incorrectly uses PPO-inspired parameters
(`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of
the DPO algorithm. This PR updates it to use the standard DPO
parameters:

- `beta`: The KL divergence coefficient that controls deviation from the
reference model
- `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo,
kto_pair)

These parameters align with standard DPO implementations like
HuggingFace's TRL library.

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
2025-07-18 11:56:00 -07:00
Charlie Doern
d994305f0a
fix: remove disabled providers from model dump (#2784)
# What does this PR do?

currently when running `llama stack run --template starter...` the
__disabled__ providers, their models, etc are printed alongside the
enabled ones making the output really confusing

in server.py add a utility `remove_disabled_providers` which
post-processes the model_dump output to remove any dict with
`provider_id: __disabled__`

we also have `debug` logs printing the disabled providers, so I think
its safe to say that is the only indicator we need when using starter.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

before (output truncated because it was huge):


```
...
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-11B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-11B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-90B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-90B-Vision-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Scout-17B-16E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Scout-17B-16E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Maverick-17B-128E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Maverick-17B-128E-Instruct
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Meta-Llama-Guard-3-8B
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Meta-Llama-Guard-3-8B
         - metadata: {}
           model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-Guard-3-8B
           model_type: llm
           provider_id: __disabled__
           provider_model_id: sambanova/Meta-Llama-Guard-3-8B
         - metadata:
             embedding_dimension: 384
           model_id: all-MiniLM-L6-v2
           model_type: embedding
           provider_id: sentence-transformers
           provider_model_id: null
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           datasetio:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /Users/charliedoern/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               api_key: '********'
               base_url: https://api.cerebras.ai
             provider_id: __disabled__
             provider_type: remote::cerebras
           - config:
               url: http://localhost:11434
             provider_id: ollama
             provider_type: remote::ollama
           - config:
               api_token: '********'
               max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
               tls_verify: ${env.VLLM_TLS_VERIFY:=true}
               url: ${env.VLLM_URL}
             provider_id: __disabled__
             provider_type: remote::vllm
           - config:
               url: ${env.TGI_URL}
             provider_id: __disabled__
             provider_type: remote::tgi
           - config:
               api_token: '********'
               huggingface_repo: ${env.INFERENCE_MODEL}
             provider_id: __disabled__
             provider_type: remote::hf::serverless
           - config:
               api_token: '********'
               endpoint_name: ${env.INFERENCE_ENDPOINT_NAME}
             provider_id: __disabled__
             provider_type: remote::hf::endpoint
           - config:
               api_key: '********'
               url: https://api.fireworks.ai/inference/v1
             provider_id: __disabled__
             provider_type: remote::fireworks
           - config:
               api_key: '********'
               url: https://api.together.xyz/v1
             provider_id: __disabled__
             provider_type: remote::together
           - config: {}
             provider_id: __disabled__
             provider_type: remote::bedrock
           - config:
               api_token: '********'
               url: ${env.DATABRICKS_URL}
             provider_id: __disabled__
             provider_type: remote::databricks
           - config:
               api_key: '********'
               append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True}
               url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com}
             provider_id: __disabled__
             provider_type: remote::nvidia
           - config:
               api_token: '********'
               url: ${env.RUNPOD_URL:=}
             provider_id: __disabled__
             provider_type: remote::runpod
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::openai
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::anthropic
           - config:
               api_key: '********'
             provider_id: __disabled__
             provider_type: remote::gemini
           - config:
               api_key: '********'
               url: https://api.groq.com
             provider_id: __disabled__
             provider_type: remote::groq
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.fireworks.ai/inference/v1
             provider_id: __disabled__
             provider_type: remote::fireworks-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.llama.com/compat/v1/
             provider_id: __disabled__
             provider_type: remote::llama-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.together.xyz/v1
             provider_id: __disabled__
             provider_type: remote::together-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.groq.com/openai/v1
             provider_id: __disabled__
             provider_type: remote::groq-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.sambanova.ai/v1
             provider_id: __disabled__
             provider_type: remote::sambanova-openai-compat
           - config:
               api_key: '********'
               openai_compat_api_base: https://api.cerebras.ai/v1
             provider_id: __disabled__
             provider_type: remote::cerebras-openai-compat
           - config:
               api_key: '********'
               url: https://api.sambanova.ai/v1
             provider_id: __disabled__
             provider_type: remote::sambanova
           - config:
               api_key: '********'
               url: ${env.PASSTHROUGH_URL}
             provider_id: __disabled__
             provider_type: remote::passthrough
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: huggingface
               device: cpu
               distributed_backend: null
             provider_id: huggingface
             provider_type: inline::huggingface
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               otel_exporter_otlp_endpoint: null
               service_name: "\u200B"
               sinks: console,sqlite
               sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
           - config:
               db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec_registry.db
                 type: sqlite
             provider_id: __disabled__
             provider_type: inline::sqlite-vec
           - config:
               db_path: ${env.MILVUS_DB_PATH:=~/.llama/distributions/starter}/milvus.db
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/milvus_registry.db
                 type: sqlite
             provider_id: __disabled__
             provider_type: inline::milvus
           - config:
               url: ${env.CHROMADB_URL:=}
             provider_id: __disabled__
             provider_type: remote::chromadb
           - config:
               db: ${env.PGVECTOR_DB:=}
               host: ${env.PGVECTOR_HOST:=localhost}
               kvstore:
                 db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/pgvector_registry.db
                 type: sqlite
               password: '********'
               port: ${env.PGVECTOR_PORT:=5432}
               user: ${env.PGVECTOR_USER:=}
             provider_id: __disabled__
             provider_type: remote::pgvector
         scoring_fns: []
         server:
           auth: null
           host: null
           port: 8321
           quota: null
           tls_cafile: null
           tls_certfile: null
           tls_keyfile: null
         shields:
         - params: null
           provider_id: null
           provider_shield_id: ollama/__disabled__
           shield_id: __disabled__
         tool_groups:
         - args: null
           mcp_endpoint: null
           provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - args: null
           mcp_endpoint: null
           provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

```

after:

```
INFO     2025-07-16 13:00:32,604 __main__:448 server: Run configuration:
INFO     2025-07-16 13:00:32,606 __main__:450 server: apis:
         - agents
         - datasetio
         - eval
         - files
         - inference
         - post_training
         - safety
         - scoring
         - telemetry
         - tool_runtime
         - vector_io
         benchmarks: []
         datasets: []
         image_name: starter
         inference_store:
           db_path: /Users/charliedoern/.llama/distributions/starter/inference_store.db
           type: sqlite
         metadata_store:
           db_path: /Users/charliedoern/.llama/distributions/starter/registry.db
           type: sqlite
         models:
         - metadata: {}
           model_id: ollama/llama3.2:3b
           model_type: llm
           provider_id: ollama
           provider_model_id: llama3.2:3b
         - metadata:
             embedding_dimension: 384
           model_id: all-MiniLM-L6-v2
           model_type: embedding
           provider_id: sentence-transformers
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           datasetio:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /Users/charliedoern/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               url: http://localhost:11434
             provider_id: ollama
             provider_type: remote::ollama
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: huggingface
               device: cpu
             provider_id: huggingface
             provider_type: inline::huggingface
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               service_name: "\u200B"
               sinks: console,sqlite
               sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
         scoring_fns: []
         server:
           port: 8321
         shields: []
         tool_groups:
         - provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-18 10:44:35 -07:00
slekkala1
15916852e8
chore: Add slekkala1 to codeowners (#2817)
Getting started on LLAMA Stack
2025-07-18 10:33:30 -07:00
Ashwin Bharambe
9e3ae50306
feat(server): construct the stack in a persistent event loop (#2818)
When we call `construct_stack()`, providers are instantiated and
`initialize()` is called. This call can end up doing _anything_ at all
-- specifically, providers are free to create long running background
tasks as part of this. If we wrapped this within a `asyncio.run()` as in
the current code, these tasks get canceled when the stack construction
finishes. This is not correct. The PR addresses the issue by creating a
persistent event loop which is used for both the stack as well as for
running the uvicorn server. In other words, the lifetime of the
providers (and downstream async code) is now the same as the lifetime of
the uvicorn server.

## Test Plan

This should not affect any current code since we don't have background
tasks created right now. However,
https://github.com/meta-llama/llama-stack/pull/2805 will start using
this functionality.
2025-07-18 10:29:19 -07:00
Nathan Weinberg
2bb9039173
docs: fix steps in the Quick Start Guide (#2800)
# What does this PR do?
'build' command didn't take into account ENABLE flags for starter distro

for some reason, I was having issues with HuggingFace access for the
embedding model, so added a tip for that as well

Closes #2779

## Test Plan
I ran the described steps manually, but it would be nice if someone else
could try it and verify this still works

We might consider having some CI job ensure the QSG remains functional -
it's not a great experience for new users if they try Llama Stack for
the first time and it doesn't work as we describe

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-18 09:08:46 -07:00
Christian Zaccaria
e45543f7f3
test: Measure and track code coverage (#2636)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Added coverage badge to README. - [See my
fork](https://github.com/ChristianZaccaria/llama-stack)
- Added a GitHub Actions workflow that runs the tests and updates the
coverage badge. - [See
run](4574811323)
- Documented steps in `testing.md` for running the tests locally, and
viewing the `html` report.
- Excluded non-essential files from coverage reporting to provide a more
accurate measurement.

Automatically created PR to update coverage badge:
https://github.com/ChristianZaccaria/llama-stack/pull/9

# Note for reviewers
1. Currently the coverage report shows a 45% coverage. Wondering if
there are other files or directories that should also be excluded from
the report to increase the percentage. The directories with the least
test coverage are `llama_stack/cli`, `llama_stack/models`, and
`llama_stack/ui`. - Should we exclude these?
2. **[Required]** The `GITHUB_TOKEN` should have write permissions to
open a PR to update the coverage badge.

# GitHub Issue
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2355 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
The `testing.md` file describes how to run the unit tests locally.
2025-07-18 18:08:36 +02:00
Nathan Weinberg
1785a6b39c
docs: add virtualenv instructions for running starter distro (#2780)
# What does this PR do?
we had directions for a container and conda but not venv

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-18 09:07:43 -07:00
Charlie Doern
0eb0583cdf
fix: amend integration test workflow (#2812)
# What does this PR do?

trigger integration tests on ALL changes to `tests/` to catch failures
before they merge into main

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-18 15:23:36 +02:00
Mustafa Elbehery
fe6af7dc8b
chore(test): migrate unit tests from unittest to pytest nvidia test f… (#2794)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Python Package Build Test / build (3.13) (push) Failing after 11s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Integration Tests / test-matrix (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 29s
Python Package Build Test / build (3.12) (push) Failing after 1m46s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m44s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m51s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m53s
Pre-commit / pre-commit (push) Successful in 3m17s
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 12:32:19 +02:00
Mustafa Elbehery
b78b8e1486
chore: add mypy inference parallel utils (#2670)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 12:01:10 +02:00
Mustafa Elbehery
ca7edcd6a4
chore(api): add mypy coverage to chat_format (#2654)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:56:53 +02:00
Mustafa Elbehery
75480b01b8
chore(test): migrate unit tests from unittest to pytest for system prompt (#2789)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:54:02 +02:00
Mustafa Elbehery
3cdf748a8e
chore(test): migrate unit tests from unittest to pytest for nvidia datastore (#2790)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:52:47 +02:00
Mustafa Elbehery
55713abe7d
chore(test): migrate unit tests from unittest to pytest nvidia test p… (#2792)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-18 11:49:45 +02:00
Charlie Doern
d7cc38e934
fix: remove async test markers (fix pre-commit) (#2808)
# What does this PR do?

some async test markers are in the codebase causing pre-commit to fail
due to #2744

remove these pytest fixtures

## Test Plan
pre-commit passes

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-17 21:35:28 -07:00
Ashwin Bharambe
d64e096c5f
fix(cli): image name should not default to CONDA_DEFAULT_ENV (#2806)
Some checks failed
Integration Tests / discover-tests (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Unit Tests / unit-tests (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 53s
Pre-commit / pre-commit (push) Failing after 2m14s
If I am running `uv run llama stack run --image-type venv` it should not
be saying to me "Conda detected" because I am pretty clearly telling it
I need venv. The root cause is the offending line.
2025-07-17 16:40:35 -07:00
Matthew Farrellee
910b017680
chore: block asyncio marks in tests (#2744)
# What does this PR do?

use pre-commit to block addition of new asyncio marks, since we
configure pytest with async-mode=auto, see
https://github.com/meta-llama/llama-stack/pull/2730
2025-07-17 16:33:30 -07:00
Mustafa Elbehery
bd8a3ae3cc
chore(test): migrate unit tests from unittest to pytest for prompt adapter (#2788)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-07-17 16:31:38 -07:00
ehhuang
3ae4aeb344
test: add some tests for Telemetry API (#2787)
# What does this PR do?

## Test Plan
ENABLE_OLLAMA=ollama LLAMA_STACK_CONFIG=starter uv run pytest
tests/integration/telemetry
--text-model="ollama/llama3.2:3b-instruct-fp16"
2025-07-17 16:20:51 -07:00
Mustafa Elbehery
73868ce9e3
chore(test): migrate unit tests from unittest to pytest for server en… (#2795)
This PR replaces unittest with pytest.

Part of https://github.com/meta-llama/llama-stack/issues/2680

cc @leseb

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-17 16:20:12 -07:00
Matthew Farrellee
477bcd4d09
feat: allow dynamic model registration for nvidia inference provider (#2726)
# What does this PR do?

let's users register models available at
https://integrate.api.nvidia.com/v1/models that isn't already in
llama_stack/providers/remote/inference/nvidia/models.py

## Test Plan

1. run the nvidia distro
2. register a model from https://integrate.api.nvidia.com/v1/models that
isn't already know, as of this writing
nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example
3. perform inference w/ the model
2025-07-17 12:11:30 -07:00
Matthew Farrellee
57745101be
chore: internal change, make Model.provider_model_id non-optional (#2690)
Some checks failed
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 25s
Test Llama Stack Build / build-single-provider (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 30s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s
Unit Tests / unit-tests (3.12) (push) Failing after 32s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 40s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 32s
Unit Tests / unit-tests (3.13) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 47s
Python Package Build Test / build (3.13) (push) Failing after 1m51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m58s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s
Integration Tests / test-matrix (push) Failing after 36s
Test Llama Stack Build / build (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 3m40s
- POST /v1/models accepts optional provider_model_id
- ModelsRoutingTable.register_model handler ensures it is non-None,
providing a default

usage of Model.provider_model_id will no longer need to detect None
2025-07-17 08:26:57 -07:00
Derek Higgins
c2b64dce5b
fix: Move sentence-transformers to the top (#2703)
Move sentence-transformers to be the first embedding in the list of
models. This ensures it will always be the default and is more
consistent then having the default change based on what env variables
are available

Closes: #2702

## Test Plan
Manually verified

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-17 10:31:30 -04:00
ehhuang
51b179e1c5
chore: update k8s template (#2786)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 50s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 58s
Unit Tests / unit-tests (3.13) (push) Failing after 54s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m40s
# What does this PR do?
- enables auth
- updates to use distribution-starter docker

## Test Plan
bash apply.sh
2025-07-16 15:07:26 -07:00
IAN MILLER
b57db11bed
feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s
Unit Tests / unit-tests (3.13) (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this task is to create a solution that can automatically
detect when new models are added, deprecated, or removed by OpenAI and
Llama API providers, and automatically update the list of supported
models in LLamaStack.

This feature is vitally important in order to avoid missing new models
and editing the entries manually hence I created automation allowing
users to dynamically register:
- any models from OpenAI provider available at 
[https://api.openai.com/v1/models](https://api.openai.com/v1/models)
that are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py)

- any models from Llama API provider available at
[https://api.llama.com/v1/models](https://api.llama.com/v1/models) that
are not in
[https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2504

this PR is dependant on #2710

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

1. Create venv at root llamastack directory:
`uv venv .venv --python 3.12 --seed`    
2. Activate venv:
`source .venv/bin/activate`   
3. `uv pip install -e .`
4. Create OpenAI distro modifying run.yaml
5. Build distro:
`llama stack build --template starter --image-type venv`
6. Then run LlamaStack, but before navigate to templates/starter folder:
`llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY>
ENABLE_OPENAI=openai`
7. Then try to register dummy llm that doesn't exist in OpenAI provider:
` llama-stack-client models register ianm/ianllm
--provider-model-id=ianllm --provider-id=openai `
 
You should receive this output - combined list of static config +
fetched available models from OpenAI:
 
<img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50"
src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883"
/>

8. Then register real llm from OpenAI:
llama-stack-client models register openai/gpt-4-turbo-preview
--provider-model-id=gpt-4-turbo-preview --provider-id=openai

<img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02"
src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0"
/>
<img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11"
src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee"
/>

We correctly fetched all available models from OpenAI

As for Llama API, as a non-US person I don't have access to Llama API
Key but I joined wait list. The implementation for Llama is the same as
for OpenAI since Llama is openai compatible. So, the response from GET
endpoint has the same structure as OpenAI
https://llama.developer.meta.com/docs/api/models
2025-07-16 12:49:38 -04:00
Charlie Doern
6c516d391b
fix: de-clutter llama stack run logs (#2783)
# What does this PR do?

currently each disabled provider is printed as a warning, switch to
debug. This level of verbosity isn't necessary, especially if we intend
to grow the list of providers over time that can be in a single run yaml


## Test Plan

before:

<img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37
18 PM"
src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71"
/>

after:
<img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM"
src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-16 09:44:26 -07:00
Nathan Weinberg
919ee3199b
docs: add missing bold title to match others (#2782)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 18:05:48 +02:00
Sergey Yedrikov
30be1fd8b7
fix: SQLiteVecIndex.create(..., bank_id="test_bank.123") - bank_id with a dot - leads to sqlite3.OperationalError (#2770) (#2771)
# What does this PR do?
Resolves https://github.com/meta-llama/llama-stack/issues/2770. It
replaces characters in SQLite table names that are not alphanumeric or
underscores with underscores and quotes the table names with square
brackets in SQL statements.

Closes #[2770]

## Test Plan
I added a ".123" suffix to the bank_id on the following line
```
    index = await SQLiteVecIndex.create(dimension=embedding_dimension, db_path=db_path, bank_id="test_bank.123")
```
in tests/unit/providers/vector_io/test_sqlite_vec.py, which, without the
fix in place, demonstrates the issue.
2025-07-16 08:25:44 -07:00
Nathan Weinberg
72e606355d
fix: add shutdown function for localfs provider (#2781)
# What does this PR do?
this was causing an unnessessary logger warning

## Test Plan
Run `LLAMA_STACK_DIR=. ENABLE_OLLAMA=ollama
OLLAMA_INFERENCE_MODEL=llama3.2:3b llama stack build --template starter
--image-type venv --run` and then `Crtl-C` to shutdown

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 08:24:57 -07:00
Nathan Weinberg
3165197b75
chore: remove 'gha_workflow_llama_stack_tests.yml' (#2767)
This was introduced in
https://github.com/meta-llama/llama-stack/pull/523 but as far as I can
tell has never been used. It's been over six months so it feels fair to
remove it at this point.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-16 07:12:26 -07:00
Matthew Farrellee
a3e249807b
chore: remove vision model URL workarounds and simplify client creation (#2775)
The vision models are now available at the standard URL, so the
workaround code has been removed. This also simplifies the codebase by
eliminating the need for per-model client caching.

- Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct
models
- Convert _get_client method to _client property for cleaner API
- Remove unnecessary lru_cache decorator and functools import
- Simplify client creation logic to use single base URL for all models
2025-07-16 07:10:04 -07:00
IAN MILLER
fa1bb9ae00
docs: fix typo and link self loop for index.html#running-tests (#2777)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes typo "here here" and self loop link at
[https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md](https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2762

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-16 07:09:44 -07:00
Sébastien Han
ff9d4d8a9d
ci: do not pull model (#2776)
the model is now available in the container image

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-16 04:58:05 -07:00
Sébastien Han
f85189022c
fix: re-hydrate requirement and fix package (#2774)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Tests / discover-tests (push) Successful in 6s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 26s
Test Llama Stack Build / build (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 35s
Pre-commit / pre-commit (push) Successful in 1m20s
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-16 05:46:15 -04:00
Ashwin Bharambe
95fdc8ea94 build: Bump version to 0.2.15 2025-07-15 20:29:08 -07:00
Kelly Brown
b096794959
docs: Reorganize documentation on the webpage (#2651)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests / discover-tests (push) Successful in 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 14s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test Llama Stack Build / build (push) Failing after 25s
Integration Tests / test-matrix (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do?
Reorganizes the Llama stack webpage into more concise index pages,
introduce more of a workflow, and reduce repetition of content.

New nav structure so far based on #2637 

Further discussions in
https://github.com/meta-llama/llama-stack/discussions/2585

**Preview:**
![Screenshot 2025-07-09 at 2 31
53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af)

You can also build a full local preview locally 

 **Feedback**
Looking for feedback on page titles and general feedback on the new
structure

**Follow up documentation**
I plan on reducing some sections and standardizing some terminology in a
follow up PR.
More discussions on that in
https://github.com/meta-llama/llama-stack/discussions/2585
2025-07-15 14:19:35 -07:00
Francisco Arceo
e1755d1ed2
chore: Adding OpenAI Vector Stores Files API compatibility for PGVector (#2755)
# What does this PR do?
Adding OpenAI Vector Stores Files API compatibility for PGVector

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Updated CI to include PGVector

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 15:46:49 -04:00
ehhuang
e64e4fc5a2
test: add tests against published client (#2752)
# What does this PR do?
closes #2751

## Test Plan

---------

Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>
2025-07-15 12:25:31 -07:00
Mark Campbell
65fcd03461
docs: update outdated llama stack client documentation (#2758)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Adds new documentation that was missing for the Llama Stack Python
Client as well as updates old/outdated docs
2025-07-15 11:49:59 -07:00
Nathan Weinberg
b3d86ca926
fix: stop image_name from being cast to an integer (#2759)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 2m1s
# What does this PR do?

https://github.com/meta-llama/llama-stack/pull/2490 introduced a new
function for type conversion of strings.

However, a side effect of this is that it will cast any string that can
be cast to an integer if possible, which for something like `image_name`
is not desired as we only accept strings for this field in the
`StackRunConfig`

This PR introduces logic to ensure that `image_name` remains a string 

Closes #2749

## Test Plan

You can run the original step to reproduce from the bug to verify this
manually
```bash
OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run
```

I have also added an additional unit test to prevent any future
regression here

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-15 09:44:21 -07:00
Francisco Arceo
31b088978a
fix: Fix /vector-stores/create API when vector store with duplicate name (#2617)
# What does this PR do?

Resolves https://github.com/meta-llama/llama-stack/issues/2735

Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).

This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.

Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.

NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.

## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv

LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```

Unit tests and test script below 👇 

<details> 
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>

```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os

# Initialize colorama for color support in terminal
init(autoreset=True)

# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2


def colored_print(color, text):
    """Prints text to the console with the specified color."""
    print(f"{color}{text}{Style.RESET_ALL}")


def log_and_print(color, message, level=logging.INFO):
    """Logs a message and prints it to the console with the specified color."""
    logging.log(level, message)
    colored_print(color, message)


def run_tests(client, prefix="openai"):
    """
    Runs all tests using the provided OpenAI client and saves the output
    to JSON files with the given prefix.
    """
    # Create the directory if it doesn't exist
    os.makedirs('openai_testing', exist_ok=True)

    # Default values in case tests fail
    global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
    DEMO_VECTOR_STORE_ID = None
    DEMO_VECTOR_STORE_ID2 = None

    def test_idempotent_vector_store_creation():
        """
        Test that creating a vector store with the same name is idempotent.
        """
        log_and_print(Fore.BLUE, "Starting vector store creation test...")
        try:
            vector_store = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Attempt to create the same vector store again
            vector_store2 = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Check instead of assert
            if vector_store2.id != vector_store.id:
                log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)

            global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
            DEMO_VECTOR_STORE_ID = vector_store.id
            DEMO_VECTOR_STORE_ID2 = vector_store2.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
        except Exception as e:
            log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            # Create a fallback vector store ID if needed
            if 'vector_store' in locals() and vector_store:
                DEMO_VECTOR_STORE_ID = vector_store.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2

    def test_vector_store_list():
        """
        Test listing vector stores.
        """
        log_and_print(Fore.BLUE, "Starting vector store list test...")
        try:
            vector_stores = client.vector_stores.list()

            # Check instead of assert
            if not isinstance(vector_stores, pagination.SyncCursorPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Vector store list test passed!")

            vector_stores_data = vector_stores.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
                json.dump(vector_stores_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_retrieve_vector_store():
        """
        Test retrieving a specific vector store.
        """
        log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            vector_store = client.vector_stores.retrieve(
                vector_store_id=DEMO_VECTOR_STORE_ID,
            )

            # Check instead of assert
            if vector_store.id != DEMO_VECTOR_STORE_ID:
                log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Retrieve vector store test passed!")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_modify_vector_store():
        """
        Test modifying a vector store.
        """
        log_and_print(Fore.BLUE, "Starting modify vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            updated_vector_store = client.vector_stores.update(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                name="Updated Support FAQ FJA",
            )

            # Check instead of assert
            if updated_vector_store.name != "Updated Support FAQ FJA":
                log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Modify vector store test passed!")

            updated_vector_store_data = updated_vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
                json.dump(updated_vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_delete_vector_store():
        """
        Test deleting a vector store.
        """
        log_and_print(Fore.BLUE, "Starting delete vector store test...")
        if not DEMO_VECTOR_STORE_ID2:
            log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
                          level=logging.WARNING)
            return

        try:
            response = client.vector_stores.delete(
                vector_store_id=DEMO_VECTOR_STORE_ID2,
            )

            log_and_print(Fore.GREEN, "Delete vector store test passed!")

            response_data = response.to_dict()
            log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
                json.dump(response_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_create_vector_store_file():
        log_and_print(Fore.BLUE, "Starting create vector store file test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            # create jsonl of files as an example
            with open("mydata.jsonl", "w") as f:
                f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')

            # Create a simple text file if my_data_small.txt doesn't exist
            if not os.path.exists("my_data_small.txt"):
                with open("my_data_small.txt", "w") as f:
                    f.write("This is a test file for vector store testing.\n")

            created_file = client.files.create(
                file=open("my_data_small.txt", "rb"),
                purpose="assistants",
            )

            created_file_data = created_file.to_dict()
            log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
                json.dump(created_file_data, f, indent=2)

            retrieved_files = client.files.retrieve(created_file.id)
            retrieved_files_data = retrieved_files.to_dict()
            log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
                json.dump(retrieved_files_data, f, indent=2)

            vector_store_file = client.vector_stores.files.create(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                file_id=created_file.id,
            )
            log_and_print(Fore.GREEN, "Create vector store file test passed!")
        except Exception as e:
            log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_search_vector_store():
        """
        Test searching a vector store.
        """
        log_and_print(Fore.BLUE, "Starting search vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            query = "What is the banana policy?"
            search_results = client.vector_stores.search(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                query=query,
                max_num_results=10,
                ranking_options={
                    'ranker': 'default-2024-11-15',
                    'score_threshold': 0.0,
                },
                rewrite_query=False,
            )

            # Check instead of assert
            if not isinstance(search_results, pagination.SyncPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Search vector store test passed!")

            search_results_dict = search_results.to_dict()
            log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
            with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
                json.dump(search_results_dict, f, indent=2)

            log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
        except Exception as e:
            log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    # Run all tests in sequence, even if some fail
    test_results = []

    try:
        result = test_idempotent_vector_store_creation()
        if result and len(result) == 2:
            DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
        test_results.append(True)
    except Exception as e:
        log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
        test_results.append(False)

    for test_func in [
        test_vector_store_list,
        test_retrieve_vector_store,
        test_modify_vector_store,
        test_delete_vector_store,
        test_create_vector_store_file,
        test_search_vector_store
    ]:
        try:
            test_func()
            test_results.append(True)
        except Exception as e:
            log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            test_results.append(False)

    if all(test_results):
        log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
    else:
        failed_count = test_results.count(False)
        log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
    parser.add_argument(
        "--provider",
        type=str,
        default="llama",
        choices=["openai", "llama", "both"],
        help="Specify which environment to test: openai, llama, or both. Default is both.",
    )
    args = parser.parse_args()

    try:
        if args.provider in ("openai", "both"):
            openai_client = OpenAI()
            run_tests(openai_client, prefix="openai")

        if args.provider in ("llama", "both"):
            llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
            run_tests(llama_client, prefix="llama")

        log_and_print(Fore.GREEN, "All tests completed!")

    except Exception as e:
        log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 11:24:41 -04:00
ehhuang
5400a2e2b1
chore: remove tests.yaml (#2754)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / discover-tests (push) Successful in 23s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 26s
Python Package Build Test / build (3.12) (push) Failing after 22s
Integration Tests / test-matrix (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s
Unit Tests / unit-tests (3.13) (push) Failing after 57s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m2s
Pre-commit / pre-commit (push) Successful in 1m51s
# What does this PR do?
Don't think this is used anymore

## Test Plan
2025-07-14 22:02:37 -07:00
Varsha
4ae5656c2f
feat: Implement keyword search in milvus (#2231)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m9s
# What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with
the implementation for remote Milvus, the tests require us to start a
Milvus containers locally.

In order to verify the implementation, run:
```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```

You can also test the changes using the below script:
```
#!/usr/bin/env python3
import asyncio
import os
import uuid
from typing import List

from llama_stack_client import (
    Agent, 
    AgentEventLogger, 
    LlamaStackClient, 
    RAGDocument
)


class MilvusRAGDemo:
    def __init__(self, base_url: str = "http://localhost:8321/"):
        self.client = LlamaStackClient(base_url=base_url)
        self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}"
        self.model_id = None
        self.embedding_model_id = None
        self.embedding_dimension = None
        
    def setup_models(self):
        """Get available models and select appropriate ones for LLM and embeddings."""
        models = self.client.models.list()
    
        # Select embedding model
        embedding_models = [m for m in models if m.model_type == "embedding"]
        if not embedding_models:
            raise ValueError("No embedding models found")
        self.embedding_model_id = embedding_models[0].identifier
        self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"]
        
    def register_vector_db(self):
        print(f"Registering Milvus vector database: {self.vector_db_id}")
        
        response = self.client.vector_dbs.register(
            vector_db_id=self.vector_db_id,
            embedding_model=self.embedding_model_id,
            embedding_dimension=self.embedding_dimension,
            provider_id="milvus-remote",  # Use remote Milvus
        )
        print(f"Vector database registered successfully")
        return response
        
    def insert_documents(self):
        """Insert sample documents into the vector database."""
        print("\nInserting sample documents...")
        
        # Sample documents about different topics
        documents = [
            RAGDocument(
                document_id="ai_ml_basics",
                content="""
                Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world.
                AI refers to the simulation of human intelligence in machines, while ML is a subset
                of AI that enables computers to learn and improve from experience without being
                explicitly programmed. Deep learning, a subset of ML, uses neural networks with
                multiple layers to process complex patterns in data.
                
                Key concepts in AI/ML include:
                - Supervised Learning: Training with labeled data
                - Unsupervised Learning: Finding patterns in unlabeled data
                - Reinforcement Learning: Learning through trial and error
                - Neural Networks: Computing systems inspired by biological brains
                """,
                mime_type="text/plain",
                metadata={"topic": "technology", "category": "ai_ml"},
            ),
        ]
        
        # Insert documents with chunking
        self.client.tool_runtime.rag_tool.insert(
            documents=documents,
            vector_db_id=self.vector_db_id,
            chunk_size_in_tokens=200,  # Smaller chunks for better granularity
        )
        print(f"Inserted {len(documents)} documents with chunking")
                
    def test_keyword_search(self):
        """Test keyword-based search using BM25."""
        
        queries = [
            "neural networks",
            "Python frameworks",
            "data cleaning",
        ]
        
        for query in queries:
            response = self.client.vector_io.query(
                vector_db_id=self.vector_db_id,
                query=query,
                params={
                    "mode": "keyword",  # Keyword search
                    "max_chunks": 3,
                    "score_threshold": 0.0,
                }
            )
            
            for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)):
                print(f"  {i+1}. Score: {score:.4f}")
                print(f"     Content: {chunk.content[:100]}...")
                print(f"     Metadata: {chunk.metadata}")    

                
    def run_demo(self):       
        try:
            self.setup_models()
            self.register_vector_db()
            self.insert_documents()
            self.test_keyword_search()
        except Exception as e:
            print(f"Error during demo: {e}")
            raise


def main():
    """Main function to run the demo."""
    # Check if Llama Stack server is running
    demo = MilvusRAGDemo()    
    try:
        demo.run_demo()
    except Exception as e:
        print(f"Demo failed: {e}")

if __name__ == "__main__":
    main()
```

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-07-14 19:39:55 -04:00
Francisco Arceo
33f0d83ad3
chore: Move vector store kvstore implementation into openai_vector_store_mixin.py (#2748) 2025-07-14 18:10:35 -04:00
Hardik Shah
6b8a8c1be9
fix: Safety in starter (#2731)
- fireworks, together do not support Llama-guard 3 8b model anymore 
- Need to default to ollama 
- current safety shields logic was not correct since the shield_id was
the provider ( which had duplicates )
- Followed similar logic to models 

Note: Seems a bit over-engineered but this can now be extended to other
providers and fits in the overall mechanism of how env_vars are used to
manage starter.

### How to test 
```
ENABLE_OLLAMA=ollama ENABLE_FIREWORKS=fireworks SAFETY_MODEL=llama-guard3:1b pytest -s -v tests/integration/ --stack-config starter -k 'not(supervised_fine_tune or builtin_tool_code or safety_with_image or code_interpreter_for or rag_and_code or truncation or register_and_unregister)' --text-model fireworks/meta-llama/Llama-3.3-70B-Instruct --vision-model fireworks/meta-llama/Llama-4-Scout-17B-16E-Instruct --safety-shield llama-guard3:1b --embedding-model all-MiniLM-L6-v2
```

### Related but not obvious in this PR 
In the llama-stack-ops repo, we run tests before publishing packages and
docker containers.
The actions in that repo were using the fireworks / together distros (
which are non-existent )

So need to update that to run with `starter` and use `ollama`
specifically for safety.
2025-07-14 15:07:40 -07:00
Nathan Weinberg
6ad22c209f
chore: add issue template for technical debt (#2753)
# What does this PR do?
Adds a template for technical debt. Currently we don't support blank
issues so everything filed has to a bug or a feature.
This would allow maintainers as well as community members to track
things we might want to merge to expose the functionality but should be
addressed later. Such things can also be "good first issues" for new
contributors.

## Example of what we constitute as technical debt
Inelegant code solutions, tests we intend to temporarily disable but
would like to restore, CI hacks around infrastructure or installation,
etc.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-14 14:41:44 -07:00
ehhuang
aa0840c281
docs: fix building distro link (#2750)
# What does this PR do?


## Test Plan

Co-authored-by: raghotham <rsm@meta.com>
2025-07-14 12:06:56 -07:00
Matthew Farrellee
f731f369a2
feat: add infrastructure to allow inference model discovery (#2710)
# What does this PR do?

inference providers each have a static list of supported / known models.
some also have access to a dynamic list of currently available models.
this change gives prodivers using the ModelRegistryHelper the ability to
combine their static and dynamic lists.

for instance, OpenAIInferenceAdapter can implement
```
   def query_available_models(self) -> list[str]:
      return [entry.model for entry in self.openai_client.models.list()]
```
to augment its static list w/ a current list from openai.

## Test Plan

scripts/unit-test.sh
2025-07-14 11:38:53 -07:00
Derek Higgins
a7ed86181c
fix(faiss): Delete file contents from kvstore (#2686)
Remove both the metadata and content from the kvstore when a file is
being removed from the vector store.

Closes: #2685

Also add faiss provider to openai_vector_stores test suite

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: raghotham <rsm@meta.com>
2025-07-14 13:58:23 -04:00
Sumanth Kamenani
77d2c8e95d
docs: clarify run.yaml files are starting points for customization (#2746)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Integration Tests / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Update ReadTheDocs / update-readthedocs (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 29s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m12s
# What does this PR do?
This PR improves documentation clarity around run.yaml file usage. It
adds comprehensive guidance to help users understand that generated
run.yaml files are templates meant to be customized for production use,
not used as-is.

## Changes
- Add new documentation section on customizing run.yaml files
- Clarify that generated run.yaml files are templates, not production
configs
- Add guidance on customization best practices and common scenarios  
- Update existing documentation to reference customization guide
- Improve clarity around run.yaml file usage for better user experience

## Test Plan
- Verified new documentation file exists at correct location
- Confirmed documentation is properly integrated into the toctree
structure
- Checked all internal links use correct paths and reference existing
files
- Validated references are added to relevant existing documentation
files
- Documentation build testing will be handled by CI environment
2025-07-14 09:53:13 -07:00
Mark Campbell
618ccea090
feat: add input validation for search mode of rag query config (#2275)
# What does this PR do?
Adds input validation for mode in RagQueryConfig
This will prevent users from inputting search modes other than `vector`
and `keyword` for the time being with `hybrid` to follow when that
functionality is implemented.

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
```
# Check out this PR and enter the LS directory
uv sync --extra dev
```
Run the quickstart
[example](https://llama-stack.readthedocs.io/en/latest/getting_started/#step-3-run-the-demo)
Alter the Agent to include a query_config
```
agent = Agent(
    client,
    model=model_id,
    instructions="You are a helpful assistant",
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {
                "vector_db_ids": [vector_db_id],
                "query_config": {
                    "mode": "i-am-not-vector", # Test for non valid search mode
                    "max_chunks": 6
                }
            },
        }
    ],
)
```
Ensure you get the following error:
```
400: {'errors': [{'loc': ['mode'], 'msg': "Value error, mode must be either 'vector' or 'keyword' if supported by the vector_io provider", 'type': 'value_error'}]}
```

## Running unit tests
```
uv sync --extra dev
uv run pytest tests/unit/rag/test_rag_query.py -v
```

[//]: # (## Documentation)
2025-07-14 09:11:34 -04:00
Francisco Arceo
958fc92b1b
feat: Add Vector stores UI (#2737)
Some checks failed
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 26s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 30s
Test External Providers / test-external-providers (venv) (push) Failing after 24s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 31s
Integration Tests / test-matrix (push) Failing after 56s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m1s
Pre-commit / pre-commit (push) Successful in 1m42s
Integration Tests / discover-tests (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
# What does this PR do?
- Adds two pages to UI
  - Vector stores
  - Vector store detail view
- Fixed darkmode navbar highlighting
- Updated darkmode font color
- Updated llama-stack-client package

<img width="1916" height="734" alt="Screenshot 2025-07-12 at 11 34
35 PM"
src="https://github.com/user-attachments/assets/3f9b6727-ee82-4e6b-9555-2e3ef36d24d2"
/>

<img width="1912" height="910" alt="Screenshot 2025-07-12 at 11 57
09 PM"
src="https://github.com/user-attachments/assets/0c9d3b5e-5592-4dfb-8e04-a57edc9fb406"
/>


## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-13 01:03:55 -07:00
Matthew Farrellee
68e7978c88
chore: block network access from unit tests (#2732)
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Integration Tests / test-matrix (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Pre-commit / pre-commit (push) Successful in 1m0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test External Providers / test-external-providers (venv) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
# What does this PR do?
this blocks network access for all `tests/unit/` tests.
`tests/integration/` are untouched.

it also introduces an `allow_network` marker to explicitly allow network
access.

## Test Plan
`./scripts/unit-tests.sh`
2025-07-12 16:53:54 -07:00
dependabot[bot]
8374d4cefd
chore(github-deps): bump medyagh/setup-minikube from 0.0.19 to 0.0.20 (#2738) 2025-07-12 16:23:42 -04:00
Ben Browning
51d9fd4808
fix: Don't cache clients for passthrough auth providers (#2728)
Some checks failed
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s
Unit Tests / unit-tests (3.12) (push) Failing after 45s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s
# What does this PR do?

Some of our inference providers support passthrough authentication via
`x-llamastack-provider-data` header values. This fixes the providers
that support passthrough auth to not cache their clients to the backend
providers (mostly OpenAI client instances) so that the client connecting
to Llama Stack has to provide those auth values on each and every
request.

## Test Plan

I added some unit tests to ensure we're not caching clients across
requests for all the fixed providers in this PR.

```
uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py
```


I also ran some of our OpenAI compatible API integration tests for each
of the changed providers, just to ensure they still work. Note that
these providers don't actually pass all these tests (for unrelated
reasons due to quirks of the Groq and Together SaaS services), but
enough of the tests passed to confirm the clients are still working as
intended.

### Together

```
ENABLE_TOGETHER="together" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "together/meta-llama/Llama-3.1-8B-Instruct"
```

### OpenAI

```
ENABLE_OPENAI="openai" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "openai/gpt-4o-mini"
```

### Groq

```
ENABLE_GROQ="groq" \
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
  tests/integration/inference/test_openai_completion.py \
  --text-model "groq/meta-llama/Llama-3.1-8B-Instruct"
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-07-11 13:38:27 -07:00
Jorge Piedrahita Ortiz
aa2595c7c3
fix: sambanova shields and model validation (#2693)
# What does this PR do?
Update the shield register validation of Sambanova not to raise, but
only warn when a model is not available in the base url endpoint used,
also added warnings when model is not available in the base url endpoint
used

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
run starter distro with Sambanova enabled
2025-07-11 16:29:15 -04:00
Matthew Farrellee
30b2e6a495
chore: default to pytest asyncio-mode=auto (#2730)
# What does this PR do?

previously, developers who ran `./scripts/unit-tests.sh` would get
`asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and
`@pytest_asyncio.fixture` were redundent. developers who ran `pytest`
directly would get pytest's default (strict mode), would run into errors
leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture`
to their code.

with this change -
- `asyncio_mode=auto` is included in `pyproject.toml` making behavior
consistent for all invocations of pytest
- removes all redundant `@pytest_asyncio.fixture` and
`@pytest.mark.asyncio`
 - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0`

## Test Plan

- `./scripts/unit-tests.sh`
- `uv run pytest tests/unit`
2025-07-11 13:00:24 -07:00
Sébastien Han
2ebc172f33
fix: pin opentelemtry version (#2722)
Some checks failed
Integration Tests / test-matrix (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 54s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 2m9s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s
Integration Tests / discover-tests (push) Successful in 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Test Llama Stack Build / build (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
# What does this PR do?

Otherwise we can get old versions like 1.11 and experience this error:

```
ModuleNotFoundError: No module named 'opentelemetry.exporter.otlp.proto.http.metric_exporter'
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-11 16:25:51 +02:00
Sébastien Han
2e4eedce14
fix: container build on podman (#2723)
# What does this PR do?

COPY with chmod does not work, see
https://github.com/containers/buildah/issues/4614. Also Docker arguably
implements it.

Anyway, this command is not even needed since later don't we do:

```
RUN mkdir -p /.llama /.cache && chmod -R g+rw /app /.llama /.cache
```

And providers.d will get the right modes.

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

Build with CONTAINER_BINARY=podman and success

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-11 16:25:33 +02:00
ehhuang
d880c2df0e
fix: auth sql store: user is owner policy (#2674)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Installer CI / lint (push) Failing after 4s
Installer CI / smoke-test (push) Has been skipped
Integration Tests / discover-tests (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Integration Tests / test-matrix (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Pre-commit / pre-commit (push) Successful in 1m8s
# What does this PR do?
The current authorized sql store implementation does not respect
user.principal (only checks attributes). This PR addresses that.


## Test Plan

Added test cases to integration tests.
2025-07-10 14:40:32 -07:00
ehhuang
4cf1952c32
chore: update vllm k8s command to support tool calling (#2717)
# What does this PR do?


## Test Plan
2025-07-10 14:40:17 -07:00
Nathan Weinberg
5fe3027cbf
chore: remove "rfc" directory and move original rfc to "docs" (#2718)
# What does this PR do?
the "rfc" directory has only a single document in it, and its the
original RFC for creating Llama Stack

simply the project directory structure by moving this into the "docs"
directory and renaming it to "original_rfc" to preserve the context of
the doc

## Why did you do this?
A simplified top-level directory structure helps keep the project
simpler and prevents misleading new contributors into thinking we use it
(we really don't)

---------

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Co-authored-by: raghotham <raghotham@gmail.com>
2025-07-10 14:06:10 -07:00
Nathan Weinberg
9f04bc6d1a
chore: move "install.sh" script into "scripts" dir (#2719)
# What does this PR do?
"install.sh" is something that a general user might not use e.g. it is
specific to using the "ollama" inference provider

cleanup the top-level structure of the repo by moving it into the
"scripts" dir and updating the relevant references accordingly

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-10 13:14:10 -07:00
Nathan Weinberg
0bbff91c7e
docs: fix a few broken things in the CONTRIBUTING.md (#2714)
# What does this PR do?

"dev" dependencies were moved in pyproject.toml

typo with guidance around automatic doc generation

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-10 11:47:54 -07:00
Francisco Arceo
6a6b66ae4f
chore: Adding unit tests for OpenAI vector stores and migrating SQLite-vec registry to kvstore (#2665)
# What does this PR do?

This PR refactors and the VectorIO backend logic for `sqlite-vec` and
adds unit tests and fixtures to make it easy to test both `sqlite-vec`
and `milvus`.

Key changes:
- `sqlite-vec` migrated to `kvstore` registry
- added in-memory cache for sqlite-vec to be consistent with `milvus`
- default fixtures moved to `conftest.py` 
- removed redundant tests from sqlite`-vec`
- made `test_vector_io_openai_vector_stores.py` more easily extensible 


## Test Plan
Unit tests added testing inline providers.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-10 14:22:13 -04:00
Nathan Weinberg
b18f4d1ccf
ci: add config for pre-commit.ci (#2712)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test Llama Stack Build / build (push) Failing after 5s
Integration Tests / test-matrix (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 1m51s
# What does this PR do?
the project already had some config setup for https://pre-commit.ci/

this commit adds additional explicit fields

Closes #2711

**IMPORTANT:** A project maintainer must add `pre-commit.ci` to this
repo for this to work - this can be done via https://pre-commit.ci/

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-10 17:24:10 +02:00
Mustafa Elbehery
83c6b20067
chore(api): add mypy coverage to cli/stack (#2650)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-10 16:53:38 +02:00
Nathan Weinberg
bbe0199bb7
chore: update pre-commit hook versions (#2708)
While investigating the `uv.lock` changes made in
https://github.com/meta-llama/llama-stack/pull/2695 I noticed several of
the pre-commit hook versions were out of date

This PR updates them and fixes some new `ruff` errors

---------

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-10 16:47:59 +02:00
Charlie Doern
81ebaf6e9a
fix: properly represent paths in server logs (#2698)
# What does this PR do?

currently when logging the run yaml, if there are path objects in the
object they are represented as:

```
         external_providers_dir: !!python/object/apply:pathlib.PosixPath
         - '~'
         - .llama
         - providers.d
```

now, with a config.model_dump(mode="json"), it works properly

```
         external_providers_dir: ~/.llama/providers.d
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-10 10:19:12 -04:00
Sébastien Han
01c222e12f
ci: run all APIs integration tests (#2646)
# What does this PR do?

We are now automatically building the list of integration test to run.
In that process, eval and files and being tested now.

This is pending https://github.com/meta-llama/llama-stack/pull/2628

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-10 15:16:08 +02:00
ehhuang
81109a0f72
test: terminate server process when finished (#2700)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 1m31s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
# What does this PR do?
Terminate server process for real.

## Test Plan
```ENABLE_OPENAI=openai LLAMA_STACK_CONFIG=server:starter pytest -v tests/integration/agents/test_openai_responses.py --text-model "gpt-4o-mini" -vv -s -k 'test_list_response_input_items[' && lsof -ti:8321```
observe no process printed anymore
2025-07-09 20:59:37 -07:00
ehhuang
780b4c6eea
fix: llama stack run starter in conda (#2679)
# What does this PR do?
`llama stack run starter` in conda environment fails with ' --config is
required for venv and conda environments' because it is passed as
--template and start_stack.sh doesn't process template.

## Test Plan
`llama stack run starter`
2025-07-09 20:33:45 -07:00
Nathan Weinberg
7915551eee
build: replace "python-jose" with "python-jose[cryptography]" (#2695)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 42s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s
Test Llama Stack Build / build-single-provider (push) Failing after 43s
Python Package Build Test / build (3.12) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 54s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m43s
# What does this PR do?
`python-jose` recommends using the `cryptography` backend in their
installation docs:
https://github.com/mpdavis/python-jose?tab=readme-ov-file#cryptographic-backends

This PR modifies the LLS dependencies to use this instead of the current
`native-python`

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-09 13:21:57 -07:00
Matthew Farrellee
1d8c00635c
chore: Update CODEOWNERS (#2692)
add @mattf
2025-07-09 08:19:31 -07:00
Sébastien Han
9b7eecebcf
ci: test safety with starter (#2628)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 31s
Python Package Build Test / build (3.13) (push) Failing after 29s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?

We are now testing the safety capability with the starter image. This
includes a few changes:

* Enable the safety integration test
* Relax the shield model requirements from llama-guard to make it work
  with llama-guard3:8b coming from Ollama
* Expose a shield for each inference provider in the starter distro. The
  shield will only be registered if the provider is enabled.

Closes: https://github.com/meta-llama/llama-stack/issues/2528

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-09 16:53:50 +02:00
Mustafa Elbehery
de01eefdef
chore: add mypy post training (#2675)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 15:44:39 +02:00
Jorge
dafd9ed5c0
docs: Update links to Android Demo App (#2687)
# What does this PR do?
Updates some broken or outdated links pointing to the Android Demo App

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
2025-07-09 15:41:57 +02:00
Mustafa Elbehery
cd0ad21111
chore(api): add mypy coverage to apis (#2648)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack/apis`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 12:55:16 +02:00
Sébastien Han
297cd8e0db
fix: runpod transition to python 3.12 (#2682)
# What does this PR do?

I'm not sure how this was missed in the pyupgrade PR. This code seems
broken...

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-09 12:27:42 +02:00
Mustafa Elbehery
7f3661e7d8
chore: add mypy loader (#2672)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:26:33 +02:00
Mustafa Elbehery
a5c3362bcd
chore(api): add mypy coverage to meta_reference_config (#2664)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:24:30 +02:00
Mustafa Elbehery
28343fea51
chore(api): add mypy coverage to meta_reference_safety (#2661)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:22:34 +02:00
pgustafs
d39660afed
fix(remote:milvus): add missing files_api parameter and kvstore configuration (#2630)
- Fix constructor call missing files_api parameter
- Add kvstore field to MilvusVectorIOConfig
- Resolves #2626

# What does this PR do?
[https://github.com/meta-llama/llama-stack/issues/2626]
## Problem
The `MilvusVectorIOAdapter` fails to initialize due to two missing
configuration issues:
1. Missing `files_api` parameter in the constructor call
2. Missing `kvstore` field in the `MilvusVectorIOConfig` class

## Root Cause  
1. The adapter constructor expects 3 parameters `(config, inference_api,
files_api)` but the `get_adapter_impl` function only passes 2 parameters
2. The `MilvusVectorIOConfig` class lacks the `kvstore` field that the
adapter's `initialize()` method expects for metadata persistence

## Solution
- Added `files_api = deps.get(Api.files, None)` to safely retrieve files
API from dependencies
- Pass the files_api parameter to MilvusVectorIOAdapter constructor
- Added `kvstore: KVStoreConfig | None = None` field to
MilvusVectorIOConfig
- Maintains backward compatibility since both files_api and kvstore can
be None

Closes #2626

## Test Plan
- [x] Tested with Milvus configuration - server starts successfully 
```yaml
vector_io:
  - provider_id: milvus
    provider_type: remote::milvus
    config:
      uri: http://localhost:19530
      token: root:Milvus
      kvstore:
        type: sqlite
        namespace: null
        db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/remote-vllm}/milvus_store.db
```
- [x] Vector operations work as expected
```python
from llama_stack_client import LlamaStackClient
from llama_stack_client.types.shared_params.document import Document as RAGDocument
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger as AgentEventLogger
import os


endpoint =  os.getenv("LLAMA_STACK_ENDPOINT")
model =  os.getenv("INFERENCE_MODEL")

# Initialize the client
client = LlamaStackClient(base_url=endpoint)

vector_db_id = "my_documents"

response = client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
    provider_id="milvus",
)

urls = ["getting_started/Red_Hat_AI_Inference_Server-3.0-Getting_started-en-US.pdf", "vllm_server_arguments/Red_Hat_AI_Inference_Server-3.0-vLLM_server_arguments-en-US.pdf"]
documents = [
    RAGDocument(
        document_id=f"num-{i}",
        content=f"https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.0/pdf/{url}",
        mime_type="application/pdf",
        metadata={},
    )
    for i, url in enumerate(urls)
]

client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
)

rag_agent = Agent(
    client,
    model=model,
    # Define instructions for the agent (system prompt)
    instructions="You are a helpful assistant",
    enable_session_persistence=False,
    # Define tools available to the agent
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {
                "vector_db_ids": [vector_db_id],
            },
        }
    ],
)

session_id = rag_agent.create_session("test-session")

user_prompts = [
    "How to start the AI Inference Server container image? use the knowledge_search tool to get information.",
]

for prompt in user_prompts:
    print(f"User> {prompt}")
    response = rag_agent.create_turn(
        messages=[{"role": "user", "content": prompt}],
        session_id=session_id,
    )
    for log in AgentEventLogger().log(response):
        log.print()
```    

server logs:
```
INFO     2025-07-04 22:18:30,385 __main__:577 server: Listening on ['::', '0.0.0.0']:5000                                                             
INFO:     Started server process [769725]
INFO:     Waiting for application startup.
INFO     2025-07-04 22:18:30,390 __main__:158 server: Starting up                                                                                     
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
INFO     2025-07-04 22:18:52,193 llama_stack.distribution.routing_tables.common:200 core: Setting owner for vector_db 'my_documents' to               
20:18:52.194 [START] /v1/vector-dbs
INFO:     192.168.1.249:64170 - "POST /v1/vector-dbs HTTP/1.1" 200 OK
20:18:52.216 [END] /v1/vector-dbs [StatusCode.OK] (21.89ms)
20:18:52.222 [START] /v1/tool-runtime/rag-tool/insert
INFO     2025-07-04 22:18:56,265 llama_stack.providers.utils.inference.embedding_mixin:102 uncategorized: Loading sentence transformer for            
         all-MiniLM-L6-v2...                                                                                                                          
WARNING  2025-07-04 22:18:59,214 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed                           
INFO     2025-07-04 22:18:59,339 sentence_transformers.SentenceTransformer:219 uncategorized: Use pytorch device_name: cuda:0                         
INFO     2025-07-04 22:18:59,340 sentence_transformers.SentenceTransformer:227 uncategorized: Load pretrained SentenceTransformer: all-MiniLM-L6-v2   
INFO:     192.168.1.249:64170 - "POST /v1/tool-runtime/rag-tool/insert HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "POST /v1/agents HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "GET /v1/tools?toolgroup_id=builtin%3A%3Arag%2Fknowledge_search HTTP/1.1" 200 OK
INFO:     192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session HTTP/1.1" 200 OK
20:19:01.834 [END] /v1/tool-runtime/rag-tool/insert [StatusCode.OK] (9612.06ms)
20:19:01.839 [START] /v1/agents
INFO:     192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session/d2706302-bb54-421d-a890-5e25df9cb47f/turn HTTP/1.1" 200 OK
20:19:01.839 [END] /v1/agents [StatusCode.OK] (0.18ms)
20:19:01.844 [START] /v1/tools
INFO     2025-07-04 22:19:01,853 llama_stack.providers.remote.inference.vllm.vllm:330 uncategorized: Initializing vLLM client with                    
         base_url=http://192.168.1.183:8080/v1                                                                                                        
20:19:01.858 [END] /v1/tools [StatusCode.OK] (14.92ms)
20:19:01.868 [START] /v1/agents/{agent_id}/session
20:19:01.868 [END] /v1/agents/{agent_id}/session [StatusCode.OK] (0.37ms)
20:19:01.873 [START] /v1/agents/{agent_id}/session/{session_id}/turn
20:19:01.885 [START] inference
20:19:05.506 [END] inference [StatusCode.OK] (3621.19ms)
INFO     2025-07-04 22:19:05,537 llama_stack.providers.inline.agents.meta_reference.agent_instance:890 agents: executing tool call: knowledge_search  
         with args: {'query': 'How to start the AI Inference Server container image'}                                                                 
20:19:05.538 [START] tool_execution
20:19:05.928 [END] tool_execution [StatusCode.OK] (390.08ms)
 20:19:05.538 [INFO] executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'}
20:19:05.935 [START] inference
20:19:17.539 [END] inference [StatusCode.OK] (11603.76ms)
20:19:17.560 [END] /v1/agents/{agent_id}/session/{session_id}/turn [StatusCode.OK] (15686.62ms)
```
- [x] No regressions in functionality
- [x] Configuration properly accepts kvstore settings

---------

Co-authored-by: Peter Gustafsson <peter.gustafsson6@gmail.com>
Co-authored-by: raghotham <rsm@meta.com>
Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-07-09 10:08:14 +02:00
Mustafa Elbehery
2d3d9664a7
chore(api): add mypy coverage to prompts (#2657)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-09 10:07:00 +02:00
ehhuang
84fa83b788
fix: update k8s templates (#2645)
Some checks failed
Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 15s
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 17s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 41s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Python Package Build Test / build (3.13) (push) Failing after 33s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m23s
# What does this PR do?
- fix env variables
- use gpu for vllm
- add eks/apply.py for aws
- add template to set hf secret

## Test Plan
bash apply.sh

Co-authored-by: Eric Huang <erichuang@fb.com>
2025-07-08 15:57:01 -07:00
ehhuang
daf660c4ea
feat(auth,ui): support github sign-in in the UI (#2545)
# What does this PR do?
Uses NextAuth to add github sign in support.

## Test Plan
Start server with auth configured as in
https://github.com/meta-llama/llama-stack/pull/2509


https://github.com/user-attachments/assets/61ff7442-f601-4b39-8686-5d0afb3b45ac
2025-07-08 11:02:57 -07:00
ehhuang
c8bac888af
feat(auth): support github tokens (#2509)
# What does this PR do?

This PR adds GitHub OAuth authentication support to Llama Stack,
allowing users to
  authenticate using their GitHub credentials (#2508) . 

1. support verifying github acesss tokens
2. support provider-specific auth error messages
3. opportunistic reorganized the auth configs for better ergonomics

## Test Plan
Added unit tests.

Also tested e2e manually:
```
server:
  port: 8321
  auth:
    provider_config:
      type: github_token
```
```
~/projects/llama-stack/llama_stack/ui
❯ curl -v http://localhost:8321/v1/models
* Host localhost:8321 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8321...
* Connected to localhost (::1) port 8321
> GET /v1/models HTTP/1.1
> Host: localhost:8321
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 401 Unauthorized
< date: Fri, 27 Jun 2025 21:51:25 GMT
< server: uvicorn
< content-type: application/json
< x-trace-id: 5390c6c0654086c55d87c86d7cbf2f6a
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
{"error": {"message": "Authentication required. Please provide a valid GitHub access token (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) in the Authorization header (Bearer <token>)"}}
~/projects/llama-stack/llama_stack/ui
❯ ./scripts/unit-tests.sh


~/projects/llama-stack/llama_stack/ui
❯ curl "http://localhost:8321/v1/models" \
-H "Authorization: Bearer <token_obtained_from_github>" \

{"data":[{"identifier":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_resource_id":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-guard-3-8b","provider_resource_id":"accounts/fireworks/models/llama-guard-3-8b","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-v3p1-405b-instruct","provider_resource_id":"accounts/f
```

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-08 11:02:36 -07:00
Francisco Arceo
83c89265e0
chore: Adding unit tests for Milvus and OpenAI compatibility (#2640)
Some checks failed
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 36s
Test Llama Stack Build / build-single-provider (push) Failing after 36s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m35s
# What does this PR do?
- Enabling Unit tests for Milvus to start to test OpenAI compatibility
and fixing a few bugs.
- Also fixed an inconsistency in the Milvus config between remote and
inline.
- Added pymilvus to extras for testing in CI

I'm going to refactor this later to include the other inline providers
so that we can catch issues sooner.

I have another PR where I've been testing to find other bugs in the
implementation (and required changes drafted here:
https://github.com/meta-llama/llama-stack/pull/2617).

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-08 00:50:16 -07:00
Charlie Doern
27b3cd570f
fix: use --template flag for server (#2643)
# What does this PR do?

currently when a template is used, we still use `--config`.

`server.py` has a dedicated `--template` flag and logic, use that
instead

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-08 00:48:50 -07:00
ehhuang
e9926564bd
fix: authorized sql store with postgres (#2641)
Some checks failed
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 14s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Test Llama Stack Build / build-single-provider (push) Failing after 44s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s
Pre-commit / pre-commit (push) Successful in 1m34s
# What does this PR do?
postgres has different json extract syntax from sqlite

## Test Plan
added integration test
2025-07-07 19:36:34 -07:00
Ben Browning
5bb3817c49
fix: Restore the nvidia distro (#2639)
# What does this PR do?

The `nvidia` distro was previously collapsed into the `starter` distro.
However, the `nvidia` distro was setup specifically to use NVIDIA NeMo
microservices as providers for all APIs and not just inference, which
means it was doing quite a bit more than what the `starter` distro
covers today.

We should work with our friends at NVIDIA to determine the best place to
maintain this distro long-term, but for now this restores the `nvidia`
distro and its docs back to where they were so that things continue to
work for their users.

## Test Plan

I ensure the `nvidia` distro could build, and run at least to the point
of complaining that I didn't provide the necessary API keys.

```
uv run llama stack build --template nvidia --image-type venv
uv run llama stack run llama_stack/templates/nvidia/run.yaml
```

I also made sure the docs website built and looks reasonable, with the
`nvidia` distro docs at the same URL it was previously (because it has
incoming links from official NVIDIA NeMo docs, among other places).

```
uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-07-07 15:50:05 -07:00
Charlie Doern
d0ec5c3d3a
fix: print proper template path upon build (#2642)
# What does this PR do?

Rather than pointing to a dir in `llama_stack/templates` (the repo
directory)

we should point to `$BUILD_DIR/IMAGE_NAME-run.yaml`
(`~/.llama/distributions/IMAGE_NAME/IMAGE_NAME-run.yaml`)

currently we are printing:

```
You can find the newly-built template here: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/templates/starter/run.yaml
You can run the new Llama Stack distro via: llama stack run /Users/charliedoern/projects/Documents/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv
```

but should be printing things like:

```
You can find the newly-built template here: /Users/charliedoern/.llama/distributions/starter/starter-run.yaml
You can run the new Llama Stack distro via: llama stack run /Users/charliedoern/.llama/distributions/starter/starter-run.yaml --image-type venv
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-07-07 15:39:39 -07:00
Sébastien Han
5561f1c36d
ci: error when a pipefails (#2635)
Some checks failed
Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 30s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 26s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 24s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m5s
Pre-commit / pre-commit (push) Successful in 1m53s
# What does this PR do?

The CI was failing but the error was eaten by the pipe. Now we run the
task with pipefail.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-07 16:47:30 +02:00
Wen Zhou
4bca4af3e4
refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627)
Some checks failed
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 16s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 24s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 33s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 30s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 41s
Unit Tests / unit-tests (3.12) (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 1m30s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- we are using `all-minilm:l6-v2` but the model we download from ollama
is `all-minilm:latest`
  latest: https://ollama.com/library/all-minilm:latest 1b226e2802db
  l6-v2: https://ollama.com/library/all-minilm:l6-v2 pin 1b226e2802db
- even currently they are exactly the same model but if
[all-minilm:l12-v2](https://ollama.com/library/all-minilm:l12-v2) is
updated, "latest" might not be the same for l6-v2.
- the only change in this PR is pin the model id in ollama
- also update detailed_tutorial with "starter" to replace deprecated
"ollama".

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
>INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
>llama stack build --run --template ollama --image-type venv
...
Build Successful!
You can find the newly-built template here: /home/wenzhou/zdtsw-forking/lls/llama-stack/llama_stack/templates/ollama/run.yaml
....
 - metadata:                                                                                                                                  
     embedding_dimension: 384                                                                                                                 
   model_id: all-MiniLM-L6-v2                                                                                                                 
   model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType                                                                 
   - embedding                                                                                                                                
   provider_id: ollama                                                                                                                        
   provider_model_id: all-minilm:l6-v2  
   ...
```
test
```
>llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
           INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK"
OpenAIChatCompletion(
    id='chatcmpl-04f99071-3da2-44ba-a19f-03b5b7fc70b7',
    choices=[
        OpenAIChatCompletionChoice(
            finish_reason='stop',
            index=0,
            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
                role='assistant',
                content="Here is a 2-sentence poem about the moon:\n\nSilver crescent in the midnight sky,\nLuna's gentle face, a beauty to the eye.",
                name=None,
                tool_calls=None,
                refusal=None,
                annotations=None,
                audio=None,
                function_call=None
            ),
            logprobs=None
        )
    ],
    created=1751644429,
    model='llama3.2:3b-instruct-fp16',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='fp_ollama',
    usage={'completion_tokens': 33, 'prompt_tokens': 36, 'total_tokens': 69, 'completion_tokens_details': None, 'prompt_tokens_details': None}
)
```

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-07-06 09:07:37 +05:30
dependabot[bot]
2faec38724
chore(deps): bump next from 15.3.2 to 15.3.3 in /llama_stack/ui (#2632)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 23s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 25s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 33s
Python Package Build Test / build (3.13) (push) Failing after 31s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m23s
Bumps [next](https://github.com/vercel/next.js) from 15.3.2 to 15.3.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.3.3</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Reinstate <code>vary</code> (<a
href="https://redirect.github.com/vercel/next.js/issues/79939">#79939</a>)</li>
<li>fix(next-swc): Fix interestingness detection for React Compiler (<a
href="https://redirect.github.com/vercel/next.js/issues/79558">#79558</a>)</li>
<li>fix(next-swc): Fix react compiler usefulness detector (<a
href="https://redirect.github.com/vercel/next.js/issues/79480">#79480</a>)</li>
<li>fix(dev-overlay): Better handle edge-case file paths in launchEditor
(<a
href="https://redirect.github.com/vercel/next.js/issues/79526">#79526</a>)</li>
<li>Client router should discard stale prefetch entries for static pages
(<a
href="https://redirect.github.com/vercel/next.js/issues/79362">#79362</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/gaojude"><code>@​gaojude</code></a>, <a
href="https://github.com/kdy1"><code>@​kdy1</code></a>, <a
href="https://github.com/bgw"><code>@​bgw</code></a>, and <a
href="https://github.com/unstubbable"><code>@​unstubbable</code></a> for
helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3ab8db7383"><code>3ab8db7</code></a>
v15.3.3</li>
<li><a
href="18c8113ebd"><code>18c8113</code></a>
[backport] Reinstate <code>vary</code> (<a
href="https://redirect.github.com/vercel/next.js/issues/79939">#79939</a>)</li>
<li><a
href="e18212f546"><code>e18212f</code></a>
re-enable vary header deploy test (<a
href="https://redirect.github.com/vercel/next.js/issues/79753">#79753</a>)</li>
<li><a
href="ec202eccf0"><code>ec202ec</code></a>
Revert &quot;[next-server] skip setting vary header for basic
routes&quot; (<a
href="https://redirect.github.com/vercel/next.js/issues/79426">#79426</a>)</li>
<li><a
href="e2f264fdce"><code>e2f264f</code></a>
fix(next-swc): Fix interestingness detection for React Compiler (15.3)
(<a
href="https://redirect.github.com/vercel/next.js/issues/79558">#79558</a>)</li>
<li><a
href="562fac78da"><code>562fac7</code></a>
fix(next-swc): Fix react compiler usefulness detector (15.3) (<a
href="https://redirect.github.com/vercel/next.js/issues/79480">#79480</a>)</li>
<li><a
href="06097fd7bb"><code>06097fd</code></a>
fix(dev-overlay): Better handle edge-case file paths in launchEditor (<a
href="https://redirect.github.com/vercel/next.js/issues/79526">#79526</a>)</li>
<li><a
href="bda731fa96"><code>bda731f</code></a>
Client router should discard stale prefetch entries for static pages (<a
href="https://redirect.github.com/vercel/next.js/issues/79362">#79362</a>)</li>
<li>See full diff in <a
href="https://github.com/vercel/next.js/compare/v15.3.2...v15.3.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.3.2&new-version=15.3.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/meta-llama/llama-stack/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-05 00:13:33 -04:00
Wen Zhou
c025cab3a3
docs: update docs to use "starter" than "ollama" (#2629) 2025-07-05 08:44:57 +05:30
Francisco Arceo
dc7df60d42
docs: Update starter docs to include milvus inline (#2631) 2025-07-05 08:43:39 +05:30
Sébastien Han
ea966565f6
feat: improve telemetry (#2590)
Some checks failed
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 19s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 16s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 0s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s
Python Package Build Test / build (3.12) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m40s
# What does this PR do?

* Use a single env variable to setup OTEL endpoint
* Update telemetry provider doc
* Update general telemetry doc with the metric with generate
* Left a script to setup telemetry for testing

Closes: https://github.com/meta-llama/llama-stack/issues/783

Note to reviewer: the `setup_telemetry.sh` script was useful for me, it
was nicely generated by AI, if we don't want it in the repo, and I can
delete it, and I would understand.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 17:29:09 +02:00
Derek Higgins
4eae0cbfa4
fix(starter): Add missing faiss provider to build.yaml vector_io section (#2625)
The starter template build.yaml was missing the inline::faiss provider
in the vector_io section, while it was properly configured in run.yaml
and starter.py's vector_io_providers list.

Fixes: #2624

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 17:28:57 +02:00
Sébastien Han
df6ce8befa
fix: only load mcp when enabled in tool_group (#2621)
# What does this PR do?

The agent code is currently importing MCP modules even when MCP isn’t
enabled. Do we consider this worth fixing, or are we treating MCP as a
first-class dependency? I believe we should treat it as such.

If everyone agrees, let’s go ahead and close this.

Note: The current setup breaks if someone builds a distro without
including MCP in tool_group but still serves the agent API.

Also, we should bump the MCP version to support streamable responses, as
SSE is being deprecated.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 20:27:05 +05:30
Sébastien Han
c4349f532b
feat: consolidate most distros into "starter" (#2516)
# What does this PR do?

* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
  since inference providers are disabled by default and can be turned on
  manually via env variable.
* Disables safety in starter distro

Closes: https://github.com/meta-llama/llama-stack/issues/2502.

~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~

TODO:

- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-04 15:58:03 +02:00
Derek Higgins
f77d4d91f5
fix: handle encoding errors when adding files to vector store (#2574)
Some checks failed
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s
Pre-commit / pre-commit (push) Successful in 1m35s
- Add try-catch block around data.decode() to handle UnicodeDecodeError
- Implement UTF-8 fallback when detected encoding fails
- Return empty string when both encodings fail
- add unit tests

Fixes #2572: UnicodeDecodeError when uploading files with problematic
encodings

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 12:10:18 +02:00
Ashwin Bharambe
f1c62e0af0 build: Bump version to 0.2.14 2025-07-04 12:12:12 +05:30
Matthew Farrellee
ef26259209
feat: add llama guard 4 model (#2579)
add support for Llama Guard 4 model to the llama_guard safety provider

test with -

0. NVIDIA_API_KEY=... llama stack build --image-type conda --image-name
env-nvidia --providers
inference=remote::nvidia,safety=inline::llama-guard --run
1. llama-stack-client models register meta-llama/Llama-Guard-4-12B
--provider-model-id meta/llama-guard-4-12b
2. pytest tests/integration/safety/test_llama_guard.py

Co-authored-by: raghotham <rsm@meta.com>
2025-07-03 22:29:04 -07:00
Derek Higgins
0422b4fc63
fix: CI flakiness in vector IO tests by pinning pymilvus>=2.4.10 (#2610)
Some checks failed
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 1m15s
Python Package Build Test / build (3.12) (push) Failing after 1m12s
Python Package Build Test / build (3.13) (push) Failing after 1m10s
Test External Providers / test-external-providers (venv) (push) Failing after 1m27s
Unit Tests / unit-tests (3.12) (push) Failing after 35s
Unit Tests / unit-tests (3.13) (push) Failing after 34s
Pre-commit / pre-commit (push) Successful in 2m47s
This occurred when marshmallow 4.0.0 was installed (which removed
__version_info__)

By pinning pymilvus to >=2.4.10, we ensure marshmallow doesn't get
installed.

Also set the dependency in InlineProviderSpec as this is the one that
takes effect
when using the "inline::milvus" provider.

Fixes https://github.com/meta-llama/llama-stack/issues/2588

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 10:27:23 +05:30
Francisco Arceo
ea80ea63ac
chore: Updating chunk id generation to ensure uniqueness (#2618)
# What does this PR do?
This handles an edge case for `generate_chunk_id` if the concatenation
of the `document_id` and `chunk_text` combination are not unique. Adding
the window location ensures uniqueness.

## Test Plan
Added unit test

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-04 10:26:35 +05:30
Francisco Arceo
4afd619c56
chore: Add support for vector-stores files api for Milvus (#2582)
Some checks failed
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 18s
Test Llama Stack Build / generate-matrix (push) Successful in 20s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 51s
Test Llama Stack Build / build-single-provider (push) Failing after 55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 54s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do?
### Summary

This pull request implements support for the OpenAI Vector Store Files
API for the Milvus vector store provider in `llama_stack`. It enables
storing, loading, updating, and deleting file metadata and file contents
in Milvus collections, allowing OpenAI vector store files to be managed
directly within Milvus.

### Main Changes

- **Milvus Vector Store Files API Implementation**
- Implements all required methods for storing, loading, updating, and
deleting vector store file metadata and contents
(`_save_openai_vector_store_file`, `_load_openai_vector_store_file`,
`_load_openai_vector_store_file_contents`,
`_update_openai_vector_store_file`,
`_delete_openai_vector_store_file_from_storage`).
- Uses two Milvus collections: `openai_vector_store_files` (for
metadata) and `openai_vector_store_files_contents` (for chunked file
contents).
- Collections are created dynamically if they do not exist, with
appropriate schema definitions.
- **Collection Name Sanitization**
- Adds a `sanitize_collection_name` utility to ensure Milvus collection
names only contain valid characters (letters, numbers, underscores).
- **Testing**
- Updates test skip logic to include `"inline::milvus"` for cases where
the OpenAI Vector Store Files API is not supported, improving
integration test accuracy.
- **Other Improvements**
  - Passes `kvstore` to `MilvusIndex` for consistency.
- Removes obsolete NotImplementedErrors and legacy code for file
storage.

## Test Plan
CI and tested via a test script

## Notes
- `VectorDB` currently uses the `name` as the `identifier` in
`openai_create_vector_store`. We need to add `name` as a field to
`VectorDB` and generate the `identifier` upon creation. OpenAI is not
idempotent with respect to the `name` field that they pass (i.e., you
can pass the same name multiple times and OpenAI will generate a new
identifier). I'll add a follow up PR for this.
- The `Files` api needs to use `files-` as a prefix in the identifier. I
have updated the Vector Store to use the OpenAI prefix `vs_*`.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-03 12:15:33 -07:00
Sébastien Han
dae1fcd3c2
ci: let pytest run the distro server (#2586)
# What does this PR do?

* Use #2580 functionality to auto-start the server with the tests
* Reduce timeout to 30sec
* Print server logs on errors
* Pytest logs are collected to a file pytest.log

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-03 10:51:46 -07:00
Akram Ben Aissi
f4950f4ef0
fix: AccessDeniedError leads to HTTP 500 instead of error 403 (#2595)
Resolves access control error visibility issues where 500 errors were
returned instead of proper 403 responses with actionable error messages.

• Enhance AccessDeniedError with detailed context and improve exception
handling
• Enhanced AccessDeniedError class to include user, action, and resource
context
  - Added constructor parameters for action, resource, and user
- Generate detailed error messages showing user principal, attributes,
and attempted resource
- Backward compatible with existing usage (falls back to generic
message)

• Updated exception handling in server.py
  - Import AccessDeniedError from access_control module
  - Return proper 403 status codes with detailed error messages
- Separate handling for PermissionError (generic) vs AccessDeniedError
(detailed)

• Enhanced error context at raise sites
- Updated routing_tables/common.py to pass action, resource, and user
context
- Updated agents persistence to include context in access denied errors
  - Provides better debugging information for access control issues

• Added comprehensive unit tests
  - Created tests/unit/server/test_server.py with 13 test cases
  - Covers AccessDeniedError with and without context
- Tests all exception types (ValidationError, BadRequestError,
AuthenticationRequiredError, etc.)
  - Validates proper HTTP status codes and error message formats


# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

```
server:
  port: 8321
    access_policy:
    - permit:
        principal: admin
        actions: [create, read, delete]
        when: user with admin in groups
    - permit:
        actions: [read]
        when: user with system:authenticated in roles
```
then:

```
curl --request POST --url http://localhost:8321/v1/vector-dbs \
  --header "Authorization: Bearer your-bearer" \
  --data '{
    "vector_db_id": "my_demo_vector_db",
    "embedding_model": "ibm-granite/granite-embedding-125m-english",
    "embedding_dimension": 768,
    "provider_id": "milvus"
  }'
 
```

depending if user is in group admin or not, you should get the
`AccessDeniedError`. Before this PR, this was leading to an error 500
and `Traceback` displayed in the logs.
After the PR, logs display a simpler error (unless DEBUG logging is set)
and a 403 Forbidden error is returned on the HTTP side.

---------

Signed-off-by: Akram Ben Aissi <<akram.benaissi@gmail.com>>
2025-07-03 10:50:49 -07:00
ehhuang
3c43a2f529
fix: store configs (#2593)
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo,
as the config expected a str but the value was converted to int.

This PR:
1. Updates the type of port in sqlstore to be int
2. template generation uses `dict` instead of `StackRunConfig` so as to
avoid failing pydantic typechecks.
3. Adds `replace_env_vars` to StackRunConfig instantiation in
`configure.py` (not sure why this wasn't needed before).

## Test Plan
`llama stack build --template postgres_demo --image-type conda --run`
2025-07-03 10:07:23 -07:00
Sébastien Han
aa273944fd
fix: add mcp dependency to agent provider (#2587)
# What does this PR do?

The agent depends on utils.tools.mcp.

Closes: https://github.com/meta-llama/llama-stack/issues/2576

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-03 14:59:01 +02:00
Christian Zaccaria
b246b0660e
docs: Add quick_start.ipynb notebook equivalent of index.md Quickstart guide (#2128)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 52s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 54s
Unit Tests / unit-tests (3.13) (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 1m51s
# What does this PR do?
- Adding a notebook equivalent of the
[getting_started/index.md#Quickstart
guide](https://github.com/meta-llama/llama-stack/blob/main/docs/source/getting_started/index.md).


## To discuss

**Note:** works locally, but I am encountering issues when attempting to
run through the notebook on Google Colab. Specifically, on the last step
to run the demo, the `knowledge_search` tool doesn't seem to be called
i.e.,:
```
rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html
prompt> How do you do great work?
inference> I don't have personal experiences or emotions, but I was trained on a large corpus of text data and use various techniques such as natural language processing (NLP) and machine learning algorithms to generate human-like responses.

```


I would expect to get something like:
```
rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html
prompt> How do you do great work?
inference> [knowledge_search(query="What is the key to doing great work")]
tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}
tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:
....
....
```
2025-07-03 13:55:43 +02:00
Sumanth Kamenani
577ec382e1
fix(docs): update Agents101 notebook for builtin websearch (#2591)
- Switch from BRAVE_SEARCH_API_KEY to TAVILY_SEARCH_API_KEY
- Add provider_data to LlamaStackClient for API key passing
- Use builtin::websearch toolgroup instead of manual tool config
- Fix message types to use UserMessage instead of plain dict
- Add streaming support with proper type casting
- Remove async from EventLogger loop (bug fix)

Fixes websearch functionality in agents tutorial by properly configuring
Tavily search provider integration.
# What does this PR do?

Fixes the Agents101 tutorial notebook to work with the current Llama
Stack websearch implementation. The tutorial was using outdated Brave
Search configuration that no longer works with the current server setup.

**Key Changes:**
- **Switch API provider**: Change from `BRAVE_SEARCH_API_KEY` to
`TAVILY_SEARCH_API_KEY` to match server configuration
- **Fix client setup**: Add `provider_data` to `LlamaStackClient` to
properly pass API keys to server
- **Modernize tool usage**: Replace manual tool configuration with
`tools=["builtin::websearch"]`
- **Fix type safety**: Use `UserMessage` type instead of plain
dictionaries for messages
- **Fix streaming**: Add proper streaming support with `stream=True` and
type casting
- **Fix EventLogger**: Remove incorrect `async for` usage (should be
`for`)

**Why needed:** Users following the tutorial were getting 401
Unauthorized errors because the notebook wasn't properly configured for
the Tavily search provider that the server actually uses.

## Test Plan

**Prerequisites:**
1. Start Llama Stack server with Ollama template and
`TAVILY_SEARCH_API_KEY` environment variable
2. Set `TAVILY_SEARCH_API_KEY` in your `.env` file

**Testing Steps:**
1. **Clone and setup:**
   ```bash
   git checkout fix-2558-update-agents101
   cd docs/zero_to_hero_guide/
   ```

2. **Start server with API key:**
   ```bash
   export TAVILY_SEARCH_API_KEY="your_tavily_api_key"
   podman run -it --network=host -v ~/.llama:/root/.llama:Z \
     --env INFERENCE_MODEL=$INFERENCE_MODEL \
     --env OLLAMA_URL=http://localhost:11434 \
     --env TAVILY_SEARCH_API_KEY=$TAVILY_SEARCH_API_KEY \
     llamastack/distribution-ollama --port $LLAMA_STACK_PORT
   ```

3. **Run the notebook:**
   - Open `07_Agents101.ipynb` in Jupyter
   - Execute all cells in order
- Cell 5 should run without errors and show successful web search
results

**Expected Results:**
-  No 401 Unauthorized errors
-  Agent successfully calls `brave_search.call()` with web results
-  Switzerland travel recommendations appear in output
-  Follow-up questions work correctly

**Before this fix:** Users got `401 Unauthorized` errors and tutorial
failed
**After this fix:** Tutorial works end-to-end with proper web search
functionality

**Tested with:**
- Tavily API key (free tier)
- Ollama distribution template  
- Llama-3.2-3B-Instruct model
2025-07-03 11:14:51 +02:00
Wen Zhou
040424acf5
docs: update full list of providers with matched APIs and dockerhub images (#2452)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- add model_type in example
- change "Memory" to "VectorIO" as column name
- update index.md and README.md

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
run pre-commit to catch changes.

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-07-03 10:12:56 +02:00
Nate Harada
5b07755556
docs: Minor spelling fix (#2592)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 33s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 33s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 33s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 30s
Python Package Build Test / build (3.12) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 54s
Python Package Build Test / build (3.13) (push) Failing after 42s
Test External Providers / test-external-providers (venv) (push) Failing after 27s
Unit Tests / unit-tests (3.13) (push) Failing after 36s
Unit Tests / unit-tests (3.12) (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 2m3s
# What does this PR do?
Minor spelling fix in the comments

## Test Plan
No code changes
2025-07-02 20:26:51 -04:00
Jorge
4d0d2d685f
fix: Set parameter usedforsecurity=False when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters (#2577)
Some checks failed
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 26s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 31s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 34s
Python Package Build Test / build (3.13) (push) Failing after 33s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?
Set parameter `usedforsecurity=False` when calling hashlib.md5 in order
to fix rag_tool.insert on FIPS clusters

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2571

---------

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
2025-07-02 12:07:05 +02:00
ehhuang
fc735a414e
test: Add one-step integration testing with server auto-start (#2580)
Some checks failed
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 1m3s
Python Package Build Test / build (3.13) (push) Failing after 1m3s
Test External Providers / test-external-providers (venv) (push) Failing after 1m7s
Unit Tests / unit-tests (3.12) (push) Failing after 1m15s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 2m42s
## Summary

Add support for `server:<config>` format in `--stack-config` option to
enable seamless one-step integration testing. This eliminates the need
to manually start servers in separate terminals before running tests.

## Key Features

- **Auto-start server**: Automatically launches `llama stack run
<config>` if target port is available
- **Smart reuse**: Reuses existing server if port is already occupied  
- **Health check polling**: Waits up to 2 minutes for server readiness
via `/v1/health` endpoint
- **Custom port support**: Use `server:<config>:<port>` for non-default
ports
- **Clean output**: Server runs quietly in background without cluttering
test output
- **Backward compatibility**: All existing `--stack-config` formats
continue to work

## Usage Examples

```bash
# Auto-start server with default port 8321
pytest tests/integration/inference/ --stack-config=server:fireworks

# Use custom port
pytest tests/integration/safety/ --stack-config=server:together:8322

# Run multiple test suites seamlessly  
pytest tests/integration/inference/ tests/integration/agents/ --stack-config=server:starter
```

## Implementation Details

- Enhanced `llama_stack_client` fixture with server management
- Updated documentation with cleaner organization and comprehensive
examples
- Added utility functions for port checking, server startup, and health
verification

## Test Plan

- Verified server auto-start when port 8321 is available
- Verified server reuse when port 8321 is occupied
- Tested health check polling via `/v1/health` endpoint
- Confirmed custom port configuration works correctly
- Verified backward compatibility with existing config formats

## Before/After Comparison

**Before (2 steps):**
```bash
# Terminal 1: Start server manually
llama stack run fireworks --port 8321

# Terminal 2: Wait for startup, then run tests  
pytest tests/integration/inference/ --stack-config=http://localhost:8321
```

**After (1 step):**
```bash
# Single command handles everything
pytest tests/integration/inference/ --stack-config=server:fireworks  
```
2025-07-01 14:48:46 -07:00
Wen Zhou
958600a5c1
fix: update zero_to_hero package and README (#2578)
Some checks failed
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 36s
Python Package Build Test / build (3.12) (push) Failing after 33s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test External Providers / test-external-providers (venv) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- update REAMDE.md format and python version
- update package name: CustomTool was renamed to ClientTool in
https://github.com/meta-llama/llama-stack-client-python/pull/73


<!-- If resolving an issue, uncomment and update the line below -->
Closes #2556 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-07-01 11:08:55 -07:00
Nathan Weinberg
d165000bbc
docs: specify the ability to train non-Llama models (#2573)
# What does this PR do?
Clarifies that non-Llama models can be trained via the Post Training API

## Test Plan
Build docs locally

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-01 19:29:06 +05:30
Sébastien Han
25268854bc
fix: allow default empty vars for conditionals (#2570)
# What does this PR do?

We were not using conditionals correctly, conditionals can only be used
when the env variable is set, so `${env.ENVIRONMENT:+}` would return
None is ENVIRONMENT is not set.

If you want to create a conditional value, you need to do
`${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set,
otherwise will return None.

Closes: https://github.com/meta-llama/llama-stack/issues/2564

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-07-01 14:42:05 +02:00
Nathan Weinberg
faaeccc6fd
docs: update external provider guide and navigation (#2567)
Some checks failed
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 25s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 31s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 28s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 14s
Python Package Build Test / build (3.12) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 1m23s
# What does this PR do?
The external providers guide can now be accessed directly from the
sidebar

## Test Plan
Build locally to test the changes

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-01 09:42:32 +02:00
Francisco Arceo
0066135944
chore: Enabling VectorIO Integration tests for Milvus (#2546)
Some checks failed
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test Llama Stack Build / build-single-provider (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 35s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 41s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 2m3s
2025-06-30 19:49:59 -07:00
Francisco Arceo
5785ccda35
fix: Fixing Milvus sample config and updating documentation (#2568) 2025-06-30 19:25:23 -07:00
Matthew Farrellee
f6d91f45ba
fix: update zero-to-hero guide for modern llama stack (#2555)
# What does this PR do?

closes #2553 

## Test Plan

run through notebooks w/ llama stack running on localhost:{8321,8322}
2025-06-30 18:09:33 -07:00
Matthew Farrellee
13aa367c8a
fix: default api_key from env must be a SecretStr (#2565)
# What does this PR do?

fixes the api_key type when read from env

## Test Plan

run nvidia template w/o api_key in run.yaml and perform inference

before change the inference will fail w/ -

```
  File ".../llama-stack/llama_stack/providers/remote/inference/nvidia/nvidia.py", line 118, in _get_client_for_base_url
    api_key=(self._config.api_key.get_secret_value() if self._config.api_key else "NO KEY"),
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get_secret_value'
```
2025-06-30 18:08:44 -07:00
Nathan Weinberg
ba9acce93b
docs: fixed incorrect API list item (#2566)
Current text did not match section in example Ollama distro:
https://llama-stack.readthedocs.io/en/latest/distributions/configuration.html

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-06-30 18:08:19 -07:00
Ashwin Bharambe
b333a3c03a
fix(ollama): Download remote image URLs for Ollama (#2551)
Some checks failed
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s
Python Package Build Test / build (3.12) (push) Failing after 43s
Test External Providers / test-external-providers (venv) (push) Failing after 40s
Python Package Build Test / build (3.13) (push) Failing after 42s
Unit Tests / unit-tests (3.13) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
Update ReadTheDocs / update-readthedocs (push) Failing after 20s
Pre-commit / pre-commit (push) Successful in 2m13s
## What does this PR do?

Ollama does not support remote images. Only local file paths OR base64
inputs are supported. This PR ensures that the Stack downloads remote
images and passes the base64 down to the inference engine.

## Test Plan

Added a test cases for Responses and ran it for both `fireworks` and
`ollama` providers.
2025-06-30 20:36:11 +05:30
Sébastien Han
c9a49a80e8
docs: auto generated documentation for providers (#2543)
# What does this PR do?

Simple approach to get some provider pages in the docs.

Add or update description fields in the provider configuration class
using Pydantic’s Field, ensuring these descriptions are clear and
complete, as they will be used to auto-generate provider documentation
via ./scripts/distro_codegen.py instead of editing the docs manually.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-30 15:13:20 +02:00
Sébastien Han
8d8e90d78e
fix: add missing argument and methods (#2550)
# What does this PR do?

Resolves:

```
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

llama_stack/providers/utils/responses/responses_store.py:119: error: Missing positional argument "policy" in call to "fetch_one" of "AuthorizedSqlStore"  [call-arg]
llama_stack/providers/utils/responses/responses_store.py:122: error: "AuthorizedSqlStore" has no attribute "delete"  [attr-defined]
Found 2 errors in 1 file (checked 403 source files)
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-30 14:55:37 +02:00
Krzysztof Malczuk
be9bf68246
feat: Add webmethod for deleting openai responses (#2160)
Some checks failed
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 37s
Python Package Build Test / build (3.13) (push) Failing after 33s
Python Package Build Test / build (3.12) (push) Failing after 36s
Pre-commit / pre-commit (push) Failing after 1m19s
# What does this PR do?
This PR creates a webmethod for deleting open AI responses, adds and
implementation for it and makes an integration test for the OpenAI
delete response method.

[//]: # (If resolving an issue, uncomment and update the line below)
# (Closes #2077)

## Test Plan
Ran the standard tests and the pre-commit hooks and the unit tests.

# (## Documentation)
For this pr I made the routes and implementation based on the current
get and create methods. The unit tests were not able to handle this test
due to the mock interface in use, which did not allow for effective CRUD
to be tested. I instead created an integration test to match the
existing ones in the test_openai_responses.
2025-06-30 11:28:02 +02:00
Wen Zhou
6fa5271807
docs: update document since container is not an option for "llama stack run" + update docs with current "usage" (#2531)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- change from https://github.com/meta-llama/llama-stack/issues/2110 need
update documentation. "container" is not valid value for --image-type
- chore: updates from standard output

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-06-30 11:02:07 +05:30
dependabot[bot]
dc1b4a84c3
chore(github-deps): bump astral-sh/setup-uv from 6.3.0 to 6.3.1 (#2548)
Some checks failed
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 28s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 18s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 32s
Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 31s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 42s
Python Package Build Test / build (3.12) (push) Failing after 40s
Python Package Build Test / build (3.13) (push) Failing after 38s
Test External Providers / test-external-providers (venv) (push) Failing after 39s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 2m18s
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.3.0 to 6.3.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.3.1 🌈 Do not warn when version not in manifest-file</h2>
<h2>Changes</h2>
<p>This is a hotfix to change the warning messages that a version could
not be found in the local manifest-file to info level.</p>
<p>A <code>setup-uv</code> release contains a version-manifest.json file
with infos in all available <code>uv</code> releases. When a new
<code>uv</code> version is released this is not contained in this file
until the file gets updated and a new <code>setup-uv</code> release is
made.
We will overhaul this process in the future but for now the spamming of
warnings is removed.</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Do not warn when version not in manifest-file <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/462">#462</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known versions for 0.7.14 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/459">#459</a>)</li>
<li>Revert &quot;Set expected cache dir drive to C: on windows (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)&quot;
<a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/460">#460</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="bd01e18f51"><code>bd01e18</code></a>
Do not warn when version not in manifest-file (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/462">#462</a>)</li>
<li><a
href="c6a5ebaafe"><code>c6a5eba</code></a>
chore: update known versions for 0.7.14 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/459">#459</a>)</li>
<li><a
href="790df8f465"><code>790df8f</code></a>
Revert &quot;Set expected cache dir drive to C: on windows (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)&quot;
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/460">#460</a>)</li>
<li>See full diff in <a
href="445689ea25...bd01e18f51">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.3.0&new-version=6.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-29 13:55:32 -04:00
Ashwin Bharambe
21669b14e7
fix(docs): add setuptools explicitly (#2547)
Some checks failed
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 31s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 35s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Test Llama Stack Build / build (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 30s
Pre-commit / pre-commit (push) Successful in 1m23s
Given the shift to python3.12, we need to explicitly depend on
`setuptools` for the pkg_resources import

## Test Plan

Run 
```
cd local/llama-stack
UV_PROJECT_ENVIRONMENT=/tmp/docs uv sync --frozen --group docs

cd /tmp/docs
uv run python -m sphinx -T -b html -d _build/doctrees -D language=en \
   ~/local/llama-stack/docs/source/ \
  /tmp/docs/html
```
2025-06-28 08:14:25 +05:30
github-actions[bot]
709eb7da33 build: Bump version to 0.2.13 2025-06-27 23:56:14 +00:00
Francisco Arceo
cc19b56c87
chore: OpenAI compatibility for Milvus (#2470)
# What does this PR do?
Closes https://github.com/meta-llama/llama-stack/issues/2461



## Test Plan
Tested with the `ollama` distriubtion template and updated the vector_io
provider to:
```yaml
vector_io:
- provider_id: milvus
  provider_type: inline::milvus
  config:
    db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/milvus_store.db
    kvstore:
      type: sqlite
      db_name: milvus_registry.db
```

Ran the stack
```bash
llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434"
```

Ran the tests:
```
pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py  --embedding-model all-MiniLM-L6-v2
```
Output passed.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-27 16:00:36 -07:00
Charlie Doern
65b4fae51d
fix: proper checkpointing logic for HF trainer (#2429)
# What does this PR do?

currently only the last saved model is reported as a checkpoint and
associated with the job UUID. since the HF trainer handles checkpoint
collection during training, we need to add all of the `checkpoint-*`
folders as Checkpoint objects. Adjust the save strategy to be per-epoch
to make this easier and to use less storage

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-27 17:36:25 -04:00
Ramakrishna Reddy Yekulla
03e61e3fcc
fix: ValueError in faiss vector database serialization (resolves #2519) (#2526)
Some checks failed
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 22s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 15s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Pre-commit / pre-commit (push) Successful in 1m12s
The error message was misleading as it appeared to be an Ollama
connectivity issue, but actually occurred during faiss vector database
initialization.

## 🔍 Root Cause Analysis

The issue was in the faiss vector database serialization logic in
`llama_stack/providers/inline/vector_io/faiss/faiss.py`:

1. **Saving**: `faiss.serialize_index()` returns binary data (uint8
numpy array)
2. **Bug**: Code incorrectly used `np.savetxt()` which converts binary
to text with scientific notation (e.g., `7.300000000000000000e+01`)
3. **Loading**: `np.loadtxt(buffer, dtype=np.uint8)` failed to parse
scientific notation back to uint8
4. **Result**: Server crashed during initialization before reaching
Ollama connectivity check

##  Solution

Replaced text-based serialization with proper binary serialization:
```

**After (fixed):**
```python
# Saving - proper binary format
np.save(buffer, np_index, allow_pickle=False)  

# Loading - proper binary format
self.index = faiss.deserialize_index(np.load(buffer,
allow_pickle=False))
```

## 🧪 Testing

-  Binary serialization/deserialization works correctly
-  Backward compatible with existing functionality
-  No security concerns (allow_pickle=False maintained)
-  Resolves the specific ValueError mentioned in the issue

## 📊 Impact

This fix resolves:
- ValueError during server startup with Ollama templates

## 🔗 Related Issues

- Closes #2519 
- Affects all users of Ollama template and faiss vector_io configurations

## 📝 Files Changed

- `llama_stack/providers/inline/vector_io/faiss/faiss.py` - Fixed serialization methods in `initialize()` and `_save_index()`

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
2025-06-27 14:34:52 -04:00
Rohan Awhad
7cb5d3c60f
chore: standardize unsupported model error #2517 (#2518)
# What does this PR do?

- llama_stack/exceptions.py: Add UnsupportedModelError class
- remote inference ollama.py and utils/inference/model_registry.py:
Changed ValueError in favor of UnsupportedModelError
- utils/inference/litellm_openai_mixin.py: remove `register_model`
function implementation from `LiteLLMOpenAIMixin` class. Now uses the
parent class `ModelRegistryHelper`'s function implementation

Closes #2517


## Test Plan


1. Create a new `test_run_openai.yaml` and paste the following config in
it:

```yaml
version: '2'
image_name: test-image
apis:
- inference
providers:
  inference:
  - provider_id: openai
    provider_type: remote::openai
    config:
      max_tokens: 8192
models:
- metadata: {}
  model_id: "non-existent-model"
  provider_id: openai
  model_type: llm
server:
  port: 8321
```

And run the server with:
```bash
uv run llama stack run test_run_openai.yaml
```

You should now get a `llama_stack.exceptions.UnsupportedModelError` with
the supported list of models in the error message.

---

Tested for the following remote inference providers, and they all raise
the `UnsupportedModelError`:
- Anthropic
- Cerebras
- Fireworks
- Gemini
- Groq
- Ollama
- OpenAI
- SambaNova
- Together
- Watsonx

---------

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-27 14:26:58 -04:00
Yuan Tang
9baa16e498
fix(security): Upgrade protobuf and aiohttp. Fixes CVE-2025-4565 (#2541)
# What does this PR do?

Fixes CVE-2025-4565 and the following warning:

```
warning: `aiohttp==3.11.13` is yanked (reason: "Regression: https://github.com/aio-libs/aiohttp/issues/10617")
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-06-27 06:58:38 -07:00
Juanma
e7eb9f9adc
fix: dataset metadata without provider_id (#2527)
# What does this PR do?
Fixes an error when inferring dataset provider_id with metadata

Closes #[2506](https://github.com/meta-llama/llama-stack/issues/2506)

Signed-off-by: Juanma Barea <juanmabareamartinez@gmail.com>
2025-06-27 08:51:29 -04:00
Yuan Tang
40fdce79b3
fix(security): Upgrade urllib3 to v2.5.0. Fixes CVE-2025-50181 and CVE-2025-50182 (#2534)
Some checks failed
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 36s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Test External Providers / test-external-providers (venv) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m21s
This fixes CVE-2025-50181 and CVE-2025-50182.

Changes via:
```
uv sync --upgrade-package urllib3
uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-06-27 10:46:47 +02:00
Wen Zhou
8c3f2762fb
build: update temp. created Containerfile (#2492)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- conditionally created folder /.llama/providers.d if
external_providers_dir is set
- do not create /.cache folder, not in use anywhere
- combine chmod and copy to one command


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
updated test:

```
export CONTAINER_BINARY=podman
LLAMA_STACK_DIR=. uv run llama stack build --template remote-vllm --image-type container --image-name  <name>
```
log:
```
Containerfile created successfully in /tmp/tmp.rPMunE39Aw/Containerfile

FROM python:3.11-slim
WORKDIR /app

RUN apt-get update && apt-get install -y        iputils-ping net-tools iproute2 dnsutils telnet        curl wget telnet git       procps psmisc lsof        traceroute        bubblewrap        gcc        && rm -rf /var/lib/apt/lists/*

ENV UV_SYSTEM_PYTHON=1
RUN pip install uv
RUN uv pip install --no-cache sentencepiece pillow pypdf transformers pythainlp faiss-cpu opentelemetry-sdk requests datasets chardet scipy nltk numpy matplotlib psycopg2-binary aiosqlite langdetect autoevals tree_sitter tqdm pandas chromadb-client opentelemetry-exporter-otlp-proto-http redis scikit-learn openai pymongo emoji sqlalchemy[asyncio] mcp aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
RUN uv pip install --no-cache sentence-transformers --no-deps
RUN uv pip install --no-cache torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Allows running as non-root user
RUN mkdir -p /.llama/providers.d /.cache
RUN uv pip install --no-cache llama-stack
RUN pip uninstall -y uv
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "remote-vllm"]

RUN chmod -R g+rw /app /.llama /.cache

PWD: /tmp/llama-stack
Containerfile: /tmp/tmp.rPMunE39Aw/Containerfile
+ podman build --progress=plain --security-opt label=disable --platform linux/amd64 -t distribution-remote-vllm:0.2.12 -f /tmp/tmp.rPMunE39Aw/Containerfile /tmp/llama-stack
....
Success!
Build Successful!
You can find the newly-built template here: /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml
You can run the new Llama Stack distro via: llama stack run /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml --image-type container
```

```
podman tag localhost/distribution-remote-vllm:dev quay.io/wenzhou/distribution-remote-vllm:2492_2
podman push quay.io/wenzhou/distribution-remote-vllm:2492_2



docker run --rm -p 8321:8321 -e INFERENCE_MODEL="meta-llama/Llama-2-7b-chat-hf" -e VLLM_URL="http://localhost:8000/v1" quay.io/wenzhou/distribution-remote-vllm:2492_2 --port 8321

INFO     2025-06-26 13:47:31,813 __main__:436 server: Using template remote-vllm config file:                                                         
         /app/llama-stack-source/llama_stack/templates/remote-vllm/run.yaml                                                                           
INFO     2025-06-26 13:47:31,818 __main__:438 server: Run configuration:                                                                              
INFO     2025-06-26 13:47:31,826 __main__:440 server: apis:                                                                                           
         - agents                                                                                                                                     
         - datasetio                                                                                                                                  
         - eval                                                                                                                                       
         - inference                                                                                                                                  
         - safety                                                                                                                                     
         - scoring                                                                                                                                    
         - telemetry                                                                                                                                  
         - tool_runtime                                                                                                                               
         - vector_io                                                                                                                                  
         benchmarks: []                                                                                                                               
         container_image: null                                                                                                                        
....                                                                                                 
```
-----
previous test:
local run` >llama stack build --template remote-vllm --image-type
container`
image stored in  `quay.io/wenzhou/distribution-remote-vllm:2492`

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-06-27 10:23:12 +02:00
Yuan Tang
0ddb293d77
docs: Add recent releases to CHANGELOG.md (#2533)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

Update changelog.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-06-26 23:04:13 -04:00
Ben Browning
0883944bc3
fix: Some missed env variable changes from PR 2490 (#2538)
Some checks failed
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 25s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 28s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 32s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 29s
Pre-commit / pre-commit (push) Successful in 1m1s
# What does this PR do?

Some templates were still using the old environment variable substition
syntax instead of the new one and were not getting substituted properly.

Also, some places didn't handle the new None vs old empty string ("")
values that come from the conditional environment variable substitution.

This gets the starter and remote-vllm distributions starting again, and
I tested various permutations of the starter as chroma and pgvector
needed some adjustments to their config classes to handle the new
possible `None` values. And, I had to tweak our `Provider` class to also
handle `None` values, for cases where we disable providers in the
starter config via environment variables.

This may not have caught everything that was missed, but I did grep
around quite a bit to try and find anything lingering.

## Test Plan

The following permutations now all run (or attempt to run to the point
of complaining that they can't connect to chroma, vllm, etc) when before
they failed immediately on startup because of bad environment variable
substitions:

```
uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_SQLITE_VEC=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_PGVECTOR=true uv run llama stack run llama_stack/templates/starter/run.yaml
ENABLE_CHROMADB=true uv run llama stack run llama_stack/templates/starter/run.yaml

uv run llama stack run llama_stack/templates/remote-vllm/run.yaml
```
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: raghotham <rsm@meta.com>
2025-06-26 17:59:15 -07:00
Hardik Shah
eb01a3f1c5
ci: vector_io provider integration tests (#2537)
Runs integration tests for `vector_io` across the provider matrix. 
This new workflow adds CI testing across - `inline::faiss`,
`remote::chroma`.
2025-06-26 17:04:32 -07:00
grs
68d8f2186f
fix: fix test of root span to match what is being set (#2494)
Some checks failed
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 32s
Unit Tests / unit-tests (3.12) (push) Failing after 48s
Pre-commit / pre-commit (push) Successful in 1m32s
# What does this PR do?

I get errors when trying to query spans. It appears to be a result of
traces being inserted where there is no root_span_id which causes a
pydantic validation error on trying to load the data for a query
response (and in any case having no span referenced undermines the
purpose of the trace). The root cause as far as I can see is an invalid
test in the code that inserts the trace, where it is testing for the
string "true" against an object set to the python value True.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #2493 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
With this change I can query spans.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-06-26 11:41:35 -04:00
Sébastien Han
dbdc811d16
chore: isolate bare minimum project dependencies (#2282)
Some checks failed
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 48s
# What does this PR do?

The goal is to promote the minimal set of dependencies the project needs
to run, this includes:

* dependencies needed to work with the CLI
* dependencies needed for the server to run with no providers

This also:
* Relocate redundant dependencies out of the core project and into the
  individual providers that actually require them.
* Include all necessary server dependencies so the project can run
  standalone, even without any providers.

<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan

Build and run distro a server.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 10:14:27 +02:00
Sébastien Han
43c1f39bd6
refactor(env)!: enhanced environment variable substitution (#2490)
# What does this PR do?

This commit significantly improves the environment variable substitution
functionality in Llama Stack configuration files:
* The version field in configuration files has been changed from string
to integer type for better type consistency across build and run
configurations.

* The environment variable substitution system for ${env.FOO:} was fixed
and properly returns an error

* The environment variable substitution system for ${env.FOO+} returns
None instead of an empty strings, it better matches type annotations in
config fields

* The system includes automatic type conversion for boolean, integer,
and float values.

* The error messages have been enhanced to provide clearer guidance when
environment variables are missing, including suggestions for using
default values or conditional syntax.

* Comprehensive documentation has been added to the configuration guide
explaining all supported syntax patterns, best practices, and runtime
override capabilities.

* Multiple provider configurations have been updated to use the new
conditional syntax for optional API keys, making the system more
flexible for different deployment scenarios. The telemetry configuration
has been improved to properly handle optional endpoints with appropriate
validation, ensuring that required endpoints are specified when their
corresponding sinks are enabled.

* There were many instances of ${env.NVIDIA_API_KEY:} that should have
caused the code to fail. However, due to a bug, the distro server was
still being started, and early validation wasn’t triggered. As a result,
failures were likely being handled downstream by the providers. I’ve
maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I
believe this is incorrect for many configurations. I’ll leave it to each
provider to correct it as needed.

* Environment variable substitution now uses the same syntax as Bash
parameter expansion.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:20:08 +05:30
Sébastien Han
36d70637b9
fix: finish conversion to StrEnum (#2514)
# What does this PR do?

We still had a few enum declared to behave like string as well as enum.
Let's use StrEnum for those.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:26 +05:30
Sébastien Han
ac5fd57387
chore: remove nested imports (#2515)
# What does this PR do?

* Given that our API packages use "import *" in `__init.py__` we don't
need to do `from llama_stack.apis.models.models` but simply from
llama_stack.apis.models. The decision to use `import *` is debatable and
should probably be revisited at one point.

* Remove unneeded Ruff F401 rule
* Consolidate Ruff F403 rule in the pyprojectfrom
llama_stack.apis.models.models

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:05 +05:30
Ben Browning
2d9fd041eb
fix: annotations list and web_search_preview in Responses (#2520)
# What does this PR do?


These are a couple of fixes to get an example LangChain app working with
our OpenAI Responses API implementation.

The Responses API spec requires an annotations array in
`output[*].content[*].annotations` and we were not providing one. So,
this adds that as an empty list, even though we don't do anything to
populate it yet. This prevents an error from client libraries like
Langchain that expect this field to always exist, even if an empty list.

The other fix is `web_search_preview` is a valid name for the web search
tool in the Responses API, but we only responded to `web_search` or
`web_search_preview_2025_03_11`.


## Test Plan


The existing Responses unit tests were expanded to test these cases,
via:

```
pytest -sv tests/unit/providers/agents/meta_reference/test_openai_responses.py
```

The existing test_openai_responses.py integration tests still pass with
this change, tested as below with Fireworks:

```
uv run llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv tests/integration/agents/test_openai_responses.py \
  --text-model accounts/fireworks/models/llama4-scout-instruct-basic
```

Lastly, this example LangChain app now works with Llama stack (tested
with Ollama in the starter template in this case). This LangChain code
is using the example snippets for using Responses API at
https://python.langchain.com/docs/integrations/chat/openai/#responses-api

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8321/v1/openai/v1",
    api_key="fake",
    model="ollama/meta-llama/Llama-3.2-3B-Instruct",
)

tool = {"type": "web_search_preview"}
llm_with_tools = llm.bind_tools([tool])

response = llm_with_tools.invoke("What was a positive news story from today?")

print(response.content)
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-26 07:59:33 +05:30
ehhuang
1d3f27fe5b
fix: resume responses with tool call output (#2524)
Some checks failed
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 49s
Test External Providers / test-external-providers (venv) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 2m5s
# What does this PR do?
closes #2522 

## Test Plan
added integration test
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v
tests/integration/agents/test_openai_responses.py --text-model
"accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k
'function_call'
2025-06-25 14:43:37 -07:00
Francisco Arceo
82f13fe83e
feat: Add ChunkMetadata to Chunk (#2497)
# What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.

More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.

```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 

## Test Plan
Added unit tests

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-25 15:55:23 -04:00
Ben Browning
fa0b0c13d4
fix: Ollama should be optional in starter distro (#2482)
Some checks failed
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 1m10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m8s
Python Package Build Test / build (3.13) (push) Failing after 1m6s
Test External Providers / test-external-providers (venv) (push) Failing after 1m4s
Pre-commit / pre-commit (push) Successful in 2m33s
# What does this PR do?

Our starter distro required Ollama to be running (and a large list of
models available in that Ollama) to successfully start. This adjusts
things so that Ollama does not have to be running to use the starter
template / distro.

To accomplish this, a few changes were needed:

* The Ollama provider is now configurable whether it raises an Exception
or just logs a warning when it cannot reach the Ollama server on
startup. The default is to raise an exception (same as previous
behavior), but in the starter template we adjust this to just log a
warning so that we can bring the stack up without needing a running
Ollama server.

* The starter template no longer specifies a default list of models for
Ollama, as any models specified there need to actually be pulled and
available in Ollama. Instead, it adds a new
`OLLAMA_INFERENCE_MODEL` environment variable where users can provide an
optional model to register with the Ollama provider on startup.
Additional models can also be registered via the typical
`models.register(...)` at runtime.

* The vLLM template was adjusted to also allow an optional
`VLLM_INFERENCE_MODEL` specified on startup, so that the behavior
between vLLM and Ollama was consistent here to make it easy to get up
and running quickly.

* The default vector store was changed from sqlite-vec to faiss.
sqlite-vec can enabled via setting the `ENABLE_SQLITE_VEC` environment
variable, like we do for chromadb and pgvector. This is due to
sqlite-vec not shipping proper arm64 binaries, like we previously fixed
in #1530 for the ollama distribution.

## Test Plan

With this change, the following scenarios now work with the starter
template that did not before:

* no Ollama running
* Ollama running but not all of the Llama models pulled locally
* Ollama running with a custom model registered on startup
* vLLM running with a custom model registered on startup
* running the starter template on linux/arm64, like when running
containers on Mac without rosetta emulation

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-25 15:54:00 +02:00
Varsha
cfee63bd0d
feat: Add search_mode support to OpenAI vector store API (#2500)
Some checks failed
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s
Test Llama Stack Build / build (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 44s
Test External Providers / test-external-providers (venv) (push) Failing after 47s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m12s
# What does this PR do?
Add search_mode parameter (vector/keyword/hybrid) to
openai_search_vector_store method. Fixes OpenAPI
code generation by using str instead of Literal type.

Closes: #2459 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-24 20:38:47 -04:00
ehhuang
114946ae88
chore: fix build script bug (#2507)
# What does this PR do?
Fixes
```
Installing pip dependencies
error: Failed to parse: `scikit-learn pymongo pythainlp datasets torch sentencepiece requests aiohttp psycopg2-binary trl pillow pandas chardet nltk scipy ollama faiss-cpu pypdf tree_sitter langdetect openai matplotlib asyncpg peft redis autoevals mcp opentelemetry-exporter-otlp-proto-http sqlalchemy[asyncio] tqdm opentelemetry-sdk aiosqlite numpy chromadb-client emoji transformers aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http`
  Caused by: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `p`
scikit-learn pymongo pythainlp datasets torch sentencepiece requests aiohttp psycopg2-binary trl pillow pandas chardet nltk scipy ollama faiss-cpu pypdf tree_sitter langdetect openai matplotlib asyncpg peft redis autoevals mcp opentelemetry-exporter-otlp-proto-http sqlalchemy[asyncio] tqdm opentelemetry-sdk aiosqlite numpy chromadb-client emoji transformers aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
             ^
ERROR    2025-06-24 11:33:33,362 llama_stack.distribution.build:145 uncategorized: Failed to build target myenv with return code 2
Error building stack: Failed to build image myenv
```
## Test Plan
2025-06-24 12:05:22 -07:00
Sébastien Han
450ed920d6
chore: do not build on auth ci test (#2505)
Some checks failed
Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 23s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 19s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 2m4s
# What does this PR do?

Since we are using a very minimal run.yaml, there is not need to build.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-24 21:08:33 +05:30
Ashwin Bharambe
73c18feac4
fix: update the signature of openai_list_files_in_vector_store in all VectorIO impls (#2503) 2025-06-24 18:55:56 +05:30
ehhuang
7fa8f23555
fix(ui): ensure initial data fetch only happens once (#2486)
# What does this PR do?
Bug:
1. go to responses chat logs in UI
2. go to chat completions logs page
3. observe that same data appears in the table twice

This is because `fetchData` is called multiple times when multiple
renders occur.

## Test Plan
manual testing of above bug repro steps
2025-06-24 12:22:55 +02:00
Sébastien Han
9c8be89fb6
chore: bump python supported version to 3.12 (#2475)
Some checks failed
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 6s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s
Python Package Build Test / build (3.12) (push) Failing after 33s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 31s
Pre-commit / pre-commit (push) Successful in 1m54s
# What does this PR do?

The project now supports Python >= 3.12

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-24 09:22:04 +05:30
Rohan Awhad
d797f9aec1
fix: #2495 FileNotFound Err in container image (#2498)
# What does this PR do?

Closes #2495 

Changes:
- Delay the `COPY run.yaml` into docker image step until after external
provider handling
- Split the check for `external_providers_dir` into “non-empty” and
“directory exists"


## Test Plan

0. Create and Activate venv

1. Create a `simple_build.yaml`
    ```yaml
    version: '2'
    distribution_spec:
      providers:
        inference:
          - remote::openai
    image_type: container
    image_name: openai-stack
    ```

2. Run llama stack build:
```bash
llama stack build --config simple_build.yaml
```

3. Run the docker container:
```bash
docker run \
  -p 8321:8321 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  openai_stack:0.2.12
```

This should show server is running.
```
INFO     2025-06-23 19:07:57,832 llama_stack.distribution.distribution:151 core: Loading external providers from /.llama/providers.d
INFO     2025-06-23 19:07:59,324 __main__:572 server: Listening on ['::', '0.0.0.0']:8321
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2025-06-23 19:07:59,336 __main__:156 server: Starting up
INFO:     Application startup complete.                                                                             
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

Notice the first line:
```
Loading external providers from /.llama/providers.d
```
This is expected behaviour.

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-24 09:08:08 +05:30
dependabot[bot]
929ac618ce
chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 6.3.0 (#2488)
Some checks failed
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 21s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 22s
Python Package Build Test / build (3.12) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 20s
Python Package Build Test / build (3.11) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 29s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 28s
Unit Tests / unit-tests (3.11) (push) Failing after 23s
Unit Tests / unit-tests (3.13) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
Pre-commit / pre-commit (push) Successful in 48s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 21s
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.0.1 to 6.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.3.0 🌈 Use latest version from manifest-file</h2>
<h2>Changes</h2>
<p>If a manifest-file is supplied the default value of the version input
(latest) will get the latest version available in the manifest. That
might not be the actual latest version available in the official uv
repo.</p>
<h2>🚀 Enhancements</h2>
<ul>
<li>Use latest version from manifest-file <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/458">#458</a>)</li>
</ul>
<h2>v6.2.0 🌈  New input manifest-file</h2>
<h2>Changes</h2>
<p>This release adds a new input <code>manifest-file</code>.</p>
<p>The <code>manifest-file</code> input allows you to specify a JSON
manifest that lists available uv versions,
architectures, and their download URLs. By default, this action uses the
manifest file contained
in this repository, which is automatically updated with each release of
uv.</p>
<p>The manifest file contains an array of objects, each describing a
version,
architecture, platform, and the corresponding download URL.</p>
<p>You can supply a custom manifest file URL to define additional
versions,
architectures, or different download URLs.
This is useful if you maintain your own uv builds or want to override
the default sources.</p>
<p>For example:</p>
<pre lang="json"><code>[
  {
    &quot;version&quot;: &quot;0.7.12-alpha.1&quot;,
&quot;artifactName&quot;:
&quot;uv-x86_64-unknown-linux-gnu.tar.gz&quot;,
    &quot;arch&quot;: &quot;x86_64&quot;,
    &quot;platform&quot;: &quot;unknown-linux-gnu&quot;,
&quot;downloadUrl&quot;:
&quot;https://release.pyx.dev/0.7.12-alpha.1/uv-x86_64-unknown-linux-gnu.tar.gz&quot;
  },
  ...
]
</code></pre>
<pre lang="yaml"><code>- name: Use a custom manifest file
  uses: astral-sh/setup-uv@v6
  with:
manifest-file: &quot;https://example.com/my-custom-manifest.json&quot;
</code></pre>
<blockquote>
<p>[!WARNING]</p>
</blockquote>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="445689ea25"><code>445689e</code></a>
Use latest version from manifest-file (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/458">#458</a>)</li>
<li><a
href="a02a550bdd"><code>a02a550</code></a>
Look for version-manifest.json relative to action path (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/456">#456</a>)</li>
<li><a
href="60cc2b4585"><code>60cc2b4</code></a>
Add input manifest-file (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/454">#454</a>)</li>
<li><a
href="7bbb36f434"><code>7bbb36f</code></a>
chore: update known versions for 0.7.13 and 0.7.12 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/444">#444</a>)</li>
<li><a
href="60ecb381b4"><code>60ecb38</code></a>
Set expected cache dir drive to C: on windows (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)</li>
<li><a
href="252c995424"><code>252c995</code></a>
chore: update known versions for 0.7.11 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/442">#442</a>)</li>
<li><a
href="477a814f2d"><code>477a814</code></a>
chore: update known versions for 0.7.10 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/440">#440</a>)</li>
<li><a
href="9b19f8f4b1"><code>9b19f8f</code></a>
Add warning about shadowed uv binaries to
<code>activate-environment</code> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/439">#439</a>)</li>
<li><a
href="d44461ea9f"><code>d44461e</code></a>
chore: update known versions for 0.7.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/437">#437</a>)</li>
<li><a
href="c19c1b1ffd"><code>c19c1b1</code></a>
Check that all jobs are in all-tests-passed.needs (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/432">#432</a>)</li>
<li>Additional commits viewable in <a
href="6b9c6063ab...445689ea25">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=6.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-23 11:21:06 +02:00
ehhuang
6fde601765
chore: upgrade hf hub dependency (#2487)
Some checks failed
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s
Python Package Build Test / build (3.11) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 13s
Test Llama Stack Build / build (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 33s
Test Llama Stack Build / build-single-provider (push) Failing after 31s
Pre-commit / pre-commit (push) Successful in 1m12s
# What does this PR do?
CI tests have been failing with
.venv/lib/python3.12/site-packages/peft/auto.py:21: in <module>
    from transformers import (
.venv/lib/python3.12/site-packages/transformers/__init__.py:27: in
<module>
    from . import dependency_versions_check

.venv/lib/python3.12/site-packages/transformers/dependency_versions_check.py:57:
in <module>
    require_version_core(deps[pkg])
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:117:
in require_version_core
    return require_version(requirement, hint)
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:111:
in require_version
    _compare_versions(op, got_ver, want_ver, requirement, pkg, hint)
.venv/lib/python3.12/site-packages/transformers/utils/versions.py:44: in
_compare_versions
    raise ImportError(
E ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal
functioning of this module, but found huggingface-hub==0.29.0.
E Try: `pip install transformers -U` or `pip install -e '.[dev]'` if
you're working with git main
------------------------------ Captured log setup
------------------------------
INFO llama_stack.providers.remote.inference.ollama.ollama:ollama.py:106
checking connectivity to Ollama at `http://0.0.0.0:11434`.../
=========================== short test summary info
============================
ERROR
tests/integration/providers/test_providers.py::TestProviders::test_providers
- ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal
functioning of this module, but found huggingface-hub==0.29.0.
Try: `pip install transformers -U` or `pip install -e '.[dev]'` if
you're working with git main
=================== 1 skipped, 4 warnings, 1 error in 9.52s
====================

## Test Plan
CI
2025-06-20 15:50:54 -07:00
ehhuang
23b7dc7b37
fix: stack build (#2485)
# What does this PR do?

probably related to 3.11 upgrade

^^^^
File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.11/site-packages/termcolor/termcolor.py",
line 147, in colored
    text = fmt_str % (COLORS[color], text)
                      ~~~~~~^^^^^^^
KeyError: 'light_blue'

## Test Plan
2025-06-20 15:15:43 -07:00
github-actions[bot]
d70573bd47 build: Bump version to 0.2.12 2025-06-20 21:06:17 +00:00
ehhuang
d3b60507d7
feat: support auth attributes in inference/responses stores (#2389)
# What does this PR do?
Inference/Response stores now store user attributes when inserting, and
respects them when fetching.

## Test Plan
pytest tests/unit/utils/test_sqlstore.py
2025-06-20 10:24:45 -07:00
Costa Shulyupin
7930c524f9
docs: Fix spacing (#2481)
Some checks failed
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 13s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.11, vector_io) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Pre-commit / pre-commit (push) Successful in 1m14s
![image](https://github.com/user-attachments/assets/4b8e0e9c-1622-41dd-a0f4-178b6b452029)


Replace misaligned tab with spaces

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
2025-06-20 13:21:58 +02:00
ehhuang
6832e8a658
feat: remove score_threshold constraint (#2479)
Some checks failed
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 28s
Python Package Build Test / build (3.11) (push) Failing after 3s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 28s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 23s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 30s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 48s
Test External Providers / test-external-providers (venv) (push) Failing after 1m5s
Pre-commit / pre-commit (push) Successful in 2m17s
# What does this PR do?
See inline comment.


fixes test

_
test_openai_vector_store_search_with_high_score_filter[llama_stack_client-meta-llama/Llama-3.3-70B-Instruct-meta-llama/Llama-4-Scout-17B-16E-Instruct-all-MiniLM-L6-v2-None-None]
_
llama-stack/llama_stack/distribution/library_client.py:98: in
convert_to_pydantic
    return TypeAdapter(annotation).validate_python(value)
.venv/lib/python3.10/site-packages/pydantic/type_adapter.py:421: in
validate_python
    return self.validator.validate_python(
E pydantic_core._pydantic_core.ValidationError: 1 validation error for
nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal

The above exception was the direct cause of the following exception:

llama-stack/tests/integration/vector_io/test_openai_vector_stores.py:376:
in test_openai_vector_store_search_with_high_score_filter
    search_response = compat_client.vector_stores.search(

.venv/lib/python3.10/site-packages/llama_stack_client/resources/vector_stores/vector_stores.py:356:
in search
    return self._post(

.venv/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1232:
in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream,
stream_cls=stream_cls))
llama-stack/llama_stack/distribution/library_client.py:177: in request
result = loop.run_until_complete(self.async_client.request(*args,
**kwargs))

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/asyncio/base_events.py:649:
in run_until_complete
    return future.result()
llama-stack/llama_stack/distribution/library_client.py:292: in request
    response = await self._call_non_streaming(
llama-stack/llama_stack/distribution/library_client.py:313: in
_call_non_streaming
    body = self._convert_body(path, options.method, body)
llama-stack/llama_stack/distribution/library_client.py:425: in
_convert_body
converted_body[param_name] = convert_to_pydantic(param.annotation,
value)
llama-stack/llama_stack/distribution/library_client.py:112: in
convert_to_pydantic
raise ValueError(f"Failed to convert parameter {value} into
{annotation}: {e}") from e
E ValueError: Failed to convert parameter {'score_threshold':
1.3458905661753127} into
llama_stack.apis.vector_io.vector_io.SearchRankingOptions | None: 1
validation error for nullable[SearchRankingOptions]
E   score_threshold
E Input should be less than or equal to 1 [type=less_than_equal,
input_value=1.3458905661753127, input_type=float]
E For further information visit
https://errors.pydantic.dev/2.11/v/less_than_equal

## Test Plan
2025-06-20 09:17:42 +05:30
Eran Cohen
747e594680
feat: expand set of known gemini models (#2471)
Some checks failed
Test Llama Stack Build / build-single-provider (push) Failing after 39s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s
Python Package Build Test / build (3.12) (push) Failing after 36s
Test External Providers / test-external-providers (venv) (push) Failing after 45s
Pre-commit / pre-commit (push) Successful in 1m57s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Python Package Build Test / build (3.11) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 3s
feat: Add Gemini 2.0 and 2.5 models

This commit expands the set of known Gemini models by introducing:
- `gemini/gemini-2.0-flash`
- `gemini/gemini-2.5-flash`
- `gemini/gemini-2.5-pro`

These new models are added to `LLM_MODEL_IDS` for broader compatibility
and updated in `run.yaml` to allow for their immediate use in starter
configurations.

Signed-off-by: Eran Cohen <eranco@redhat.com>
2025-06-19 12:19:37 -04:00
Ben Browning
f394c7f2d9
feat: Add missing Vector Store Files API surface (#2468)
Some checks failed
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s
Python Package Build Test / build (3.11) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s
Test External Providers / test-external-providers (venv) (push) Failing after 43s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m4s
# What does this PR do?

This adds the ability to list, retrieve, update, and delete Vector Store
Files. It implements these new APIs for the faiss and sqlite-vec
providers, since those are the two that also have the rest of the vector
store files implementation.

Closes #2445 

## Test Plan

### test_openai_vector_stores Integration Tests

There are a number of new integration tests added, which I ran for each
provider as outlined below.

faiss (from ollama distro):

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \
  --embedding-model=all-MiniLM-L6-v2
```

sqlite-vec (from starter distro):

```
llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \
  --embedding-model=all-MiniLM-L6-v2
```

### file_search verification tests

I also ensured the file_search verification tests continue to work, both
for faiss and sqlite-vec.

faiss (ollama distro):

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.2-3B-Instruct
```


sqlite-vec (starter distro):

```
llama stack run llama_stack/templates/starter/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-19 11:08:24 -04:00
Ihar Hrachyshka
a2f054607d
fix: cancel scheduler tasks on shutdown (#2130)
# What does this PR do?

Scheduler: cancel tasks on shutdown.

Otherwise the currently running tasks will never exit (before they
actually complete), which means the process can't be properly shut down
(only with SIGKILL).

Ideally, we let tasks know that they are about to shutdown and give them
some time to do so; but in the lack of the mechanism, it's better to
cancel than linger forever.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Start a long running task (e.g. torchtune or external kfp-provider
training).
Ctr-C the process in TTY. Confirm it exits in reasonable time.

```
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
13:32:26.187 - INFO - Shutting down
13:32:26.187 - INFO - Shutting down DatasetsRoutingTable
13:32:26.187 - INFO - Shutting down DatasetIORouter
13:32:26.187 - INFO - Shutting down TorchtuneKFPPostTrainingImpl
    Traceback (most recent call last):
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
        return self._loop.run_until_complete(task)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
    asyncio.exceptions.CancelledError

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 109, in <module>
        executor_main()
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main
        output_file = executor.execute()
                      ^^^^^^^^^^^^^^^^^^
      File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor.py", line 361, in execute
        result = self.func(**func_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
      File "/var/folders/45/1q1rx6cn7jbcn2ty852w0g_r0000gn/T/tmp.RKpPrvTWDD/ephemeral_component.py", line 118, in component
        asyncio.run(recipe.setup())
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
        return runner.run(main)
               ^^^^^^^^^^^^^^^^
      File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 123, in run
        raise KeyboardInterrupt()
    KeyboardInterrupt


13:32:31.219 - ERROR - Task 'component' finished with status FAILURE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO     2025-05-09 13:32:31,221 llama_stack.providers.utils.scheduler:221 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m
         finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
ERROR    2025-05-09 13:32:31,223 llama_stack_provider_kfp_trainer.scheduler:54 scheduler: Job
         test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa failed.
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/src/llama_stack_provider_kfp_trainer/scheduler.py:45   │
         │ in do                                                                                                       │
         │                                                                                                             │
         │    42 │   │   │                                                                                             │
         │    43 │   │   │   job.status = JobStatus.running                                                            │
         │    44 │   │   │   try:                                                                                      │
         │ ❱  45 │   │   │   │   artifacts = self._to_artifacts(job.handler().output)                                  │
         │    46 │   │   │   │   for artifact in artifacts:                                                            │
         │    47 │   │   │   │   │   on_artifact_collected_cb(artifact)                                                │
         │    48                                                                                                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/base_compon │
         │ ent.py:101 in __call__                                                                                      │
         │                                                                                                             │
         │    98 │   │   │   │   f'{self.name}() missing {len(missing_arguments)} required '                           │
         │    99 │   │   │   │   f'{argument_or_arguments}: {arguments}.')                                             │
         │   100 │   │                                                                                                 │
         │ ❱ 101 │   │   return pipeline_task.PipelineTask(                                                            │
         │   102 │   │   │   component_spec=self.component_spec,                                                       │
         │   103 │   │   │   args=task_inputs,                                                                         │
         │   104 │   │   │   execute_locally=pipeline_context.Pipeline.get_default_pipeline() is                       │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:187 in __init__                                                                                       │
         │                                                                                                             │
         │   184 │   │   ])                                                                                            │
         │   185 │   │                                                                                                 │
         │   186 │   │   if execute_locally:                                                                           │
         │ ❱ 187 │   │   │   self._execute_locally(args=args)                                                          │
         │   188 │                                                                                                     │
         │   189 │   def _execute_locally(self, args: Dict[str, Any]) -> None:                                         │
         │   190 │   │   """Execute the pipeline task locally.                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │
         │ sk.py:197 in _execute_locally                                                                               │
         │                                                                                                             │
         │   194 │   │   from kfp.local import task_dispatcher                                                         │
         │   195 │   │                                                                                                 │
         │   196 │   │   if self.pipeline_spec is not None:                                                            │
         │ ❱ 197 │   │   │   self._outputs = pipeline_orchestrator.run_local_pipeline(                                 │
         │   198 │   │   │   │   pipeline_spec=self.pipeline_spec,                                                     │
         │   199 │   │   │   │   arguments=args,                                                                       │
         │   200 │   │   │   )                                                                                         │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:43 in run_local_pipeline                                                                    │
         │                                                                                                             │
         │    40 │                                                                                                     │
         │    41 │   # validate and access all global state in this function, not downstream                           │
         │    42 │   config.LocalExecutionConfig.validate()                                                            │
         │ ❱  43 │   return _run_local_pipeline_implementation(                                                        │
         │    44 │   │   pipeline_spec=pipeline_spec,                                                                  │
         │    45 │   │   arguments=arguments,                                                                          │
         │    46 │   │   raise_on_error=config.LocalExecutionConfig.instance.raise_on_error,                           │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:108 in _run_local_pipeline_implementation                                                   │
         │                                                                                                             │
         │   105 │   │   │   )                                                                                         │
         │   106 │   │   return outputs                                                                                │
         │   107 │   elif dag_status == status.Status.FAILURE:                                                         │
         │ ❱ 108 │   │   log_and_maybe_raise_for_failure(                                                              │
         │   109 │   │   │   pipeline_name=pipeline_name,                                                              │
         │   110 │   │   │   fail_stack=fail_stack,                                                                    │
         │   111 │   │   │   raise_on_error=raise_on_error,                                                            │
         │                                                                                                             │
         │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │
         │ orchestrator.py:137 in log_and_maybe_raise_for_failure                                                      │
         │                                                                                                             │
         │   134 │   │   logging_utils.format_task_name(task_name) for task_name in fail_stack)                        │
         │   135 │   msg = f'Pipeline {pipeline_name_with_color} finished with status                                  │
         │       {status_with_color}. Inner task failed: {task_chain_with_color}.'                                     │
         │   136 │   if raise_on_error:                                                                                │
         │ ❱ 137 │   │   raise RuntimeError(msg)                                                                       │
         │   138 │   with logging_utils.local_logger_context():                                                        │
         │   139 │   │   logging.error(msg)                                                                            │
         │   140                                                                                                       │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status
         [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m.
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down
         DistributionInspectImpl
INFO     2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down ProviderImpl
INFO:     Application shutdown complete.
INFO:     Finished server process [26648]
```

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-06-19 17:01:33 +02:00
Sébastien Han
c20388c424
ci: add python package build test (#2457)
# What does this PR do?

We now test a package build on every PRs.

Closes: https://github.com/meta-llama/llama-stack/issues/2406

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-19 18:57:32 +05:30
Sébastien Han
fa1d986f72
fix: remove asyncio.TimeoutError since Python update (#2476)
# What does this PR do?

Since we now support Pythong starting from 3.11, this is not needed
anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-19 18:52:41 +05:30
Sébastien Han
6039d922c0
fix: allow running vector tests with embedding dimension (#2467)
Some checks failed
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 28s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 24s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 30s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 28s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 23s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s
Test Llama Stack Build / build (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 1m7s
Test Llama Stack Build / build-single-provider (push) Failing after 1m15s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m17s
Unit Tests / unit-tests (3.12) (push) Failing after 1m32s
Pre-commit / pre-commit (push) Failing after 2m14s
# What does this PR do?

Do not force 384 for the embedding dimension, use the one provided by
the test run.

## Test Plan

```
 pytest -s -vvv tests/integration/vector_io/test_vector_io.py --stack-config=http://localhost:8321 \
    -k "not(builtin_tool or safety_with_image or code_interpreter or test_rag)" \
    --text-model="meta-llama/Llama-3.2-3B-Instruct" \
    --embedding-model=granite-embedding-125m --embedding-dimension=768
Uninstalled 1 package in 16ms
Installed 1 package in 11ms
INFO     2025-06-18 10:52:03,314 tests.integration.conftest:59 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
================================================= test session starts =================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.5-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'cov': '6.0.0', 'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: cov-6.0.0, html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 8 items

tests/integration/vector_io/test_vector_io.py::test_vector_db_retrieve[emb=granite-embedding-125m:dim=768] PASSED
tests/integration/vector_io/test_vector_io.py::test_vector_db_register[emb=granite-embedding-125m:dim=768] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case0] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case1] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case2] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case3] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case4] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks_with_precomputed_embeddings[emb=granite-embedding-125m:dim=768] PASSED

================================================== 8 passed in 5.50s ==================================================
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-19 13:29:04 +05:30
Charlie Doern
d12f195f56
feat: drop python 3.10 support (#2469)
# What does this PR do?

dropped python3.10, updated pyproject and dependencies, and also removed
some blocks of code with special handling for enum.StrEnum

Closes #2458

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-19 12:07:14 +05:30
ehhuang
db2cd9e8f3
feat: support filters in file search (#2472)
# What does this PR do?
Move to use vector_stores.search for file search tool in Responses,
which supports filters.

closes #2435 

## Test Plan
Added e2e test with fitlers.
myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search and filters' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.3-70B-Instruct
2025-06-18 21:50:55 -07:00
Ihar Hrachyshka
fd37a50e6a
chore: Remove @booxter from triagers (#2473)
Sadly, I won't have capacity to continue working for the project.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-06-18 19:30:09 -07:00
ehhuang
e6bfc717cb
feat(ui): add infinite scroll pagination to chat completions/responses logs table (#2466)
Some checks failed
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 5s
Test External Providers / test-external-providers (venv) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 20s
Unit Tests / unit-tests (3.11) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Unit Tests / unit-tests (3.10) (push) Failing after 48s
Unit Tests / unit-tests (3.12) (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 1m23s
## Summary:

This commit adds infinite scroll pagination to the chat completions and
responses tables.


## Test Plan:
  1. Run unit tests: npm run test
  2. Manual testing: Navigate to chat
  completions/responses pages
  3. Verify infinite scroll triggers when approaching
  bottom
  4. Added playwright tests: npm run test:e2e
2025-06-18 15:28:39 -07:00
Sumit Jaiswal
90d03552d4
feat: To add health check for faiss inline vector_io provider (#2319)
Some checks failed
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 7s
Test External Providers / test-external-providers (venv) (push) Failing after 1m1s
Unit Tests / unit-tests (3.11) (push) Failing after 1m11s
Unit Tests / unit-tests (3.10) (push) Failing after 1m13s
Unit Tests / unit-tests (3.12) (push) Failing after 1m9s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
To add health check for faiss inline vector_io provider.

I tried adding `async def health(self) -> HealthResponse:` like in
inference provider, but it didn't worked for `inline->vector_io->faiss`
provider. And via debug logs, I understood the critical issue, that the
health responses are being stored with the API name as the key, not as a
nested dictionary with provider IDs. This means that all providers of
the same API type (e.g., "vector_io") will share the same health
response, and only the last one processed will be visible in the API
response.
I've created a patch file that fixes this issue by:
- Storing the original get_providers_health method
- Creating a patched version that correctly maps health responses to
providers
- Applying the patch to the `ProviderImpl` class

Not an expert, so please let me know, if there can be any other
workaround using which I can get the health status updated directly from
`faiss.py`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Added unit tests to test the provider patch implementation in the PR.
Adding a screenshot with the FAISS inline vector_io health status as
"OK"


![faiss_health_check](https://github.com/user-attachments/assets/d769e762-890c-41ea-a596-5e90951f79a4)
2025-06-18 17:56:25 +02:00
github-actions[bot]
7d812e3bf0 build: Bump version to 0.2.11
Some checks failed
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 55s
2025-06-17 19:08:17 +00:00
3497 changed files with 1041091 additions and 157101 deletions

View file

@ -4,3 +4,9 @@ omit =
*/llama_stack/providers/* */llama_stack/providers/*
*/llama_stack/templates/* */llama_stack/templates/*
.venv/* .venv/*
*/llama_stack/cli/scripts/*
*/llama_stack_ui/*
*/llama_stack/distribution/ui/*
*/llama_stack/strong_typing/*
*/llama_stack/env.py
*/__init__.py

19
.dockerignore Normal file
View file

@ -0,0 +1,19 @@
.venv
__pycache__
*.pyc
*.pyo
*.pyd
*.so
.git
.gitignore
htmlcov*
.coverage
coverage*
.cache
.mypy_cache
.pytest_cache
.ruff_cache
uv.lock
node_modules
build
/tmp

1
.gitattributes vendored Normal file
View file

@ -0,0 +1 @@
tests/**/recordings/** linguist-generated=true

2
.github/CODEOWNERS vendored
View file

@ -2,4 +2,4 @@
# These owners will be the default owners for everything in # These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence, # the repo. Unless a later match takes precedence,
* @ashwinb @yanxi0830 @hardikjshah @raghotham @ehhuang @terrytangyuan @leseb @bbrowning @reluctantfuturist * @ashwinb @raghotham @ehhuang @leseb @bbrowning @mattf @franciscojavierarceo @cdoern

View file

@ -2,10 +2,10 @@ blank_issues_enabled: false
contact_links: contact_links:
- name: Have you read the docs? - name: Have you read the docs?
url: https://llama-stack.readthedocs.io/en/latest/index.html url: https://llamastack.github.io/providers/external/index.html
about: Much help can be found in the docs about: Much help can be found in the docs
- name: Start a discussion - name: Start a discussion
url: https://github.com/meta-llama/llama-stack/discussions/new url: https://github.com/llamastack/llama-stack/discussions/new/
about: Start a discussion on a topic about: Start a discussion on a topic
- name: Chat on Discord - name: Chat on Discord
url: https://discord.gg/llama-stack url: https://discord.gg/llama-stack

30
.github/ISSUE_TEMPLATE/tech-debt.yml vendored Normal file
View file

@ -0,0 +1,30 @@
name: 🔧 Tech Debt
description: Something that is functional but should be improved or optimizied
labels: ["tech-debt"]
body:
- type: textarea
id: tech-debt-explanation
attributes:
label: 🤔 What is the technical debt you think should be addressed?
description: >
A clear and concise description of _what_ needs to be addressed - ensure you are describing
constitutes [technical debt](https://en.wikipedia.org/wiki/Technical_debt) and is not a bug
or feature request.
validations:
required: true
- type: textarea
id: tech-debt-motivation
attributes:
label: 💡 What is the benefit of addressing this technical debt?
description: >
A clear and concise description of _why_ this work is needed.
validations:
required: true
- type: textarea
id: other-thoughts
attributes:
label: Other thoughts
description: >
Any thoughts about how this may result in complexity in the codebase, or other trade-offs.

1
.github/TRIAGERS.md vendored
View file

@ -1,2 +1 @@
# This file documents Triage members in the Llama Stack community # This file documents Triage members in the Llama Stack community
@bbrowning @booxter @franciscojavierarceo @leseb

View file

@ -0,0 +1,72 @@
name: Install llama-stack-client
description: Install llama-stack-client based on branch context and client-version input
inputs:
client-version:
description: 'Client version to install on non-release branches (latest or published). Ignored on release branches.'
required: false
default: ""
sdk_install_url:
description: 'URL to install Python SDK from (for testing preview builds). If provided, overrides client-version.'
required: false
default: ""
outputs:
uv-extra-index-url:
description: 'UV_EXTRA_INDEX_URL to use (set for release branches)'
value: ${{ steps.configure.outputs.uv-extra-index-url }}
install-after-sync:
description: 'Whether to install client after uv sync'
value: ${{ steps.configure.outputs.install-after-sync }}
install-source:
description: 'Where to install client from after sync'
value: ${{ steps.configure.outputs.install-source }}
runs:
using: "composite"
steps:
- name: Configure client installation
id: configure
shell: bash
run: |
# If sdk_install_url is provided (e.g., from Stainless preview), use it directly
if [ -n "${{ inputs.sdk_install_url }}" ]; then
echo "Using provided sdk_install_url: ${{ inputs.sdk_install_url }}"
echo "install-after-sync=true" >> $GITHUB_OUTPUT
echo "install-source=${{ inputs.sdk_install_url }}" >> $GITHUB_OUTPUT
exit 0
fi
# Determine the branch we're working with
BRANCH="${{ github.base_ref || github.ref }}"
BRANCH="${BRANCH#refs/heads/}"
echo "Working with branch: $BRANCH"
# On release branches: use test.pypi for uv sync, then install from git
# On non-release branches: install based on client-version after sync
if [[ "$BRANCH" =~ ^release-[0-9]+\.[0-9]+\.x$ ]]; then
echo "Detected release branch: $BRANCH"
# Check if matching branch exists in client repo
if ! git ls-remote --exit-code --heads https://github.com/llamastack/llama-stack-client-python.git "$BRANCH" > /dev/null 2>&1; then
echo "::error::Branch $BRANCH not found in llama-stack-client-python repository"
echo "::error::Please create the matching release branch in llama-stack-client-python before testing"
exit 1
fi
# Configure to use test.pypi as extra index (PyPI is primary)
echo "uv-extra-index-url=https://test.pypi.org/simple/" >> $GITHUB_OUTPUT
echo "install-after-sync=true" >> $GITHUB_OUTPUT
echo "install-source=git+https://github.com/llamastack/llama-stack-client-python.git@$BRANCH" >> $GITHUB_OUTPUT
elif [ "${{ inputs.client-version }}" = "latest" ]; then
# Install from main git after sync
echo "install-after-sync=true" >> $GITHUB_OUTPUT
echo "install-source=git+https://github.com/llamastack/llama-stack-client-python.git@main" >> $GITHUB_OUTPUT
elif [ "${{ inputs.client-version }}" = "published" ]; then
# Use published version from PyPI (installed by sync)
echo "install-after-sync=false" >> $GITHUB_OUTPUT
elif [ -n "${{ inputs.client-version }}" ]; then
echo "::error::Invalid client-version: ${{ inputs.client-version }}"
exit 1
fi

View file

@ -0,0 +1,137 @@
name: 'Run and Record Tests'
description: 'Run integration tests and handle recording/artifact upload'
inputs:
stack-config:
description: 'Stack configuration to use'
required: true
setup:
description: 'Setup to use for tests (e.g., ollama, gpt, vllm)'
required: false
default: ''
inference-mode:
description: 'Inference mode (record or replay)'
required: true
suite:
description: 'Test suite to use: base, responses, vision, etc.'
required: false
default: ''
subdirs:
description: 'Comma-separated list of test subdirectories to run; overrides suite'
required: false
default: ''
pattern:
description: 'Regex pattern to pass to pytest -k'
required: false
default: ''
target-branch:
description: 'Target branch for recording commits (for PRs, use the PR head branch)'
required: false
default: ''
is-fork-pr:
description: 'Whether this is a fork PR (recordings cannot be pushed to forks)'
required: false
default: 'false'
runs:
using: 'composite'
steps:
- name: Check Storage and Memory Available Before Tests
if: ${{ always() }}
shell: bash
run: |
free -h
df -h
- name: Run Integration Tests
shell: bash
run: |
SCRIPT_ARGS="--stack-config ${{ inputs.stack-config }} --inference-mode ${{ inputs.inference-mode }}"
# Add optional arguments only if they are provided
if [ -n '${{ inputs.setup }}' ]; then
SCRIPT_ARGS="$SCRIPT_ARGS --setup ${{ inputs.setup }}"
fi
if [ -n '${{ inputs.suite }}' ]; then
SCRIPT_ARGS="$SCRIPT_ARGS --suite ${{ inputs.suite }}"
fi
if [ -n '${{ inputs.subdirs }}' ]; then
SCRIPT_ARGS="$SCRIPT_ARGS --subdirs ${{ inputs.subdirs }}"
fi
if [ -n '${{ inputs.pattern }}' ]; then
SCRIPT_ARGS="$SCRIPT_ARGS --pattern ${{ inputs.pattern }}"
fi
echo "=== Running command ==="
echo "uv run --no-sync ./scripts/integration-tests.sh $SCRIPT_ARGS"
echo ""
uv run --no-sync ./scripts/integration-tests.sh $SCRIPT_ARGS | tee pytest-${{ inputs.inference-mode }}.log
- name: Commit and push recordings
if: ${{ inputs.inference-mode == 'record' || inputs.inference-mode == 'record-if-missing' }}
shell: bash
run: |
echo "Checking for recording changes"
git status --porcelain tests/integration/recordings/ tests/integration/*/recordings/
if [[ -n $(git status --porcelain tests/integration/recordings/ tests/integration/*/recordings/) ]]; then
echo "New recordings detected"
# Determine target branch: use target-branch input if provided, otherwise use current branch
TARGET_BRANCH="${{ inputs.target-branch }}"
if [ -z "$TARGET_BRANCH" ]; then
TARGET_BRANCH="${{ github.ref_name }}"
fi
echo "Target branch: $TARGET_BRANCH"
# Check if this is a fork PR
if [ "${{ inputs.is-fork-pr }}" = "true" ]; then
echo "::warning::This is a fork PR. Recordings were updated locally but cannot be pushed to the fork."
echo "::warning::Please download the workflow artifacts and commit the recordings manually."
else
echo "Committing and pushing recordings to branch: $TARGET_BRANCH"
git add tests/integration/recordings/ tests/integration/*/recordings/
git commit -m "Recordings update from CI (setup: ${{ inputs.setup }}, suite: ${{ inputs.suite }})"
git fetch origin "$TARGET_BRANCH"
git rebase "origin/$TARGET_BRANCH"
echo "Rebased successfully"
git push origin "HEAD:$TARGET_BRANCH"
echo "Pushed successfully to $TARGET_BRANCH"
fi
else
echo "No recording changes"
fi
- name: Upload recordings (for fork PRs)
if: ${{ inputs.is-fork-pr == 'true' && (inputs.inference-mode == 'record' || inputs.inference-mode == 'record-if-missing') }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: recordings-${{ github.run_id }}-${{ github.run_attempt || '1' }}-${{ strategy.job-index || github.job }}
path: |
tests/integration/recordings/
tests/integration/*/recordings/
retention-days: 7
if-no-files-found: ignore
- name: Write docker logs to file
if: ${{ always() }}
shell: bash
run: |
# Ollama logs (if ollama container exists)
sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log 2>&1 || true
# vllm logs (if vllm container exists)
sudo docker logs vllm > vllm-${{ inputs.inference-mode }}.log 2>&1 || true
# Note: distro container logs are now dumped in integration-tests.sh before container is removed
- name: Upload logs
if: ${{ always() }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: logs-${{ github.run_id }}-${{ github.run_attempt || '1' }}-${{ strategy.job-index || github.job }}-${{ github.action }}
path: |
*.log
retention-days: 1

View file

@ -1,9 +1,23 @@
name: Setup Ollama name: Setup Ollama
description: Start Ollama description: Start Ollama
inputs:
suite:
description: 'Test suite to use: base, responses, vision, etc.'
required: false
default: ''
runs: runs:
using: "composite" using: "composite"
steps: steps:
- name: Start Ollama - name: Start Ollama
shell: bash shell: bash
run: | run: |
docker run -d --name ollama -p 11434:11434 docker.io/leseb/ollama-with-models if [ "${{ inputs.suite }}" == "vision" ]; then
image="ollama-with-vision-model"
else
image="ollama-with-models"
fi
echo "Starting Ollama with image: $image"
docker run -d --name ollama -p 11434:11434 docker.io/llamastack/$image
echo "Verifying Ollama status..."
timeout 30 bash -c 'while ! curl -s -L http://127.0.0.1:11434; do sleep 1 && echo "."; done'

View file

@ -4,24 +4,54 @@ inputs:
python-version: python-version:
description: The Python version to use description: The Python version to use
required: false required: false
default: "3.10" default: "3.12"
client-version:
description: The llama-stack-client-python version to test against (latest or published)
required: false
default: "latest"
sdk_install_url:
description: 'URL to install Python SDK from (for testing preview builds). If provided, overrides client-version.'
required: false
default: ""
runs: runs:
using: "composite" using: "composite"
steps: steps:
- name: Install uv - name: Install uv
uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6.0.1 uses: astral-sh/setup-uv@1e862dfacbd1d6d858c55d9b792c756523627244 # v7.1.4
with: with:
python-version: ${{ inputs.python-version }} python-version: ${{ inputs.python-version }}
activate-environment: true
version: 0.7.6 - name: Configure client installation
id: client-config
uses: ./.github/actions/install-llama-stack-client
with:
client-version: ${{ inputs.client-version }}
sdk_install_url: ${{ inputs.sdk_install_url }}
- name: Install dependencies - name: Install dependencies
shell: bash shell: bash
env:
UV_EXTRA_INDEX_URL: ${{ steps.client-config.outputs.uv-extra-index-url }}
run: | run: |
# Export UV env vars for current step and persist to GITHUB_ENV for subsequent steps
if [ -n "$UV_EXTRA_INDEX_URL" ]; then
export UV_INDEX_STRATEGY=unsafe-best-match
echo "UV_EXTRA_INDEX_URL=$UV_EXTRA_INDEX_URL" >> $GITHUB_ENV
echo "UV_INDEX_STRATEGY=$UV_INDEX_STRATEGY" >> $GITHUB_ENV
echo "Exported UV environment variables for current and subsequent steps"
fi
echo "Updating project dependencies via uv sync"
uv sync --all-groups uv sync --all-groups
uv pip install ollama faiss-cpu
# always test against the latest version of the client echo "Installing ad-hoc dependencies"
# TODO: this is not necessarily a good idea. we need to test against both published and latest uv pip install faiss-cpu
# to find out backwards compatibility issues.
uv pip install git+https://github.com/meta-llama/llama-stack-client-python.git@main # Install specific client version after sync if needed
uv pip install -e . if [ "${{ steps.client-config.outputs.install-after-sync }}" = "true" ]; then
echo "Installing llama-stack-client from: ${{ steps.client-config.outputs.install-source }}"
uv pip install ${{ steps.client-config.outputs.install-source }}
fi
echo "Installed llama packages"
uv pip list | grep llama

View file

@ -0,0 +1,95 @@
name: 'Setup Test Environment'
description: 'Common setup steps for integration tests including dependencies, providers, and build'
inputs:
python-version:
description: 'Python version to use'
required: true
client-version:
description: 'Client version (latest or published)'
required: true
sdk_install_url:
description: 'URL to install Python SDK from (for testing preview builds). If provided, overrides client-version.'
required: false
default: ''
setup:
description: 'Setup to configure (ollama, vllm, gpt, etc.)'
required: false
default: 'ollama'
suite:
description: 'Test suite to use: base, responses, vision, etc.'
required: false
default: ''
inference-mode:
description: 'Inference mode (record or replay)'
required: true
runs:
using: 'composite'
steps:
- name: Install dependencies
uses: ./.github/actions/setup-runner
with:
python-version: ${{ inputs.python-version }}
client-version: ${{ inputs.client-version }}
sdk_install_url: ${{ inputs.sdk_install_url }}
- name: Setup ollama
if: ${{ (inputs.setup == 'ollama' || inputs.setup == 'ollama-vision') && inputs.inference-mode == 'record' }}
uses: ./.github/actions/setup-ollama
with:
suite: ${{ inputs.suite }}
- name: Setup vllm
if: ${{ inputs.setup == 'vllm' && inputs.inference-mode == 'record' }}
uses: ./.github/actions/setup-vllm
- name: Start Postgres service
if: ${{ contains(inputs.setup, 'postgres') }}
shell: bash
run: |
sudo docker rm -f postgres-ci || true
sudo docker run -d --name postgres-ci \
-e POSTGRES_USER=llamastack \
-e POSTGRES_PASSWORD=llamastack \
-e POSTGRES_DB=llamastack \
-p 5432:5432 \
postgres:16
echo "Waiting for Postgres to become ready..."
for i in {1..30}; do
if sudo docker exec postgres-ci pg_isready -U llamastack -d llamastack >/dev/null 2>&1; then
echo "Postgres is ready"
break
fi
if [ "$i" -eq 30 ]; then
echo "Postgres failed to start in time"
sudo docker logs postgres-ci || true
exit 1
fi
sleep 2
done
- name: Verify client installation
shell: bash
run: |
echo "Verifying llama-stack-client installation:"
uv pip show llama-stack-client || echo "llama-stack-client not found"
echo ""
echo "All installed llama packages:"
uv pip list | grep llama || true
- name: Build Llama Stack
shell: bash
run: |
# Client is already installed by setup-runner (handles both main and release branches)
echo "Building Llama Stack"
LLAMA_STACK_DIR=. \
uv run --no-sync llama stack list-deps ci-tests | xargs -L1 uv pip install
- name: Configure git for commits
shell: bash
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"

View file

@ -0,0 +1,35 @@
name: Setup TypeScript client
description: Conditionally checkout and link llama-stack-client-typescript based on client-version
inputs:
client-version:
description: 'Client version (latest or published)'
required: true
outputs:
ts-client-path:
description: 'Path or version to use for TypeScript client'
value: ${{ steps.set-path.outputs.ts-client-path }}
runs:
using: "composite"
steps:
- name: Checkout TypeScript client (latest)
if: ${{ inputs.client-version == 'latest' }}
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
repository: llamastack/llama-stack-client-typescript
ref: main
path: .ts-client-checkout
- name: Set TS_CLIENT_PATH
id: set-path
shell: bash
run: |
if [ "${{ inputs.client-version }}" = "latest" ]; then
echo "ts-client-path=${{ github.workspace }}/.ts-client-checkout" >> $GITHUB_OUTPUT
elif [ "${{ inputs.client-version }}" = "published" ]; then
echo "ts-client-path=^0.3.2" >> $GITHUB_OUTPUT
else
echo "::error::Invalid client-version: ${{ inputs.client-version }}"
exit 1
fi

28
.github/actions/setup-vllm/action.yml vendored Normal file
View file

@ -0,0 +1,28 @@
name: Setup VLLM
description: Start VLLM
runs:
using: "composite"
steps:
- name: Start VLLM
shell: bash
run: |
# Start vllm container
docker run -d \
--name vllm \
-p 8000:8000 \
--privileged=true \
quay.io/higginsd/vllm-cpu:65393ee064-qwen3 \
--host 0.0.0.0 \
--port 8000 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--model /root/.cache/Qwen3-0.6B \
--served-model-name Qwen/Qwen3-0.6B \
--max-model-len 8192
# Wait for vllm to be ready
echo "Waiting for vllm to be ready..."
timeout 900 bash -c 'until curl -f http://localhost:8000/health; do
echo "Waiting for vllm..."
sleep 5
done'

View file

@ -9,15 +9,25 @@ updates:
day: "saturday" day: "saturday"
commit-message: commit-message:
prefix: chore(github-deps) prefix: chore(github-deps)
- package-ecosystem: "uv" - package-ecosystem: "uv"
directory: "/" directory: "/"
schedule: schedule:
interval: "weekly" interval: "weekly"
day: "saturday" day: "saturday"
# ignore all non-security updates: https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#open-pull-requests-limit
open-pull-requests-limit: 0
labels: labels:
- type/dependencies - type/dependencies
- python - python
commit-message: commit-message:
prefix: chore(python-deps) prefix: chore(python-deps)
- package-ecosystem: npm
directory: "/llama_stack_ui"
schedule:
interval: "weekly"
day: "saturday"
labels:
- type/dependencies
- javascript
commit-message:
prefix: chore(ui-deps)

23
.github/mergify.yml vendored Normal file
View file

@ -0,0 +1,23 @@
pull_request_rules:
- name: ping author on conflicts and add 'needs-rebase' label
conditions:
- conflict
- -closed
actions:
label:
add:
- needs-rebase
comment:
message: >
This pull request has merge conflicts that must be resolved before it
can be merged. @{{author}} please rebase it.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
- name: remove 'needs-rebase' label when conflict is resolved
conditions:
- -conflict
- -closed
actions:
label:
remove:
- needs-rebase

25
.github/workflows/README.md vendored Normal file
View file

@ -0,0 +1,25 @@
# Llama Stack CI
Llama Stack uses GitHub Actions for Continuous Integration (CI). Below is a table detailing what CI the project includes and the purpose.
| Name | File | Purpose |
| ---- | ---- | ------- |
| Backward Compatibility Check | [backward-compat.yml](backward-compat.yml) | Check backward compatibility for config.yaml files |
| API Conformance Tests | [conformance.yml](conformance.yml) | Run the API Conformance test suite on the changes. |
| Installer CI | [install-script-ci.yml](install-script-ci.yml) | Test the installation script |
| Integration Auth Tests | [integration-auth-tests.yml](integration-auth-tests.yml) | Run the integration test suite with Kubernetes authentication |
| SqlStore Integration Tests | [integration-sql-store-tests.yml](integration-sql-store-tests.yml) | Run the integration test suite with SqlStore |
| Integration Tests (Replay) | [integration-tests.yml](integration-tests.yml) | Run the integration test suites from tests/integration in replay mode |
| Vector IO Integration Tests | [integration-vector-io-tests.yml](integration-vector-io-tests.yml) | Run the integration test suite with various VectorIO providers |
| Pre-commit | [pre-commit.yml](pre-commit.yml) | Run pre-commit checks |
| Test Llama Stack Build | [providers-build.yml](providers-build.yml) | Test llama stack build |
| Test llama stack list-deps | [providers-list-deps.yml](providers-list-deps.yml) | Test llama stack list-deps |
| Python Package Build Test | [python-build-test.yml](python-build-test.yml) | Test building the llama-stack PyPI project |
| Integration Tests (Record) | [record-integration-tests.yml](record-integration-tests.yml) | Run the integration test suite from tests/integration |
| Check semantic PR titles | [semantic-pr.yml](semantic-pr.yml) | Ensure that PR titles follow the conventional commit spec |
| Stainless SDK Builds | [stainless-builds.yml](stainless-builds.yml) | Build Stainless SDK from OpenAPI spec changes |
| Close stale issues and PRs | [stale_bot.yml](stale_bot.yml) | Run the Stale Bot action |
| Test External Providers Installed via Module | [test-external-provider-module.yml](test-external-provider-module.yml) | Test External Provider installation via Python module |
| Test External API and Providers | [test-external.yml](test-external.yml) | Test the External API and Provider mechanisms |
| UI Tests | [ui-unit-tests.yml](ui-unit-tests.yml) | Run the UI test suite |
| Unit Tests | [unit-tests.yml](unit-tests.yml) | Run the unit test suite |

578
.github/workflows/backward-compat.yml vendored Normal file
View file

@ -0,0 +1,578 @@
name: Backward Compatibility Check
run-name: Check backward compatibility for config.yaml files
on:
pull_request:
branches:
- main
- 'release-[0-9]+.[0-9]+.[0-9]+.[0-9]+'
- 'release-[0-9]+.[0-9]+.[0-9]+'
- 'release-[0-9]+.[0-9]+'
paths:
- 'src/llama_stack/core/datatypes.py'
- 'src/llama_stack/providers/datatypes.py'
- 'src/llama_stack/distributions/**/config.yaml'
- 'tests/backward_compat/**'
- '.github/workflows/backward-compat.yml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
check-main-compatibility:
name: Check Compatibility with main
runs-on: ubuntu-latest
steps:
- name: Checkout PR branch
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0 # Need full history to access main branch
- name: Set up Python
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
with:
python-version: '3.12'
- name: Install uv
uses: astral-sh/setup-uv@681c641aba71e4a1c380be3ab5e12ad51f415867 # v7.1.6
with:
enable-cache: true
- name: Install dependencies
run: |
uv sync --group dev
- name: Extract config.yaml files from main branch
id: extract_configs
run: |
# Get list of config.yaml paths from main
git fetch origin main
CONFIG_PATHS=$(git ls-tree -r --name-only origin/main | grep "src/llama_stack/distributions/.*/config.yaml$" || true)
if [ -z "$CONFIG_PATHS" ]; then
echo "No config.yaml files found in main branch"
exit 1
fi
# Extract all configs to a temp directory
mkdir -p /tmp/main_configs
echo "Extracting configs from main branch:"
while IFS= read -r config_path; do
if [ -z "$config_path" ]; then
continue
fi
# Extract filename for storage
filename=$(basename $(dirname "$config_path"))
echo " - $filename (from $config_path)"
git show origin/main:"$config_path" > "/tmp/main_configs/${filename}.yaml"
done <<< "$CONFIG_PATHS"
echo ""
echo "Extracted $(ls /tmp/main_configs/*.yaml | wc -l) config files"
- name: Test all configs from main
id: test_configs
continue-on-error: true
run: |
# Run pytest once with all configs parameterized
if COMPAT_TEST_CONFIGS_DIR=/tmp/main_configs uv run pytest tests/backward_compat/test_run_config.py -v; then
echo "failed=false" >> $GITHUB_OUTPUT
else
echo "failed=true" >> $GITHUB_OUTPUT
exit 1
fi
- name: Check for breaking change acknowledgment
id: check_ack
if: steps.test_configs.outputs.failed == 'true'
run: |
echo "Breaking changes detected. Checking for acknowledgment..."
# Check PR title for '!:' marker (conventional commits)
PR_TITLE="${{ github.event.pull_request.title }}"
if [[ "$PR_TITLE" =~ ^[a-z]+\!: ]]; then
echo "✓ Breaking change acknowledged in PR title"
echo "acknowledged=true" >> $GITHUB_OUTPUT
exit 0
fi
# Check commit messages for BREAKING CHANGE:
if git log origin/main..HEAD --format=%B | grep -q "BREAKING CHANGE:"; then
echo "✓ Breaking change acknowledged in commit message"
echo "acknowledged=true" >> $GITHUB_OUTPUT
exit 0
fi
echo "✗ Breaking change NOT acknowledged"
echo "acknowledged=false" >> $GITHUB_OUTPUT
env:
GH_TOKEN: ${{ github.token }}
- name: Evaluate results
if: always()
run: |
FAILED="${{ steps.test_configs.outputs.failed }}"
ACKNOWLEDGED="${{ steps.check_ack.outputs.acknowledged }}"
if [[ "$FAILED" == "true" ]]; then
if [[ "$ACKNOWLEDGED" == "true" ]]; then
echo ""
echo "⚠️ WARNING: Breaking changes detected but acknowledged"
echo ""
echo "This PR introduces backward-incompatible changes to config.yaml."
echo "The changes have been properly acknowledged."
echo ""
exit 0 # Pass the check
else
echo ""
echo "❌ ERROR: Breaking changes detected without acknowledgment"
echo ""
echo "This PR introduces backward-incompatible changes to config.yaml"
echo "that will break existing user configurations."
echo ""
echo "To acknowledge this breaking change, do ONE of:"
echo " 1. Add '!:' to your PR title (e.g., 'feat!: change xyz')"
echo " 2. Add the 'breaking-change' label to this PR"
echo " 3. Include 'BREAKING CHANGE:' in a commit message"
echo ""
exit 1 # Fail the check
fi
fi
test-integration-main:
name: Run Integration Tests with main Config
runs-on: ubuntu-latest
steps:
- name: Checkout PR branch
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
- name: Extract ci-tests config.yaml from main
run: |
git fetch origin main
git show origin/main:src/llama_stack/distributions/ci-tests/config.yaml > /tmp/main-ci-tests-config.yaml
echo "Extracted ci-tests config.yaml from main branch"
- name: Setup test environment
uses: ./.github/actions/setup-test-environment
with:
python-version: '3.12'
client-version: 'latest'
setup: 'ollama'
suite: 'base'
inference-mode: 'replay'
- name: Run integration tests with main config
id: test_integration
continue-on-error: true
uses: ./.github/actions/run-and-record-tests
with:
stack-config: /tmp/main-ci-tests-config.yaml
setup: 'ollama'
inference-mode: 'replay'
suite: 'base'
- name: Check for breaking change acknowledgment
id: check_ack
if: steps.test_integration.outcome == 'failure'
run: |
echo "Integration tests failed. Checking for acknowledgment..."
# Check PR title for '!:' marker (conventional commits)
PR_TITLE="${{ github.event.pull_request.title }}"
if [[ "$PR_TITLE" =~ ^[a-z]+\!: ]]; then
echo "✓ Breaking change acknowledged in PR title"
echo "acknowledged=true" >> $GITHUB_OUTPUT
exit 0
fi
# Check commit messages for BREAKING CHANGE:
if git log origin/main..HEAD --format=%B | grep -q "BREAKING CHANGE:"; then
echo "✓ Breaking change acknowledged in commit message"
echo "acknowledged=true" >> $GITHUB_OUTPUT
exit 0
fi
echo "✗ Breaking change NOT acknowledged"
echo "acknowledged=false" >> $GITHUB_OUTPUT
env:
GH_TOKEN: ${{ github.token }}
- name: Evaluate integration test results
if: always()
run: |
TEST_FAILED="${{ steps.test_integration.outcome == 'failure' }}"
ACKNOWLEDGED="${{ steps.check_ack.outputs.acknowledged }}"
if [[ "$TEST_FAILED" == "true" ]]; then
if [[ "$ACKNOWLEDGED" == "true" ]]; then
echo ""
echo "⚠️ WARNING: Integration tests failed with main config but acknowledged"
echo ""
exit 0 # Pass the check
else
echo ""
echo "❌ ERROR: Integration tests failed with main config without acknowledgment"
echo ""
echo "To acknowledge this breaking change, do ONE of:"
echo " 1. Add '!:' to your PR title (e.g., 'feat!: change xyz')"
echo " 2. Include 'BREAKING CHANGE:' in a commit message"
echo ""
exit 1 # Fail the check
fi
fi
test-integration-release:
name: Run Integration Tests with Latest Release (Informational)
runs-on: ubuntu-latest
steps:
- name: Checkout PR branch
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
- name: Get latest release
id: get_release
run: |
# Get the latest release from GitHub
LATEST_TAG=$(gh release list --limit 1 --json tagName --jq '.[0].tagName' 2>/dev/null || echo "")
if [ -z "$LATEST_TAG" ]; then
echo "No releases found, skipping release compatibility check"
echo "has_release=false" >> $GITHUB_OUTPUT
exit 0
fi
echo "Latest release: $LATEST_TAG"
echo "has_release=true" >> $GITHUB_OUTPUT
echo "tag=$LATEST_TAG" >> $GITHUB_OUTPUT
env:
GH_TOKEN: ${{ github.token }}
- name: Extract ci-tests config.yaml from release
if: steps.get_release.outputs.has_release == 'true'
id: extract_config
run: |
RELEASE_TAG="${{ steps.get_release.outputs.tag }}"
# Try with src/ prefix first (newer releases), then without (older releases)
if git show "$RELEASE_TAG:src/llama_stack/distributions/ci-tests/config.yaml" > /tmp/release-ci-tests-config.yaml 2>/dev/null; then
echo "Extracted ci-tests config.yaml from release $RELEASE_TAG (src/ path)"
echo "has_config=true" >> $GITHUB_OUTPUT
elif git show "$RELEASE_TAG:llama_stack/distributions/ci-tests/config.yaml" > /tmp/release-ci-tests-config.yaml 2>/dev/null; then
echo "Extracted ci-tests config.yaml from release $RELEASE_TAG (old path)"
echo "has_config=true" >> $GITHUB_OUTPUT
else
echo "::warning::ci-tests/config.yaml not found in release $RELEASE_TAG"
echo "has_config=false" >> $GITHUB_OUTPUT
fi
- name: Setup test environment
if: steps.get_release.outputs.has_release == 'true' && steps.extract_config.outputs.has_config == 'true'
uses: ./.github/actions/setup-test-environment
with:
python-version: '3.12'
client-version: 'latest'
setup: 'ollama'
suite: 'base'
inference-mode: 'replay'
- name: Run integration tests with release config (PR branch)
id: test_release_pr
if: steps.get_release.outputs.has_release == 'true' && steps.extract_config.outputs.has_config == 'true'
continue-on-error: true
uses: ./.github/actions/run-and-record-tests
with:
stack-config: /tmp/release-ci-tests-config.yaml
setup: 'ollama'
inference-mode: 'replay'
suite: 'base'
- name: Checkout main branch to test baseline
if: steps.get_release.outputs.has_release == 'true' && steps.extract_config.outputs.has_config == 'true'
run: |
git checkout origin/main
- name: Setup test environment for main
if: steps.get_release.outputs.has_release == 'true' && steps.extract_config.outputs.has_config == 'true'
uses: ./.github/actions/setup-test-environment
with:
python-version: '3.12'
client-version: 'latest'
setup: 'ollama'
suite: 'base'
inference-mode: 'replay'
- name: Run integration tests with release config (main branch)
id: test_release_main
if: steps.get_release.outputs.has_release == 'true' && steps.extract_config.outputs.has_config == 'true'
continue-on-error: true
uses: ./.github/actions/run-and-record-tests
with:
stack-config: /tmp/release-ci-tests-config.yaml
setup: 'ollama'
inference-mode: 'replay'
suite: 'base'
- name: Report results and post PR comment
if: always() && steps.get_release.outputs.has_release == 'true' && steps.extract_config.outputs.has_config == 'true'
run: |
RELEASE_TAG="${{ steps.get_release.outputs.tag }}"
PR_OUTCOME="${{ steps.test_release_pr.outcome }}"
MAIN_OUTCOME="${{ steps.test_release_main.outcome }}"
if [[ "$PR_OUTCOME" == "failure" && "$MAIN_OUTCOME" == "success" ]]; then
# NEW breaking change - PR fails but main passes
echo "::error::🚨 This PR introduces a NEW breaking change!"
# Check if we already posted a comment (to avoid spam on every push)
EXISTING_COMMENT=$(gh pr view ${{ github.event.pull_request.number }} --json comments --jq '.comments[] | select(.body | contains("🚨 New Breaking Change Detected") and contains("Integration tests")) | .id' | head -1)
if [[ -z "$EXISTING_COMMENT" ]]; then
gh pr comment ${{ github.event.pull_request.number }} --body "## 🚨 New Breaking Change Detected
**Integration tests against release \`$RELEASE_TAG\` are now failing**
⚠️ This PR introduces a breaking change that affects compatibility with the latest release.
- Users on release \`$RELEASE_TAG\` may not be able to upgrade
- Existing configurations may break
The tests pass on \`main\` but fail with this PR's changes.
> **Note:** This is informational only and does not block merge.
> Consider whether this breaking change is acceptable for users."
else
echo "Comment already exists, skipping to avoid spam"
fi
cat >> $GITHUB_STEP_SUMMARY <<EOF
## 🚨 NEW Breaking Change Detected
**Integration tests against release \`$RELEASE_TAG\` FAILED**
⚠️ **This PR introduces a NEW breaking change**
- Tests **PASS** on main branch ✅
- Tests **FAIL** on PR branch ❌
- Users on release \`$RELEASE_TAG\` may not be able to upgrade
- Existing configurations may break
> **Note:** This is informational only and does not block merge.
> Consider whether this breaking change is acceptable for users.
EOF
elif [[ "$PR_OUTCOME" == "failure" ]]; then
# Existing breaking change - both PR and main fail
echo "::warning::Breaking change already exists in main branch"
cat >> $GITHUB_STEP_SUMMARY <<EOF
## ⚠️ Release Compatibility Test Failed (Existing Issue)
**Integration tests against release \`$RELEASE_TAG\` FAILED**
- Tests **FAIL** on main branch ❌
- Tests **FAIL** on PR branch ❌
- This breaking change already exists in main (not introduced by this PR)
> **Note:** This is informational only.
EOF
else
# Success - tests pass
cat >> $GITHUB_STEP_SUMMARY <<EOF
## ✅ Release Compatibility Test Passed
Integration tests against release \`$RELEASE_TAG\` passed successfully.
This PR maintains compatibility with the latest release.
EOF
fi
env:
GH_TOKEN: ${{ github.token }}
check-schema-release-compatibility:
name: Check Schema Compatibility with Latest Release (Informational)
runs-on: ubuntu-latest
steps:
- name: Checkout PR branch
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
with:
python-version: '3.12'
- name: Install uv
uses: astral-sh/setup-uv@681c641aba71e4a1c380be3ab5e12ad51f415867 # v7.1.6
with:
enable-cache: true
- name: Install dependencies
run: |
uv sync --group dev
- name: Get latest release
id: get_release
run: |
# Get the latest release from GitHub
LATEST_TAG=$(gh release list --limit 1 --json tagName --jq '.[0].tagName' 2>/dev/null || echo "")
if [ -z "$LATEST_TAG" ]; then
echo "No releases found, skipping release compatibility check"
echo "has_release=false" >> $GITHUB_OUTPUT
exit 0
fi
echo "Latest release: $LATEST_TAG"
echo "has_release=true" >> $GITHUB_OUTPUT
echo "tag=$LATEST_TAG" >> $GITHUB_OUTPUT
env:
GH_TOKEN: ${{ github.token }}
- name: Extract configs from release
if: steps.get_release.outputs.has_release == 'true'
id: extract_release_configs
run: |
RELEASE_TAG="${{ steps.get_release.outputs.tag }}"
# Get config.yaml files from the release (try both src/ and old path)
CONFIG_PATHS=$(git ls-tree -r --name-only "$RELEASE_TAG" | grep "llama_stack/distributions/.*/config.yaml$" || true)
if [ -z "$CONFIG_PATHS" ]; then
echo "::warning::No config.yaml files found in release $RELEASE_TAG"
echo "has_configs=false" >> $GITHUB_OUTPUT
exit 0
fi
# Extract all configs to a temp directory
mkdir -p /tmp/release_configs
echo "Extracting configs from release $RELEASE_TAG:"
while IFS= read -r config_path; do
if [ -z "$config_path" ]; then
continue
fi
filename=$(basename $(dirname "$config_path"))
echo " - $filename (from $config_path)"
git show "$RELEASE_TAG:$config_path" > "/tmp/release_configs/${filename}.yaml" 2>/dev/null || true
done <<< "$CONFIG_PATHS"
echo ""
echo "Extracted $(ls /tmp/release_configs/*.yaml 2>/dev/null | wc -l) config files"
echo "has_configs=true" >> $GITHUB_OUTPUT
- name: Test against release configs (PR branch)
id: test_schema_pr
if: steps.get_release.outputs.has_release == 'true' && steps.extract_release_configs.outputs.has_configs == 'true'
continue-on-error: true
run: |
RELEASE_TAG="${{ steps.get_release.outputs.tag }}"
COMPAT_TEST_CONFIGS_DIR=/tmp/release_configs uv run pytest tests/backward_compat/test_run_config.py -v --tb=short
- name: Checkout main branch to test baseline
if: steps.get_release.outputs.has_release == 'true' && steps.extract_release_configs.outputs.has_configs == 'true'
run: |
git checkout origin/main
- name: Install dependencies for main
if: steps.get_release.outputs.has_release == 'true' && steps.extract_release_configs.outputs.has_configs == 'true'
run: |
uv sync --group dev
- name: Test against release configs (main branch)
id: test_schema_main
if: steps.get_release.outputs.has_release == 'true' && steps.extract_release_configs.outputs.has_configs == 'true'
continue-on-error: true
run: |
RELEASE_TAG="${{ steps.get_release.outputs.tag }}"
COMPAT_TEST_CONFIGS_DIR=/tmp/release_configs uv run pytest tests/backward_compat/test_run_config.py -v --tb=short
- name: Report results and post PR comment
if: always() && steps.get_release.outputs.has_release == 'true' && steps.extract_release_configs.outputs.has_configs == 'true'
run: |
RELEASE_TAG="${{ steps.get_release.outputs.tag }}"
PR_OUTCOME="${{ steps.test_schema_pr.outcome }}"
MAIN_OUTCOME="${{ steps.test_schema_main.outcome }}"
if [[ "$PR_OUTCOME" == "failure" && "$MAIN_OUTCOME" == "success" ]]; then
# NEW breaking change - PR fails but main passes
echo "::error::🚨 This PR introduces a NEW schema breaking change!"
# Check if we already posted a comment (to avoid spam on every push)
EXISTING_COMMENT=$(gh pr view ${{ github.event.pull_request.number }} --json comments --jq '.comments[] | select(.body | contains("🚨 New Schema Breaking Change Detected")) | .id' | head -1)
if [[ -z "$EXISTING_COMMENT" ]]; then
gh pr comment ${{ github.event.pull_request.number }} --body "## 🚨 New Schema Breaking Change Detected
**Schema validation against release \`$RELEASE_TAG\` is now failing**
⚠️ This PR introduces a schema breaking change that affects compatibility with the latest release.
- Users on release \`$RELEASE_TAG\` will not be able to upgrade
- Existing config.yaml configurations will fail validation
The tests pass on \`main\` but fail with this PR's changes.
> **Note:** This is informational only and does not block merge.
> Consider whether this breaking change is acceptable for users."
else
echo "Comment already exists, skipping to avoid spam"
fi
cat >> $GITHUB_STEP_SUMMARY <<EOF
## 🚨 NEW Schema Breaking Change Detected
**Schema validation against release \`$RELEASE_TAG\` FAILED**
⚠️ **This PR introduces a NEW schema breaking change**
- Tests **PASS** on main branch ✅
- Tests **FAIL** on PR branch ❌
- Users on release \`$RELEASE_TAG\` will not be able to upgrade
- Existing config.yaml configurations will fail validation
> **Note:** This is informational only and does not block merge.
> Consider whether this breaking change is acceptable for users.
EOF
elif [[ "$PR_OUTCOME" == "failure" ]]; then
# Existing breaking change - both PR and main fail
echo "::warning::Schema breaking change already exists in main branch"
cat >> $GITHUB_STEP_SUMMARY <<EOF
## ⚠️ Release Schema Compatibility Failed (Existing Issue)
**Schema validation against release \`$RELEASE_TAG\` FAILED**
- Tests **FAIL** on main branch ❌
- Tests **FAIL** on PR branch ❌
- This schema breaking change already exists in main (not introduced by this PR)
> **Note:** This is informational only.
EOF
else
# Success - tests pass
cat >> $GITHUB_STEP_SUMMARY <<EOF
## ✅ Release Schema Compatibility Passed
All config.yaml configs from release \`$RELEASE_TAG\` are compatible.
This PR maintains backward compatibility with the latest release.
EOF
fi
env:
GH_TOKEN: ${{ github.token }}

View file

@ -1,29 +0,0 @@
name: Update Changelog
on:
release:
types: [published, unpublished, created, edited, deleted, released]
permissions:
contents: read
jobs:
generate_changelog:
name: Generate changelog
permissions:
contents: write # for peter-evans/create-pull-request to create branch
pull-requests: write # for peter-evans/create-pull-request to create a PR
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: main
fetch-depth: 0
- run: |
python ./scripts/gen-changelog.py
- uses: peter-evans/create-pull-request@271a8d0340265f705b14b6d32b9829c1cb33d45e # v7.0.8
with:
title: 'docs: update CHANGELOG.md for ${{ github.ref_name }}'
commit-message: 'docs: update CHANGELOG.md for ${{ github.ref_name }}'
branch: create-pull-request/changelog
signoff: true

161
.github/workflows/conformance.yml vendored Normal file
View file

@ -0,0 +1,161 @@
# API Conformance Tests
# This workflow ensures that API changes maintain backward compatibility and don't break existing integrations
# It runs schema validation and OpenAPI diff checks to catch breaking changes early
#
# The workflow handles both monolithic and split API specifications:
# - If split specs exist (stable/experimental/deprecated), they are stitched together for comparison
# - If only monolithic spec exists, it is used directly
# This allows for clean API organization while maintaining robust conformance testing
name: API Conformance Tests
run-name: Run the API Conformance test suite on the changes.
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
types: [opened, synchronize, reopened, edited]
paths:
- 'docs/static/llama-stack-spec.yaml' # Legacy monolithic spec
- 'docs/static/stable-llama-stack-spec.yaml' # Stable APIs spec
- 'docs/static/experimental-llama-stack-spec.yaml' # Experimental APIs spec
- 'docs/static/deprecated-llama-stack-spec.yaml' # Deprecated APIs spec
- '.github/workflows/conformance.yml' # This workflow itself
concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
# Cancel in-progress runs when new commits are pushed to avoid wasting CI resources
cancel-in-progress: true
jobs:
# Job to check if API schema changes maintain backward compatibility
check-schema-compatibility:
runs-on: ubuntu-latest
steps:
- name: Checkout PR Code
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
# Check if we should skip conformance testing due to breaking changes
- name: Check if conformance test should be skipped
id: skip-check
env:
PR_TITLE: ${{ github.event.pull_request.title }}
run: |
# Skip if title contains "!:" indicating breaking change (like "feat!:")
if [[ "$PR_TITLE" == *"!:"* ]]; then
echo "skip=true" >> $GITHUB_OUTPUT
exit 0
fi
# Get all commits in this PR and check for BREAKING CHANGE footer
git log --format="%B" ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }} | \
grep -q "BREAKING CHANGE:" && echo "skip=true" >> $GITHUB_OUTPUT || echo "skip=false" >> $GITHUB_OUTPUT
shell: bash
# Checkout the base branch to compare against (usually main)
# This allows us to diff the current changes against the previous state
- name: Checkout Base Branch
if: steps.skip-check.outputs.skip != 'true'
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: ${{ github.event.pull_request.base.ref }}
path: 'base'
# Cache oasdiff to avoid checksum failures and speed up builds
- name: Cache oasdiff
if: steps.skip-check.outputs.skip != 'true'
id: cache-oasdiff
uses: actions/cache@9255dc7a253b0ccc959486e2bca901246202afeb
with:
path: ~/oasdiff
key: oasdiff-${{ runner.os }}
# Install oasdiff: https://github.com/oasdiff/oasdiff, a tool for detecting breaking changes in OpenAPI specs.
- name: Install oasdiff
if: steps.skip-check.outputs.skip != 'true' && steps.cache-oasdiff.outputs.cache-hit != 'true'
run: |
curl -fsSL https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh | sh
cp /usr/local/bin/oasdiff ~/oasdiff
# Setup cached oasdiff
- name: Setup cached oasdiff
if: steps.skip-check.outputs.skip != 'true' && steps.cache-oasdiff.outputs.cache-hit == 'true'
run: |
sudo cp ~/oasdiff /usr/local/bin/oasdiff
sudo chmod +x /usr/local/bin/oasdiff
# Install yq for YAML processing
- name: Install yq
run: |
sudo wget -qO /usr/local/bin/yq https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
sudo chmod +x /usr/local/bin/yq
# Verify API specs exist for conformance testing
- name: Check API Specs
if: steps.skip-check.outputs.skip != 'true'
run: |
echo "Checking for API specification files..."
# Check current branch
if [ -f "docs/static/stable-llama-stack-spec.yaml" ]; then
echo "✓ Found stable API spec in current branch"
CURRENT_SPEC="docs/static/stable-llama-stack-spec.yaml"
elif [ -f "docs/static/llama-stack-spec.yaml" ]; then
echo "✓ Found monolithic API spec in current branch"
CURRENT_SPEC="docs/static/llama-stack-spec.yaml"
else
echo "❌ No API specs found in current branch"
exit 1
fi
# Check base branch
if [ -f "base/docs/static/stable-llama-stack-spec.yaml" ]; then
echo "✓ Found stable API spec in base branch"
BASE_SPEC="base/docs/static/stable-llama-stack-spec.yaml"
elif [ -f "base/docs/static/llama-stack-spec.yaml" ]; then
echo "✓ Found monolithic API spec in base branch"
BASE_SPEC="base/docs/static/llama-stack-spec.yaml"
else
echo "❌ No API specs found in base branch"
exit 1
fi
# Export for next step
echo "BASE_SPEC=${BASE_SPEC}" >> $GITHUB_ENV
echo "CURRENT_SPEC=${CURRENT_SPEC}" >> $GITHUB_ENV
echo "Will compare: ${BASE_SPEC} -> ${CURRENT_SPEC}"
# Run oasdiff to detect breaking changes in the API specification
# This step will fail if incompatible changes are detected, preventing breaking changes from being merged
- name: Run OpenAPI Breaking Change Diff
if: steps.skip-check.outputs.skip != 'true'
run: |
oasdiff breaking --fail-on ERR $BASE_SPEC $CURRENT_SPEC --match-path '^/v1/'
# Run oasdiff to detect breaking changes in the API specification when compared to the OpenAI openAPI spec
- name: Run OpenAPI Breaking Change Diff Against OpenAI API
if: steps.skip-check.outputs.skip != 'true'
continue-on-error: true
shell: bash
run: |
OPENAI_SPEC=docs/static/openai-spec-2.3.0.yml
LLAMA_STACK_SPEC=docs/static/llama-stack-spec.yaml
# Compare Llama Stack spec against OpenAI spec.
# This finds breaking changes in our implementation of common endpoints.
# By using our spec as the base, we avoid errors for endpoints we don't implement.
oasdiff breaking --fail-on ERR \
"$LLAMA_STACK_SPEC" \
"$OPENAI_SPEC" \
--strip-prefix-base "/v1"
# Report when test is skipped
- name: Report skip reason
if: steps.skip-check.outputs.skip == 'true'
run: |
echo "Conformance test skipped due to breaking change indicator"

View file

@ -1,355 +0,0 @@
name: "Run Llama-stack Tests"
on:
#### Temporarily disable PR runs until tests run as intended within mainline.
#TODO Add this back.
#pull_request_target:
# types: ["opened"]
# branches:
# - 'main'
# paths:
# - 'llama_stack/**/*.py'
# - 'tests/**/*.py'
workflow_dispatch:
inputs:
runner:
description: 'GHA Runner Scale Set label to run workflow on.'
required: true
default: "llama-stack-gha-runner-gpu"
checkout_reference:
description: "The branch, tag, or SHA to checkout"
required: true
default: "main"
debug:
description: 'Run debugging steps?'
required: false
default: "true"
sleep_time:
description: '[DEBUG] sleep time for debugging'
required: true
default: "0"
provider_id:
description: 'ID of your provider'
required: true
default: "meta_reference"
model_id:
description: 'Shorthand name for target model ID (llama_3b or llama_8b)'
required: true
default: "llama_3b"
model_override_3b:
description: 'Specify shorthand model for <llama_3b> '
required: false
default: "Llama3.2-3B-Instruct"
model_override_8b:
description: 'Specify shorthand model for <llama_8b> '
required: false
default: "Llama3.1-8B-Instruct"
env:
# ID used for each test's provider config
PROVIDER_ID: "${{ inputs.provider_id || 'meta_reference' }}"
# Path to model checkpoints within EFS volume
MODEL_CHECKPOINT_DIR: "/data/llama"
# Path to directory to run tests from
TESTS_PATH: "${{ github.workspace }}/llama_stack/providers/tests"
# Keep track of a list of model IDs that are valid to use within pytest fixture marks
AVAILABLE_MODEL_IDs: "llama_3b llama_8b"
# Shorthand name for model ID, used in pytest fixture marks
MODEL_ID: "${{ inputs.model_id || 'llama_3b' }}"
# Override the `llama_3b` / `llama_8b' models, else use the default.
LLAMA_3B_OVERRIDE: "${{ inputs.model_override_3b || 'Llama3.2-3B-Instruct' }}"
LLAMA_8B_OVERRIDE: "${{ inputs.model_override_8b || 'Llama3.1-8B-Instruct' }}"
# Defines which directories in TESTS_PATH to exclude from the test loop
EXCLUDED_DIRS: "__pycache__"
# Defines the output xml reports generated after a test is run
REPORTS_GEN: ""
jobs:
execute_workflow:
name: Execute workload on Self-Hosted GPU k8s runner
permissions:
pull-requests: write
defaults:
run:
shell: bash
runs-on: ${{ inputs.runner != '' && inputs.runner || 'llama-stack-gha-runner-gpu' }}
if: always()
steps:
##############################
#### INITIAL DEBUG CHECKS ####
##############################
- name: "[DEBUG] Check content of the EFS mount"
id: debug_efs_volume
continue-on-error: true
if: inputs.debug == 'true'
run: |
echo "========= Content of the EFS mount ============="
ls -la ${{ env.MODEL_CHECKPOINT_DIR }}
- name: "[DEBUG] Get runner container OS information"
id: debug_os_info
if: ${{ inputs.debug == 'true' }}
run: |
cat /etc/os-release
- name: "[DEBUG] Print environment variables"
id: debug_env_vars
if: ${{ inputs.debug == 'true' }}
run: |
echo "PROVIDER_ID = ${PROVIDER_ID}"
echo "MODEL_CHECKPOINT_DIR = ${MODEL_CHECKPOINT_DIR}"
echo "AVAILABLE_MODEL_IDs = ${AVAILABLE_MODEL_IDs}"
echo "MODEL_ID = ${MODEL_ID}"
echo "LLAMA_3B_OVERRIDE = ${LLAMA_3B_OVERRIDE}"
echo "LLAMA_8B_OVERRIDE = ${LLAMA_8B_OVERRIDE}"
echo "EXCLUDED_DIRS = ${EXCLUDED_DIRS}"
echo "REPORTS_GEN = ${REPORTS_GEN}"
############################
#### MODEL INPUT CHECKS ####
############################
- name: "Check if env.model_id is valid"
id: check_model_id
run: |
if [[ " ${AVAILABLE_MODEL_IDs[@]} " =~ " ${MODEL_ID} " ]]; then
echo "Model ID '${MODEL_ID}' is valid."
else
echo "Model ID '${MODEL_ID}' is invalid. Terminating workflow."
exit 1
fi
#######################
#### CODE CHECKOUT ####
#######################
- name: "Checkout 'meta-llama/llama-stack' repository"
id: checkout_repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.branch }}
- name: "[DEBUG] Content of the repository after checkout"
id: debug_content_after_checkout
if: ${{ inputs.debug == 'true' }}
run: |
ls -la ${GITHUB_WORKSPACE}
##########################################################
#### OPTIONAL SLEEP DEBUG ####
# #
# Use to "exec" into the test k8s POD and run tests #
# manually to identify what dependencies are being used. #
# #
##########################################################
- name: "[DEBUG] sleep"
id: debug_sleep
if: ${{ inputs.debug == 'true' && inputs.sleep_time != '' }}
run: |
sleep ${{ inputs.sleep_time }}
############################
#### UPDATE SYSTEM PATH ####
############################
- name: "Update path: execute"
id: path_update_exec
run: |
# .local/bin is needed for certain libraries installed below to be recognized
# when calling their executable to install sub-dependencies
mkdir -p ${HOME}/.local/bin
echo "${HOME}/.local/bin" >> "$GITHUB_PATH"
#####################################
#### UPDATE CHECKPOINT DIRECTORY ####
#####################################
- name: "Update checkpoint directory"
id: checkpoint_update
run: |
echo "Checkpoint directory: ${MODEL_CHECKPOINT_DIR}/$LLAMA_3B_OVERRIDE"
if [ "${MODEL_ID}" = "llama_3b" ] && [ -d "${MODEL_CHECKPOINT_DIR}/${LLAMA_3B_OVERRIDE}" ]; then
echo "MODEL_CHECKPOINT_DIR=${MODEL_CHECKPOINT_DIR}/${LLAMA_3B_OVERRIDE}" >> "$GITHUB_ENV"
elif [ "${MODEL_ID}" = "llama_8b" ] && [ -d "${MODEL_CHECKPOINT_DIR}/${LLAMA_8B_OVERRIDE}" ]; then
echo "MODEL_CHECKPOINT_DIR=${MODEL_CHECKPOINT_DIR}/${LLAMA_8B_OVERRIDE}" >> "$GITHUB_ENV"
else
echo "MODEL_ID & LLAMA_*B_OVERRIDE are not a valid pairing. Terminating workflow."
exit 1
fi
- name: "[DEBUG] Checkpoint update check"
id: debug_checkpoint_update
if: ${{ inputs.debug == 'true' }}
run: |
echo "MODEL_CHECKPOINT_DIR (after update) = ${MODEL_CHECKPOINT_DIR}"
##################################
#### DEPENDENCY INSTALLATIONS ####
##################################
- name: "Installing 'apt' required packages"
id: install_apt
run: |
echo "[STEP] Installing 'apt' required packages"
sudo apt update -y
sudo apt install -y python3 python3-pip npm wget
- name: "Installing packages with 'curl'"
id: install_curl
run: |
curl -fsSL https://ollama.com/install.sh | sh
- name: "Installing packages with 'wget'"
id: install_wget
run: |
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b install -c pytorch -c nvidia faiss-gpu=1.9.0
# Add miniconda3 bin to system path
echo "${HOME}/miniconda3/bin" >> "$GITHUB_PATH"
- name: "Installing packages with 'npm'"
id: install_npm_generic
run: |
sudo npm install -g junit-merge
- name: "Installing pip dependencies"
id: install_pip_generic
run: |
echo "[STEP] Installing 'llama-stack' models"
pip install -U pip setuptools
pip install -r requirements.txt
pip install -e .
pip install -U \
torch torchvision \
pytest pytest_asyncio \
fairscale lm-format-enforcer \
zmq chardet pypdf \
pandas sentence_transformers together \
aiosqlite
- name: "Installing packages with conda"
id: install_conda_generic
run: |
conda install -q -c pytorch -c nvidia faiss-gpu=1.9.0
#############################################################
#### TESTING TO BE DONE FOR BOTH PRS AND MANUAL DISPATCH ####
#############################################################
- name: "Run Tests: Loop"
id: run_tests_loop
working-directory: "${{ github.workspace }}"
run: |
pattern=""
for dir in llama_stack/providers/tests/*; do
if [ -d "$dir" ]; then
dir_name=$(basename "$dir")
if [[ ! " $EXCLUDED_DIRS " =~ " $dir_name " ]]; then
for file in "$dir"/test_*.py; do
test_name=$(basename "$file")
new_file="result-${dir_name}-${test_name}.xml"
if torchrun $(which pytest) -s -v ${TESTS_PATH}/${dir_name}/${test_name} -m "${PROVIDER_ID} and ${MODEL_ID}" \
--junitxml="${{ github.workspace }}/${new_file}"; then
echo "Ran test: ${test_name}"
else
echo "Did NOT run test: ${test_name}"
fi
pattern+="${new_file} "
done
fi
fi
done
echo "REPORTS_GEN=$pattern" >> "$GITHUB_ENV"
- name: "Test Summary: Merge"
id: test_summary_merge
working-directory: "${{ github.workspace }}"
run: |
echo "Merging the following test result files: ${REPORTS_GEN}"
# Defaults to merging them into 'merged-test-results.xml'
junit-merge ${{ env.REPORTS_GEN }}
############################################
#### AUTOMATIC TESTING ON PULL REQUESTS ####
############################################
#### Run tests ####
- name: "PR - Run Tests"
id: pr_run_tests
working-directory: "${{ github.workspace }}"
if: github.event_name == 'pull_request_target'
run: |
echo "[STEP] Running PyTest tests at 'GITHUB_WORKSPACE' path: ${GITHUB_WORKSPACE} | path: ${{ github.workspace }}"
# (Optional) Add more tests here.
# Merge test results with 'merged-test-results.xml' from above.
# junit-merge <new-test-results> merged-test-results.xml
#### Create test summary ####
- name: "PR - Test Summary"
id: pr_test_summary_create
if: github.event_name == 'pull_request_target'
uses: test-summary/action@31493c76ec9e7aa675f1585d3ed6f1da69269a86 # v2.4
with:
paths: "${{ github.workspace }}/merged-test-results.xml"
output: test-summary.md
- name: "PR - Upload Test Summary"
id: pr_test_summary_upload
if: github.event_name == 'pull_request_target'
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: test-summary
path: test-summary.md
#### Update PR request ####
- name: "PR - Update comment"
id: pr_update_comment
if: github.event_name == 'pull_request_target'
uses: thollander/actions-comment-pull-request@24bffb9b452ba05a4f3f77933840a6a841d1b32b # v3.0.1
with:
filePath: test-summary.md
########################
#### MANUAL TESTING ####
########################
#### Run tests ####
- name: "Manual - Run Tests: Prep"
id: manual_run_tests
working-directory: "${{ github.workspace }}"
if: github.event_name == 'workflow_dispatch'
run: |
echo "[STEP] Running PyTest tests at 'GITHUB_WORKSPACE' path: ${{ github.workspace }}"
#TODO Use this when collection errors are resolved
# pytest -s -v -m "${PROVIDER_ID} and ${MODEL_ID}" --junitxml="${{ github.workspace }}/merged-test-results.xml"
# (Optional) Add more tests here.
# Merge test results with 'merged-test-results.xml' from above.
# junit-merge <new-test-results> merged-test-results.xml
#### Create test summary ####
- name: "Manual - Test Summary"
id: manual_test_summary
if: always() && github.event_name == 'workflow_dispatch'
uses: test-summary/action@31493c76ec9e7aa675f1585d3ed6f1da69269a86 # v2.4
with:
paths: "${{ github.workspace }}/merged-test-results.xml"

View file

@ -1,12 +1,14 @@
name: Installer CI name: Installer CI
run-name: Test the installation script
on: on:
pull_request: pull_request:
paths: paths:
- 'install.sh' - 'scripts/install.sh'
push: push:
paths: paths:
- 'install.sh' - 'scripts/install.sh'
schedule: schedule:
- cron: '0 2 * * *' # every day at 02:00 UTC - cron: '0 2 * * *' # every day at 02:00 UTC
@ -14,13 +16,33 @@ jobs:
lint: lint:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 4.2.2 - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # 6.0.1
- name: Run ShellCheck on install.sh - name: Run ShellCheck on install.sh
run: shellcheck install.sh run: shellcheck scripts/install.sh
smoke-test: smoke-test-on-dev:
needs: lint
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 4.2.2 - name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Build a single provider
run: |
BUILD_ARGS="--build-arg INSTALL_MODE=editable --build-arg DISTRO_NAME=starter"
if [ -n "${UV_EXTRA_INDEX_URL:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_EXTRA_INDEX_URL=$UV_EXTRA_INDEX_URL"
fi
if [ -n "${UV_INDEX_STRATEGY:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_INDEX_STRATEGY=$UV_INDEX_STRATEGY"
fi
docker build . \
-f containers/Containerfile \
$BUILD_ARGS \
--tag llama-stack:starter-ci
- name: Run installer end-to-end - name: Run installer end-to-end
run: ./install.sh run: |
IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)
./scripts/install.sh --image $IMAGE_ID

View file

@ -1,13 +1,20 @@
name: Integration Auth Tests name: Integration Auth Tests
run-name: Run the integration test suite with Kubernetes authentication
on: on:
push: push:
branches: [ main ] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
pull_request: pull_request:
branches: [ main ] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
paths: paths:
- 'distributions/**' - 'distributions/**'
- 'llama_stack/**' - 'src/llama_stack/**'
- '!src/llama_stack_ui/**'
- 'tests/integration/**' - 'tests/integration/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
@ -15,7 +22,7 @@ on:
- '.github/workflows/integration-auth-tests.yml' # This workflow - '.github/workflows/integration-auth-tests.yml' # This workflow
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
@ -28,18 +35,14 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Build Llama Stack
run: |
llama stack build --template ollama --image-type venv
- name: Install minikube - name: Install minikube
if: ${{ matrix.auth-provider == 'kubernetes' }} if: ${{ matrix.auth-provider == 'kubernetes' }}
uses: medyagh/setup-minikube@cea33675329b799adccc9526aa5daccc26cd5052 # v0.0.19 uses: medyagh/setup-minikube@e9e035a86bbc3caea26a450bd4dbf9d0c453682e # v0.0.21
- name: Start minikube - name: Start minikube
if: ${{ matrix.auth-provider == 'oauth2_token' }} if: ${{ matrix.auth-provider == 'oauth2_token' }}
@ -69,26 +72,53 @@ jobs:
if: ${{ matrix.auth-provider == 'oauth2_token' }} if: ${{ matrix.auth-provider == 'oauth2_token' }}
run: | run: |
run_dir=$(mktemp -d) run_dir=$(mktemp -d)
cat <<'EOF' > $run_dir/run.yaml cat <<EOF > $run_dir/config.yaml
version: '2' version: '2'
image_name: kube image_name: kube
apis: [] apis: []
providers: {} providers: {}
storage:
backends:
kv_default:
type: kv_sqlite
db_path: $run_dir/kvstore.db
sql_default:
type: sql_sqlite
db_path: $run_dir/sql_store.db
stores:
metadata:
namespace: registry
backend: kv_default
inference:
table_name: inference_store
backend: sql_default
conversations:
table_name: openai_conversations
backend: sql_default
prompts:
namespace: prompts
backend: kv_default
server: server:
port: 8321 port: 8321
EOF EOF
yq eval '.server.auth = {"provider_type": "${{ matrix.auth-provider }}"}' -i $run_dir/run.yaml yq eval '.server.auth.provider_config.type = "${{ matrix.auth-provider }}"' -i $run_dir/config.yaml
yq eval '.server.auth.config = {"tls_cafile": "${{ env.KUBERNETES_CA_CERT_PATH }}", "issuer": "${{ env.KUBERNETES_ISSUER }}", "audience": "${{ env.KUBERNETES_AUDIENCE }}"}' -i $run_dir/run.yaml yq eval '.server.auth.provider_config.tls_cafile = "${{ env.KUBERNETES_CA_CERT_PATH }}"' -i $run_dir/config.yaml
yq eval '.server.auth.config.jwks = {"uri": "${{ env.KUBERNETES_API_SERVER_URL }}", "token": "${{ env.TOKEN }}"}' -i $run_dir/run.yaml yq eval '.server.auth.provider_config.issuer = "${{ env.KUBERNETES_ISSUER }}"' -i $run_dir/config.yaml
cat $run_dir/run.yaml yq eval '.server.auth.provider_config.audience = "${{ env.KUBERNETES_AUDIENCE }}"' -i $run_dir/config.yaml
yq eval '.server.auth.provider_config.jwks.uri = "${{ env.KUBERNETES_API_SERVER_URL }}"' -i $run_dir/config.yaml
yq eval '.server.auth.provider_config.jwks.token = "${{ env.TOKEN }}"' -i $run_dir/config.yaml
cat $run_dir/config.yaml
nohup uv run llama stack run $run_dir/run.yaml --image-type venv > server.log 2>&1 & # avoid line breaks in the server log, especially because we grep it below.
export LLAMA_STACK_LOG_WIDTH=200
nohup uv run llama stack run $run_dir/config.yaml > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready - name: Wait for Llama Stack server to be ready
run: | run: |
echo "Waiting for Llama Stack server..." echo "Waiting for Llama Stack server..."
for i in {1..30}; do for i in {1..30}; do
if curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://localhost:8321/v1/health | grep -q "OK"; then # Note: /v1/health does not require authentication
if curl -s -L http://localhost:8321/v1/health | grep -q "OK"; then
echo "Llama Stack server is up!" echo "Llama Stack server is up!"
if grep -q "Enabling authentication with provider: ${{ matrix.auth-provider }}" server.log; then if grep -q "Enabling authentication with provider: ${{ matrix.auth-provider }}" server.log; then
echo "Llama Stack server is configured to use ${{ matrix.auth-provider }} auth" echo "Llama Stack server is configured to use ${{ matrix.auth-provider }} auth"
@ -107,4 +137,40 @@ jobs:
- name: Test auth - name: Test auth
run: | run: |
curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://127.0.0.1:8321/v1/providers|jq # Function to test API endpoint with authentication
# Usage: test_endpoint <curl_args> <user_token_file> <expected_status> [output_file]
test_endpoint() {
local curl_args="$1"
local user_token_file=$2
local expected_status=$3
local output_file=${4:-/dev/null}
local status
local extra_curl_args=(-s -L -o "$output_file" -w "%{http_code}")
if [ "$user_token_file" != "none" ]; then
extra_curl_args+=(-H "Authorization: Bearer $(cat $user_token_file)")
fi
set -x
status=$(curl $curl_args "${extra_curl_args[@]}")
set +x
if [ "$status" = "$expected_status" ]; then
echo " ✓ Status: $status (expected $expected_status)"
return 0
else
echo " ✗ Status: $status (expected $expected_status)"
exit 1
fi
}
echo "Testing /v1/version without token (should succeed)..."
test_endpoint "http://127.0.0.1:8321/v1/version" "none" "200" || exit 1
echo "Testing /v1/providers without token (should fail with 401)..."
test_endpoint "http://127.0.0.1:8321/v1/providers" "none" "401" || exit 1
echo "Testing /v1/providers with valid token (should succeed)..."
test_endpoint "http://127.0.0.1:8321/v1/providers" "llama-stack-auth-token" "200" "providers.json" || exit 1
cat providers.json | jq . > /dev/null && echo " ✓ Valid JSON response"

View file

@ -0,0 +1,76 @@
name: SqlStore Integration Tests
run-name: Run the integration test suite with SqlStore
on:
push:
branches:
- main
- 'release-[0-9]+.[0-9]+.x'
pull_request:
branches:
- main
- 'release-[0-9]+.[0-9]+.x'
paths:
- 'src/llama_stack/providers/utils/sqlstore/**'
- 'tests/integration/sqlstore/**'
- 'uv.lock'
- 'pyproject.toml'
- 'requirements.txt'
- '.github/workflows/integration-sql-store-tests.yml' # This workflow
concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true
jobs:
test-postgres:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12", "3.13"]
fail-fast: false
services:
postgres:
image: postgres:15
env:
POSTGRES_USER: llamastack
POSTGRES_PASSWORD: llamastack
POSTGRES_DB: llamastack
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
with:
python-version: ${{ matrix.python-version }}
- name: Run SqlStore Integration Tests
env:
ENABLE_POSTGRES_TESTS: "true"
POSTGRES_HOST: localhost
POSTGRES_PORT: 5432
POSTGRES_DB: llamastack
POSTGRES_USER: llamastack
POSTGRES_PASSWORD: llamastack
run: |
uv run pytest -sv tests/integration/providers/utils/sqlstore/
- name: Upload test logs
if: ${{ always() }}
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: postgres-test-logs-${{ github.run_id }}-${{ github.run_attempt }}-${{ matrix.python-version }}
path: |
*.log
retention-days: 1

View file

@ -1,120 +1,163 @@
name: Integration Tests name: Integration Tests (Replay)
run-name: Run the integration test suites from tests/integration in replay mode
on: on:
push: push:
branches: [ main ] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
pull_request: pull_request:
branches: [ main ] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
types: [opened, synchronize, reopened]
paths: paths:
- 'llama_stack/**' - 'src/llama_stack/**'
- 'tests/integration/**' - '!src/llama_stack_ui/**'
- 'tests/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
- 'requirements.txt'
- '.github/workflows/integration-tests.yml' # This workflow - '.github/workflows/integration-tests.yml' # This workflow
- '.github/actions/setup-ollama/action.yml'
- '.github/actions/setup-test-environment/action.yml'
- '.github/actions/run-and-record-tests/action.yml'
- 'scripts/integration-tests.sh'
- 'scripts/generate_ci_matrix.py'
schedule:
# If changing the cron schedule, update the provider in the test-matrix job
- cron: '0 0 * * *' # (test latest client) Daily at 12 AM UTC
workflow_dispatch:
inputs:
test-all-client-versions:
description: 'Test against both the latest and published versions'
type: boolean
default: false
test-setup:
description: 'Test against a specific setup'
type: string
default: 'ollama'
workflow_call:
inputs:
sdk_install_url:
required: false
type: string
description: 'URL to install Python SDK from (for testing preview builds)'
matrix_key:
required: false
type: string
default: 'default'
description: 'Matrix configuration key from ci_matrix.json (e.g., "default", "stainless")'
pr_head_sha:
required: false
type: string
description: 'The SHA of the pull request head to checkout'
pr_head_ref:
required: false
type: string
description: 'The branch name of the pull request head (for recording commits)'
is_fork_pr:
required: false
type: boolean
default: false
description: 'Whether this is a fork PR (cannot push recordings to forks)'
test-all-client-versions:
required: false
type: boolean
default: false
description: 'Test against both the latest and published versions'
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} # Skip concurrency for pushes to main - each commit should be tested independently
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
test-matrix: generate-matrix:
runs-on: ubuntu-latest runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: ${{ inputs.pr_head_sha || github.event.pull_request.head.sha || github.sha }}
- name: Generate test matrix
id: set-matrix
run: |
# Generate matrix from CI_MATRIX in tests/integration/ci_matrix.json
# Supports schedule-based, manual input, and workflow_call overrides
MATRIX=$(PYTHONPATH=. python3 scripts/generate_ci_matrix.py \
--schedule "${{ github.event.schedule }}" \
--test-setup "${{ github.event.inputs.test-setup || '' }}" \
--matrix-key "${{ inputs.matrix_key || 'default' }}")
echo "matrix=$MATRIX" >> $GITHUB_OUTPUT
echo "Generated matrix: $MATRIX"
run-replay-mode-tests:
needs: generate-matrix
runs-on: ubuntu-latest
name: ${{ format('Integration Tests ({0}, {1}, {2}, client={3}, {4})', matrix.client, matrix.config.setup, matrix.python-version, matrix.client-version, matrix.config.suite) }}
strategy: strategy:
fail-fast: false
matrix: matrix:
# Listing tests manually since some of them currently fail client: [library, docker, server]
# TODO: generate matrix list from tests/integration when fixed # Use Python 3.13 only on nightly schedule (daily latest client test), otherwise use 3.12
test-type: [agents, inference, datasets, inspect, scoring, post_training, providers, tool_runtime, vector_io] python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}
client-type: [library, http] node-version: [22]
python-version: ["3.10", "3.11", "3.12"] client-version: ${{ (github.event.schedule == '0 0 * * *' || github.event.inputs.test-all-client-versions == 'true' || inputs.test-all-client-versions == true) && fromJSON('["published", "latest"]') || fromJSON('["latest"]') }}
fail-fast: false # we want to run all tests regardless of failure # Test configurations: Generated from CI_MATRIX in tests/integration/ci_matrix.json
# See scripts/generate_ci_matrix.py for generation logic
config: ${{ fromJSON(needs.generate-matrix.outputs.matrix).include }}
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: ${{ inputs.pr_head_sha || github.event.pull_request.head.sha || github.sha }}
- name: Install dependencies - name: Setup test environment
uses: ./.github/actions/setup-runner if: ${{ matrix.config.allowed_clients == null || contains(matrix.config.allowed_clients, matrix.client) }}
uses: ./.github/actions/setup-test-environment
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
client-version: ${{ matrix.client-version }}
sdk_install_url: ${{ inputs.sdk_install_url || '' }}
setup: ${{ matrix.config.setup }}
suite: ${{ matrix.config.suite }}
inference-mode: ${{ matrix.config.inference_mode || 'replay' }}
- name: Setup ollama - name: Setup Node.js for TypeScript client tests
uses: ./.github/actions/setup-ollama if: ${{ matrix.client == 'server' }}
uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0
- name: Build Llama Stack
run: |
uv run llama stack build --template ollama --image-type venv
- name: Start Llama Stack server in background
if: matrix.client-type == 'http'
env:
INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"
run: |
LLAMA_STACK_LOG_FILE=server.log nohup uv run llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434" &
- name: Wait for Llama Stack server to be ready
if: matrix.client-type == 'http'
run: |
echo "Waiting for Llama Stack server..."
for i in {1..30}; do
if curl -s http://localhost:8321/v1/health | grep -q "OK"; then
echo "Llama Stack server is up!"
exit 0
fi
sleep 1
done
echo "Llama Stack server failed to start"
cat server.log
exit 1
- name: Verify Ollama status is OK
if: matrix.client-type == 'http'
run: |
echo "Verifying Ollama status..."
ollama_status=$(curl -s -L http://127.0.0.1:8321/v1/providers/ollama|jq --raw-output .health.status)
echo "Ollama status: $ollama_status"
if [ "$ollama_status" != "OK" ]; then
echo "Ollama health check failed"
exit 1
fi
- name: Check Storage and Memory Available Before Tests
if: ${{ always() }}
run: |
free -h
df -h
- name: Run Integration Tests
env:
INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"
OLLAMA_URL: "http://0.0.0.0:11434"
run: |
if [ "${{ matrix.client-type }}" == "library" ]; then
stack_config="ollama"
else
stack_config="http://localhost:8321"
fi
uv run pytest -s -v tests/integration/${{ matrix.test-type }} --stack-config=${stack_config} \
-k "not(builtin_tool or safety_with_image or code_interpreter or test_rag)" \
--text-model="meta-llama/Llama-3.2-3B-Instruct" \
--embedding-model=all-MiniLM-L6-v2
- name: Check Storage and Memory Available After Tests
if: ${{ always() }}
run: |
free -h
df -h
- name: Write ollama logs to file
if: ${{ always() }}
run: |
sudo docker logs ollama > ollama.log
- name: Upload all logs to artifacts
if: ${{ always() }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with: with:
name: logs-${{ github.run_id }}-${{ github.run_attempt }}-${{ matrix.client-type }}-${{ matrix.test-type }}-${{ matrix.python-version }} node-version: ${{matrix.node-version}}
path: | cache: 'npm'
*.log cache-dependency-path: tests/integration/client-typescript/package-lock.json
retention-days: 1
- name: Setup TypeScript client
if: ${{ matrix.client == 'server' }}
id: setup-ts-client
uses: ./.github/actions/setup-typescript-client
with:
client-version: ${{ matrix.client-version }}
- name: Run tests
if: ${{ matrix.config.allowed_clients == null || contains(matrix.config.allowed_clients, matrix.client) }}
uses: ./.github/actions/run-and-record-tests
env:
OPENAI_API_KEY: dummy
TS_CLIENT_PATH: ${{ steps.setup-ts-client.outputs.ts-client-path || '' }}
with:
stack-config: >-
${{ matrix.config.stack_config
|| (matrix.client == 'library' && 'ci-tests')
|| (matrix.client == 'server' && 'server:ci-tests')
|| 'docker:ci-tests' }}
setup: ${{ matrix.config.setup }}
inference-mode: ${{ matrix.config.inference_mode || 'replay' }}
suite: ${{ matrix.config.suite }}
target-branch: ${{ inputs.pr_head_ref || '' }}
is-fork-pr: ${{ inputs.is_fork_pr && 'true' || (github.event.pull_request.head.repo.full_name != github.repository && 'true' || 'false') }}

View file

@ -0,0 +1,206 @@
name: Vector IO Integration Tests
run-name: Run the integration test suite with various VectorIO providers
on:
push:
branches:
- main
- 'release-[0-9]+.[0-9]+.x'
pull_request:
branches:
- main
- 'release-[0-9]+.[0-9]+.x'
paths:
- 'src/llama_stack/**'
- '!src/llama_stack_ui/**'
- 'tests/integration/vector_io/**'
- 'uv.lock'
- 'pyproject.toml'
- 'requirements.txt'
- '.github/workflows/integration-vector-io-tests.yml' # This workflow
schedule:
- cron: '0 0 * * *' # (test on python 3.13) Daily at 12 AM UTC
concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true
jobs:
test-matrix:
runs-on: ubuntu-latest
strategy:
matrix:
vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector", "remote::weaviate", "remote::qdrant"]
python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}
fail-fast: false # we want to run all tests regardless of failure
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
with:
python-version: ${{ matrix.python-version }}
- name: Setup Chroma
if: matrix.vector-io-provider == 'remote::chromadb'
run: |
docker run --rm -d --pull always \
--name chromadb \
-p 8000:8000 \
-v ~/chroma:/chroma/chroma \
-e IS_PERSISTENT=TRUE \
-e ANONYMIZED_TELEMETRY=FALSE \
chromadb/chroma:latest
- name: Setup Weaviate
if: matrix.vector-io-provider == 'remote::weaviate'
run: |
docker run --rm -d --pull always \
--name weaviate \
-p 8080:8080 -p 50051:50051 \
cr.weaviate.io/semitechnologies/weaviate:1.32.0
- name: Start PGVector DB
if: matrix.vector-io-provider == 'remote::pgvector'
run: |
docker run -d \
--name pgvector \
-e POSTGRES_USER=llamastack \
-e POSTGRES_PASSWORD=llamastack \
-e POSTGRES_DB=llamastack \
-p 5432:5432 \
pgvector/pgvector:pg17
- name: Wait for PGVector to be ready
if: matrix.vector-io-provider == 'remote::pgvector'
run: |
echo "Waiting for Postgres to be ready..."
for i in {1..30}; do
if docker exec pgvector pg_isready -U llamastack > /dev/null 2>&1; then
echo "Postgres is ready!"
break
fi
echo "Not ready yet... ($i)"
sleep 1
done
- name: Enable pgvector extension
if: matrix.vector-io-provider == 'remote::pgvector'
run: |
PGPASSWORD=llamastack psql -h localhost -U llamastack -d llamastack \
-c "CREATE EXTENSION IF NOT EXISTS vector;"
- name: Setup Qdrant
if: matrix.vector-io-provider == 'remote::qdrant'
run: |
docker run --rm -d --pull always \
--name qdrant \
-p 6333:6333 \
qdrant/qdrant
- name: Wait for Qdrant to be ready
if: matrix.vector-io-provider == 'remote::qdrant'
run: |
echo "Waiting for Qdrant to be ready..."
for i in {1..30}; do
if curl -s http://localhost:6333/collections | grep -q '"status":"ok"'; then
echo "Qdrant is ready!"
exit 0
fi
sleep 2
done
echo "Qdrant failed to start"
docker logs qdrant
exit 1
- name: Wait for ChromaDB to be ready
if: matrix.vector-io-provider == 'remote::chromadb'
run: |
echo "Waiting for ChromaDB to be ready..."
for i in {1..30}; do
if curl -s http://localhost:8000/api/v2/heartbeat | grep -q "nanosecond heartbeat"; then
echo "ChromaDB is ready!"
exit 0
fi
sleep 2
done
echo "ChromaDB failed to start"
docker logs chromadb
exit 1
- name: Wait for Weaviate to be ready
if: matrix.vector-io-provider == 'remote::weaviate'
run: |
echo "Waiting for Weaviate to be ready..."
for i in {1..30}; do
if curl -s http://localhost:8080 | grep -q "https://weaviate.io/developers/weaviate/current/"; then
echo "Weaviate is ready!"
exit 0
fi
sleep 2
done
echo "Weaviate failed to start"
docker logs weaviate
exit 1
- name: Build Llama Stack
run: |
uv run --no-sync llama stack list-deps ci-tests | xargs -L1 uv pip install
- name: Check Storage and Memory Available Before Tests
if: ${{ always() }}
run: |
free -h
df -h
- name: Run Vector IO Integration Tests
env:
ENABLE_CHROMADB: ${{ matrix.vector-io-provider == 'remote::chromadb' && 'true' || '' }}
CHROMADB_URL: ${{ matrix.vector-io-provider == 'remote::chromadb' && 'http://localhost:8000' || '' }}
ENABLE_PGVECTOR: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'true' || '' }}
PGVECTOR_HOST: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'localhost' || '' }}
PGVECTOR_PORT: ${{ matrix.vector-io-provider == 'remote::pgvector' && '5432' || '' }}
PGVECTOR_DB: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}
PGVECTOR_USER: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}
PGVECTOR_PASSWORD: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}
ENABLE_QDRANT: ${{ matrix.vector-io-provider == 'remote::qdrant' && 'true' || '' }}
QDRANT_URL: ${{ matrix.vector-io-provider == 'remote::qdrant' && 'http://localhost:6333' || '' }}
ENABLE_WEAVIATE: ${{ matrix.vector-io-provider == 'remote::weaviate' && 'true' || '' }}
WEAVIATE_CLUSTER_URL: ${{ matrix.vector-io-provider == 'remote::weaviate' && 'localhost:8080' || '' }}
run: |
uv run --no-sync \
pytest -sv --stack-config="files=inline::localfs,inference=inline::sentence-transformers,vector_io=${{ matrix.vector-io-provider }}" \
tests/integration/vector_io
- name: Check Storage and Memory Available After Tests
if: ${{ always() }}
run: |
free -h
df -h
- name: Create sanitized provider name
if: ${{ always() }}
run: |
echo "SANITIZED_PROVIDER=$(echo "${{ matrix.vector-io-provider }}" | tr ':' '_')" >> $GITHUB_ENV
- name: Write ChromaDB logs to file
if: ${{ always() && matrix.vector-io-provider == 'remote::chromadb' }}
run: |
docker logs chromadb > chromadb.log
- name: Write Qdrant logs to file
if: ${{ always() && matrix.vector-io-provider == 'remote::qdrant' }}
run: |
docker logs qdrant > qdrant.log
- name: Upload all logs to artifacts
if: ${{ always() }}
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: vector-io-logs-${{ github.run_id }}-${{ github.run_attempt }}-${{ env.SANITIZED_PROVIDER }}-${{ matrix.python-version }}
path: |
*.log
retention-days: 1

View file

@ -1,45 +1,181 @@
name: Pre-commit name: Pre-commit
run-name: Run pre-commit checks
on: on:
pull_request: pull_request:
push: push:
branches: [main] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
pre-commit: pre-commit:
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy:
matrix:
node-version: [22]
permissions:
contents: write
pull-requests: write
steps: steps:
- name: Checkout code - name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
# For dependabot PRs, we need to checkout with a token that can push changes
token: ${{ github.actor == 'dependabot[bot]' && secrets.GITHUB_TOKEN || github.token }}
# Fetch full history for dependabot PRs to allow commits
fetch-depth: ${{ github.actor == 'dependabot[bot]' && 0 || 1 }}
- name: Set up Python - name: Set up Python
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
with: with:
python-version: '3.11' python-version: '3.12'
cache: pip cache: pip
cache-dependency-path: | cache-dependency-path: |
**/requirements*.txt **/requirements*.txt
.pre-commit-config.yaml .pre-commit-config.yaml
- uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1 - name: Set up Node.js
uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0
with:
node-version: ${{matrix.node-version}}
cache: 'npm'
cache-dependency-path: 'src/llama_stack_ui/'
- name: Set up uv
uses: astral-sh/setup-uv@681c641aba71e4a1c380be3ab5e12ad51f415867 # v7.1.6
- name: Install npm dependencies
run: npm ci
working-directory: src/llama_stack_ui
- name: Install pre-commit
run: python -m pip install 'pre-commit>=4.4.0'
- name: Cache pre-commit
uses: actions/cache@9255dc7a253b0ccc959486e2bca901246202afeb # v4
with:
path: ~/.cache/pre-commit
key: pre-commit-3|${{ env.pythonLocation }}|${{ hashFiles('.pre-commit-config.yaml') }}
- name: Run pre-commit
id: precommit
run: |
set +e
pre-commit run --show-diff-on-failure --color=always --all-files 2>&1 | tee /tmp/precommit.log
status=${PIPESTATUS[0]}
echo "status=$status" >> $GITHUB_OUTPUT
exit 0
env: env:
SKIP: no-commit-to-branch SKIP: no-commit-to-branch,mypy
RUFF_OUTPUT_FORMAT: github RUFF_OUTPUT_FORMAT: github
- name: Verify if there are any diff files after pre-commit - name: Check pre-commit results
if: steps.precommit.outputs.status != '0'
run: | run: |
git diff --exit-code || (echo "There are uncommitted changes, run pre-commit locally and commit again" && exit 1) echo "::error::Pre-commit hooks failed. Please run 'pre-commit run --all-files' locally and commit the fixes."
echo ""
echo "Failed hooks output:"
cat /tmp/precommit.log
exit 1
- name: Debug
run: |
echo "github.ref: ${{ github.ref }}"
echo "github.actor: ${{ github.actor }}"
- name: Commit changes for dependabot PRs
if: github.actor == 'dependabot[bot]'
run: |
if ! git diff --exit-code || [ -n "$(git ls-files --others --exclude-standard)" ]; then
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
# Ensure we're on the correct branch
git checkout -B ${{ github.head_ref }}
git add -A
git commit -m "Apply pre-commit fixes"
# Pull latest changes from the PR branch and rebase our commit on top
git pull --rebase origin ${{ github.head_ref }}
# Push to the PR branch
git push origin ${{ github.head_ref }}
echo "Pre-commit fixes committed and pushed"
else
echo "No changes to commit"
fi
- name: Verify no uncommitted changes
if: github.actor != 'dependabot[bot]'
run: |
if ! git diff --exit-code; then
echo "::error::There are uncommitted changes after pre-commit. Please run 'pre-commit run --all-files' locally and commit the fixes."
echo "::warning::Files with changes:"
git diff --name-status
exit 1
fi
- name: Verify if there are any new files after pre-commit - name: Verify if there are any new files after pre-commit
if: github.actor != 'dependabot[bot]'
run: | run: |
unstaged_files=$(git ls-files --others --exclude-standard) unstaged_files=$(git ls-files --others --exclude-standard)
if [ -n "$unstaged_files" ]; then if [ -n "$unstaged_files" ]; then
echo "There are uncommitted new files, run pre-commit locally and commit again" echo "::error::There are new untracked files after pre-commit. Please run 'pre-commit run --all-files' locally and commit the fixes."
echo "::warning::New files:"
echo "$unstaged_files" echo "$unstaged_files"
exit 1 exit 1
fi fi
- name: Configure client installation
id: client-config
uses: ./.github/actions/install-llama-stack-client
- name: Sync dev + type_checking dependencies
env:
UV_EXTRA_INDEX_URL: ${{ steps.client-config.outputs.uv-extra-index-url }}
run: |
if [ -n "$UV_EXTRA_INDEX_URL" ]; then
export UV_INDEX_STRATEGY="unsafe-best-match"
fi
uv sync --group dev --group type_checking
# Install specific client version after sync if needed
if [ "${{ steps.client-config.outputs.install-after-sync }}" = "true" ]; then
echo "Installing llama-stack-client from: ${{ steps.client-config.outputs.install-source }}"
uv pip install ${{ steps.client-config.outputs.install-source }}
fi
- name: Run mypy (full type_checking)
env:
UV_EXTRA_INDEX_URL: ${{ steps.client-config.outputs.uv-extra-index-url }}
run: |
if [ -n "$UV_EXTRA_INDEX_URL" ]; then
export UV_INDEX_STRATEGY="unsafe-best-match"
fi
set +e
uv run --group dev --group type_checking mypy
status=$?
if [ $status -ne 0 ]; then
echo "::error::Full mypy failed. Reproduce locally with 'uv run pre-commit run mypy-full --hook-stage manual --all-files'."
fi
exit $status
- name: Check if any unused recordings
run: |
set -e
PYTHONPATH=$PWD uv run ./scripts/cleanup_recordings.py --delete
changes=$(git status --short tests/integration | grep 'recordings' || true)
if [ -n "$changes" ]; then
echo "::error::Unused integration recordings detected. Run 'PYTHONPATH=$(pwd) uv run ./scripts/cleanup_recordings.py --delete' locally and commit the deletions."
echo "$changes"
exit 1
fi

View file

@ -1,69 +1,88 @@
name: Test Llama Stack Build name: Test Llama Stack Build
run-name: Test llama stack build
on: on:
push: push:
branches: branches:
- main - main
paths: paths:
- 'llama_stack/cli/stack/build.py' - 'src/llama_stack/cli/stack/build.py'
- 'llama_stack/cli/stack/_build.py' - 'src/llama_stack/cli/stack/_build.py'
- 'llama_stack/distribution/build.*' - 'src/llama_stack/core/build.*'
- 'llama_stack/distribution/*.sh' - 'src/llama_stack/core/*.sh'
- '.github/workflows/providers-build.yml' - '.github/workflows/providers-build.yml'
- 'llama_stack/templates/**' - 'src/llama_stack/distributions/**'
- 'pyproject.toml'
- 'containers/Containerfile'
- '.dockerignore'
pull_request: pull_request:
paths: paths:
- 'llama_stack/cli/stack/build.py' - 'src/llama_stack/cli/stack/build.py'
- 'llama_stack/cli/stack/_build.py' - 'src/llama_stack/cli/stack/_build.py'
- 'llama_stack/distribution/build.*' - 'src/llama_stack/core/build.*'
- 'llama_stack/distribution/*.sh' - 'src/llama_stack/core/*.sh'
- '.github/workflows/providers-build.yml' - '.github/workflows/providers-build.yml'
- 'llama_stack/templates/**' - 'src/llama_stack/distributions/**'
- 'pyproject.toml'
- 'containers/Containerfile'
- '.dockerignore'
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
generate-matrix: generate-matrix:
runs-on: ubuntu-latest runs-on: ubuntu-latest
outputs: outputs:
templates: ${{ steps.set-matrix.outputs.templates }} distros: ${{ steps.set-matrix.outputs.distros }}
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Generate Template List - name: Generate Distribution List
id: set-matrix id: set-matrix
run: | run: |
templates=$(ls llama_stack/templates/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]') distros=$(ls src/llama_stack/distributions/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]')
echo "templates=$templates" >> "$GITHUB_OUTPUT" echo "distros=$distros" >> "$GITHUB_OUTPUT"
build: build:
needs: generate-matrix needs: generate-matrix
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy: strategy:
matrix: matrix:
template: ${{ fromJson(needs.generate-matrix.outputs.templates) }} distro: ${{ fromJson(needs.generate-matrix.outputs.distros) }}
image-type: [venv, container] image-type: [venv, container]
fail-fast: false # We want to run all jobs even if some fail fail-fast: false # We want to run all jobs even if some fail
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Print build dependencies - name: Install distribution into venv
if: matrix.image-type == 'venv'
run: | run: |
uv run llama stack build --template ${{ matrix.template }} --image-type ${{ matrix.image-type }} --image-name test --print-deps-only uv run llama stack list-deps ${{ matrix.distro }} | xargs -L1 uv pip install
- name: Run Llama Stack Build - name: Build container image
if: matrix.image-type == 'container'
run: | run: |
# USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead BUILD_ARGS="--build-arg INSTALL_MODE=editable --build-arg DISTRO_NAME=${{ matrix.distro }}"
# LLAMA_STACK_DIR is set to the current directory so we are building from the source if [ -n "${UV_EXTRA_INDEX_URL:-}" ]; then
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --template ${{ matrix.template }} --image-type ${{ matrix.image-type }} --image-name test BUILD_ARGS="$BUILD_ARGS --build-arg UV_EXTRA_INDEX_URL=$UV_EXTRA_INDEX_URL"
fi
if [ -n "${UV_INDEX_STRATEGY:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_INDEX_STRATEGY=$UV_INDEX_STRATEGY"
fi
docker build . \
-f containers/Containerfile \
$BUILD_ARGS \
--tag llama-stack:${{ matrix.distro }}-ci
- name: Print dependencies in the image - name: Print dependencies in the image
if: matrix.image-type == 'venv' if: matrix.image-type == 'venv'
@ -74,36 +93,51 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Build a single provider - name: Build a single provider
run: | run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --image-type venv --image-name test --providers inference=remote::ollama uv pip install -e .
uv run --no-sync llama stack list-deps --providers inference=remote::ollama | xargs -L1 uv pip install
build-custom-container-distribution: build-custom-container-distribution:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Build a single provider - name: Build container image
run: | run: |
yq -i '.image_type = "container"' llama_stack/templates/starter/build.yaml BASE_IMAGE=$(yq -r '.distribution_spec.container_image // "python:3.12-slim"' src/llama_stack/distributions/ci-tests/config.yaml)
yq -i '.image_name = "test"' llama_stack/templates/starter/build.yaml BUILD_ARGS="--build-arg INSTALL_MODE=editable --build-arg DISTRO_NAME=ci-tests"
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config llama_stack/templates/starter/build.yaml BUILD_ARGS="$BUILD_ARGS --build-arg BASE_IMAGE=$BASE_IMAGE"
BUILD_ARGS="$BUILD_ARGS --build-arg RUN_CONFIG_PATH=/workspace/src/llama_stack/distributions/ci-tests/config.yaml"
if [ -n "${UV_EXTRA_INDEX_URL:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_EXTRA_INDEX_URL=$UV_EXTRA_INDEX_URL"
fi
if [ -n "${UV_INDEX_STRATEGY:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_INDEX_STRATEGY=$UV_INDEX_STRATEGY"
fi
docker build . \
-f containers/Containerfile \
$BUILD_ARGS \
-t llama-stack:ci-tests
- name: Inspect the container image entrypoint - name: Inspect the container image entrypoint
run: | run: |
IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1) IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)
if [ -z "$IMAGE_ID" ]; then
echo "No image found"
exit 1
fi
entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID) entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)
echo "Entrypoint: $entrypoint" echo "Entrypoint: $entrypoint"
if [ "$entrypoint" != "[python -m llama_stack.distribution.server.server --config /app/run.yaml]" ]; then if [ "$entrypoint" != "[/usr/local/bin/llama-stack-entrypoint.sh]" ]; then
echo "Entrypoint is not correct" echo "Entrypoint is not correct"
exit 1 exit 1
fi fi
@ -112,32 +146,44 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Pin template to UBI9 base - name: Pin distribution to UBI9 base
run: | run: |
yq -i ' yq -i '
.image_type = "container" |
.image_name = "ubi9-test" |
.distribution_spec.container_image = "registry.access.redhat.com/ubi9:latest" .distribution_spec.container_image = "registry.access.redhat.com/ubi9:latest"
' llama_stack/templates/starter/build.yaml ' src/llama_stack/distributions/ci-tests/config.yaml
- name: Build dev container (UBI9) - name: Build UBI9 container image
env:
USE_COPY_NOT_MOUNT: "true"
LLAMA_STACK_DIR: "."
run: | run: |
uv run llama stack build --config llama_stack/templates/starter/build.yaml BASE_IMAGE=$(yq -r '.distribution_spec.container_image // "registry.access.redhat.com/ubi9:latest"' src/llama_stack/distributions/ci-tests/config.yaml)
BUILD_ARGS="--build-arg INSTALL_MODE=editable --build-arg DISTRO_NAME=ci-tests"
BUILD_ARGS="$BUILD_ARGS --build-arg BASE_IMAGE=$BASE_IMAGE"
BUILD_ARGS="$BUILD_ARGS --build-arg RUN_CONFIG_PATH=/workspace/src/llama_stack/distributions/ci-tests/config.yaml"
if [ -n "${UV_EXTRA_INDEX_URL:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_EXTRA_INDEX_URL=$UV_EXTRA_INDEX_URL"
fi
if [ -n "${UV_INDEX_STRATEGY:-}" ]; then
BUILD_ARGS="$BUILD_ARGS --build-arg UV_INDEX_STRATEGY=$UV_INDEX_STRATEGY"
fi
docker build . \
-f containers/Containerfile \
$BUILD_ARGS \
-t llama-stack:ci-tests-ubi9
- name: Inspect UBI9 image - name: Inspect UBI9 image
run: | run: |
IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1) IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)
if [ -z "$IMAGE_ID" ]; then
echo "No image found"
exit 1
fi
entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID) entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)
echo "Entrypoint: $entrypoint" echo "Entrypoint: $entrypoint"
if [ "$entrypoint" != "[python -m llama_stack.distribution.server.server --config /app/run.yaml]" ]; then if [ "$entrypoint" != "[/usr/local/bin/llama-stack-entrypoint.sh]" ]; then
echo "Entrypoint is not correct" echo "Entrypoint is not correct"
exit 1 exit 1
fi fi

View file

@ -0,0 +1,105 @@
name: Test llama stack list-deps
run-name: Test llama stack list-deps
on:
push:
branches:
- main
paths:
- 'src/llama_stack/cli/stack/list_deps.py'
- 'src/llama_stack/cli/stack/_list_deps.py'
- 'src/llama_stack/core/build.*'
- 'src/llama_stack/core/*.sh'
- '.github/workflows/providers-list-deps.yml'
- 'src/llama_stack/templates/**'
- 'pyproject.toml'
pull_request:
paths:
- 'src/llama_stack/cli/stack/list_deps.py'
- 'src/llama_stack/cli/stack/_list_deps.py'
- 'src/llama_stack/core/build.*'
- 'src/llama_stack/core/*.sh'
- '.github/workflows/providers-list-deps.yml'
- 'src/llama_stack/templates/**'
- 'pyproject.toml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
generate-matrix:
runs-on: ubuntu-latest
outputs:
distros: ${{ steps.set-matrix.outputs.distros }}
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Generate Distribution List
id: set-matrix
run: |
distros=$(ls src/llama_stack/distributions/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]')
echo "distros=$distros" >> "$GITHUB_OUTPUT"
list-deps:
needs: generate-matrix
runs-on: ubuntu-latest
strategy:
matrix:
distro: ${{ fromJson(needs.generate-matrix.outputs.distros) }}
image-type: [venv, container]
fail-fast: false # We want to run all jobs even if some fail
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Print dependencies
run: |
uv run llama stack list-deps ${{ matrix.distro }}
- name: Install Distro using llama stack list-deps
run: |
# USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead
# LLAMA_STACK_DIR is set to the current directory so we are building from the source
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack list-deps ${{ matrix.distro }} | xargs -L1 uv pip install
- name: Print dependencies in the image
if: matrix.image-type == 'venv'
run: |
uv pip list
show-single-provider:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Show a single provider
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack list-deps --providers inference=remote::ollama
list-deps-from-config:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: list-des from Config
env:
USE_COPY_NOT_MOUNT: "true"
LLAMA_STACK_DIR: "."
run: |
uv run llama stack list-deps src/llama_stack/distributions/ci-tests/config.yaml

50
.github/workflows/python-build-test.yml vendored Normal file
View file

@ -0,0 +1,50 @@
name: Python Package Build Test
run-name: Test building the llama-stack PyPI project
on:
push:
branches:
- main
pull_request:
branches:
- main
paths-ignore:
- 'src/llama_stack_ui/**'
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.12', '3.13']
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install uv
uses: astral-sh/setup-uv@681c641aba71e4a1c380be3ab5e12ad51f415867 # v7.1.6
with:
python-version: ${{ matrix.python-version }}
activate-environment: true
- name: Build Llama Stack API package
working-directory: src/llama_stack_api
run: uv build
- name: Build Llama Stack package
run: uv build
- name: Install Llama Stack package (with api stubs from local build)
run: |
uv pip install --find-links src/llama_stack_api/dist dist/*.whl
- name: Verify Llama Stack package
run: |
uv pip list
uv pip show llama-stack
command -v llama
llama stack list-apis
llama stack list-providers inference
llama stack list-deps starter

View file

@ -0,0 +1,73 @@
# This workflow should be run manually when needing to re-record tests. This happens when you have
# - added a new test
# - or changed an existing test such that a new inference call is made
# You should make a PR and then run this workflow on that PR branch. The workflow will re-record the
# tests and commit the recordings to the PR branch.
name: Integration Tests (Record)
run-name: Run the integration test suite from tests/integration
on:
workflow_dispatch:
inputs:
test-setup:
description: 'Test against a specific setup'
type: string
default: 'ollama'
suite:
description: 'Test suite to use: base, responses, vision, etc.'
type: string
default: ''
subdirs:
description: 'Comma-separated list of test subdirectories to run; overrides suite'
type: string
default: ''
pattern:
description: 'Regex pattern to pass to pytest -k'
type: string
default: ''
jobs:
record-tests:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Echo workflow inputs
run: |
echo "::group::Workflow Inputs"
echo "branch: ${{ github.ref_name }}"
echo "test-setup: ${{ inputs.test-setup }}"
echo "suite: ${{ inputs.suite }}"
echo "subdirs: ${{ inputs.subdirs }}"
echo "pattern: ${{ inputs.pattern }}"
echo "::endgroup::"
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
- name: Setup test environment
uses: ./.github/actions/setup-test-environment
with:
python-version: "3.12" # Use single Python version for recording
client-version: "latest"
setup: ${{ inputs.test-setup || 'ollama' }}
suite: ${{ inputs.suite }}
inference-mode: 'record'
- name: Run and record tests
uses: ./.github/actions/run-and-record-tests
env:
# Set OPENAI_API_KEY if using gpt setup
OPENAI_API_KEY: ${{ inputs.test-setup == 'gpt' && secrets.OPENAI_API_KEY || '' }}
with:
stack-config: 'server:ci-tests' # recording must be done with server since more tests are run
setup: ${{ inputs.test-setup || 'ollama' }}
inference-mode: 'record'
suite: ${{ inputs.suite }}
subdirs: ${{ inputs.subdirs }}
pattern: ${{ inputs.pattern }}

View file

@ -1,5 +1,7 @@
name: Check semantic PR titles name: Check semantic PR titles
run-name: Ensure that PR titles follow the conventional commit spec
on: on:
pull_request_target: pull_request_target:
types: types:
@ -9,7 +11,7 @@ on:
- synchronize - synchronize
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true cancel-in-progress: true
permissions: permissions:
@ -20,6 +22,6 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Check PR Title's semantic conformance - name: Check PR Title's semantic conformance
uses: amannn/action-semantic-pull-request@0723387faaf9b38adef4775cd42cfd5155ed6017 # v5.5.3 uses: amannn/action-semantic-pull-request@48f256284bd46cdaab1048c3721360e808335d50 # v6.1.1
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

227
.github/workflows/stainless-builds.yml vendored Normal file
View file

@ -0,0 +1,227 @@
name: Stainless SDK Builds
run-name: Build Stainless SDK from OpenAPI spec changes
# This workflow uses pull_request_target, which allows it to run on pull requests
# from forks with access to secrets. This is safe because the workflow definition
# comes from the base branch (trusted), and the action only reads OpenAPI spec
# files without executing any code from the PR.
on:
pull_request_target:
types:
- opened
- synchronize
- reopened
- closed
paths:
- "client-sdks/stainless/**"
- ".github/workflows/stainless-builds.yml" # this workflow
workflow_dispatch:
inputs:
pr_number:
description: 'PR number to run Stainless build for'
required: true
type: number
sdk_install_url:
description: 'Python SDK install URL (optional, for testing specific builds)'
required: false
type: string
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || inputs.pr_number || github.run_id }}
cancel-in-progress: true
env:
# Stainless organization name.
STAINLESS_ORG: llamastack
# Stainless project name.
STAINLESS_PROJECT: llama-stack-client
# Path to your OpenAPI spec.
OAS_PATH: ./client-sdks/stainless/openapi.yml
# Path to your Stainless config. Optional; only provide this if you prefer
# to maintain the ground truth Stainless config in your own repo.
CONFIG_PATH: ./client-sdks/stainless/config.yml
# When to fail the job based on build conclusion.
# Options: "never" | "note" | "warning" | "error" | "fatal".
FAIL_ON: error
# In your repo secrets, configure:
# - STAINLESS_API_KEY: a Stainless API key, which you can generate on the
# Stainless organization dashboard
jobs:
compute-branch:
runs-on: ubuntu-latest
outputs:
preview_branch: ${{ steps.compute.outputs.preview_branch }}
base_branch: ${{ steps.compute.outputs.base_branch }}
merge_branch: ${{ steps.compute.outputs.merge_branch }}
pr_head_repo: ${{ steps.compute.outputs.pr_head_repo }}
pr_head_ref: ${{ steps.compute.outputs.pr_head_ref }}
pr_head_sha: ${{ steps.compute.outputs.pr_head_sha }}
pr_base_sha: ${{ steps.compute.outputs.pr_base_sha }}
pr_base_ref: ${{ steps.compute.outputs.pr_base_ref }}
pr_title: ${{ steps.compute.outputs.pr_title }}
is_fork_pr: ${{ steps.compute.outputs.is_fork_pr }}
steps:
- name: Fetch PR details for workflow_dispatch
if: github.event_name == 'workflow_dispatch'
id: fetch-pr
env:
GH_TOKEN: ${{ github.token }}
run: |
PR_DATA=$(gh pr view ${{ inputs.pr_number }} --repo ${{ github.repository }} --json headRefName,headRepository,headRefOid,baseRefName,baseRefOid,headRepositoryOwner,title)
echo "pr_data=$PR_DATA" >> $GITHUB_OUTPUT
- name: Compute branch names
id: compute
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
# Extract from fetched PR data
PR_DATA='${{ steps.fetch-pr.outputs.pr_data }}'
FORK_OWNER=$(echo "$PR_DATA" | jq -r '.headRepositoryOwner.login')
REPO_NAME=$(echo "$PR_DATA" | jq -r '.headRepository.name')
HEAD_REPO="${FORK_OWNER}/${REPO_NAME}"
BRANCH_NAME=$(echo "$PR_DATA" | jq -r '.headRefName')
HEAD_SHA=$(echo "$PR_DATA" | jq -r '.headRefOid')
BASE_SHA=$(echo "$PR_DATA" | jq -r '.baseRefOid')
BASE_REF=$(echo "$PR_DATA" | jq -r '.baseRefName')
PR_TITLE=$(echo "$PR_DATA" | jq -r '.title')
else
# Use pull_request_target event data
HEAD_REPO="${{ github.event.pull_request.head.repo.full_name }}"
BRANCH_NAME="${{ github.event.pull_request.head.ref }}"
FORK_OWNER="${{ github.event.pull_request.head.repo.owner.login }}"
HEAD_SHA="${{ github.event.pull_request.head.sha }}"
BASE_SHA="${{ github.event.pull_request.base.sha }}"
BASE_REF="${{ github.event.pull_request.base.ref }}"
PR_TITLE="${{ github.event.pull_request.title }}"
fi
BASE_REPO="${{ github.repository }}"
if [ "$HEAD_REPO" != "$BASE_REPO" ]; then
# Fork PR: prefix with fork owner for isolation
if [ -z "$FORK_OWNER" ]; then
echo "Error: Fork PR detected but fork owner is empty" >&2
exit 1
fi
PREVIEW_BRANCH="preview/${FORK_OWNER}/${BRANCH_NAME}"
BASE_BRANCH="preview/base/${FORK_OWNER}/${BRANCH_NAME}"
IS_FORK_PR="true"
else
# Same-repo PR
PREVIEW_BRANCH="preview/${BRANCH_NAME}"
BASE_BRANCH="preview/base/${BRANCH_NAME}"
IS_FORK_PR="false"
fi
echo "preview_branch=${PREVIEW_BRANCH}" >> $GITHUB_OUTPUT
echo "base_branch=${BASE_BRANCH}" >> $GITHUB_OUTPUT
echo "merge_branch=${PREVIEW_BRANCH}" >> $GITHUB_OUTPUT
echo "pr_head_repo=${HEAD_REPO}" >> $GITHUB_OUTPUT
echo "pr_head_ref=${BRANCH_NAME}" >> $GITHUB_OUTPUT
echo "pr_head_sha=${HEAD_SHA}" >> $GITHUB_OUTPUT
echo "pr_base_sha=${BASE_SHA}" >> $GITHUB_OUTPUT
echo "pr_base_ref=${BASE_REF}" >> $GITHUB_OUTPUT
echo "pr_title=${PR_TITLE}" >> $GITHUB_OUTPUT
echo "is_fork_pr=${IS_FORK_PR}" >> $GITHUB_OUTPUT
preview:
needs: compute-branch
# Skip preview if workflow_dispatch provides sdk_install_url, or if PR is being closed
if: |
(github.event_name == 'workflow_dispatch' && inputs.sdk_install_url == '') ||
(github.event_name == 'pull_request_target' && github.event.action != 'closed')
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
outputs:
sdk_install_url: ${{ fromJSON(steps.run-preview.outputs.outcomes || '{}').python.install_url || '' }}
steps:
# Checkout the PR's code to access the OpenAPI spec and config files.
# This is necessary to read the spec/config from the PR (including from forks).
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
repository: ${{ needs.compute-branch.outputs.pr_head_repo }}
ref: ${{ needs.compute-branch.outputs.pr_head_sha }}
fetch-depth: 2
- name: Run preview builds
id: run-preview
uses: stainless-api/upload-openapi-spec-action/preview@11792f827da87f9411ca0b491d7514b94dcb815f # 1.9.0
env:
PR_NUMBER: ${{ inputs.pr_number || github.event.pull_request.number }}
with:
stainless_api_key: ${{ secrets.STAINLESS_API_KEY }}
org: ${{ env.STAINLESS_ORG }}
project: ${{ env.STAINLESS_PROJECT }}
oas_path: ${{ env.OAS_PATH }}
config_path: ${{ env.CONFIG_PATH }}
fail_on: ${{ env.FAIL_ON }}
base_sha: ${{ needs.compute-branch.outputs.pr_base_sha }}
base_ref: ${{ needs.compute-branch.outputs.pr_base_ref }}
head_sha: ${{ needs.compute-branch.outputs.pr_head_sha }}
branch: ${{ needs.compute-branch.outputs.preview_branch }}
base_branch: ${{ needs.compute-branch.outputs.base_branch }}
commit_message: ${{ needs.compute-branch.outputs.pr_title }}
make_comment: true
run-integration-tests:
needs: [compute-branch, preview]
if: |
always() &&
(needs.preview.result == 'success' || needs.preview.result == 'skipped') &&
(github.event_name == 'workflow_dispatch' || github.event.action != 'closed')
uses: ./.github/workflows/integration-tests.yml
with:
# Use provided sdk_install_url from workflow_dispatch, or from preview build
sdk_install_url: ${{ inputs.sdk_install_url || needs.preview.outputs.sdk_install_url }}
matrix_key: 'stainless'
test-all-client-versions: false
pr_head_sha: ${{ needs.compute-branch.outputs.pr_head_sha }}
pr_head_ref: ${{ needs.compute-branch.outputs.pr_head_ref }}
is_fork_pr: ${{ needs.compute-branch.outputs.is_fork_pr == 'true' }}
merge:
needs: compute-branch
if: github.event_name == 'pull_request_target' && github.event.action == 'closed' && github.event.pull_request.merged == true
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
# Checkout the PR's code to access the OpenAPI spec and config files.
# This is necessary to read the spec/config from the PR (including from forks).
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
repository: ${{ needs.compute-branch.outputs.pr_head_repo }}
ref: ${{ needs.compute-branch.outputs.pr_head_sha }}
fetch-depth: 2
# Note that this only merges in changes that happened on the last build on
# the computed preview branch. It's possible that there are OAS/config
# changes that haven't been built, if the preview job didn't finish
# before this step starts. In theory we want to wait for all builds
# against the preview branch to complete, but assuming that
# the preview job happens before the PR merge, it should be fine.
- name: Run merge build
uses: stainless-api/upload-openapi-spec-action/merge@11792f827da87f9411ca0b491d7514b94dcb815f # 1.9.0
with:
stainless_api_key: ${{ secrets.STAINLESS_API_KEY }}
org: ${{ env.STAINLESS_ORG }}
project: ${{ env.STAINLESS_PROJECT }}
oas_path: ${{ env.OAS_PATH }}
config_path: ${{ env.CONFIG_PATH }}
fail_on: ${{ env.FAIL_ON }}
base_sha: ${{ needs.compute-branch.outputs.pr_base_sha }}
base_ref: ${{ needs.compute-branch.outputs.pr_base_ref }}
head_sha: ${{ needs.compute-branch.outputs.pr_head_sha }}
merge_branch: ${{ needs.compute-branch.outputs.merge_branch }}

View file

@ -1,5 +1,7 @@
name: Close stale issues and PRs name: Close stale issues and PRs
run-name: Run the Stale Bot action
on: on:
schedule: schedule:
- cron: '0 0 * * *' # every day at midnight - cron: '0 0 * * *' # every day at midnight
@ -22,7 +24,7 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Stale Action - name: Stale Action
uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0 uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # v10.1.1
with: with:
stale-issue-label: 'stale' stale-issue-label: 'stale'
stale-issue-message: > stale-issue-message: >

View file

@ -0,0 +1,86 @@
name: Test External Providers Installed via Module
run-name: Test External Provider installation via Python module
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
paths:
- 'src/llama_stack/**'
- 'tests/integration/**'
- 'uv.lock'
- 'pyproject.toml'
- 'tests/external/*'
- '.github/workflows/test-external-provider-module.yml' # This workflow
jobs:
test-external-providers-from-module:
# This workflow is disabled. See https://github.com/meta-llama/llama-stack/pull/2975#issuecomment-3138702984 for details
if: false
runs-on: ubuntu-latest
strategy:
matrix:
image-type: [venv]
# We don't do container yet, it's tricky to install a package from the host into the
# container and point 'uv pip install' to the correct path...
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Install Ramalama
shell: bash
run: |
uv pip install ramalama
- name: Run Ramalama
shell: bash
run: |
nohup ramalama serve llama3.2:3b-instruct-fp16 > ramalama_server.log 2>&1 &
- name: Apply image type to config file
run: |
yq -i '.image_type = "${{ matrix.image-type }}"' tests/external/ramalama-stack/config.yaml
cat tests/external/ramalama-stack/config.yaml
- name: Install distribution dependencies
run: |
uv run llama stack list-deps tests/external/ramalama-stack/build.yaml | xargs -L1 uv pip install
- name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv'
env:
INFERENCE_MODEL: "llama3.2:3b-instruct-fp16"
LLAMA_STACK_LOG_FILE: "server.log"
run: |
# Use the virtual environment created by the build step (name comes from build config)
source ramalama-stack-test/bin/activate
uv pip list
nohup llama stack run tests/external/ramalama-stack/config.yaml > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready
run: |
for i in {1..30}; do
if ! grep -q "successfully connected to Ramalama" server.log; then
echo "Waiting for Llama Stack server to load the provider..."
sleep 1
else
echo "Provider loaded"
exit 0
fi
done
echo "Provider failed to load"
cat server.log
exit 1
- name: Upload all logs to artifacts
if: ${{ always() }}
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: logs-${{ github.run_id }}-${{ github.run_attempt }}-external-provider-module-test
path: |
*.log
retention-days: 1

View file

@ -1,73 +0,0 @@
name: Test External Providers
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
paths:
- 'llama_stack/**'
- 'tests/integration/**'
- 'uv.lock'
- 'pyproject.toml'
- 'requirements.txt'
- '.github/workflows/test-external-providers.yml' # This workflow
jobs:
test-external-providers:
runs-on: ubuntu-latest
strategy:
matrix:
image-type: [venv]
# We don't do container yet, it's tricky to install a package from the host into the
# container and point 'uv pip install' to the correct path...
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Apply image type to config file
run: |
yq -i '.image_type = "${{ matrix.image-type }}"' tests/external-provider/llama-stack-provider-ollama/custom-distro.yaml
cat tests/external-provider/llama-stack-provider-ollama/custom-distro.yaml
- name: Setup directory for Ollama custom provider
run: |
mkdir -p tests/external-provider/llama-stack-provider-ollama/src/
cp -a llama_stack/providers/remote/inference/ollama/ tests/external-provider/llama-stack-provider-ollama/src/llama_stack_provider_ollama
- name: Create provider configuration
run: |
mkdir -p /home/runner/.llama/providers.d/remote/inference
cp tests/external-provider/llama-stack-provider-ollama/custom_ollama.yaml /home/runner/.llama/providers.d/remote/inference/custom_ollama.yaml
- name: Build distro from config file
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. llama stack build --config tests/external-provider/llama-stack-provider-ollama/custom-distro.yaml
- name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv'
env:
INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"
run: |
# Use the virtual environment created by the build step (name comes from build config)
source ci-test/bin/activate
uv pip list
nohup llama stack run tests/external-provider/llama-stack-provider-ollama/run.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready
run: |
for i in {1..30}; do
if ! grep -q "Successfully loaded external provider remote::custom_ollama" server.log; then
echo "Waiting for Llama Stack server to load the provider..."
sleep 1
else
echo "Provider loaded"
exit 0
fi
done
echo "Provider failed to load"
cat server.log
exit 1

92
.github/workflows/test-external.yml vendored Normal file
View file

@ -0,0 +1,92 @@
name: Test External API and Providers
run-name: Test the External API and Provider mechanisms
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
paths:
- 'src/llama_stack/**'
- '!src/llama_stack_ui/**'
- 'tests/integration/**'
- 'uv.lock'
- 'pyproject.toml'
- 'requirements.txt'
- 'tests/external/*'
- '.github/workflows/test-external.yml' # This workflow
jobs:
test-external:
runs-on: ubuntu-latest
strategy:
matrix:
image-type: [venv]
# We don't do container yet, it's tricky to install a package from the host into the
# container and point 'uv pip install' to the correct path...
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Create API configuration
run: |
mkdir -p /home/runner/.llama/apis.d
cp tests/external/weather.yaml /home/runner/.llama/apis.d/weather.yaml
- name: Create provider configuration
run: |
mkdir -p /home/runner/.llama/providers.d/remote/weather
cp tests/external/kaze.yaml /home/runner/.llama/providers.d/remote/weather/kaze.yaml
- name: Print distro dependencies
run: |
uv run --no-sync llama stack list-deps tests/external/config.yaml
- name: Build distro from config file
run: |
uv venv ci-test
source ci-test/bin/activate
uv pip install -e .
LLAMA_STACK_LOGGING=all=CRITICAL llama stack list-deps tests/external/config.yaml | xargs -L1 uv pip install
- name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv'
env:
INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"
LLAMA_STACK_LOG_FILE: "server.log"
run: |
# Use the virtual environment created by the build step (name comes from build config)
source ci-test/bin/activate
uv pip list
nohup llama stack run tests/external/config.yaml > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready
run: |
echo "Waiting for Llama Stack server..."
for i in {1..30}; do
if curl -sSf http://localhost:8321/v1/health | grep -q "OK"; then
echo "Llama Stack server is up!"
exit 0
fi
sleep 1
done
echo "Llama Stack server failed to start"
cat server.log
exit 1
- name: Test external API
run: |
curl -sSf http://localhost:8321/v1/weather/locations
- name: Upload all logs to artifacts
if: ${{ always() }}
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: logs-${{ github.run_id }}-${{ github.run_attempt }}-external-test
path: |
*.log
retention-days: 1

View file

@ -1,69 +0,0 @@
name: auto-tests
on:
# pull_request:
workflow_dispatch:
inputs:
commit_sha:
description: 'Specific Commit SHA to trigger on'
required: false
default: $GITHUB_SHA # default to the last commit of $GITHUB_REF branch
jobs:
test-llama-stack-as-library:
runs-on: ubuntu-latest
env:
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
TAVILY_SEARCH_API_KEY: ${{ secrets.TAVILY_SEARCH_API_KEY }}
strategy:
matrix:
provider: [fireworks, together]
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ github.event.inputs.commit_sha }}
- name: Echo commit SHA
run: |
echo "Triggered on commit SHA: ${{ github.event.inputs.commit_sha }}"
git rev-parse HEAD
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt pytest
pip install -e .
- name: Build providers
run: |
llama stack build --template ${{ matrix.provider }} --image-type venv
- name: Install the latest llama-stack-client & llama-models packages
run: |
pip install -e git+https://github.com/meta-llama/llama-stack-client-python.git#egg=llama-stack-client
pip install -e git+https://github.com/meta-llama/llama-models.git#egg=llama-models
- name: Run client-sdk test
working-directory: "${{ github.workspace }}"
env:
REPORT_OUTPUT: md_report.md
shell: bash
run: |
pip install --upgrade pytest-md-report
echo "REPORT_FILE=${REPORT_OUTPUT}" >> "$GITHUB_ENV"
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
LLAMA_STACK_CONFIG=./llama_stack/templates/${{ matrix.provider }}/run.yaml pytest --md-report --md-report-verbose=1 ./tests/client-sdk/inference/ --md-report-output "$REPORT_OUTPUT"
- name: Output reports to the job summary
if: always()
shell: bash
run: |
if [ -f "$REPORT_FILE" ]; then
echo "<details><summary> Test Report for ${{ matrix.provider }} </summary>" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
cat "$REPORT_FILE" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "</details>" >> $GITHUB_STEP_SUMMARY
fi

55
.github/workflows/ui-unit-tests.yml vendored Normal file
View file

@ -0,0 +1,55 @@
name: UI Tests
run-name: Run the UI test suite
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
paths:
- 'src/llama_stack_ui/**'
- '.github/workflows/ui-unit-tests.yml' # This workflow
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true
jobs:
ui-tests:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
node-version: [22]
steps:
- name: Checkout repository
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Node.js
uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: 'src/llama_stack_ui/package-lock.json'
- name: Install dependencies
working-directory: src/llama_stack_ui
run: npm ci
- name: Run linting
working-directory: src/llama_stack_ui
run: npm run lint
- name: Run format check
working-directory: src/llama_stack_ui
run: npm run format:check
- name: Run unit tests
working-directory: src/llama_stack_ui
env:
CI: true
run: npm test -- --coverage --watchAll=false --passWithNoTests

View file

@ -1,12 +1,19 @@
name: Unit Tests name: Unit Tests
run-name: Run the unit test suite
on: on:
push: push:
branches: [ main ] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
pull_request: pull_request:
branches: [ main ] branches:
- main
- 'release-[0-9]+.[0-9]+.x'
paths: paths:
- 'llama_stack/**' - 'src/llama_stack/**'
- '!src/llama_stack_ui/**'
- 'tests/unit/**' - 'tests/unit/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
@ -15,7 +22,7 @@ on:
workflow_dispatch: workflow_dispatch:
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
@ -25,24 +32,24 @@ jobs:
fail-fast: false fail-fast: false
matrix: matrix:
python: python:
- "3.10"
- "3.11"
- "3.12" - "3.12"
- "3.13" - "3.13"
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
with:
python-version: ${{ matrix.python }}
- name: Run unit tests - name: Run unit tests
run: | run: |
PYTHON_VERSION=${{ matrix.python }} ./scripts/unit-tests.sh --cov=llama_stack --junitxml=pytest-report-${{ matrix.python }}.xml --cov-report=html:htmlcov-${{ matrix.python }} PYTHON_VERSION=${{ matrix.python }} ./scripts/unit-tests.sh --junitxml=pytest-report-${{ matrix.python }}.xml
- name: Upload test results - name: Upload test results
if: always() if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with: with:
name: test-results-${{ matrix.python }} name: test-results-${{ matrix.python }}
path: | path: |

View file

@ -1,68 +0,0 @@
name: Update ReadTheDocs
on:
workflow_dispatch:
inputs:
branch:
description: 'RTD version to update'
required: false
default: 'latest'
push:
branches:
- main
paths:
- 'docs/**'
- 'pyproject.toml'
- '.github/workflows/update-readthedocs.yml'
tags:
- '*'
pull_request:
branches:
- main
paths:
- 'docs/**'
- 'pyproject.toml'
- '.github/workflows/update-readthedocs.yml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
update-readthedocs:
runs-on: ubuntu-latest
env:
TOKEN: ${{ secrets.READTHEDOCS_TOKEN }}
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Build HTML
run: |
cd docs
uv run make html
- name: Trigger ReadTheDocs build
if: github.event_name != 'pull_request'
run: |
if [ -z "$TOKEN" ]; then
echo "READTHEDOCS_TOKEN is not set"
exit 1
fi
response=$(curl -X POST \
-H "Content-Type: application/json" \
-d "{
\"token\": \"$TOKEN\",
\"version\": \"$GITHUB_REF_NAME\"
}" \
https://readthedocs.org/api/v2/webhook/llama-stack/289768/)
echo "Response: $response"
if [ $(echo $response | jq -r '.build_triggered') != 'true' ]; then
echo "Failed to trigger ReadTheDocs build"
exit 1
fi

13
.gitignore vendored
View file

@ -18,7 +18,6 @@ Package.resolved
.venv/ .venv/
.vscode .vscode
_build _build
docs/src
# Sample tool-calling datasets generated by NVIDIA notebooks # Sample tool-calling datasets generated by NVIDIA notebooks
docs/notebooks/nvidia/tool_calling/sample_data/ docs/notebooks/nvidia/tool_calling/sample_data/
pyrightconfig.json pyrightconfig.json
@ -26,3 +25,15 @@ venv/
pytest-report.xml pytest-report.xml
.coverage .coverage
.python-version .python-version
AGENTS.md
server.log
CLAUDE.md
.claude/
docs/.docusaurus/
docs/node_modules/
docs/static/imported-files/
docs/docs/api-deprecated/
docs/docs/api-experimental/
docs/docs/api/
tests/integration/client-typescript/node_modules/
.ts-client-checkout/

View file

@ -1,7 +1,9 @@
exclude: 'build/' exclude: 'build/'
minimum_pre_commit_version: 4.4.0
x-uv-dependency: &uv-dependency "uv==0.9.15"
default_language_version: default_language_version:
python: python3 python: python3.12
node: "22"
repos: repos:
- repo: https://github.com/pre-commit/pre-commit-hooks - repo: https://github.com/pre-commit/pre-commit-hooks
@ -14,12 +16,12 @@ repos:
- id: check-added-large-files - id: check-added-large-files
args: ['--maxkb=1000'] args: ['--maxkb=1000']
- id: end-of-file-fixer - id: end-of-file-fixer
exclude: '^(.*\.svg)$' exclude: '^(.*\.svg|.*\.md)$'
- id: no-commit-to-branch - id: no-commit-to-branch
- id: check-yaml - id: check-yaml
args: ["--unsafe"] args: ["--unsafe"]
exclude: 'docs/static/openai-spec-2.3.0.yml'
- id: detect-private-key - id: detect-private-key
- id: requirements-txt-fixer
- id: mixed-line-ending - id: mixed-line-ending
args: [--fix=lf] # Forces to replace line ending by LF (line feed) args: [--fix=lf] # Forces to replace line ending by LF (line feed)
- id: check-executables-have-shebangs - id: check-executables-have-shebangs
@ -29,7 +31,7 @@ repos:
- id: check-toml - id: check-toml
- repo: https://github.com/Lucas-C/pre-commit-hooks - repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.5.4 rev: v1.5.5
hooks: hooks:
- id: insert-license - id: insert-license
files: \.py$|\.sh$ files: \.py$|\.sh$
@ -38,39 +40,26 @@ repos:
- docs/license_header.txt - docs/license_header.txt
- repo: https://github.com/astral-sh/ruff-pre-commit - repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.4 rev: v0.12.2
hooks: hooks:
- id: ruff - id: ruff
args: [ --fix ] args: [ --fix ]
exclude: ^llama_stack/strong_typing/.*$
- id: ruff-format - id: ruff-format
- repo: https://github.com/adamchainz/blacken-docs - repo: https://github.com/adamchainz/blacken-docs
rev: 1.19.0 rev: 1.19.1
hooks: hooks:
- id: blacken-docs - id: blacken-docs
additional_dependencies: additional_dependencies:
- black==24.3.0 - black==24.3.0
- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.7.8
hooks:
- id: uv-lock
- id: uv-export
args: [
"--frozen",
"--no-hashes",
"--no-emit-project",
"--no-default-groups",
"--output-file=requirements.txt"
]
- repo: https://github.com/pre-commit/mirrors-mypy - repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.15.0 rev: v1.18.2
hooks: hooks:
- id: mypy - id: mypy
additional_dependencies: additional_dependencies:
- uv==0.6.2 - *uv-dependency
- mypy - mypy
- pytest - pytest
- rich - rich
@ -86,24 +75,48 @@ repos:
- repo: local - repo: local
hooks: hooks:
- id: uv-lock
name: uv-lock
additional_dependencies:
- *uv-dependency
entry: ./scripts/uv-run-with-index.sh lock
language: python
pass_filenames: false
require_serial: true
files: ^(pyproject\.toml|uv\.lock)$
- id: mypy-full
name: mypy (full type_checking)
entry: ./scripts/uv-run-with-index.sh run --group dev --group type_checking mypy
language: system
pass_filenames: false
stages: [manual]
- id: distro-codegen - id: distro-codegen
name: Distribution Template Codegen name: Distribution Template Codegen
additional_dependencies: additional_dependencies:
- uv==0.7.8 - *uv-dependency
entry: uv run --group codegen ./scripts/distro_codegen.py entry: ./scripts/uv-run-with-index.sh run --group codegen ./scripts/distro_codegen.py
language: python language: python
pass_filenames: false pass_filenames: false
require_serial: true require_serial: true
files: ^llama_stack/templates/.*$|^llama_stack/providers/.*/inference/.*/models\.py$ files: ^src/llama_stack/distributions/.*$|^src/llama_stack/providers/.*/inference/.*/models\.py$
- id: provider-codegen
name: Provider Codegen
additional_dependencies:
- *uv-dependency
entry: ./scripts/uv-run-with-index.sh run --group codegen ./scripts/provider_codegen.py
language: python
pass_filenames: false
require_serial: true
files: ^src/llama_stack/providers/.*$|^scripts/run_openapi_generator.sh$
- id: openapi-codegen - id: openapi-codegen
name: API Spec Codegen name: API Spec Codegen
additional_dependencies: additional_dependencies:
- uv==0.7.8 - *uv-dependency
entry: sh -c 'uv run ./docs/openapi_generator/run_openapi_generator.sh > /dev/null' entry: sh -c './scripts/uv-run-with-index.sh run scripts/run_openapi_generator.sh'
language: python language: python
pass_filenames: false pass_filenames: false
require_serial: true require_serial: true
files: ^llama_stack/apis/|^docs/openapi_generator/ files: ^src/llama_stack_api/.*$
- id: check-workflows-use-hashes - id: check-workflows-use-hashes
name: Check GitHub Actions use SHA-pinned actions name: Check GitHub Actions use SHA-pinned actions
entry: ./scripts/check-workflows-use-hashes.sh entry: ./scripts/check-workflows-use-hashes.sh
@ -112,7 +125,109 @@ repos:
require_serial: true require_serial: true
always_run: true always_run: true
files: ^\.github/workflows/.*\.ya?ml$ files: ^\.github/workflows/.*\.ya?ml$
- id: check-init-py
name: Check for missing __init__.py files
entry: ./scripts/check-init-py.sh
language: system
pass_filenames: false
require_serial: true
always_run: true
files: ^src/llama_stack/.*$
- id: forbid-pytest-asyncio
name: Block @pytest.mark.asyncio and @pytest_asyncio.fixture
entry: bash
language: system
types: [python]
pass_filenames: true
args:
- -c
- |
grep -EnH '^[^#]*@pytest\.mark\.asyncio|@pytest_asyncio\.fixture' "$@" && {
echo;
echo "❌ Do not use @pytest.mark.asyncio or @pytest_asyncio.fixture."
echo " pytest is already configured with async-mode=auto."
echo;
exit 1;
} || true
- id: generate-ci-docs
name: Generate CI documentation
additional_dependencies:
- *uv-dependency
entry: ./scripts/uv-run-with-index.sh run ./scripts/gen-ci-docs.py
language: python
pass_filenames: false
require_serial: true
files: ^.github/workflows/.*$
- id: ui-linter
name: Format & Lint UI
entry: bash ./scripts/run-ui-linter.sh
language: system
files: ^src/llama_stack_ui/.*\.(ts|tsx)$
pass_filenames: false
require_serial: true
- id: check-log-usage
name: Ensure 'llama_stack.log' usage for logging
entry: bash
language: system
types: [python]
pass_filenames: true
args:
- -c
- |
matches=$(grep -EnH '^[^#]*\b(import\s+logging|from\s+logging\b)' "$@" | grep -v -e '#\s*allow-direct-logging' || true)
if [ -n "$matches" ]; then
# GitHub Actions annotation format
while IFS=: read -r file line_num rest; do
echo "::error file=$file,line=$line_num::Do not use 'import logging' or 'from logging import' in $file. Use the custom log instead: from llama_stack.log import get_logger; logger = get_logger(). If direct logging is truly needed, add: # allow-direct-logging"
done <<< "$matches"
exit 1
fi
exit 0
- id: fips-compliance
name: Ensure llama-stack remains FIPS compliant
entry: bash
language: system
types: [python]
pass_filenames: true
exclude: '^tests/.*$' # Exclude test dir as some safety tests used MD5
args:
- -c
- |
grep -EnH '^[^#]*\b(md5|sha1|uuid3|uuid5)\b' "$@" && {
echo;
echo "❌ Do not use any of the following functions: hashlib.md5, hashlib.sha1, uuid.uuid3, uuid.uuid5"
echo " These functions are not FIPS-compliant"
echo;
exit 1;
} || true
- id: check-api-independence
name: Ensure llama_stack_api does not import llama_stack
entry: bash
language: system
pass_filenames: false
require_serial: true
always_run: true
files: ^src/llama_stack_api/.*$
args:
- -c
- |
API_DIR="src/llama_stack_api"
grep -rn --include="*.py" -E '^[^#]*(import llama_stack\b|from llama_stack\b)' "$API_DIR" 2>/dev/null && {
echo "llama_stack_api must not import llama_stack";
exit 1;
}
[ -f "$API_DIR/pyproject.toml" ] && grep -n 'llama_stack[^_]' "$API_DIR/pyproject.toml" && {
echo "llama_stack_api must not depend on llama_stack in pyproject.toml";
exit 1;
}
exit 0
ci: ci:
autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
autoupdate_commit_msg: ⬆ [pre-commit.ci] pre-commit autoupdate autoupdate_commit_msg: ⬆ [pre-commit.ci] pre-commit autoupdate
autofix_prs: true
autoupdate_branch: ''
autoupdate_schedule: weekly
skip: []
submodules: false

View file

@ -1,25 +0,0 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/source/conf.py
# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
jobs:
pre_create_environment:
- asdf plugin add uv
- asdf install uv latest
- asdf global uv latest
create_environment:
- uv venv "${READTHEDOCS_VIRTUALENV_PATH}"
install:
- UV_PROJECT_ENVIRONMENT="${READTHEDOCS_VIRTUALENV_PATH}" uv sync --frozen --group docs

View file

@ -1,5 +1,155 @@
# Changelog # Changelog
# v0.2.20
Published on: 2025-08-29T22:25:32Z
Here are some key changes that are coming as part of this release.
### Build and Environment
- Environment improvements: fixed env var replacement to preserve types.
- Docker stability: fixed container startup failures for Fireworks AI provider.
- Removed absolute paths in build for better portability.
### Features
- UI Enhancements: Implemented file upload and VectorDB creation/configuration directly in UI.
- Vector Store Improvements: Added keyword, vector, and hybrid search inside vector store.
- Added S3 authorization support for file providers.
- SQL Store: Added inequality support to where clause.
### Documentation
- Fixed post-training docs.
- Added Contributor Guidelines for creating Internal vs. External providers.
### Fixes
- Removed unsupported bfcl scoring function.
- Multiple reliability and configuration fixes for providers and environment handling.
### Engineering / Chores
- Cleaner internal development setup with consistent paths.
- Incremental improvements to provider integration and vector store behavior.
### New Contributors
- @omertuc made their first contribution in #3270
- @r3v5 made their first contribution in vector store hybrid search
---
# v0.2.19
Published on: 2025-08-26T22:06:55Z
## Highlights
* feat: Add CORS configuration support for server by @skamenan7 in https://github.com/llamastack/llama-stack/pull/3201
* feat(api): introduce /rerank by @ehhuang in https://github.com/llamastack/llama-stack/pull/2940
* feat: Add S3 Files Provider by @mattf in https://github.com/llamastack/llama-stack/pull/3202
---
# v0.2.18
Published on: 2025-08-20T01:09:27Z
## Highlights
* Add moderations create API
* Hybrid search in Milvus
* Numerous Responses API improvements
* Documentation updates
---
# v0.2.17
Published on: 2025-08-05T01:51:14Z
## Highlights
* feat(tests): introduce inference record/replay to increase test reliability by @ashwinb in https://github.com/meta-llama/llama-stack/pull/2941
* fix(library_client): improve initialization error handling and prevent AttributeError by @mattf in https://github.com/meta-llama/llama-stack/pull/2944
* fix: use OLLAMA_URL to activate Ollama provider in starter by @ashwinb in https://github.com/meta-llama/llama-stack/pull/2963
* feat(UI): adding MVP playground UI by @franciscojavierarceo in https://github.com/meta-llama/llama-stack/pull/2828
* Standardization of errors (@nathan-weinberg)
* feat: Enable DPO training with HuggingFace inline provider by @Nehanth in https://github.com/meta-llama/llama-stack/pull/2825
* chore: rename templates to distributions by @ashwinb in https://github.com/meta-llama/llama-stack/pull/3035
---
# v0.2.16
Published on: 2025-07-28T23:35:23Z
## Highlights
* Automatic model registration for self-hosted providers (ollama and vllm currently). No need for `INFERENCE_MODEL` environment variables which need to be updated, etc.
* Much simplified starter distribution. Most `ENABLE_` env variables are now gone. When you set `VLLM_URL`, the `vllm` provider is auto-enabled. Similar for `MILVUS_URL`, `PGVECTOR_DB`, etc. Check the [config.yaml](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/starter/config.yaml) for more details.
* All tests migrated to pytest now (thanks @Elbehery)
* DPO implementation in the post-training provider (thanks @Nehanth)
* (Huge!) Support for external APIs and providers thereof (thanks @leseb, @cdoern and others). This is a really big deal -- you can now add more APIs completely out of tree and experiment with them before (optionally) wanting to contribute back.
* `inline::vllm` provider is gone thank you very much
* several improvements to OpenAI inference implementations and LiteLLM backend (thanks @mattf)
* Chroma now supports Vector Store API (thanks @franciscojavierarceo).
* Authorization improvements: Vector Store/File APIs now supports access control (thanks @franciscojavierarceo); Telemetry read APIs are gated according to logged-in user's roles.
---
# v0.2.15
Published on: 2025-07-16T03:30:01Z
---
# v0.2.14
Published on: 2025-07-04T16:06:48Z
## Highlights
* Support for Llama Guard 4
* Added Milvus support to vector-stores API
* Documentation and zero-to-hero updates for latest APIs
---
# v0.2.13
Published on: 2025-06-28T04:28:11Z
## Highlights
* search_mode support in OpenAI vector store API
* Security fixes
---
# v0.2.12
Published on: 2025-06-20T22:52:12Z
## Highlights
* Filter support in file search
* Support auth attributes in inference and response stores
---
# v0.2.11
Published on: 2025-06-17T20:26:26Z
## Highlights
* OpenAI-compatible vector store APIs
* Hybrid Search in Sqlite-vec
* File search tool in Responses API
* Pagination in inference and response stores
* Added `suffix` to completions API for fill-in-the-middle tasks
---
# v0.2.10.1 # v0.2.10.1
Published on: 2025-06-06T20:11:02Z Published on: 2025-06-06T20:11:02Z
@ -399,7 +549,7 @@ GenAI application developers need more than just an LLM - they need to integrate
Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety. Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.
With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stacks plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience. With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stacks plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.
## Release ## Release
After iterating on the APIs for the last 3 months, today were launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements. After iterating on the APIs for the last 3 months, today were launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.
@ -462,70 +612,3 @@ A small but important bug-fix release to update the URL datatype for the client-
--- ---
# v0.0.62
Published on: 2024-12-18T02:39:43Z
---
# v0.0.61
Published on: 2024-12-10T20:50:33Z
---
# v0.0.55
Published on: 2024-11-23T17:14:07Z
---
# v0.0.54
Published on: 2024-11-22T00:36:09Z
---
# v0.0.53
Published on: 2024-11-20T22:18:00Z
🚀 Initial Release Notes for Llama Stack!
### Added
- Resource-oriented design for models, shields, memory banks, datasets and eval tasks
- Persistence for registered objects with distribution
- Ability to persist memory banks created for FAISS
- PostgreSQL KVStore implementation
- Environment variable placeholder support in run.yaml files
- Comprehensive Zero-to-Hero notebooks and quickstart guides
- Support for quantized models in Ollama
- Vision models support for Together, Fireworks, Meta-Reference, and Ollama, and vLLM
- Bedrock distribution with safety shields support
- Evals API with task registration and scoring functions
- MMLU and SimpleQA benchmark scoring functions
- Huggingface dataset provider integration for benchmarks
- Support for custom dataset registration from local paths
- Benchmark evaluation CLI tools with visualization tables
- RAG evaluation scoring functions and metrics
- Local persistence for datasets and eval tasks
### Changed
- Split safety into distinct providers (llama-guard, prompt-guard, code-scanner)
- Changed provider naming convention (`impls``inline`, `adapters``remote`)
- Updated API signatures for dataset and eval task registration
- Restructured folder organization for providers
- Enhanced Docker build configuration
- Added version prefixing for REST API routes
- Enhanced evaluation task registration workflow
- Improved benchmark evaluation output formatting
- Restructured evals folder organization for better modularity
### Removed
- `llama stack configure` command
---

View file

@ -1,17 +1,112 @@
# Contributing to Llama-Stack # Contributing to Llama Stack
We want to make contributing to this project as easy and transparent as We want to make contributing to this project as easy and transparent as
possible. possible.
## Set up your development environment
We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.
You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).
You can install the dependencies by running:
```bash
cd llama-stack
uv venv --python 3.12
uv sync --group dev
uv pip install -e .
source .venv/bin/activate
```
```{note}
If you are making changes to Llama Stack, it is essential that you use Python 3.12 as shown above.
Llama Stack can work with Python 3.13 but the pre-commit hooks used to validate code changes only work with Python 3.12.
If you don't specify a Python version, `uv` will automatically select a Python version according to the `requires-python`
section of the `pyproject.toml`, which is fine for running Llama Stack but not for committing changes.
For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
```
Note that you can create a dotenv file `.env` that includes necessary environment variables:
```
LLAMA_STACK_BASE_URL=http://localhost:8321
LLAMA_STACK_CLIENT_LOG=debug
LLAMA_STACK_PORT=8321
LLAMA_STACK_CONFIG=<provider-name>
TAVILY_SEARCH_API_KEY=
BRAVE_SEARCH_API_KEY=
```
And then use this dotenv file when running client SDK tests via the following:
```bash
uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
```
### Pre-commit Hooks
We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
```bash
uv pip install 'pre-commit>=4.4.0'
uv run pre-commit install
```
Note that the only version of pre-commit that works with the Llama Stack continuous integration is `4.3.0` so it is essential that you pull
that specific version as shown above. Once you have run these commands, pre-commit hooks will run automatically before each commit.
Alternatively, if you don't want to install the pre-commit hooks (or if you want to check if your changes are ready before committing),
you can run the checks manually by running:
```bash
uv run pre-commit run --all-files -v
```
The `-v` (verbose) parameter is optional but often helpful for getting more information about any issues with that the pre-commit checks identify.
To run the expanded mypy configuration that CI enforces, use:
```bash
uv run pre-commit run mypy-full --hook-stage manual --all-files
```
or invoke mypy directly with all optional dependencies:
```bash
uv run --group dev --group type_checking mypy
```
```{caution}
Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
```
## Discussions -> Issues -> Pull Requests ## Discussions -> Issues -> Pull Requests
We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md). We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later. If in doubt, please open a [discussion](https://github.com/llamastack/llama-stack/discussions); we can always convert that to an issue later.
### Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.
Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.
### Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Meta's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
**I'd like to contribute!** **I'd like to contribute!**
All issues are actionable (please report if they are not.) Pick one and start working on it. Thank you. If you are new to the project, start by looking at the issues tagged with "good first issue". If you're interested
If you need help or guidance, comment on the issue. Issues that are extra friendly to new contributors are tagged with "contributor friendly". leave a comment on the issue and a triager will assign it to you.
Please avoid picking up too many issues at once. This helps you stay focused and ensures that others in the community also have opportunities to contribute.
- Try to work on only 12 issues at a time, especially if youre still getting familiar with the codebase.
- Before taking an issue, check if its already assigned or being actively discussed.
- If youre blocked or cant continue with an issue, feel free to unassign yourself or leave a comment so others can step in.
**I have a bug!** **I have a bug!**
@ -41,89 +136,20 @@ If you need help or guidance, comment on the issue. Issues that are extra friend
4. Make sure your code lints using `pre-commit`. 4. Make sure your code lints using `pre-commit`.
5. If you haven't already, complete the Contributor License Agreement ("CLA"). 5. If you haven't already, complete the Contributor License Agreement ("CLA").
6. Ensure your pull request follows the [conventional commits format](https://www.conventionalcommits.org/en/v1.0.0/). 6. Ensure your pull request follows the [conventional commits format](https://www.conventionalcommits.org/en/v1.0.0/).
7. Ensure your pull request follows the [coding style](#coding-style).
## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Meta's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
## Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.
Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.
## Set up your development environment Please keep pull requests (PRs) small and focused. If you have a large set of changes, consider splitting them into logically grouped, smaller PRs to facilitate review and testing.
We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments. ```{tip}
You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/). As a general guideline:
- Experienced contributors should try to keep no more than 5 open PRs at a time.
You can install the dependencies by running: - New contributors are encouraged to have only one open PR at a time until theyre familiar with the codebase and process.
```bash
cd llama-stack
uv sync --extra dev
uv pip install -e .
source .venv/bin/activate
``` ```
> [!NOTE] ## Repository guidelines
> You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.11`)
> Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.
> For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
Note that you can create a dotenv file `.env` that includes necessary environment variables: ### Coding Style
```
LLAMA_STACK_BASE_URL=http://localhost:8321
LLAMA_STACK_CLIENT_LOG=debug
LLAMA_STACK_PORT=8321
LLAMA_STACK_CONFIG=<provider-name>
TAVILY_SEARCH_API_KEY=
BRAVE_SEARCH_API_KEY=
```
And then use this dotenv file when running client SDK tests via the following:
```bash
uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
```
## Pre-commit Hooks
We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
```bash
uv run pre-commit install
```
After that, pre-commit hooks will run automatically before each commit.
Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:
```bash
uv run pre-commit run --all-files
```
> [!CAUTION]
> Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
## Running tests
You can find the Llama Stack testing documentation here [here](tests/README.md).
## Adding a new dependency to the project
To add a new dependency to the project, you can use the `uv` command. For example, to add `foo` to the project, you can run:
```bash
uv add foo
uv sync
```
## Coding Style
* Comments should provide meaningful insights into the code. Avoid filler comments that simply * Comments should provide meaningful insights into the code. Avoid filler comments that simply
describe the next step, as they create unnecessary clutter, same goes for docstrings. describe the next step, as they create unnecessary clutter, same goes for docstrings.
@ -139,39 +165,65 @@ uv sync
justification for bypassing the check. justification for bypassing the check.
* Don't use unicode characters in the codebase. ASCII-only is preferred for compatibility or * Don't use unicode characters in the codebase. ASCII-only is preferred for compatibility or
readability reasons. readability reasons.
* Providers configuration class should be Pydantic Field class. It should have a `description` field
that describes the configuration. These descriptions will be used to generate the provider
documentation.
* When possible, use keyword arguments only when calling functions.
* Llama Stack utilizes [custom Exception classes](llama_stack/apis/common/errors.py) for certain Resources that should be used where applicable.
### License
By contributing to Llama, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
## Common Tasks ## Common Tasks
Some tips about common tasks you work on while contributing to Llama Stack: Some tips about common tasks you work on while contributing to Llama Stack:
### Using `llama stack build` ### Installing dependencies of distributions
Building a stack image (conda / docker) will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands. When installing dependencies for a distribution, you can use `llama stack list-deps` to view and install the required packages.
Example: Example:
```bash ```bash
cd work/ cd work/
git clone https://github.com/meta-llama/llama-stack.git git clone https://github.com/llamastack/llama-stack.git
git clone https://github.com/meta-llama/llama-stack-client-python.git git clone https://github.com/llamastack/llama-stack-client-python.git
cd llama-stack cd llama-stack
LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --template <...>
# Show dependencies for a distribution
llama stack list-deps <distro-name>
# Install dependencies
llama stack list-deps <distro-name> | xargs -L1 uv pip install
``` ```
### Updating distribution configurations
### Updating Provider Configurations If you have made changes to a provider's configuration in any form (introducing a new config key, or
changing models, etc.), you should run `./scripts/distro_codegen.py` to re-generate various YAML
files as well as the documentation. You should not change `docs/source/.../distributions/` files
manually as they are auto-generated.
If you have made changes to a provider's configuration in any form (introducing a new config key, or changing models, etc.), you should run `./scripts/distro_codegen.py` to re-generate various YAML files as well as the documentation. You should not change `docs/source/.../distributions/` files manually as they are auto-generated. ### Updating the provider documentation
If you have made changes to a provider's configuration, you should run `./scripts/provider_codegen.py`
to re-generate the documentation. You should not change `docs/source/.../providers/` files manually
as they are auto-generated.
Note that the provider "description" field will be used to generate the provider documentation.
### Building the Documentation ### Building the Documentation
If you are making changes to the documentation at [https://llama-stack.readthedocs.io/en/latest/](https://llama-stack.readthedocs.io/en/latest/), you can use the following command to build the documentation and preview your changes. You will need [Sphinx](https://www.sphinx-doc.org/en/master/) and the readthedocs theme. If you are making changes to the documentation at [https://llamastack.github.io/](https://llamastack.github.io/), you can use the following command to build the documentation and preview your changes.
```bash ```bash
# This rebuilds the documentation pages. # This rebuilds the documentation pages and the OpenAPI spec.
uv run --group docs make -C docs/ html cd docs/
npm install
npm run gen-api-docs all
npm run build
# This will start a local server (usually at http://127.0.0.1:8000) that automatically rebuilds and refreshes when you make changes to the documentation. # This will start a local server (usually at http://127.0.0.1:3000).
uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all npm run serve
``` ```
### Update API Documentation ### Update API Documentation
@ -179,11 +231,7 @@ uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
If you modify or add new API endpoints, update the API documentation accordingly. You can do this by running the following command: If you modify or add new API endpoints, update the API documentation accordingly. You can do this by running the following command:
```bash ```bash
uv run ./docs/openapi_generator/run_openapi_generator.sh uv run ./scripts/run_openapi_generator.sh
``` ```
The generated API documentation will be available in `docs/_static/`. Make sure to review the changes before committing. The generated API schema will be available in `docs/static/`. Make sure to review the changes before committing.
## License
By contributing to Llama, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.

View file

@ -1,9 +1,11 @@
include pyproject.toml include pyproject.toml
include llama_stack/models/llama/llama3/tokenizer.model include src/llama_stack/models/llama/llama3/tokenizer.model
include llama_stack/models/llama/llama4/tokenizer.model include src/llama_stack/models/llama/llama4/tokenizer.model
include llama_stack/distribution/*.sh include src/llama_stack/core/*.sh
include llama_stack/cli/scripts/*.sh include src/llama_stack/cli/scripts/*.sh
include llama_stack/templates/*/*.yaml include src/llama_stack/distributions/*/*.yaml
include llama_stack/providers/tests/test_cases/inference/*.json exclude src/llama_stack/distributions/ci-tests
include llama_stack/models/llama/*/*.md include tests/integration/test_cases/inference/*.json
include llama_stack/tests/integration/*.jpg include src/llama_stack/models/llama/*/*.md
include src/llama_stack/tests/integration/*.jpg
prune src/llama_stack/distributions/ci-tests

174
README.md
View file

@ -7,82 +7,22 @@
[![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain) [![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)
[![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain) [![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)
[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack) [**Quick Start**](https://llamastack.github.io/docs/getting_started/quickstart) | [**Documentation**](https://llamastack.github.io/docs) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
### ✨🎉 Llama 4 Support 🎉✨
We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.
<details>
<summary>👋 Click here to see how to run Llama 4 models on Llama Stack </summary>
\
*Note you need 8xH100 GPU-host to run these models*
```bash
pip install -U llama_stack
MODEL="Llama-4-Scout-17B-16E-Instruct"
# get meta url from llama.com
llama model download --source meta --model-id $MODEL --meta-url <META_URL>
# start a llama stack server
INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu
# install client to interact with the server
pip install llama-stack-client
```
### CLI
```bash
# Run a chat completion
llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id meta-llama/$MODEL \
--message "write a haiku for meta's llama 4 models"
ChatCompletionResponse(
completion_message=CompletionMessage(content="Whispers in code born\nLlama's gentle, wise heartbeat\nFuture's soft unfold", role='assistant', stop_reason='end_of_turn', tool_calls=[]),
logprobs=None,
metrics=[Metric(metric='prompt_tokens', value=21.0, unit=None), Metric(metric='completion_tokens', value=28.0, unit=None), Metric(metric='total_tokens', value=49.0, unit=None)]
)
```
### Python SDK
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url=f"http://localhost:8321")
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
prompt = "Write a haiku about coding"
print(f"User> {prompt}")
response = client.inference.chat_completion(
model_id=model_id,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt},
],
)
print(f"Assistant> {response.completion_message.content}")
```
As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!
</details>
### 🚀 One-Line Installer 🚀 ### 🚀 One-Line Installer 🚀
To try Llama Stack locally, run: To try Llama Stack locally, run:
```bash ```bash
curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | bash curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash
``` ```
### Overview ### Overview
Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides Llama Stack defines and standardizes the core building blocks that simplify AI application development. It provides a unified set of APIs with implementations from leading service providers. More specifically, it provides:
- **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry. - **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals.
- **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile. - **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
- **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment. - **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment.
- **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android. - **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android.
@ -97,74 +37,81 @@ Llama Stack standardizes the core building blocks that simplify AI application d
/> />
</div> </div>
### Llama Stack Benefits #### Llama Stack Benefits
- **Flexible Options**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
- **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
- **Robust Ecosystem**: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.
By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications. - **Flexibility**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
- **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
- **Robust Ecosystem**: Llama Stack is integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.
For more information, see the [Benefits of Llama Stack](https://llamastack.github.io/docs/latest/concepts/architecture#benefits-of-llama-stack) documentation.
### API Providers ### API Providers
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
Please checkout for [full list](https://llamastack.github.io/docs/providers)
| **API Provider Builder** | **Environments** | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** | | API Provider | Environments | Agents | Inference | VectorIO | Safety | Post Training | Eval | DatasetIO |
|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:| |:--------------------:|:------------:|:------:|:---------:|:--------:|:------:|:-------------:|:----:|:--------:|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | | | Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SambaNova | Hosted | | ✅ | | ✅ | | | | SambaNova | Hosted | | ✅ | | ✅ | | | |
| Cerebras | Hosted | | ✅ | | | | | | Cerebras | Hosted | | ✅ | | | | | |
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | Fireworks | Hosted | ✅ | ✅ | ✅ | | | | |
| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | AWS Bedrock | Hosted | | ✅ | | ✅ | | | |
| Together | Hosted | ✅ | ✅ | | ✅ | | | | Together | Hosted | ✅ | ✅ | | ✅ | | | |
| Groq | Hosted | | ✅ | | | | | | Groq | Hosted | | ✅ | | | | | |
| Ollama | Single Node | | ✅ | | | | | | Ollama | Single Node | | ✅ | | | | | |
| TGI | Hosted and Single Node | | ✅ | | | | | | TGI | Hosted/Single Node | | ✅ | | | | | |
| NVIDIA NIM | Hosted and Single Node | | ✅ | | | | | | NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | |
| Chroma | Single Node | | | ✅ | | | | | ChromaDB | Hosted/Single Node | | | ✅ | | | | |
| PG Vector | Single Node | | | ✅ | | | | | Milvus | Hosted/Single Node | | | ✅ | | | | |
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | Qdrant | Hosted/Single Node | | | ✅ | | | | |
| vLLM | Hosted and Single Node | | ✅ | | | | | | Weaviate | Hosted/Single Node | | | ✅ | | | | |
| OpenAI | Hosted | | ✅ | | | | | | SQLite-vec | Single Node | | | ✅ | | | | |
| Anthropic | Hosted | | ✅ | | | | | | PG Vector | Single Node | | | ✅ | | | | |
| Gemini | Hosted | | ✅ | | | | | | PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | |
| watsonx | Hosted | | ✅ | | | | | | vLLM | Single Node | | ✅ | | | | | |
| HuggingFace | Single Node | | | | | | ✅ | | OpenAI | Hosted | | ✅ | | | | | |
| TorchTune | Single Node | | | | | | ✅ | | Anthropic | Hosted | | ✅ | | | | | |
| NVIDIA NEMO | Hosted | | | | | | ✅ | | Gemini | Hosted | | ✅ | | | | | |
| WatsonX | Hosted | | ✅ | | | | | |
| HuggingFace | Single Node | | | | | ✅ | | ✅ |
| TorchTune | Single Node | | | | | ✅ | | |
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | ✅ | ✅ | ✅ |
| NVIDIA | Hosted | | | | | ✅ | ✅ | ✅ |
> **Note**: Additional providers are available through external packages. See [External Providers](https://llamastack.github.io/docs/providers/external) documentation.
### Distributions ### Distributions
A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support: A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario. For example, you can begin with a local setup of Ollama and seamlessly transition to production, with fireworks, without changing your application code.
Here are some of the distributions we support:
| **Distribution** | **Llama Stack Docker** | Start This Distribution | | **Distribution** | **Llama Stack Docker** | Start This Distribution |
|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:| |:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|
| Meta Reference | [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html) | | Starter Distribution | [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general) | [Guide](https://llamastack.github.io/docs/distributions/self_hosted_distro/starter) |
| SambaNova | [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html) | | Meta Reference | [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) | [Guide](https://llamastack.github.io/docs/distributions/self_hosted_distro/meta-reference-gpu) |
| Cerebras | [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html) | | PostgreSQL | [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general) | |
| Ollama | [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html) |
| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) |
| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) |
| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) |
| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) |
For full documentation on the Llama Stack distributions see the [Distributions Overview](https://llamastack.github.io/docs/distributions) page.
### Documentation ### Documentation
Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest/index.html) page for more details. Please checkout our [Documentation](https://llamastack.github.io/docs) page for more details.
* CLI references * CLI references
* [llama (server-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html): Guide for using the `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution. * [llama (server-side) CLI Reference](https://llamastack.github.io/docs/references/llama_cli_reference): Guide for using the `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
* [llama (client-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_stack_client_cli_reference.html): Guide for using the `llama-stack-client` CLI, which allows you to query information about the distribution. * [llama (client-side) CLI Reference](https://llamastack.github.io/docs/references/llama_stack_client_cli_reference): Guide for using the `llama-stack-client` CLI, which allows you to query information about the distribution.
* Getting Started * Getting Started
* [Quick guide to start a Llama Stack server](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html). * [Quick guide to start a Llama Stack server](https://llamastack.github.io/docs/getting_started/quickstart).
* [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs * [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs
* The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack). * The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack).
* A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples. * A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples.
* [Contributing](CONTRIBUTING.md) * [Contributing](CONTRIBUTING.md)
* [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) to walk-through how to add a new API provider. * [Adding a new API Provider](https://llamastack.github.io/docs/contributing/new_api_provider) to walk-through how to add a new API provider.
### Llama Stack Client SDKs ### Llama Stack Client SDKs
Check out our client SDKs for connecting to a Llama Stack server in your preferred language.
| **Language** | **Client SDK** | **Package** | | **Language** | **Client SDK** | **Package** |
| :----: | :----: | :----: | | :----: | :----: | :----: |
| Python | [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [![PyPI version](https://img.shields.io/pypi/v/llama_stack_client.svg)](https://pypi.org/project/llama_stack_client/) | Python | [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [![PyPI version](https://img.shields.io/pypi/v/llama_stack_client.svg)](https://pypi.org/project/llama_stack_client/)
@ -172,6 +119,17 @@ Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest
| Typescript | [llama-stack-client-typescript](https://github.com/meta-llama/llama-stack-client-typescript) | [![NPM version](https://img.shields.io/npm/v/llama-stack-client.svg)](https://npmjs.org/package/llama-stack-client) | Typescript | [llama-stack-client-typescript](https://github.com/meta-llama/llama-stack-client-typescript) | [![NPM version](https://img.shields.io/npm/v/llama-stack-client.svg)](https://npmjs.org/package/llama-stack-client)
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [![Maven version](https://img.shields.io/maven-central/v/com.llama.llamastack/llama-stack-client-kotlin)](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin) | Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [![Maven version](https://img.shields.io/maven-central/v/com.llama.llamastack/llama-stack-client-kotlin)](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)
Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
## 🌟 GitHub Star History
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=meta-llama/llama-stack&type=Date)](https://www.star-history.com/#meta-llama/llama-stack&Date)
## ✨ Contributors
Thanks to all of our amazing contributors!
<a href="https://github.com/meta-llama/llama-stack/graphs/contributors">
<img src="https://contrib.rocks/image?repo=meta-llama/llama-stack" />
</a>

View file

@ -0,0 +1,229 @@
# Llama Stack Benchmark Suite on Kubernetes
## Motivation
Performance benchmarking is critical for understanding the overhead and characteristics of the Llama Stack abstraction layer compared to direct inference engines like vLLM.
### Why This Benchmark Suite Exists
**Performance Validation**: The Llama Stack provides a unified API layer across multiple inference providers, but this abstraction introduces potential overhead. This benchmark suite quantifies the performance impact by comparing:
- Llama Stack inference (with vLLM backend)
- Direct vLLM inference calls
- Both under identical Kubernetes deployment conditions
**Production Readiness Assessment**: Real-world deployments require understanding performance characteristics under load. This suite simulates concurrent user scenarios with configurable parameters (duration, concurrency, request patterns) to validate production readiness.
**Regression Detection (TODO)**: As the Llama Stack evolves, this benchmark provides automated regression detection for performance changes. CI/CD pipelines can leverage these benchmarks to catch performance degradations before production deployments.
**Resource Planning**: By measuring throughput, latency percentiles, and resource utilization patterns, teams can make informed decisions about:
- Kubernetes resource allocation (CPU, memory, GPU)
- Auto-scaling configurations
- Cost optimization strategies
### Key Metrics Captured
The benchmark suite measures critical performance indicators:
- **Throughput**: Requests per second under sustained load
- **Latency Distribution**: P50, P95, P99 response times
- **Time to First Token (TTFT)**: Critical for streaming applications
- **Inter-Token Latency (ITL)**: Token generation speed for streaming
- **Error Rates**: Request failures and timeout analysis
This data enables data-driven architectural decisions and performance optimization efforts.
## Setup
**1. Deploy base k8s infrastructure:**
```bash
cd ../../docs/source/distributions/k8s
./apply.sh
```
**2. Deploy benchmark components:**
```bash
./apply.sh
```
**3. Verify deployment:**
```bash
kubectl get pods
# Should see: llama-stack-benchmark-server, vllm-server, etc.
```
## Benchmark Results
We use [GuideLLM](https://github.com/neuralmagic/guidellm) against our k8s deployment for comprehensive performance testing.
### Performance - 1 vLLM Replica
We vary the number of Llama Stack replicas with 1 vLLM replica and compare performance below.
![Performance - 1 vLLM Replica](results/vllm_replica1_benchmark_results.png)
For full results see the `benchmarking/k8s-benchmark/results/` directory.
## Quick Start
Follow the instructions below to run benchmarks similar to the ones above.
### Comprehensive Benchmark Suite
**Run all benchmarks with different cluster configurations:**
```bash
./scripts/run-all-benchmarks.sh
```
This script will automatically:
- Scale deployments to different configurations
- Run benchmarks for each setup
- Generate output files with meaningful names that include setup information
### Individual Benchmarks
**Benchmark Llama Stack (runs against current cluster setup):**
```bash
./scripts/run-guidellm-benchmark.sh --target stack
```
**Benchmark vLLM direct (runs against current cluster setup):**
```bash
./scripts/run-guidellm-benchmark.sh --target vllm
```
**Benchmark with custom parameters:**
```bash
./scripts/run-guidellm-benchmark.sh --target stack --max-seconds 120 --prompt-tokens 1024 --output-tokens 512
```
**Benchmark with custom output file:**
```bash
./scripts/run-guidellm-benchmark.sh --target stack --output-file results/my-custom-benchmark.txt
```
### Generating Charts
Once the benchmarks are run, you can generate performance charts from benchmark results:
```bash
uv run ./scripts/generate_charts.py
```
This loads runs in the `results/` directory and creates visualizations comparing different configurations and replica counts.
## Benchmark Workflow
The benchmark suite is organized into two main scripts with distinct responsibilities:
### 1. `run-all-benchmarks.sh` - Orchestration & Scaling
- **Purpose**: Manages different cluster configurations and orchestrates benchmark runs
- **Responsibilities**:
- Scales Kubernetes deployments (vLLM replicas, Stack replicas, worker counts)
- Runs benchmarks for each configuration
- Generates meaningful output filenames with setup information
- **Use case**: Running comprehensive performance testing across multiple configurations
### 2. `run-guidellm-benchmark.sh` - Single Benchmark Execution
- **Purpose**: Executes a single benchmark against the current cluster state
- **Responsibilities**:
- Runs GuideLLM benchmark with configurable parameters
- Accepts custom output file paths
- No cluster scaling - benchmarks current deployment state
- **Use case**: Testing specific configurations or custom scenarios
### Typical Workflow
1. **Comprehensive Testing**: Use `run-all-benchmarks.sh` to automatically test multiple configurations
2. **Custom Testing**: Use `run-guidellm-benchmark.sh` for specific parameter testing or manual cluster configurations
3. **Analysis**: Use `generate_charts.py` to visualize results from either approach
## Command Reference
### run-all-benchmarks.sh
Orchestrates multiple benchmark runs with different cluster configurations. This script:
- Automatically scales deployments before each benchmark
- Runs benchmarks against the configured cluster setup
- Generates meaningfully named output files
```bash
./scripts/run-all-benchmarks.sh
```
**Configuration**: Edit the `configs` array in the script to customize benchmark configurations:
```bash
# Each line: (target, stack_replicas, vllm_replicas, stack_workers)
configs=(
"stack 1 1 1"
"stack 1 1 2"
"stack 1 1 4"
"vllm 1 1 -"
)
```
**Output files**: Generated with setup information in filename:
- Stack: `guidellm-benchmark-stack-s{replicas}-sw{workers}-v{vllm_replicas}-{timestamp}.txt`
- vLLM: `guidellm-benchmark-vllm-v{vllm_replicas}-{timestamp}.txt`
### run-guidellm-benchmark.sh Options
Runs a single benchmark against the current cluster setup (no scaling).
```bash
./scripts/run-guidellm-benchmark.sh [options]
Options:
-t, --target <stack|vllm> Target to benchmark (default: stack)
-s, --max-seconds <seconds> Maximum duration in seconds (default: 60)
-p, --prompt-tokens <tokens> Number of prompt tokens (default: 512)
-o, --output-tokens <tokens> Number of output tokens (default: 256)
-r, --rate-type <type> Rate type (default: concurrent)
-c, --rate Rate (default: 1,2,4,8,16,32,64,128)
--output-file <path> Output file path (default: auto-generated)
--stack-deployment <name> Name of the stack deployment (default: llama-stack-benchmark-server)
--vllm-deployment <name> Name of the vllm deployment (default: vllm-server)
--stack-url <url> URL of the stack service (default: http://llama-stack-benchmark-service:8323/v1/openai)
-h, --help Show help message
Examples:
./scripts/run-guidellm-benchmark.sh --target vllm # Benchmark vLLM direct
./scripts/run-guidellm-benchmark.sh --target stack # Benchmark Llama Stack (default)
./scripts/run-guidellm-benchmark.sh -t vllm -s 60 -p 512 -o 256 # vLLM with custom parameters
./scripts/run-guidellm-benchmark.sh --output-file results/my-benchmark.txt # Specify custom output file
./scripts/run-guidellm-benchmark.sh --stack-deployment my-stack-server # Use custom stack deployment name
```
## Local Testing
### Running Benchmark Locally
For local development without Kubernetes:
**1. (Optional) Start Mock OpenAI server:**
There is a simple mock OpenAI server if you don't have an inference provider available.
The `openai-mock-server.py` provides:
- **OpenAI-compatible API** for testing without real models
- **Configurable streaming delay** via `STREAM_DELAY_SECONDS` env var
- **Consistent responses** for reproducible benchmarks
- **Lightweight testing** without GPU requirements
```bash
uv run python openai-mock-server.py --port 8080
```
**2. Start Stack server:**
```bash
LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 --factory
```
**3. Run GuideLLM benchmark:**
```bash
GUIDELLM__PREFERRED_ROUTE="chat_completions" uv run guidellm benchmark run \
--target "http://localhost:8321/v1/openai/v1" \
--model "meta-llama/Llama-3.2-3B-Instruct" \
--rate-type sweep \
--max-seconds 60 \
--data "prompt_tokens=256,output_tokens=128" --output-path='output.html'
```

View file

@ -0,0 +1,33 @@
#!/usr/bin/env bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Deploys the benchmark-specific components on top of the base k8s deployment (../k8s/apply.sh).
export STREAM_DELAY_SECONDS=0.005
export POSTGRES_USER=llamastack
export POSTGRES_DB=llamastack
export POSTGRES_PASSWORD=llamastack
export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
export SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
export BENCHMARK_INFERENCE_MODEL=$INFERENCE_MODEL
export LLAMA_STACK_WORKERS=4
set -euo pipefail
set -x
# Deploy benchmark-specific components
kubectl create configmap llama-stack-config --from-file=stack_run_config.yaml \
--dry-run=client -o yaml > stack-configmap.yaml
kubectl apply --validate=false -f stack-configmap.yaml
# Deploy our custom llama stack server (overriding the base one)
envsubst < stack-k8s.yaml.template | kubectl apply --validate=false -f -

View file

@ -0,0 +1,202 @@
#!/usr/bin/env python3
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
"""
OpenAI-compatible mock server that returns:
- Hardcoded /models response for consistent validation
- Valid OpenAI-formatted chat completion responses with dynamic content
"""
import argparse
import json
import os
import random
import time
import uuid
from flask import Flask, Response, jsonify, request
app = Flask(__name__)
# Models from environment variables
def get_models():
models_str = os.getenv("MOCK_MODELS", "meta-llama/Llama-3.2-3B-Instruct")
model_ids = [m.strip() for m in models_str.split(",") if m.strip()]
return {
"object": "list",
"data": [
{"id": model_id, "object": "model", "created": 1234567890, "owned_by": "vllm"} for model_id in model_ids
],
}
def generate_random_text(length=50):
"""Generate random but coherent text for responses."""
words = [
"Hello",
"there",
"I'm",
"an",
"AI",
"assistant",
"ready",
"to",
"help",
"you",
"with",
"your",
"questions",
"and",
"tasks",
"today",
"Let",
"me",
"know",
"what",
"you'd",
"like",
"to",
"discuss",
"or",
"explore",
"together",
"I",
"can",
"assist",
"with",
"various",
"topics",
"including",
"coding",
"writing",
"analysis",
"and",
"more",
]
return " ".join(random.choices(words, k=length))
@app.route("/v1/models", methods=["GET"])
def list_models():
models = get_models()
print(f"[MOCK] Returning models: {[m['id'] for m in models['data']]}")
return jsonify(models)
@app.route("/v1/chat/completions", methods=["POST"])
def chat_completions():
"""Return OpenAI-formatted chat completion responses."""
data = request.get_json()
default_model = get_models()["data"][0]["id"]
model = data.get("model", default_model)
messages = data.get("messages", [])
stream = data.get("stream", False)
print(f"[MOCK] Chat completion request - model: {model}, stream: {stream}")
if stream:
return handle_streaming_completion(model, messages)
else:
return handle_non_streaming_completion(model, messages)
def handle_non_streaming_completion(model, messages):
response_text = generate_random_text(random.randint(20, 80))
# Calculate realistic token counts
prompt_tokens = sum(len(str(msg.get("content", "")).split()) for msg in messages)
completion_tokens = len(response_text.split())
response = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion",
"created": int(time.time()),
"model": model,
"choices": [{"index": 0, "message": {"role": "assistant", "content": response_text}, "finish_reason": "stop"}],
"usage": {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens,
},
}
return jsonify(response)
def handle_streaming_completion(model, messages):
def generate_stream():
# Generate response text
full_response = generate_random_text(random.randint(30, 100))
words = full_response.split()
# Send initial chunk
initial_chunk = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model,
"choices": [{"index": 0, "delta": {"role": "assistant", "content": ""}}],
}
yield f"data: {json.dumps(initial_chunk)}\n\n"
# Send word by word
for i, word in enumerate(words):
chunk = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model,
"choices": [{"index": 0, "delta": {"content": f"{word} " if i < len(words) - 1 else word}}],
}
yield f"data: {json.dumps(chunk)}\n\n"
# Configurable delay to simulate realistic streaming
stream_delay = float(os.getenv("STREAM_DELAY_SECONDS", "0.005"))
time.sleep(stream_delay)
# Send final chunk
final_chunk = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model,
"choices": [{"index": 0, "delta": {"content": ""}, "finish_reason": "stop"}],
}
yield f"data: {json.dumps(final_chunk)}\n\n"
yield "data: [DONE]\n\n"
return Response(
generate_stream(),
mimetype="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Access-Control-Allow-Origin": "*",
},
)
@app.route("/health", methods=["GET"])
def health():
return jsonify({"status": "healthy", "type": "openai-mock"})
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="OpenAI-compatible mock server")
parser.add_argument("--port", type=int, default=8081, help="Port to run the server on (default: 8081)")
args = parser.parse_args()
port = args.port
models = get_models()
print("Starting OpenAI-compatible mock server...")
print(f"- /models endpoint with: {[m['id'] for m in models['data']]}")
print("- OpenAI-formatted chat/completion responses with dynamic content")
print("- Streaming support with valid SSE format")
print(f"- Listening on: http://0.0.0.0:{port}")
app.run(host="0.0.0.0", port=port, debug=False)

View file

@ -0,0 +1,171 @@
Collecting uv
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.9/20.9 MB 144.3 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.8.19
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: pip install --upgrade pip
Using Python 3.11.13 environment at: /usr/local
Resolved 61 packages in 551ms
Downloading pillow (6.3MiB)
Downloading hf-xet (3.0MiB)
Downloading tokenizers (3.1MiB)
Downloading pygments (1.2MiB)
Downloading pandas (11.8MiB)
Downloading aiohttp (1.7MiB)
Downloading pydantic-core (1.9MiB)
Downloading numpy (16.2MiB)
Downloading transformers (11.1MiB)
Downloading pyarrow (40.8MiB)
Downloading pydantic-core
Downloading aiohttp
Downloading tokenizers
Downloading hf-xet
Downloading pygments
Downloading pillow
Downloading numpy
Downloading pandas
Downloading transformers
Downloading pyarrow
Prepared 61 packages in 1.23s
Installed 61 packages in 114ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.12.15
+ aiosignal==1.4.0
+ annotated-types==0.7.0
+ anyio==4.10.0
+ attrs==25.3.0
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ click==8.1.8
+ datasets==4.1.1
+ dill==0.4.0
+ filelock==3.19.1
+ frozenlist==1.7.0
+ fsspec==2025.9.0
+ ftfy==6.3.1
+ guidellm==0.3.0
+ h11==0.16.0
+ h2==4.3.0
+ hf-xet==1.1.10
+ hpack==4.1.0
+ httpcore==1.0.9
+ httpx==0.28.1
+ huggingface-hub==0.35.0
+ hyperframe==6.1.0
+ idna==3.10
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ mdurl==0.1.2
+ multidict==6.6.4
+ multiprocess==0.70.16
+ numpy==2.3.3
+ packaging==25.0
+ pandas==2.3.2
+ pillow==11.3.0
+ propcache==0.3.2
+ protobuf==6.32.1
+ pyarrow==21.0.0
+ pydantic==2.11.9
+ pydantic-core==2.33.2
+ pydantic-settings==2.10.1
+ pygments==2.19.2
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.1.1
+ pytz==2025.2
+ pyyaml==6.0.2
+ regex==2025.9.18
+ requests==2.32.5
+ rich==14.1.0
+ safetensors==0.6.2
+ six==1.17.0
+ sniffio==1.3.1
+ tokenizers==0.22.1
+ tqdm==4.67.1
+ transformers==4.56.2
+ typing-extensions==4.15.0
+ typing-inspection==0.4.1
+ tzdata==2025.2
+ urllib3==2.5.0
+ wcwidth==0.2.14
+ xxhash==3.5.0
+ yarl==1.20.1
Using Python 3.11.13 environment at: /usr/local
Audited 1 package in 3ms
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Creating backend...
Backend openai_http connected to http://llama-stack-benchmark-service:8323/v1/openai for model meta-llama/Llama-3.2-3B-Instruct.
Creating request loader...
Created loader with 1000 unique requests from prompt_tokens=512,output_tokens=256.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [17:34:30] ⠋ 100% concurrent@1 (complete) Req: 0.3 req/s, 3.32s Lat, 1.0 Conc, 18 Comp, 1 Inc, 0 Err │
│ Tok: 74.0 gen/s, 238.6 tot/s, 40.2ms TTFT, 13.4ms ITL, 546 Prompt, 246 Gen │
│ [17:35:35] ⠋ 100% concurrent@2 (complete) Req: 0.6 req/s, 3.46s Lat, 2.0 Conc, 34 Comp, 2 Inc, 0 Err │
│ Tok: 139.6 gen/s, 454.0 tot/s, 48.0ms TTFT, 14.1ms ITL, 546 Prompt, 243 Gen │
│ [17:36:40] ⠋ 100% concurrent@4 (complete) Req: 1.1 req/s, 3.44s Lat, 3.9 Conc, 68 Comp, 4 Inc, 0 Err │
│ Tok: 273.2 gen/s, 900.4 tot/s, 50.7ms TTFT, 14.3ms ITL, 546 Prompt, 238 Gen │
│ [17:37:45] ⠋ 100% concurrent@8 (complete) Req: 2.2 req/s, 3.55s Lat, 7.7 Conc, 129 Comp, 8 Inc, 0 Err │
│ Tok: 519.1 gen/s, 1699.8 tot/s, 66.0ms TTFT, 14.6ms ITL, 547 Prompt, 240 Gen │
│ [17:38:50] ⠋ 100% concurrent@16 (complete) Req: 4.1 req/s, 3.76s Lat, 15.5 Conc, 247 Comp, 16 Inc, 0 Err │
│ Tok: 1005.5 gen/s, 3256.7 tot/s, 101.0ms TTFT, 15.0ms ITL, 547 Prompt, 244 Gen │
│ [17:39:56] ⠋ 100% concurrent@32 (complete) Req: 8.1 req/s, 3.84s Lat, 30.9 Conc, 483 Comp, 32 Inc, 0 Err │
│ Tok: 1926.3 gen/s, 6327.2 tot/s, 295.7ms TTFT, 14.8ms ITL, 547 Prompt, 239 Gen │
│ [17:41:03] ⠋ 100% concurrent@64 (complete) Req: 9.9 req/s, 6.05s Lat, 59.7 Conc, 576 Comp, 58 Inc, 0 Err │
│ Tok: 2381.0 gen/s, 7774.5 tot/s, 1196.2ms TTFT, 20.2ms ITL, 547 Prompt, 241 Gen │
│ [17:42:10] ⠋ 100% concurrent@128 (complete) Req: 9.2 req/s, 11.59s Lat, 107.2 Conc, 514 Comp, 117 Inc, 0 Err │
│ Tok: 2233.4 gen/s, 7286.3 tot/s, 2403.9ms TTFT, 38.2ms ITL, 547 Prompt, 242 Gen │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (8/8) [ 0:08:41 < 0:00:00 ]
Benchmarks Metadata:
Run id:511a14fd-ba11-4ffa-92ef-7cc23db4dd38
Duration:528.5 seconds
Profile:type=concurrent, strategies=['concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent'], streams=[1, 2, 4, 8, 16, 32, 64, 128]
Args:max_number=None, max_duration=60.0, warmup_number=None, warmup_duration=3.0, cooldown_number=None, cooldown_duration=None
Worker:type_='generative_requests_worker' backend_type='openai_http' backend_target='http://llama-stack-benchmark-service:8323/v1/openai' backend_model='meta-llama/Llama-3.2-3B-Instruct'
backend_info={'max_output_tokens': 16384, 'timeout': 300, 'http2': True, 'follow_redirects': True, 'headers': {}, 'text_completions_path': '/v1/completions', 'chat_completions_path':
'/v1/chat/completions'}
Request Loader:type_='generative_request_loader' data='prompt_tokens=512,output_tokens=256' data_args=None processor='meta-llama/Llama-3.2-3B-Instruct' processor_args=None
Extras:None
Benchmarks Info:
===================================================================================================================================================
Metadata |||| Requests Made ||| Prompt Tok/Req ||| Output Tok/Req ||| Prompt Tok Total||| Output Tok Total||
Benchmark| Start Time| End Time| Duration (s)| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err
--------------|-----------|---------|-------------|------|-----|-----|------|------|----|------|------|----|-------|------|----|-------|------|----
concurrent@1| 17:34:35| 17:35:35| 60.0| 18| 1| 0| 546.4| 512.0| 0.0| 246.0| 14.0| 0.0| 9835| 512| 0| 4428| 14| 0
concurrent@2| 17:35:40| 17:36:40| 60.0| 34| 2| 0| 546.4| 512.0| 0.0| 242.7| 80.0| 0.0| 18577| 1024| 0| 8253| 160| 0
concurrent@4| 17:36:45| 17:37:45| 60.0| 68| 4| 0| 546.4| 512.0| 0.0| 238.1| 103.2| 0.0| 37156| 2048| 0| 16188| 413| 0
concurrent@8| 17:37:50| 17:38:50| 60.0| 129| 8| 0| 546.7| 512.0| 0.0| 240.3| 180.0| 0.0| 70518| 4096| 0| 31001| 1440| 0
concurrent@16| 17:38:55| 17:39:55| 60.0| 247| 16| 0| 546.6| 512.0| 0.0| 244.1| 142.6| 0.0| 135002| 8192| 0| 60300| 2281| 0
concurrent@32| 17:40:01| 17:41:01| 60.0| 483| 32| 0| 546.5| 512.0| 0.0| 239.2| 123.2| 0.0| 263972| 16384| 0| 115540| 3944| 0
concurrent@64| 17:41:08| 17:42:08| 60.0| 576| 58| 0| 546.6| 512.0| 0.0| 241.3| 13.9| 0.0| 314817| 29696| 0| 138976| 807| 0
concurrent@128| 17:42:15| 17:43:15| 60.0| 514| 117| 0| 546.5| 512.0| 0.0| 241.6| 143.9| 0.0| 280911| 59904| 0| 124160| 16832| 0
===================================================================================================================================================
Benchmarks Stats:
=======================================================================================================================================================
Metadata | Request Stats || Out Tok/sec| Tot Tok/sec| Req Latency (sec) ||| TTFT (ms) ||| ITL (ms) ||| TPOT (ms) ||
Benchmark| Per Second| Concurrency| mean| mean| mean| median| p99| mean| median| p99| mean| median| p99| mean| median| p99
--------------|-----------|------------|------------|------------|------|-------|------|-------|-------|-------|-----|-------|-----|-----|-------|-----
concurrent@1| 0.30| 1.00| 74.0| 238.6| 3.32| 3.43| 3.61| 40.2| 39.3| 51.2| 13.4| 13.3| 14.0| 13.3| 13.2| 13.9
concurrent@2| 0.58| 1.99| 139.6| 454.0| 3.46| 3.64| 3.74| 48.0| 45.8| 72.0| 14.1| 14.1| 14.5| 14.0| 14.0| 14.4
concurrent@4| 1.15| 3.95| 273.2| 900.4| 3.44| 3.69| 3.74| 50.7| 47.2| 118.6| 14.3| 14.3| 14.4| 14.2| 14.2| 14.4
concurrent@8| 2.16| 7.67| 519.1| 1699.8| 3.55| 3.76| 3.87| 66.0| 48.8| 208.2| 14.6| 14.5| 14.8| 14.5| 14.5| 14.8
concurrent@16| 4.12| 15.48| 1005.5| 3256.7| 3.76| 3.90| 4.18| 101.0| 65.6| 396.7| 15.0| 15.0| 15.9| 15.0| 15.0| 15.9
concurrent@32| 8.05| 30.89| 1926.3| 6327.2| 3.84| 4.04| 4.39| 295.7| 265.6| 720.4| 14.8| 14.9| 15.5| 14.8| 14.8| 15.3
concurrent@64| 9.87| 59.74| 2381.0| 7774.5| 6.05| 6.18| 9.94| 1196.2| 1122.5| 4295.3| 20.2| 20.0| 25.8| 20.1| 19.9| 25.8
concurrent@128| 9.25| 107.16| 2233.4| 7286.3| 11.59| 12.04| 14.46| 2403.9| 2322.3| 4001.5| 38.2| 38.5| 53.0| 38.0| 38.3| 52.7
=======================================================================================================================================================
Saving benchmarks report...
Benchmarks report saved to /benchmarks.json
Benchmarking complete.

View file

@ -0,0 +1,171 @@
Collecting uv
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.9/20.9 MB 149.3 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.8.19
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: pip install --upgrade pip
Using Python 3.11.13 environment at: /usr/local
Resolved 61 packages in 494ms
Downloading pandas (11.8MiB)
Downloading tokenizers (3.1MiB)
Downloading pygments (1.2MiB)
Downloading aiohttp (1.7MiB)
Downloading transformers (11.1MiB)
Downloading numpy (16.2MiB)
Downloading pillow (6.3MiB)
Downloading pydantic-core (1.9MiB)
Downloading hf-xet (3.0MiB)
Downloading pyarrow (40.8MiB)
Downloading pydantic-core
Downloading aiohttp
Downloading tokenizers
Downloading hf-xet
Downloading pillow
Downloading pygments
Downloading numpy
Downloading pandas
Downloading pyarrow
Downloading transformers
Prepared 61 packages in 1.24s
Installed 61 packages in 126ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.12.15
+ aiosignal==1.4.0
+ annotated-types==0.7.0
+ anyio==4.10.0
+ attrs==25.3.0
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ click==8.1.8
+ datasets==4.1.1
+ dill==0.4.0
+ filelock==3.19.1
+ frozenlist==1.7.0
+ fsspec==2025.9.0
+ ftfy==6.3.1
+ guidellm==0.3.0
+ h11==0.16.0
+ h2==4.3.0
+ hf-xet==1.1.10
+ hpack==4.1.0
+ httpcore==1.0.9
+ httpx==0.28.1
+ huggingface-hub==0.35.0
+ hyperframe==6.1.0
+ idna==3.10
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ mdurl==0.1.2
+ multidict==6.6.4
+ multiprocess==0.70.16
+ numpy==2.3.3
+ packaging==25.0
+ pandas==2.3.2
+ pillow==11.3.0
+ propcache==0.3.2
+ protobuf==6.32.1
+ pyarrow==21.0.0
+ pydantic==2.11.9
+ pydantic-core==2.33.2
+ pydantic-settings==2.10.1
+ pygments==2.19.2
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.1.1
+ pytz==2025.2
+ pyyaml==6.0.2
+ regex==2025.9.18
+ requests==2.32.5
+ rich==14.1.0
+ safetensors==0.6.2
+ six==1.17.0
+ sniffio==1.3.1
+ tokenizers==0.22.1
+ tqdm==4.67.1
+ transformers==4.56.2
+ typing-extensions==4.15.0
+ typing-inspection==0.4.1
+ tzdata==2025.2
+ urllib3==2.5.0
+ wcwidth==0.2.14
+ xxhash==3.5.0
+ yarl==1.20.1
Using Python 3.11.13 environment at: /usr/local
Audited 1 package in 3ms
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Creating backend...
Backend openai_http connected to http://llama-stack-benchmark-service:8323/v1/openai for model meta-llama/Llama-3.2-3B-Instruct.
Creating request loader...
Created loader with 1000 unique requests from prompt_tokens=512,output_tokens=256.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [17:45:18] ⠋ 100% concurrent@1 (complete) Req: 0.3 req/s, 3.42s Lat, 1.0 Conc, 17 Comp, 1 Inc, 0 Err │
│ Tok: 73.9 gen/s, 233.7 tot/s, 50.2ms TTFT, 13.4ms ITL, 547 Prompt, 253 Gen │
│ [17:46:23] ⠋ 100% concurrent@2 (complete) Req: 0.6 req/s, 3.42s Lat, 2.0 Conc, 34 Comp, 2 Inc, 0 Err │
│ Tok: 134.7 gen/s, 447.4 tot/s, 50.8ms TTFT, 14.3ms ITL, 546 Prompt, 235 Gen │
│ [17:47:28] ⠋ 100% concurrent@4 (complete) Req: 1.1 req/s, 3.55s Lat, 3.9 Conc, 66 Comp, 4 Inc, 0 Err │
│ Tok: 268.7 gen/s, 873.1 tot/s, 54.9ms TTFT, 14.4ms ITL, 547 Prompt, 243 Gen │
│ [17:48:33] ⠋ 100% concurrent@8 (complete) Req: 2.2 req/s, 3.56s Lat, 7.8 Conc, 130 Comp, 8 Inc, 0 Err │
│ Tok: 526.1 gen/s, 1728.4 tot/s, 60.6ms TTFT, 14.7ms ITL, 547 Prompt, 239 Gen │
│ [17:49:38] ⠋ 100% concurrent@16 (complete) Req: 4.1 req/s, 3.79s Lat, 15.7 Conc, 246 Comp, 16 Inc, 0 Err │
│ Tok: 1006.9 gen/s, 3268.6 tot/s, 74.8ms TTFT, 15.3ms ITL, 547 Prompt, 243 Gen │
│ [17:50:44] ⠋ 100% concurrent@32 (complete) Req: 7.8 req/s, 3.95s Lat, 30.9 Conc, 467 Comp, 32 Inc, 0 Err │
│ Tok: 1912.0 gen/s, 6191.6 tot/s, 119.1ms TTFT, 15.7ms ITL, 547 Prompt, 244 Gen │
│ [17:51:50] ⠋ 100% concurrent@64 (complete) Req: 13.0 req/s, 4.75s Lat, 61.8 Conc, 776 Comp, 64 Inc, 0 Err │
│ Tok: 3154.3 gen/s, 10273.3 tot/s, 339.1ms TTFT, 18.3ms ITL, 547 Prompt, 242 Gen │
│ [17:52:58] ⠋ 100% concurrent@128 (complete) Req: 15.1 req/s, 7.82s Lat, 117.7 Conc, 898 Comp, 127 Inc, 0 Err │
│ Tok: 3617.4 gen/s, 11843.9 tot/s, 1393.8ms TTFT, 26.8ms ITL, 547 Prompt, 240 Gen │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (8/8) [ 0:08:41 < 0:00:00 ]
Benchmarks Metadata:
Run id:f73d408e-256a-4c32-aa40-05e8d7098b66
Duration:529.2 seconds
Profile:type=concurrent, strategies=['concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent'], streams=[1, 2, 4, 8, 16, 32, 64, 128]
Args:max_number=None, max_duration=60.0, warmup_number=None, warmup_duration=3.0, cooldown_number=None, cooldown_duration=None
Worker:type_='generative_requests_worker' backend_type='openai_http' backend_target='http://llama-stack-benchmark-service:8323/v1/openai' backend_model='meta-llama/Llama-3.2-3B-Instruct'
backend_info={'max_output_tokens': 16384, 'timeout': 300, 'http2': True, 'follow_redirects': True, 'headers': {}, 'text_completions_path': '/v1/completions', 'chat_completions_path':
'/v1/chat/completions'}
Request Loader:type_='generative_request_loader' data='prompt_tokens=512,output_tokens=256' data_args=None processor='meta-llama/Llama-3.2-3B-Instruct' processor_args=None
Extras:None
Benchmarks Info:
=====================================================================================================================================================
Metadata |||| Requests Made ||| Prompt Tok/Req ||| Output Tok/Req ||| Prompt Tok Total||| Output Tok Total ||
Benchmark| Start Time| End Time| Duration (s)| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err
--------------|-----------|---------|-------------|------|-----|-----|------|------|----|------|------|----|-------|------|----|--------|------|-----
concurrent@1| 17:45:23| 17:46:23| 60.0| 17| 1| 0| 546.6| 512.0| 0.0| 252.8| 136.0| 0.0| 9292| 512| 0| 4298| 136| 0
concurrent@2| 17:46:28| 17:47:28| 60.0| 34| 2| 0| 546.4| 512.0| 0.0| 235.4| 130.0| 0.0| 18577| 1024| 0| 8003| 260| 0
concurrent@4| 17:47:33| 17:48:33| 60.0| 66| 4| 0| 546.5| 512.0| 0.0| 243.0| 97.5| 0.0| 36072| 2048| 0| 16035| 390| 0
concurrent@8| 17:48:38| 17:49:38| 60.0| 130| 8| 0| 546.6| 512.0| 0.0| 239.2| 146.0| 0.0| 71052| 4096| 0| 31090| 1168| 0
concurrent@16| 17:49:43| 17:50:43| 60.0| 246| 16| 0| 546.6| 512.0| 0.0| 243.3| 112.3| 0.0| 134456| 8192| 0| 59862| 1797| 0
concurrent@32| 17:50:49| 17:51:49| 60.0| 467| 32| 0| 546.6| 512.0| 0.0| 244.2| 147.3| 0.0| 255242| 16384| 0| 114038| 4714| 0
concurrent@64| 17:51:55| 17:52:55| 60.0| 776| 64| 0| 546.5| 512.0| 0.0| 242.2| 106.1| 0.0| 424115| 32768| 0| 187916| 6788| 0
concurrent@128| 17:53:03| 17:54:03| 60.0| 898| 127| 0| 546.5| 512.0| 0.0| 240.3| 69.8| 0.0| 490789| 65024| 0| 215810| 8864| 0
=====================================================================================================================================================
Benchmarks Stats:
======================================================================================================================================================
Metadata | Request Stats || Out Tok/sec| Tot Tok/sec| Req Latency (sec)||| TTFT (ms) ||| ITL (ms) ||| TPOT (ms) ||
Benchmark| Per Second| Concurrency| mean| mean| mean| median| p99| mean| median| p99| mean| median| p99| mean| median| p99
--------------|-----------|------------|------------|------------|-----|-------|------|-------|-------|-------|-----|-------|-----|-----|-------|-----
concurrent@1| 0.29| 1.00| 73.9| 233.7| 3.42| 3.45| 3.50| 50.2| 50.9| 62.5| 13.4| 13.4| 13.5| 13.3| 13.3| 13.5
concurrent@2| 0.57| 1.96| 134.7| 447.4| 3.42| 3.67| 4.12| 50.8| 49.2| 79.8| 14.3| 14.2| 15.9| 14.3| 14.2| 15.9
concurrent@4| 1.11| 3.92| 268.7| 873.1| 3.55| 3.72| 3.80| 54.9| 51.7| 101.3| 14.4| 14.4| 14.5| 14.4| 14.4| 14.5
concurrent@8| 2.20| 7.82| 526.1| 1728.4| 3.56| 3.78| 3.93| 60.6| 49.8| 189.5| 14.7| 14.7| 14.8| 14.6| 14.6| 14.8
concurrent@16| 4.14| 15.66| 1006.9| 3268.6| 3.79| 3.94| 4.25| 74.8| 54.3| 328.4| 15.3| 15.3| 16.1| 15.2| 15.2| 16.0
concurrent@32| 7.83| 30.91| 1912.0| 6191.6| 3.95| 4.07| 4.53| 119.1| 80.5| 674.0| 15.7| 15.6| 17.4| 15.7| 15.6| 17.3
concurrent@64| 13.03| 61.85| 3154.3| 10273.3| 4.75| 4.93| 5.43| 339.1| 321.1| 1146.6| 18.3| 18.4| 19.3| 18.2| 18.3| 19.2
concurrent@128| 15.05| 117.71| 3617.4| 11843.9| 7.82| 8.58| 13.35| 1393.8| 1453.0| 5232.2| 26.8| 26.7| 36.0| 26.7| 26.6| 35.9
======================================================================================================================================================
Saving benchmarks report...
Benchmarks report saved to /benchmarks.json
Benchmarking complete.

View file

@ -0,0 +1,171 @@
Collecting uv
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.9/20.9 MB 156.8 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.8.19
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: pip install --upgrade pip
Using Python 3.11.13 environment at: /usr/local
Resolved 61 packages in 480ms
Downloading pillow (6.3MiB)
Downloading pydantic-core (1.9MiB)
Downloading pyarrow (40.8MiB)
Downloading aiohttp (1.7MiB)
Downloading numpy (16.2MiB)
Downloading pygments (1.2MiB)
Downloading transformers (11.1MiB)
Downloading pandas (11.8MiB)
Downloading tokenizers (3.1MiB)
Downloading hf-xet (3.0MiB)
Downloading pydantic-core
Downloading aiohttp
Downloading tokenizers
Downloading hf-xet
Downloading pygments
Downloading pillow
Downloading numpy
Downloading pandas
Downloading pyarrow
Downloading transformers
Prepared 61 packages in 1.25s
Installed 61 packages in 126ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.12.15
+ aiosignal==1.4.0
+ annotated-types==0.7.0
+ anyio==4.10.0
+ attrs==25.3.0
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ click==8.1.8
+ datasets==4.1.1
+ dill==0.4.0
+ filelock==3.19.1
+ frozenlist==1.7.0
+ fsspec==2025.9.0
+ ftfy==6.3.1
+ guidellm==0.3.0
+ h11==0.16.0
+ h2==4.3.0
+ hf-xet==1.1.10
+ hpack==4.1.0
+ httpcore==1.0.9
+ httpx==0.28.1
+ huggingface-hub==0.35.0
+ hyperframe==6.1.0
+ idna==3.10
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ mdurl==0.1.2
+ multidict==6.6.4
+ multiprocess==0.70.16
+ numpy==2.3.3
+ packaging==25.0
+ pandas==2.3.2
+ pillow==11.3.0
+ propcache==0.3.2
+ protobuf==6.32.1
+ pyarrow==21.0.0
+ pydantic==2.11.9
+ pydantic-core==2.33.2
+ pydantic-settings==2.10.1
+ pygments==2.19.2
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.1.1
+ pytz==2025.2
+ pyyaml==6.0.2
+ regex==2025.9.18
+ requests==2.32.5
+ rich==14.1.0
+ safetensors==0.6.2
+ six==1.17.0
+ sniffio==1.3.1
+ tokenizers==0.22.1
+ tqdm==4.67.1
+ transformers==4.56.2
+ typing-extensions==4.15.0
+ typing-inspection==0.4.1
+ tzdata==2025.2
+ urllib3==2.5.0
+ wcwidth==0.2.14
+ xxhash==3.5.0
+ yarl==1.20.1
Using Python 3.11.13 environment at: /usr/local
Audited 1 package in 4ms
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Creating backend...
Backend openai_http connected to http://llama-stack-benchmark-service:8323/v1/openai for model meta-llama/Llama-3.2-3B-Instruct.
Creating request loader...
Created loader with 1000 unique requests from prompt_tokens=512,output_tokens=256.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [17:55:59] ⠋ 100% concurrent@1 (complete) Req: 0.3 req/s, 3.33s Lat, 1.0 Conc, 18 Comp, 1 Inc, 0 Err │
│ Tok: 74.0 gen/s, 238.0 tot/s, 49.6ms TTFT, 13.4ms ITL, 546 Prompt, 246 Gen │
│ [17:57:04] ⠋ 100% concurrent@2 (complete) Req: 0.6 req/s, 3.32s Lat, 1.9 Conc, 35 Comp, 2 Inc, 0 Err │
│ Tok: 137.1 gen/s, 457.5 tot/s, 50.6ms TTFT, 14.0ms ITL, 546 Prompt, 234 Gen │
│ [17:58:09] ⠋ 100% concurrent@4 (complete) Req: 1.2 req/s, 3.42s Lat, 4.0 Conc, 69 Comp, 4 Inc, 0 Err │
│ Tok: 276.7 gen/s, 907.2 tot/s, 52.7ms TTFT, 14.1ms ITL, 547 Prompt, 240 Gen │
│ [17:59:14] ⠋ 100% concurrent@8 (complete) Req: 2.3 req/s, 3.47s Lat, 7.8 Conc, 134 Comp, 8 Inc, 0 Err │
│ Tok: 541.4 gen/s, 1775.4 tot/s, 57.3ms TTFT, 14.3ms ITL, 547 Prompt, 240 Gen │
│ [18:00:19] ⠋ 100% concurrent@16 (complete) Req: 4.3 req/s, 3.60s Lat, 15.6 Conc, 259 Comp, 16 Inc, 0 Err │
│ Tok: 1034.8 gen/s, 3401.7 tot/s, 72.3ms TTFT, 14.8ms ITL, 547 Prompt, 239 Gen │
│ [18:01:25] ⠋ 100% concurrent@32 (complete) Req: 8.4 req/s, 3.69s Lat, 31.1 Conc, 505 Comp, 32 Inc, 0 Err │
│ Tok: 2029.7 gen/s, 6641.5 tot/s, 91.6ms TTFT, 15.0ms ITL, 547 Prompt, 241 Gen │
│ [18:02:31] ⠋ 100% concurrent@64 (complete) Req: 13.6 req/s, 4.50s Lat, 61.4 Conc, 818 Comp, 64 Inc, 0 Err │
│ Tok: 3333.9 gen/s, 10787.0 tot/s, 171.3ms TTFT, 17.8ms ITL, 547 Prompt, 244 Gen │
│ [18:03:40] ⠋ 100% concurrent@128 (complete) Req: 16.1 req/s, 7.43s Lat, 119.5 Conc, 964 Comp, 122 Inc, 0 Err │
│ Tok: 3897.0 gen/s, 12679.4 tot/s, 446.4ms TTFT, 28.9ms ITL, 547 Prompt, 243 Gen │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (8/8) [ 0:08:41 < 0:00:00 ]
Benchmarks Metadata:
Run id:5393e64f-d9f8-4548-95d8-da320bba1c24
Duration:530.1 seconds
Profile:type=concurrent, strategies=['concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent'], streams=[1, 2, 4, 8, 16, 32, 64, 128]
Args:max_number=None, max_duration=60.0, warmup_number=None, warmup_duration=3.0, cooldown_number=None, cooldown_duration=None
Worker:type_='generative_requests_worker' backend_type='openai_http' backend_target='http://llama-stack-benchmark-service:8323/v1/openai' backend_model='meta-llama/Llama-3.2-3B-Instruct'
backend_info={'max_output_tokens': 16384, 'timeout': 300, 'http2': True, 'follow_redirects': True, 'headers': {}, 'text_completions_path': '/v1/completions', 'chat_completions_path':
'/v1/chat/completions'}
Request Loader:type_='generative_request_loader' data='prompt_tokens=512,output_tokens=256' data_args=None processor='meta-llama/Llama-3.2-3B-Instruct' processor_args=None
Extras:None
Benchmarks Info:
===================================================================================================================================================
Metadata |||| Requests Made ||| Prompt Tok/Req ||| Output Tok/Req ||| Prompt Tok Total||| Output Tok Total||
Benchmark| Start Time| End Time| Duration (s)| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err
--------------|-----------|---------|-------------|------|-----|-----|------|------|----|------|------|----|-------|------|----|-------|------|----
concurrent@1| 17:56:04| 17:57:04| 60.0| 18| 1| 0| 546.4| 512.0| 0.0| 246.4| 256.0| 0.0| 9836| 512| 0| 4436| 256| 0
concurrent@2| 17:57:09| 17:58:09| 60.0| 35| 2| 0| 546.4| 512.0| 0.0| 233.9| 132.0| 0.0| 19124| 1024| 0| 8188| 264| 0
concurrent@4| 17:58:14| 17:59:14| 60.0| 69| 4| 0| 546.6| 512.0| 0.0| 239.9| 60.5| 0.0| 37715| 2048| 0| 16553| 242| 0
concurrent@8| 17:59:19| 18:00:19| 60.0| 134| 8| 0| 546.6| 512.0| 0.0| 239.8| 126.6| 0.0| 73243| 4096| 0| 32135| 1013| 0
concurrent@16| 18:00:24| 18:01:24| 60.0| 259| 16| 0| 546.6| 512.0| 0.0| 239.0| 115.7| 0.0| 141561| 8192| 0| 61889| 1851| 0
concurrent@32| 18:01:30| 18:02:30| 60.0| 505| 32| 0| 546.5| 512.0| 0.0| 240.5| 113.2| 0.0| 275988| 16384| 0| 121466| 3623| 0
concurrent@64| 18:02:37| 18:03:37| 60.0| 818| 64| 0| 546.6| 512.0| 0.0| 244.5| 132.4| 0.0| 447087| 32768| 0| 199988| 8475| 0
concurrent@128| 18:03:45| 18:04:45| 60.0| 964| 122| 0| 546.5| 512.0| 0.0| 242.5| 133.1| 0.0| 526866| 62464| 0| 233789| 16241| 0
===================================================================================================================================================
Benchmarks Stats:
=======================================================================================================================================================
Metadata | Request Stats || Out Tok/sec| Tot Tok/sec| Req Latency (sec) ||| TTFT (ms) ||| ITL (ms) ||| TPOT (ms) ||
Benchmark| Per Second| Concurrency| mean| mean| mean| median| p99| mean| median| p99| mean| median| p99| mean| median| p99
--------------|-----------|------------|------------|------------|------|--------|------|------|-------|-------|-----|-------|-----|-----|-------|-----
concurrent@1| 0.30| 1.00| 74.0| 238.0| 3.33| 3.44| 3.63| 49.6| 47.2| 66.1| 13.4| 13.3| 14.0| 13.3| 13.3| 14.0
concurrent@2| 0.59| 1.95| 137.1| 457.5| 3.32| 3.61| 3.67| 50.6| 48.6| 80.4| 14.0| 14.0| 14.2| 13.9| 13.9| 14.1
concurrent@4| 1.15| 3.95| 276.7| 907.2| 3.42| 3.61| 3.77| 52.7| 49.7| 106.9| 14.1| 14.0| 14.6| 14.0| 13.9| 14.5
concurrent@8| 2.26| 7.83| 541.4| 1775.4| 3.47| 3.70| 3.79| 57.3| 50.9| 171.3| 14.3| 14.3| 14.4| 14.2| 14.2| 14.4
concurrent@16| 4.33| 15.57| 1034.8| 3401.7| 3.60| 3.81| 4.22| 72.3| 52.0| 292.9| 14.8| 14.7| 16.3| 14.7| 14.7| 16.3
concurrent@32| 8.44| 31.12| 2029.7| 6641.5| 3.69| 3.89| 4.24| 91.6| 62.6| 504.6| 15.0| 15.0| 15.4| 14.9| 14.9| 15.4
concurrent@64| 13.64| 61.40| 3333.9| 10787.0| 4.50| 4.61| 5.67| 171.3| 101.2| 1165.6| 17.8| 17.7| 19.2| 17.7| 17.6| 19.1
concurrent@128| 16.07| 119.45| 3897.0| 12679.4| 7.43| 7.63| 9.74| 446.4| 195.8| 2533.1| 28.9| 28.9| 31.0| 28.8| 28.8| 30.9
=======================================================================================================================================================
Saving benchmarks report...
Benchmarks report saved to /benchmarks.json
Benchmarking complete.

View file

@ -0,0 +1,170 @@
Collecting uv
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.8.19-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.9/20.9 MB 126.9 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.8.19
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: pip install --upgrade pip
Using Python 3.11.13 environment at: /usr/local
Resolved 61 packages in 561ms
Downloading hf-xet (3.0MiB)
Downloading pillow (6.3MiB)
Downloading transformers (11.1MiB)
Downloading pyarrow (40.8MiB)
Downloading numpy (16.2MiB)
Downloading pandas (11.8MiB)
Downloading tokenizers (3.1MiB)
Downloading pydantic-core (1.9MiB)
Downloading pygments (1.2MiB)
Downloading aiohttp (1.7MiB)
Downloading pydantic-core
Downloading aiohttp
Downloading tokenizers
Downloading hf-xet
Downloading pygments
Downloading pillow
Downloading numpy
Downloading pandas
Downloading transformers
Downloading pyarrow
Prepared 61 packages in 1.25s
Installed 61 packages in 114ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.12.15
+ aiosignal==1.4.0
+ annotated-types==0.7.0
+ anyio==4.10.0
+ attrs==25.3.0
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ click==8.1.8
+ datasets==4.1.1
+ dill==0.4.0
+ filelock==3.19.1
+ frozenlist==1.7.0
+ fsspec==2025.9.0
+ ftfy==6.3.1
+ guidellm==0.3.0
+ h11==0.16.0
+ h2==4.3.0
+ hf-xet==1.1.10
+ hpack==4.1.0
+ httpcore==1.0.9
+ httpx==0.28.1
+ huggingface-hub==0.35.0
+ hyperframe==6.1.0
+ idna==3.10
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ mdurl==0.1.2
+ multidict==6.6.4
+ multiprocess==0.70.16
+ numpy==2.3.3
+ packaging==25.0
+ pandas==2.3.2
+ pillow==11.3.0
+ propcache==0.3.2
+ protobuf==6.32.1
+ pyarrow==21.0.0
+ pydantic==2.11.9
+ pydantic-core==2.33.2
+ pydantic-settings==2.10.1
+ pygments==2.19.2
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.1.1
+ pytz==2025.2
+ pyyaml==6.0.2
+ regex==2025.9.18
+ requests==2.32.5
+ rich==14.1.0
+ safetensors==0.6.2
+ six==1.17.0
+ sniffio==1.3.1
+ tokenizers==0.22.1
+ tqdm==4.67.1
+ transformers==4.56.2
+ typing-extensions==4.15.0
+ typing-inspection==0.4.1
+ tzdata==2025.2
+ urllib3==2.5.0
+ wcwidth==0.2.14
+ xxhash==3.5.0
+ yarl==1.20.1
Using Python 3.11.13 environment at: /usr/local
Audited 1 package in 3ms
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Creating backend...
Backend openai_http connected to http://vllm-server:8000 for model meta-llama/Llama-3.2-3B-Instruct.
Creating request loader...
Created loader with 1000 unique requests from prompt_tokens=512,output_tokens=256.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [18:11:47] ⠋ 100% concurrent@1 (complete) Req: 0.3 req/s, 3.35s Lat, 1.0 Conc, 17 Comp, 1 Inc, 0 Err │
│ Tok: 76.4 gen/s, 239.4 tot/s, 29.6ms TTFT, 13.0ms ITL, 547 Prompt, 256 Gen │
│ [18:12:52] ⠋ 100% concurrent@2 (complete) Req: 0.6 req/s, 3.53s Lat, 2.0 Conc, 32 Comp, 2 Inc, 0 Err │
│ Tok: 145.0 gen/s, 454.5 tot/s, 36.9ms TTFT, 13.7ms ITL, 546 Prompt, 256 Gen │
│ [18:13:57] ⠋ 100% concurrent@4 (complete) Req: 1.1 req/s, 3.59s Lat, 4.0 Conc, 64 Comp, 4 Inc, 0 Err │
│ Tok: 284.8 gen/s, 892.7 tot/s, 59.0ms TTFT, 13.9ms ITL, 546 Prompt, 256 Gen │
│ [18:15:02] ⠋ 100% concurrent@8 (complete) Req: 2.2 req/s, 3.70s Lat, 8.0 Conc, 128 Comp, 7 Inc, 0 Err │
│ Tok: 553.5 gen/s, 1735.2 tot/s, 79.8ms TTFT, 14.2ms ITL, 547 Prompt, 256 Gen │
│ [18:16:08] ⠋ 100% concurrent@16 (complete) Req: 4.2 req/s, 3.83s Lat, 16.0 Conc, 240 Comp, 16 Inc, 0 Err │
│ Tok: 1066.9 gen/s, 3344.6 tot/s, 97.5ms TTFT, 14.6ms ITL, 547 Prompt, 256 Gen │
│ [18:17:13] ⠋ 100% concurrent@32 (complete) Req: 8.1 req/s, 3.94s Lat, 31.8 Conc, 480 Comp, 31 Inc, 0 Err │
│ Tok: 2069.7 gen/s, 6488.4 tot/s, 120.8ms TTFT, 15.0ms ITL, 547 Prompt, 256 Gen │
│ [18:18:20] ⠋ 100% concurrent@64 (complete) Req: 13.6 req/s, 4.60s Lat, 62.3 Conc, 813 Comp, 57 Inc, 0 Err │
│ Tok: 3472.1 gen/s, 10884.9 tot/s, 190.9ms TTFT, 17.3ms ITL, 547 Prompt, 256 Gen │
│ [18:19:28] ⠋ 100% concurrent@128 (complete) Req: 16.8 req/s, 7.37s Lat, 123.5 Conc, 1005 Comp, 126 Inc, 0 Err │
│ Tok: 4289.1 gen/s, 13445.8 tot/s, 356.4ms TTFT, 27.5ms ITL, 547 Prompt, 256 Gen │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (8/8) [ 0:08:43 < 0:00:00 ]
Benchmarks Metadata:
Run id:8ccb6da1-83f4-4624-8d84-07c723b0b2a5
Duration:530.4 seconds
Profile:type=concurrent, strategies=['concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent', 'concurrent'], streams=[1, 2, 4, 8, 16, 32, 64, 128]
Args:max_number=None, max_duration=60.0, warmup_number=None, warmup_duration=3.0, cooldown_number=None, cooldown_duration=None
Worker:type_='generative_requests_worker' backend_type='openai_http' backend_target='http://vllm-server:8000' backend_model='meta-llama/Llama-3.2-3B-Instruct' backend_info={'max_output_tokens':
16384, 'timeout': 300, 'http2': True, 'follow_redirects': True, 'headers': {}, 'text_completions_path': '/v1/completions', 'chat_completions_path': '/v1/chat/completions'}
Request Loader:type_='generative_request_loader' data='prompt_tokens=512,output_tokens=256' data_args=None processor='meta-llama/Llama-3.2-3B-Instruct' processor_args=None
Extras:None
Benchmarks Info:
=====================================================================================================================================================
Metadata |||| Requests Made ||| Prompt Tok/Req ||| Output Tok/Req ||| Prompt Tok Total||| Output Tok Total ||
Benchmark| Start Time| End Time| Duration (s)| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err| Comp| Inc| Err
--------------|-----------|---------|-------------|------|-----|-----|------|------|----|------|------|----|-------|------|----|--------|------|-----
concurrent@1| 18:11:52| 18:12:52| 60.0| 17| 1| 0| 546.5| 512.0| 0.0| 256.0| 231.0| 0.0| 9291| 512| 0| 4352| 231| 0
concurrent@2| 18:12:57| 18:13:57| 60.0| 32| 2| 0| 546.5| 512.0| 0.0| 256.0| 251.0| 0.0| 17488| 1024| 0| 8192| 502| 0
concurrent@4| 18:14:02| 18:15:02| 60.0| 64| 4| 0| 546.4| 512.0| 0.0| 256.0| 175.2| 0.0| 34972| 2048| 0| 16384| 701| 0
concurrent@8| 18:15:07| 18:16:07| 60.0| 128| 7| 0| 546.6| 512.0| 0.0| 256.0| 50.7| 0.0| 69966| 3584| 0| 32768| 355| 0
concurrent@16| 18:16:13| 18:17:13| 60.0| 240| 16| 0| 546.5| 512.0| 0.0| 256.0| 166.0| 0.0| 131170| 8192| 0| 61440| 2656| 0
concurrent@32| 18:17:18| 18:18:18| 60.0| 480| 31| 0| 546.5| 512.0| 0.0| 256.0| 47.4| 0.0| 262339| 15872| 0| 122880| 1468| 0
concurrent@64| 18:18:25| 18:19:25| 60.0| 813| 57| 0| 546.5| 512.0| 0.0| 256.0| 110.7| 0.0| 444341| 29184| 0| 208128| 6311| 0
concurrent@128| 18:19:33| 18:20:33| 60.0| 1005| 126| 0| 546.5| 512.0| 0.0| 256.0| 65.8| 0.0| 549264| 64512| 0| 257280| 8296| 0
=====================================================================================================================================================
Benchmarks Stats:
=======================================================================================================================================================
Metadata | Request Stats || Out Tok/sec| Tot Tok/sec| Req Latency (sec) ||| TTFT (ms) ||| ITL (ms) ||| TPOT (ms) ||
Benchmark| Per Second| Concurrency| mean| mean| mean| median| p99| mean| median| p99| mean| median| p99| mean| median| p99
--------------|-----------|------------|------------|------------|------|--------|------|------|-------|-------|-----|-------|-----|-----|-------|-----
concurrent@1| 0.30| 1.00| 76.4| 239.4| 3.35| 3.35| 3.38| 29.6| 29.0| 38.9| 13.0| 13.0| 13.1| 13.0| 13.0| 13.0
concurrent@2| 0.57| 2.00| 145.0| 454.5| 3.53| 3.53| 3.55| 36.9| 39.0| 59.6| 13.7| 13.7| 13.8| 13.6| 13.7| 13.7
concurrent@4| 1.11| 4.00| 284.8| 892.7| 3.59| 3.59| 3.65| 59.0| 65.7| 88.2| 13.9| 13.8| 14.1| 13.8| 13.8| 14.0
concurrent@8| 2.16| 7.99| 553.5| 1735.2| 3.70| 3.69| 3.76| 79.8| 80.7| 152.6| 14.2| 14.2| 14.5| 14.1| 14.1| 14.4
concurrent@16| 4.17| 15.97| 1066.9| 3344.6| 3.83| 3.82| 3.99| 97.5| 96.3| 283.9| 14.6| 14.6| 14.9| 14.6| 14.6| 14.8
concurrent@32| 8.08| 31.84| 2069.7| 6488.4| 3.94| 3.90| 4.31| 120.8| 101.7| 564.3| 15.0| 14.9| 15.9| 14.9| 14.8| 15.9
concurrent@64| 13.56| 62.34| 3472.1| 10884.9| 4.60| 4.54| 5.43| 190.9| 133.9| 1113.2| 17.3| 17.2| 18.2| 17.2| 17.2| 18.2
concurrent@128| 16.75| 123.45| 4289.1| 13445.8| 7.37| 7.21| 9.21| 356.4| 161.9| 2319.9| 27.5| 27.5| 28.8| 27.4| 27.4| 28.7
=======================================================================================================================================================
Saving benchmarks report...
Benchmarks report saved to /benchmarks.json
Benchmarking complete.

Binary file not shown.

After

Width:  |  Height:  |  Size: 562 KiB

View file

@ -0,0 +1,294 @@
#!/usr/bin/env python3
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# /// script
# dependencies = [
# "matplotlib",
# ]
# ///
"""
Script to generate benchmark charts from guidellm text results.
Creates 2x2 grid charts with RPS, Request Latency, TTFT, and ITL metrics against concurrent@x values.
Outputs one chart file per vLLM replica group, with each line representing one benchmark run.
"""
import glob
import os
import re
import matplotlib.pyplot as plt
def extract_setup_name(filename: str) -> str:
"""Extract setup name from filename and format legend appropriately."""
basename = os.path.basename(filename)
# Try new pattern: guidellm-benchmark-stack-s{stack_replicas}-sw{workers}-v{vllm_replicas}-{timestamp}.txt
match = re.search(r"guidellm-benchmark-stack-s(\d+)-sw(\d+)-v(\d+)-(\d{8})-(\d{6})\.txt", basename)
if match:
stack_replicas = match.group(1)
workers = match.group(2)
vllm_replicas = match.group(3)
date = match.group(4)
time = match.group(5)
return f"stack-s{stack_replicas}-sw{workers}-v{vllm_replicas}"
# Try new vLLM pattern: guidellm-benchmark-vllm-v{vllm_replicas}-{timestamp}.txt
match = re.search(r"guidellm-benchmark-vllm-v(\d+)-(\d{8})-(\d{6})\.txt", basename)
if match:
vllm_replicas = match.group(1)
date = match.group(2)
time = match.group(3)
return f"vllm-v{vllm_replicas}"
# Fall back to old pattern: guidellm-benchmark-{target}-{stack_replicas}-w{workers}-{vllm_replicas}-{timestamp}.txt
match = re.search(r"guidellm-benchmark-([^-]+)-(\d+)-w(\d+)-(\d+)-(\d+)-(\d+)\.txt", basename)
if match:
target = match.group(1)
stack_replicas = match.group(2)
workers = match.group(3)
vllm_replicas = match.group(4)
date = match.group(5)
time = match.group(6)
if target == "vllm":
return f"vllm-{vllm_replicas}-w{workers}-{vllm_replicas}"
else:
return f"stack-replicas{stack_replicas}-w{workers}-vllm-replicas{vllm_replicas}-{date}-{time}"
# Fall back to older pattern: guidellm-benchmark-{target}-{stack_replicas}-{vllm_replicas}-{timestamp}.txt
match = re.search(r"guidellm-benchmark-([^-]+)-(\d+)-(\d+)-(\d+)-(\d+)\.txt", basename)
if match:
target = match.group(1)
stack_replicas = match.group(2)
vllm_replicas = match.group(3)
date = match.group(4)
time = match.group(5)
if target == "vllm":
return f"vllm-{vllm_replicas}-w1-{vllm_replicas}"
else:
return f"stack-replicas{stack_replicas}-vllm-replicas{vllm_replicas}-{date}-{time}"
return basename.replace("guidellm-benchmark-", "").replace(".txt", "")
def parse_txt_file(filepath: str) -> list[tuple[float, float, float, float, float, str]]:
"""
Parse a text benchmark file and extract concurrent@x, RPS, TTFT, ITL, and request latency data.
Returns list of (concurrency, rps_mean, ttft_mean, itl_mean, req_latency_mean, setup_name) tuples.
"""
setup_name = extract_setup_name(filepath)
data_points = []
try:
with open(filepath) as f:
content = f.read()
# Find the benchmark stats table
lines = content.split("\n")
in_stats_table = False
header_lines_seen = 0
for line in lines:
line_stripped = line.strip()
# Look for the start of the stats table
if "Benchmarks Stats:" in line:
in_stats_table = True
continue
if in_stats_table:
# Skip the first few separator/header lines
if line_stripped.startswith("=") or line_stripped.startswith("-"):
header_lines_seen += 1
if header_lines_seen >= 3: # After seeing multiple header lines, look for concurrent@ data
if line_stripped.startswith("=") and "concurrent@" not in line_stripped:
break
continue
# Parse concurrent@ lines in the stats table (may have leading spaces)
if in_stats_table and "concurrent@" in line:
parts = [part.strip() for part in line.split("|")]
if len(parts) >= 12: # Make sure we have enough columns for new format
try:
# Extract concurrency from benchmark name (e.g., concurrent@1 -> 1)
concurrent_match = re.search(r"concurrent@(\d+)", parts[0])
if not concurrent_match:
continue
concurrency = float(concurrent_match.group(1))
# Extract metrics from the new table format
# From your image, the table has these columns with | separators:
# Benchmark | Per Second | Concurrency | Out Tok/sec | Tot Tok/sec | Req Latency (sec) | TTFT (ms) | ITL (ms) | TPOT (ms)
# Looking at the mean/median/p99 structure, need to find the mean columns
# The structure shows: mean | median | p99 for each metric
rps_mean = float(parts[1]) # Per Second (RPS)
req_latency_mean = float(parts[6]) * 1000 # Request latency mean (convert from sec to ms)
ttft_mean = float(parts[9]) # TTFT mean column
itl_mean = float(parts[12]) # ITL mean column
data_points.append((concurrency, rps_mean, ttft_mean, itl_mean, req_latency_mean, setup_name))
except (ValueError, IndexError) as e:
print(f"Warning: Could not parse line '{line}' in {filepath}: {e}")
continue
except (OSError, FileNotFoundError) as e:
print(f"Error reading {filepath}: {e}")
return data_points
def generate_charts(benchmark_dir: str = "results"):
"""Generate 2x2 grid charts (RPS, Request Latency, TTFT, ITL) from benchmark text files."""
# Find all text result files instead of JSON
txt_pattern = os.path.join(benchmark_dir, "guidellm-benchmark-*.txt")
txt_files = glob.glob(txt_pattern)
if not txt_files:
print(f"No text files found matching pattern: {txt_pattern}")
return
print(f"Found {len(txt_files)} text files")
# Parse all files and collect data
all_data = {} # setup_name -> [(concurrency, rps, ttft, itl, req_latency), ...]
for txt_file in txt_files:
print(f"Processing {txt_file}")
data_points = parse_txt_file(txt_file)
for concurrency, rps, ttft, itl, req_latency, setup_name in data_points:
if setup_name not in all_data:
all_data[setup_name] = []
all_data[setup_name].append((concurrency, rps, ttft, itl, req_latency))
if not all_data:
print("No data found to plot")
return
# Sort data points by concurrency for each setup
for setup_name in all_data:
all_data[setup_name].sort(key=lambda x: x[0]) # Sort by concurrency
# Group setups by vLLM replica number (original approach)
replica_groups = {} # vllm_replica_count -> {setup_name: points}
for setup_name, points in all_data.items():
# Extract vLLM replica number from setup name
# Expected formats:
# - New stack format: "stack-s{X}-sw{W}-v{Y}"
# - New vLLM format: "vllm-v{Y}"
# - Old formats: "stack-replicas{X}-w{W}-vllm-replicas{Y}" or "vllm-{Y}-w{W}-{Y}"
# Try new formats first
vllm_match = re.search(r"-v(\d+)$", setup_name) # Matches both "stack-s1-sw2-v3" and "vllm-v1"
if not vllm_match:
# Try old stack format
vllm_match = re.search(r"vllm-replicas(\d+)", setup_name)
if not vllm_match:
# Try old vLLM format: "vllm-{Y}-w{W}-{Y}"
vllm_match = re.search(r"vllm-(\d+)-w\d+-\d+", setup_name)
if vllm_match:
vllm_replica_num = int(vllm_match.group(1))
if vllm_replica_num not in replica_groups:
replica_groups[vllm_replica_num] = {}
replica_groups[vllm_replica_num][setup_name] = points
else:
print(f"Warning: Could not extract vLLM replica count from setup name: {setup_name}")
def create_charts(data_dict, prefix, title_prefix):
"""Create a 2x2 grid with RPS, Request Latency, TTFT, and ITL charts."""
if not data_dict:
print(f"No data found for {prefix}")
return
# Create 2x2 subplot grid
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle(f"{title_prefix} Benchmark Results", fontsize=16, fontweight="bold")
# Collect all unique concurrency values for tick setting
all_concurrency_values = set()
for points in data_dict.values():
all_concurrency_values.update([p[0] for p in points])
all_concurrency_values = sorted(all_concurrency_values)
# Plot data for each setup in alphabetical order
for setup_name in sorted(data_dict.keys()):
points = data_dict[setup_name]
if not points:
continue
concurrency_values = [p[0] for p in points]
rps_values = [p[1] for p in points]
ttft_values = [p[2] for p in points]
itl_values = [p[3] for p in points]
req_latency_values = [p[4] for p in points]
# RPS chart (top-left)
ax1.plot(concurrency_values, rps_values, marker="o", label=setup_name, linewidth=2, markersize=6)
# Request Latency chart (top-right)
ax2.plot(concurrency_values, req_latency_values, marker="o", label=setup_name, linewidth=2, markersize=6)
# TTFT chart (bottom-left)
ax3.plot(concurrency_values, ttft_values, marker="o", label=setup_name, linewidth=2, markersize=6)
# ITL chart (bottom-right)
ax4.plot(concurrency_values, itl_values, marker="o", label=setup_name, linewidth=2, markersize=6)
# Configure all charts after plotting data
axes = [ax1, ax2, ax3, ax4]
titles = ["RPS", "Request Latency", "TTFT", "ITL"]
ylabels = [
"Requests Per Second (RPS)",
"Request Latency (ms)",
"Time to First Token (ms)",
"Inter Token Latency (ms)",
]
for ax, title, ylabel in zip(axes, titles, ylabels, strict=False):
ax.set_xlabel("Concurrency", fontsize=12)
ax.set_ylabel(ylabel, fontsize=12)
ax.set_title(title, fontsize=14, fontweight="bold")
ax.set_xscale("log", base=2)
ax.set_xticks(all_concurrency_values)
ax.set_xticklabels([str(int(x)) for x in all_concurrency_values])
ax.grid(True, alpha=0.3)
# Add legend to the right-most subplot (top-right)
ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
# Save the combined chart
combined_filename = os.path.join(benchmark_dir, f"{prefix}_benchmark_results.png")
plt.savefig(combined_filename, dpi=300, bbox_inches="tight")
plt.close()
print(f"Combined benchmark chart saved to {combined_filename}")
# Print grouping information
for replica_count, data_dict in replica_groups.items():
print(f"vLLM Replica {replica_count} setups: {list(data_dict.keys())}")
# Create separate charts for each replica group
for replica_count, data_dict in replica_groups.items():
prefix = f"vllm_replica{replica_count}"
title = f"vLLM Replicas={replica_count}"
create_charts(data_dict, prefix, title)
# Print summary
print("\nSummary:")
for setup_name, points in all_data.items():
print(f"{setup_name}: {len(points)} data points")
if __name__ == "__main__":
generate_charts()

View file

@ -0,0 +1,103 @@
#!/usr/bin/env bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Define benchmark configurations: (target, stack_replicas, vllm_replicas, stack_workers)
configs=(
"stack 1 1 1"
"stack 1 1 2"
"stack 1 1 4"
"vllm 1 1 -"
)
set -euo pipefail
# Get the directory where this script is located
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "Running comprehensive GuideLL benchmark suite..."
echo "Start time: $(date)"
# Default deployment names
STACK_DEPLOYMENT="llama-stack-benchmark-server"
VLLM_DEPLOYMENT="vllm-server"
# Scaling function
scale_deployments() {
local stack_replicas=$1
local vllm_replicas=$2
local workers=$3
echo "Scaling deployments..."
if [[ "$vllm_replicas" != "-" ]]; then
echo "Scaling $VLLM_DEPLOYMENT to $vllm_replicas replicas..."
kubectl scale deployment $VLLM_DEPLOYMENT --replicas=$vllm_replicas
kubectl rollout status deployment $VLLM_DEPLOYMENT --timeout=600s
fi
if [[ "$target" == "stack" ]]; then
if [[ "$stack_replicas" != "-" ]]; then
echo "Scaling $STACK_DEPLOYMENT to $stack_replicas replicas..."
kubectl scale deployment $STACK_DEPLOYMENT --replicas=$stack_replicas
kubectl rollout status deployment $STACK_DEPLOYMENT --timeout=600s
fi
if [[ "$workers" != "-" ]]; then
echo "Updating $STACK_DEPLOYMENT to use $workers workers..."
kubectl set env deployment/$STACK_DEPLOYMENT LLAMA_STACK_WORKERS=$workers
kubectl rollout status deployment $STACK_DEPLOYMENT --timeout=600s
fi
fi
echo "All scaling operations completed. Waiting additional 30s for services to stabilize..."
sleep 30
}
for config in "${configs[@]}"; do
read -r target stack_replicas vllm_replicas workers <<< "$config"
echo ""
echo "=========================================="
if [[ "$workers" != "-" ]]; then
echo "Running benchmark: $target (stack=$stack_replicas, vllm=$vllm_replicas, workers=$workers)"
else
echo "Running benchmark: $target (stack=$stack_replicas, vllm=$vllm_replicas)"
fi
echo "Start: $(date)"
echo "=========================================="
# Scale deployments before running benchmark
scale_deployments "$stack_replicas" "$vllm_replicas" "$workers"
# Generate output filename with setup info
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
if [[ "$target" == "stack" ]]; then
OUTPUT_FILE="results/guidellm-benchmark-${target}-s${stack_replicas}-sw${workers}-v${vllm_replicas}-${TIMESTAMP}.txt"
else
OUTPUT_FILE="results/guidellm-benchmark-${target}-v${vllm_replicas}-${TIMESTAMP}.txt"
fi
# Run the benchmark with the cluster as configured
"$SCRIPT_DIR/run-guidellm-benchmark.sh" \
--target "$target" \
--output-file "$OUTPUT_FILE"
echo "Completed: $(date)"
echo "Waiting 30 seconds before next benchmark..."
sleep 30
done
echo ""
echo "=========================================="
echo "All benchmarks completed!"
echo "End time: $(date)"
echo "=========================================="
echo ""
echo "Results files generated:"
ls -la results/guidellm-*.txt results/guidellm-*.json 2>/dev/null || echo "No result files found"

View file

@ -0,0 +1,219 @@
#!/usr/bin/env bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
set -euo pipefail
# Default values
TARGET="stack"
MAX_SECONDS=60
PROMPT_TOKENS=512
OUTPUT_TOKENS=256
RATE_TYPE="concurrent"
RATE="1,2,4,8,16,32,64,128"
STACK_DEPLOYMENT="llama-stack-benchmark-server"
STACK_URL="http://llama-stack-benchmark-service:8323/v1/openai"
VLLM_DEPLOYMENT="vllm-server"
OUTPUT_FILE=""
# Parse command line arguments
usage() {
echo "Usage: $0 [options]"
echo "Options:"
echo " -t, --target <stack|vllm> Target to benchmark (default: stack)"
echo " -s, --max-seconds <seconds> Maximum duration in seconds (default: 60)"
echo " -p, --prompt-tokens <tokens> Number of prompt tokens (default: 512)"
echo " -o, --output-tokens <tokens> Number of output tokens (default: 256)"
echo " -r, --rate-type <type> Rate type (default: concurrent)"
echo " -c, --rate Rate (default: 1,2,4,8,16,32,64,128)"
echo " --output-file <path> Output file path (default: auto-generated)"
echo " --stack-deployment <name> Name of the stack deployment (default: llama-stack-benchmark-server)"
echo " --vllm-deployment <name> Name of the vllm deployment (default: vllm-server)"
echo " --stack-url <url> URL of the stack service (default: http://llama-stack-benchmark-service:8323/v1/openai)"
echo " -h, --help Show this help message"
echo ""
echo "Examples:"
echo " $0 --target vllm # Benchmark vLLM direct"
echo " $0 --target stack # Benchmark Llama Stack (default)"
echo " $0 -t vllm -s 60 -p 512 -o 256 # vLLM with custom parameters"
echo " $0 --output-file results/my-benchmark.txt # Specify custom output file"
echo " $0 --stack-deployment my-stack-server # Use custom stack deployment name"
}
while [[ $# -gt 0 ]]; do
case $1 in
-t|--target)
TARGET="$2"
shift 2
;;
-s|--max-seconds)
MAX_SECONDS="$2"
shift 2
;;
-p|--prompt-tokens)
PROMPT_TOKENS="$2"
shift 2
;;
-o|--output-tokens)
OUTPUT_TOKENS="$2"
shift 2
;;
-r|--rate-type)
RATE_TYPE="$2"
shift 2
;;
-c|--rate)
RATE="$2"
shift 2
;;
--output-file)
OUTPUT_FILE="$2"
shift 2
;;
--stack-deployment)
STACK_DEPLOYMENT="$2"
shift 2
;;
--vllm-deployment)
VLLM_DEPLOYMENT="$2"
shift 2
;;
--stack-url)
STACK_URL="$2"
shift 2
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unknown option: $1"
usage
exit 1
;;
esac
done
# Validate target
if [[ "$TARGET" != "stack" && "$TARGET" != "vllm" ]]; then
echo "Error: Target must be 'stack' or 'vllm'"
usage
exit 1
fi
# Set configuration based on target
if [[ "$TARGET" == "vllm" ]]; then
BASE_URL="http://${VLLM_DEPLOYMENT}:8000"
JOB_NAME="guidellm-vllm-benchmark-job"
echo "Benchmarking vLLM direct with GuideLLM..."
else
BASE_URL="$STACK_URL"
JOB_NAME="guidellm-stack-benchmark-job"
echo "Benchmarking Llama Stack with GuideLLM..."
fi
echo "Configuration:"
echo " Target: $TARGET"
echo " Base URL: $BASE_URL"
echo " Max seconds: ${MAX_SECONDS}s"
echo " Prompt tokens: $PROMPT_TOKENS"
echo " Output tokens: $OUTPUT_TOKENS"
echo " Rate type: $RATE_TYPE"
if [[ "$TARGET" == "vllm" ]]; then
echo " vLLM deployment: $VLLM_DEPLOYMENT"
else
echo " Stack deployment: $STACK_DEPLOYMENT"
fi
echo ""
# Create temporary job yaml
TEMP_YAML="/tmp/guidellm-benchmark-job-temp-$(date +%s).yaml"
cat > "$TEMP_YAML" << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: $JOB_NAME
namespace: default
spec:
template:
spec:
containers:
- name: guidellm-benchmark
image: python:3.11-slim
command: ["/bin/bash"]
args:
- "-c"
- |
# Install uv and guidellm
pip install uv &&
uv pip install --system guidellm &&
# Login to HuggingFace
uv pip install --system huggingface_hub &&
python -c "from huggingface_hub import login; login(token='\$HF_TOKEN')" &&
# Run GuideLLM benchmark and save output
export COLUMNS=200
GUIDELLM__PREFERRED_ROUTE="chat_completions" uv run guidellm benchmark run \\
--target "$BASE_URL" \\
--rate-type "$RATE_TYPE" \\
--max-seconds $MAX_SECONDS \\
--data "prompt_tokens=$PROMPT_TOKENS,output_tokens=$OUTPUT_TOKENS" \\
--model "$INFERENCE_MODEL" \\
--rate "$RATE" \\
--warmup-percent 0.05 \\
2>&1
env:
- name: INFERENCE_MODEL
value: "meta-llama/Llama-3.2-3B-Instruct"
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
resources:
requests:
memory: "4Gi"
cpu: "500m"
limits:
memory: "8Gi"
cpu: "2000m"
restartPolicy: Never
backoffLimit: 3
EOF
echo "Cleaning up any existing GuideLLM benchmark job..."
kubectl delete job $JOB_NAME 2>/dev/null || true
echo "Deploying GuideLLM benchmark Job..."
kubectl apply -f "$TEMP_YAML"
echo "Waiting for job to start..."
kubectl wait --for=condition=Ready pod -l job-name=$JOB_NAME --timeout=120s
# Prepare file names and create results directory
mkdir -p results
if [[ -z "$OUTPUT_FILE" ]]; then
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
OUTPUT_FILE="results/guidellm-benchmark-${TARGET}-${TIMESTAMP}.txt"
fi
echo "Following GuideLLM benchmark logs..."
kubectl logs -f job/$JOB_NAME
echo "Job completed. Checking final status..."
kubectl get job $JOB_NAME
# Save benchmark results using kubectl logs
echo "Saving benchmark results..."
kubectl logs job/$JOB_NAME > "$OUTPUT_FILE"
echo "Benchmark output saved to: $OUTPUT_FILE"
# Clean up temporary file
rm -f "$TEMP_YAML"

View file

@ -0,0 +1,142 @@
apiVersion: v1
data:
stack_run_config.yaml: |
version: '2'
image_name: kubernetes-benchmark-demo
apis:
- agents
- files
- inference
- files
- safety
- tool_runtime
- vector_io
providers:
inference:
- provider_id: vllm-inference
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
files:
- provider_id: meta-reference-files
provider_type: inline::localfs
config:
storage_dir: ${env.FILES_STORAGE_DIR:=~/.llama/distributions/starter/files}
metadata_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/files_metadata.db
vector_io:
- provider_id: ${env.ENABLE_CHROMADB:+chromadb}
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL:=}
kvstore:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
responses_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
tool_runtime:
- provider_id: brave-search
provider_type: remote::brave-search
config:
api_key: ${env.BRAVE_SEARCH_API_KEY:+}
max_results: 3
- provider_id: tavily-search
provider_type: remote::tavily-search
config:
api_key: ${env.TAVILY_SEARCH_API_KEY:+}
max_results: 3
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
storage:
backends:
kv_default:
type: kv_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: ${env.POSTGRES_TABLE_NAME:=llamastack_kvstore}
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
prompts:
backend: kv_default
namespace: prompts
models:
- metadata:
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
- model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_type: llm
shields:
- shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::websearch
provider_id: tavily-search
- toolgroup_id: builtin::rag
provider_id: rag-runtime
server:
port: 8323
kind: ConfigMap
metadata:
name: llama-stack-config

View file

@ -0,0 +1,94 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: llama-benchmark-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llama-stack-benchmark-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: llama-stack-benchmark
app.kubernetes.io/component: server
template:
metadata:
labels:
app.kubernetes.io/name: llama-stack-benchmark
app.kubernetes.io/component: server
spec:
containers:
- name: llama-stack-benchmark
image: llamastack/distribution-starter:latest
imagePullPolicy: Always # since we have specified latest instead of a version
env:
- name: ENABLE_CHROMADB
value: "true"
- name: CHROMADB_URL
value: http://chromadb.default.svc.cluster.local:6000
- name: POSTGRES_HOST
value: postgres-server.default.svc.cluster.local
- name: POSTGRES_PORT
value: "5432"
- name: INFERENCE_MODEL
value: "${INFERENCE_MODEL}"
- name: SAFETY_MODEL
value: "${SAFETY_MODEL}"
- name: TAVILY_SEARCH_API_KEY
value: "${TAVILY_SEARCH_API_KEY}"
- name: VLLM_URL
value: http://vllm-server.default.svc.cluster.local:8000/v1
- name: VLLM_MAX_TOKENS
value: "3072"
- name: VLLM_SAFETY_URL
value: http://vllm-server-safety.default.svc.cluster.local:8001/v1
- name: VLLM_TLS_VERIFY
value: "false"
- name: LLAMA_STACK_LOGGING
value: "all=WARNING"
- name: LLAMA_STACK_CONFIG
value: "/etc/config/stack_run_config.yaml"
- name: LLAMA_STACK_WORKERS
value: "${LLAMA_STACK_WORKERS}"
command: ["uvicorn", "llama_stack.core.server.server:create_app", "--host", "0.0.0.0", "--port", "8323", "--workers", "$(LLAMA_STACK_WORKERS)", "--factory"]
ports:
- containerPort: 8323
resources:
requests:
cpu: "4"
limits:
cpu: "4"
volumeMounts:
- name: llama-storage
mountPath: /root/.llama
- name: llama-config
mountPath: /etc/config
volumes:
- name: llama-storage
persistentVolumeClaim:
claimName: llama-benchmark-pvc
- name: llama-config
configMap:
name: llama-stack-config
---
apiVersion: v1
kind: Service
metadata:
name: llama-stack-benchmark-service
spec:
selector:
app.kubernetes.io/name: llama-stack-benchmark
app.kubernetes.io/component: server
ports:
- name: http
port: 8323
targetPort: 8323
type: ClusterIP

View file

@ -0,0 +1,133 @@
version: '2'
image_name: kubernetes-benchmark-demo
apis:
- agents
- files
- inference
- files
- safety
- tool_runtime
- vector_io
providers:
inference:
- provider_id: vllm-inference
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
files:
- provider_id: meta-reference-files
provider_type: inline::localfs
config:
storage_dir: ${env.FILES_STORAGE_DIR:=~/.llama/distributions/starter/files}
metadata_store:
table_name: files_metadata
backend: sql_default
vector_io:
- provider_id: ${env.ENABLE_CHROMADB:+chromadb}
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL:=}
persistence:
namespace: vector_io::chroma_remote
backend: kv_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents
backend: kv_default
responses:
table_name: responses
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
tool_runtime:
- provider_id: brave-search
provider_type: remote::brave-search
config:
api_key: ${env.BRAVE_SEARCH_API_KEY:+}
max_results: 3
- provider_id: tavily-search
provider_type: remote::tavily-search
config:
api_key: ${env.TAVILY_SEARCH_API_KEY:+}
max_results: 3
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
storage:
backends:
kv_default:
type: kv_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: ${env.POSTGRES_TABLE_NAME:=llamastack_kvstore}
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
stores:
metadata:
namespace: registry
backend: kv_default
inference:
table_name: inference_store
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
conversations:
table_name: openai_conversations
backend: sql_default
prompts:
namespace: prompts
backend: kv_default
registered_resources:
models:
- metadata:
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
- model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_type: llm
shields:
- shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::websearch
provider_id: tavily-search
- toolgroup_id: builtin::rag
provider_id: rag-runtime
server:
port: 8323
vector_stores:
default_provider_id: chromadb
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5

View file

@ -0,0 +1,11 @@
These are the source-of-truth configuration files used to generate the Stainless client SDKs via Stainless.
- `openapi.yml`: this is the OpenAPI specification for the Llama Stack API.
- `config.yml`: this is the Stainless _configuration_ which instructs Stainless how to generate the client SDKs.
A small side note: notice the `.yml` suffixes since Stainless uses that suffix typically for its configuration files.
These files go hand-in-hand. Both `openapi.yml` and `config.yml` are generated by `scripts/run_openapi_generator.sh`:
- `openapi.yml` comes from the FastAPI-based generator.
- `config.yml` is rendered from `scripts/openapi_generator/stainless_config/config_data.py` so the Stainless config stays in lock-step with the spec.

View file

@ -0,0 +1,494 @@
# yaml-language-server: $schema=https://app.stainlessapi.com/config-internal.schema.json
organization:
name: llama-stack-client
docs: https://llama-stack.readthedocs.io/en/latest/
contact: llamastack@meta.com
security:
- {}
- BearerAuth: []
security_schemes:
BearerAuth:
type: http
scheme: bearer
targets:
node:
package_name: llama-stack-client
production_repo: llamastack/llama-stack-client-typescript
publish:
npm: false
python:
package_name: llama_stack_client
production_repo: llamastack/llama-stack-client-python
options:
use_uv: true
publish:
pypi: true
project_name: llama_stack_client
kotlin:
reverse_domain: com.llama_stack_client.api
production_repo: null
publish:
maven: false
go:
package_name: llama-stack-client
production_repo: llamastack/llama-stack-client-go
options:
enable_v2: true
back_compat_use_shared_package: false
client_settings:
default_env_prefix: LLAMA_STACK_CLIENT
opts:
api_key:
type: string
read_env: LLAMA_STACK_CLIENT_API_KEY
auth:
security_scheme: BearerAuth
nullable: true
environments:
production: http://any-hosted-llama-stack.com
pagination:
- name: datasets_iterrows
type: offset
request:
dataset_id:
type: string
start_index:
type: integer
x-stainless-pagination-property:
purpose: offset_count_param
limit:
type: integer
response:
data:
type: array
items:
type: object
next_index:
type: integer
x-stainless-pagination-property:
purpose: offset_count_start_field
- name: openai_cursor_page
type: cursor
request:
limit:
type: integer
after:
type: string
x-stainless-pagination-property:
purpose: next_cursor_param
response:
data:
type: array
items: {}
has_more:
type: boolean
last_id:
type: string
x-stainless-pagination-property:
purpose: next_cursor_field
settings:
license: MIT
unwrap_response_fields:
- data
file_header: 'Copyright (c) Meta Platforms, Inc. and affiliates.
All rights reserved.
This source code is licensed under the terms described in the LICENSE file in
the root directory of this source tree.
'
openapi:
transformations:
- command: mergeObject
reason: Better return_type using enum
args:
target:
- $.components.schemas
object:
ReturnType:
additionalProperties: false
properties:
type:
enum:
- string
- number
- boolean
- array
- object
- json
- union
- chat_completion_input
- completion_input
- agent_turn_input
required:
- type
type: object
- command: replaceProperties
reason: Replace return type properties with better model (see above)
args:
filter:
only:
- $.components.schemas.ScoringFn.properties.return_type
- $.components.schemas.RegisterScoringFunctionRequest.properties.return_type
value:
$ref: '#/components/schemas/ReturnType'
- command: oneOfToAnyOf
reason: Prism (mock server) doesn't like one of our requests as it technically
matches multiple variants
readme:
example_requests:
default:
type: request
endpoint: post /v1/chat/completions
params: {}
headline:
type: request
endpoint: get /v1/models
params: {}
pagination:
type: request
endpoint: post /v1/chat/completions
params: {}
resources:
$shared:
models:
interleaved_content_item: InterleavedContentItem
interleaved_content: InterleavedContent
param_type: ParamType
safety_violation: SafetyViolation
sampling_params: SamplingParams
scoring_result: ScoringResult
system_message: SystemMessage
health_info: HealthInfo
provider_info: ProviderInfo
list_providers_response: ListProvidersResponse
route_info: RouteInfo
list_routes_response: ListRoutesResponse
version_info: VersionInfo
toolgroups:
models:
tool_group: ToolGroup
list_tool_groups_response: ListToolGroupsResponse
methods:
register: post /v1/toolgroups
get: get /v1/toolgroups/{toolgroup_id}
list: get /v1/toolgroups
unregister: delete /v1/toolgroups/{toolgroup_id}
tools:
methods:
get: get /v1/tools/{tool_name}
list:
paginated: false
endpoint: get /v1/tools
tool_runtime:
models:
tool_def: ToolDef
tool_invocation_result: ToolInvocationResult
methods:
list_tools:
paginated: false
endpoint: get /v1/tool-runtime/list-tools
invoke_tool: post /v1/tool-runtime/invoke
responses:
models:
response_object_stream: OpenAIResponseObjectStream
response_object: OpenAIResponseObject
methods:
create:
type: http
streaming:
stream_event_model: responses.response_object_stream
param_discriminator: stream
endpoint: post /v1/responses
retrieve: get /v1/responses/{response_id}
list:
type: http
endpoint: get /v1/responses
delete:
type: http
endpoint: delete /v1/responses/{response_id}
subresources:
input_items:
methods:
list:
type: http
paginated: false
endpoint: get /v1/responses/{response_id}/input_items
prompts:
models:
prompt: Prompt
list_prompts_response: ListPromptsResponse
methods:
create: post /v1/prompts
list:
paginated: false
endpoint: get /v1/prompts
retrieve: get /v1/prompts/{prompt_id}
update: post /v1/prompts/{prompt_id}
delete: delete /v1/prompts/{prompt_id}
set_default_version: post /v1/prompts/{prompt_id}/set-default-version
subresources:
versions:
methods:
list:
paginated: false
endpoint: get /v1/prompts/{prompt_id}/versions
conversations:
models:
conversation_object: Conversation
methods:
create:
type: http
endpoint: post /v1/conversations
retrieve: get /v1/conversations/{conversation_id}
update:
type: http
endpoint: post /v1/conversations/{conversation_id}
delete:
type: http
endpoint: delete /v1/conversations/{conversation_id}
subresources:
items:
methods:
get:
type: http
endpoint: get /v1/conversations/{conversation_id}/items/{item_id}
list:
type: http
endpoint: get /v1/conversations/{conversation_id}/items
create:
type: http
endpoint: post /v1/conversations/{conversation_id}/items
delete:
type: http
endpoint: delete /v1/conversations/{conversation_id}/items/{item_id}
inspect:
methods:
health: get /v1/health
version: get /v1/version
embeddings:
models:
create_embeddings_response: OpenAIEmbeddingsResponse
methods:
create: post /v1/embeddings
chat:
models:
chat_completion_chunk: OpenAIChatCompletionChunk
subresources:
completions:
methods:
create:
type: http
streaming:
stream_event_model: chat.chat_completion_chunk
param_discriminator: stream
endpoint: post /v1/chat/completions
list:
type: http
paginated: false
endpoint: get /v1/chat/completions
retrieve:
type: http
endpoint: get /v1/chat/completions/{completion_id}
completions:
methods:
create:
type: http
streaming:
param_discriminator: stream
endpoint: post /v1/completions
vector_io:
models:
queryChunksResponse: QueryChunksResponse
methods:
insert: post /v1/vector-io/insert
query: post /v1/vector-io/query
vector_stores:
models:
vector_store: VectorStoreObject
list_vector_stores_response: VectorStoreListResponse
vector_store_delete_response: VectorStoreDeleteResponse
vector_store_search_response: VectorStoreSearchResponsePage
methods:
create: post /v1/vector_stores
list: get /v1/vector_stores
retrieve: get /v1/vector_stores/{vector_store_id}
update: post /v1/vector_stores/{vector_store_id}
delete: delete /v1/vector_stores/{vector_store_id}
search: post /v1/vector_stores/{vector_store_id}/search
subresources:
files:
models:
vector_store_file: VectorStoreFileObject
methods:
list: get /v1/vector_stores/{vector_store_id}/files
retrieve: get /v1/vector_stores/{vector_store_id}/files/{file_id}
update: post /v1/vector_stores/{vector_store_id}/files/{file_id}
delete: delete /v1/vector_stores/{vector_store_id}/files/{file_id}
create: post /v1/vector_stores/{vector_store_id}/files
content: get /v1/vector_stores/{vector_store_id}/files/{file_id}/content
file_batches:
models:
vector_store_file_batches: VectorStoreFileBatchObject
list_vector_store_files_in_batch_response: VectorStoreFilesListInBatchResponse
methods:
create: post /v1/vector_stores/{vector_store_id}/file_batches
retrieve: get /v1/vector_stores/{vector_store_id}/file_batches/{batch_id}
list_files: get /v1/vector_stores/{vector_store_id}/file_batches/{batch_id}/files
cancel: post /v1/vector_stores/{vector_store_id}/file_batches/{batch_id}/cancel
models:
models:
model: OpenAIModel
list_models_response: OpenAIListModelsResponse
methods:
list:
paginated: false
endpoint: get /v1/models
retrieve: get /v1/models/{model_id}
register: post /v1/models
unregister: delete /v1/models/{model_id}
subresources:
openai:
methods:
list:
paginated: false
endpoint: get /v1/models
providers:
methods:
list:
paginated: false
endpoint: get /v1/providers
retrieve: get /v1/providers/{provider_id}
routes:
methods:
list:
paginated: false
endpoint: get /v1/inspect/routes
moderations:
models:
create_response: ModerationObject
methods:
create: post /v1/moderations
safety:
models:
run_shield_response: RunShieldResponse
methods:
run_shield: post /v1/safety/run-shield
shields:
models:
shield: Shield
list_shields_response: ListShieldsResponse
methods:
retrieve: get /v1/shields/{identifier}
list:
paginated: false
endpoint: get /v1/shields
register: post /v1/shields
delete: delete /v1/shields/{identifier}
scoring:
methods:
score: post /v1/scoring/score
score_batch: post /v1/scoring/score-batch
scoring_functions:
models:
scoring_fn: ScoringFn
scoring_fn_params: ScoringFnParams
list_scoring_functions_response: ListScoringFunctionsResponse
methods:
retrieve: get /v1/scoring-functions/{scoring_fn_id}
list:
paginated: false
endpoint: get /v1/scoring-functions
register: post /v1/scoring-functions
unregister: delete /v1/scoring-functions/{scoring_fn_id}
files:
models:
file: OpenAIFileObject
list_files_response: ListOpenAIFileResponse
delete_file_response: OpenAIFileDeleteResponse
methods:
create: post /v1/files
list: get /v1/files
retrieve: get /v1/files/{file_id}
delete: delete /v1/files/{file_id}
content: get /v1/files/{file_id}/content
batches:
methods:
create: post /v1/batches
list: get /v1/batches
retrieve: get /v1/batches/{batch_id}
cancel: post /v1/batches/{batch_id}/cancel
alpha:
subresources:
inference:
methods:
rerank: post /v1alpha/inference/rerank
post_training:
models:
algorithm_config: AlgorithmConfig
post_training_job: PostTrainingJob
list_post_training_jobs_response: ListPostTrainingJobsResponse
methods:
preference_optimize: post /v1alpha/post-training/preference-optimize
supervised_fine_tune: post /v1alpha/post-training/supervised-fine-tune
subresources:
job:
methods:
artifacts: get /v1alpha/post-training/job/artifacts
cancel: post /v1alpha/post-training/job/cancel
status: get /v1alpha/post-training/job/status
list:
paginated: false
endpoint: get /v1alpha/post-training/jobs
benchmarks:
models:
benchmark: Benchmark
list_benchmarks_response: ListBenchmarksResponse
methods:
retrieve: get /v1alpha/eval/benchmarks/{benchmark_id}
list:
paginated: false
endpoint: get /v1alpha/eval/benchmarks
register: post /v1alpha/eval/benchmarks
unregister: delete /v1alpha/eval/benchmarks/{benchmark_id}
eval:
models:
evaluate_response: EvaluateResponse
benchmark_config: BenchmarkConfig
job: Job
methods:
evaluate_rows: post /v1alpha/eval/benchmarks/{benchmark_id}/evaluations
run_eval: post /v1alpha/eval/benchmarks/{benchmark_id}/jobs
evaluate_rows_alpha: post /v1alpha/eval/benchmarks/{benchmark_id}/evaluations
run_eval_alpha: post /v1alpha/eval/benchmarks/{benchmark_id}/jobs
subresources:
jobs:
methods:
cancel: delete /v1alpha/eval/benchmarks/{benchmark_id}/jobs/{job_id}
status: get /v1alpha/eval/benchmarks/{benchmark_id}/jobs/{job_id}
retrieve: get /v1alpha/eval/benchmarks/{benchmark_id}/jobs/{job_id}/result
admin:
methods:
list_providers: get /v1alpha/admin/providers
inspect_provider: get /v1alpha/admin/providers/{provider_id}
list_routes: get /v1alpha/admin/inspect/routes
health: get /v1alpha/admin/health
version: get /v1alpha/admin/version
beta:
subresources:
datasets:
models:
list_datasets_response: ListDatasetsResponse
methods:
register: post /v1beta/datasets
retrieve: get /v1beta/datasets/{dataset_id}
list:
paginated: false
endpoint: get /v1beta/datasets
unregister: delete /v1beta/datasets/{dataset_id}
iterrows: get /v1beta/datasetio/iterrows/{dataset_id}
appendrows: post /v1beta/datasetio/append-rows/{dataset_id}

File diff suppressed because it is too large Load diff

163
containers/Containerfile Normal file
View file

@ -0,0 +1,163 @@
# syntax=docker/dockerfile:1.6
#
# This Dockerfile is used to build the Llama Stack container image.
# Example:
# docker build \
# -f containers/Containerfile \
# --build-arg DISTRO_NAME=starter \
# --tag llama-stack:starter .
ARG BASE_IMAGE=python:3.12-slim
FROM ${BASE_IMAGE}
ARG INSTALL_MODE="pypi"
ARG LLAMA_STACK_DIR="/workspace"
ARG LLAMA_STACK_CLIENT_DIR=""
ARG PYPI_VERSION=""
ARG TEST_PYPI_VERSION=""
ARG KEEP_WORKSPACE=""
ARG DISTRO_NAME="starter"
ARG RUN_CONFIG_PATH=""
ARG UV_HTTP_TIMEOUT=500
ARG UV_EXTRA_INDEX_URL=""
ARG UV_INDEX_STRATEGY=""
ENV UV_HTTP_TIMEOUT=${UV_HTTP_TIMEOUT}
ENV PYTHONDONTWRITEBYTECODE=1
ENV PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /app
RUN set -eux; \
if command -v dnf >/dev/null 2>&1; then \
dnf -y update && \
dnf install -y iputils git net-tools wget \
vim-minimal python3.12 python3.12-pip python3.12-wheel \
python3.12-setuptools python3.12-devel gcc gcc-c++ make && \
ln -sf /usr/bin/pip3.12 /usr/local/bin/pip && \
ln -sf /usr/bin/python3.12 /usr/local/bin/python && \
dnf clean all; \
elif command -v apt-get >/dev/null 2>&1; then \
apt-get update && \
apt-get install -y --no-install-recommends \
iputils-ping net-tools iproute2 dnsutils telnet \
curl wget git procps psmisc lsof traceroute bubblewrap \
gcc g++ && \
rm -rf /var/lib/apt/lists/*; \
else \
echo "Unsupported base image: expected dnf or apt-get" >&2; \
exit 1; \
fi
RUN pip install --no-cache uv
ENV UV_SYSTEM_PYTHON=1
ENV INSTALL_MODE=${INSTALL_MODE}
ENV LLAMA_STACK_DIR=${LLAMA_STACK_DIR}
ENV LLAMA_STACK_CLIENT_DIR=${LLAMA_STACK_CLIENT_DIR}
ENV PYPI_VERSION=${PYPI_VERSION}
ENV TEST_PYPI_VERSION=${TEST_PYPI_VERSION}
ENV KEEP_WORKSPACE=${KEEP_WORKSPACE}
ENV DISTRO_NAME=${DISTRO_NAME}
ENV RUN_CONFIG_PATH=${RUN_CONFIG_PATH}
# Copy the repository so editable installs and run configurations are available.
COPY . /workspace
# Install the client package if it is provided
# NOTE: this is installed before llama-stack since llama-stack depends on llama-stack-client-python
# Unset UV index env vars to ensure we only use PyPI for the client
RUN set -eux; \
unset UV_EXTRA_INDEX_URL UV_INDEX_STRATEGY; \
if [ -n "$LLAMA_STACK_CLIENT_DIR" ]; then \
if [ ! -d "$LLAMA_STACK_CLIENT_DIR" ]; then \
echo "LLAMA_STACK_CLIENT_DIR is set but $LLAMA_STACK_CLIENT_DIR does not exist" >&2; \
exit 1; \
fi; \
uv pip install --no-cache -e "$LLAMA_STACK_CLIENT_DIR"; \
fi;
# Install llama-stack
# Use UV_EXTRA_INDEX_URL inline only for editable install with RC dependencies
RUN set -eux; \
SAVED_UV_EXTRA_INDEX_URL="${UV_EXTRA_INDEX_URL:-}"; \
SAVED_UV_INDEX_STRATEGY="${UV_INDEX_STRATEGY:-}"; \
unset UV_EXTRA_INDEX_URL UV_INDEX_STRATEGY; \
if [ "$INSTALL_MODE" = "editable" ]; then \
if [ ! -d "$LLAMA_STACK_DIR" ]; then \
echo "INSTALL_MODE=editable requires LLAMA_STACK_DIR to point to a directory inside the build context" >&2; \
exit 1; \
fi; \
if [ -n "$SAVED_UV_EXTRA_INDEX_URL" ] && [ -n "$SAVED_UV_INDEX_STRATEGY" ]; then \
UV_EXTRA_INDEX_URL="$SAVED_UV_EXTRA_INDEX_URL" UV_INDEX_STRATEGY="$SAVED_UV_INDEX_STRATEGY" \
uv pip install --no-cache -e "$LLAMA_STACK_DIR"; \
else \
uv pip install --no-cache -e "$LLAMA_STACK_DIR"; \
fi; \
elif [ "$INSTALL_MODE" = "test-pypi" ]; then \
uv pip install --no-cache fastapi libcst; \
if [ -n "$TEST_PYPI_VERSION" ]; then \
uv pip install --no-cache --extra-index-url https://test.pypi.org/simple/ --index-strategy unsafe-best-match "llama-stack==$TEST_PYPI_VERSION"; \
else \
uv pip install --no-cache --extra-index-url https://test.pypi.org/simple/ --index-strategy unsafe-best-match llama-stack; \
fi; \
else \
if [ -n "$PYPI_VERSION" ]; then \
uv pip install --no-cache "llama-stack==$PYPI_VERSION"; \
else \
uv pip install --no-cache llama-stack; \
fi; \
fi;
# Install the dependencies for the distribution
# Explicitly unset UV index env vars to ensure we only use PyPI for distribution deps
RUN set -eux; \
unset UV_EXTRA_INDEX_URL UV_INDEX_STRATEGY; \
if [ -z "$DISTRO_NAME" ]; then \
echo "DISTRO_NAME must be provided" >&2; \
exit 1; \
fi; \
deps="$(llama stack list-deps "$DISTRO_NAME")"; \
if [ -n "$deps" ]; then \
printf '%s\n' "$deps" | xargs -L1 uv pip install --no-cache; \
fi
# Install OpenTelemetry auto-instrumentation support
RUN set -eux; \
pip install --no-cache opentelemetry-distro opentelemetry-exporter-otlp; \
opentelemetry-bootstrap -a install
# Cleanup
RUN set -eux; \
pip uninstall -y uv; \
should_remove=1; \
if [ -n "$KEEP_WORKSPACE" ]; then should_remove=0; fi; \
if [ "$INSTALL_MODE" = "editable" ]; then should_remove=0; fi; \
case "$RUN_CONFIG_PATH" in \
/workspace*) should_remove=0 ;; \
esac; \
if [ "$should_remove" -eq 1 ] && [ -d /workspace ]; then rm -rf /workspace; fi
RUN cat <<'EOF' >/usr/local/bin/llama-stack-entrypoint.sh
#!/bin/sh
set -e
# Enable OpenTelemetry auto-instrumentation if any OTEL_* variable is set
CMD_PREFIX=""
if env | grep -q '^OTEL_'; then
CMD_PREFIX="opentelemetry-instrument"
fi
if [ -n "$RUN_CONFIG_PATH" ] && [ -f "$RUN_CONFIG_PATH" ]; then
exec $CMD_PREFIX llama stack run "$RUN_CONFIG_PATH" "$@"
fi
if [ -n "$DISTRO_NAME" ]; then
exec $CMD_PREFIX llama stack run "$DISTRO_NAME" "$@"
fi
exec $CMD_PREFIX llama stack run "$@"
EOF
RUN chmod +x /usr/local/bin/llama-stack-entrypoint.sh
RUN mkdir -p /.llama /.cache && chmod -R g+rw /app /.llama /.cache
ENTRYPOINT ["/usr/local/bin/llama-stack-entrypoint.sh"]

21
coverage.svg Normal file
View file

@ -0,0 +1,21 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="99" height="20">
<linearGradient id="b" x2="0" y2="100%">
<stop offset="0" stop-color="#bbb" stop-opacity=".1"/>
<stop offset="1" stop-opacity=".1"/>
</linearGradient>
<mask id="a">
<rect width="99" height="20" rx="3" fill="#fff"/>
</mask>
<g mask="url(#a)">
<path fill="#555" d="M0 0h63v20H0z"/>
<path fill="#fe7d37" d="M63 0h36v20H63z"/>
<path fill="url(#b)" d="M0 0h99v20H0z"/>
</g>
<g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="11">
<text x="31.5" y="15" fill="#010101" fill-opacity=".3">coverage</text>
<text x="31.5" y="14">coverage</text>
<text x="80" y="15" fill="#010101" fill-opacity=".3">44%</text>
<text x="80" y="14">44%</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 904 B

View file

@ -1,20 +0,0 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

58
docs/README.md Normal file
View file

@ -0,0 +1,58 @@
# Llama Stack Documentation
Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our [Github page](https://llamastack.github.io/getting_started/quickstart).
## Render locally
From the llama-stack `docs/` directory, run the following commands to render the docs locally:
```bash
npm install
npm run gen-api-docs all
npm run build
npm run serve
```
You can open up the docs in your browser at http://localhost:3000
## File Import System
This documentation uses `remark-code-import` to import files directly from the repository, eliminating copy-paste maintenance. Files are automatically embedded during build time.
### Importing Code Files
To import Python code (or any code files) with syntax highlighting, use this syntax in `.mdx` files:
```markdown
```python file=./demo_script.py title="demo_script.py"
```
```
This automatically imports the file content and displays it as a formatted code block with Python syntax highlighting.
**Note:** Paths are relative to the current `.mdx` file location, not the repository root.
### Importing Markdown Files as Content
For importing and rendering markdown files (like CONTRIBUTING.md), use the raw-loader approach:
```jsx
import Contributing from '!!raw-loader!../../../CONTRIBUTING.md';
import ReactMarkdown from 'react-markdown';
<ReactMarkdown>{Contributing}</ReactMarkdown>
```
**Requirements:**
- Install dependencies: `npm install --save-dev raw-loader react-markdown`
**Path Resolution:**
- For `remark-code-import`: Paths are relative to the current `.mdx` file location
- For `raw-loader`: Paths are relative to the current `.mdx` file location
- Use `../` to navigate up directories as needed
## Content
Try out Llama Stack's capabilities through our detailed Jupyter notebooks:
* [Building AI Applications Notebook](./getting_started.ipynb) - A comprehensive guide to building production-ready AI applications using Llama Stack
* [Benchmark Evaluations Notebook](./notebooks/Llama_Stack_Benchmark_Evals.ipynb) - Detailed performance evaluations and benchmarking results
* [Zero-to-Hero Guide](./zero_to_hero_guide) - Step-by-step guide for getting started with Llama Stack

View file

@ -1,35 +0,0 @@
@import url("theme.css");
.wy-nav-content {
max-width: 90%;
}
.wy-nav-side {
/* background: linear-gradient(45deg, #2980B9, #16A085); */
background: linear-gradient(90deg, #332735, #1b263c);
}
.wy-side-nav-search {
background-color: transparent !important;
}
.hide-title h1 {
display: none;
}
h2, h3, h4 {
font-weight: normal;
}
html[data-theme="dark"] .rst-content div[class^="highlight"] {
background-color: #0b0b0b;
}
pre {
white-space: pre-wrap !important;
word-break: break-all;
}
[data-theme="dark"] .mermaid {
background-color: #f4f4f6 !important;
border-radius: 6px;
padding: 0.5em;
}

View file

@ -1,32 +0,0 @@
document.addEventListener("DOMContentLoaded", function () {
const prefersDark = window.matchMedia("(prefers-color-scheme: dark)").matches;
const htmlElement = document.documentElement;
// Check if theme is saved in localStorage
const savedTheme = localStorage.getItem("sphinx-rtd-theme");
if (savedTheme) {
// Use the saved theme preference
htmlElement.setAttribute("data-theme", savedTheme);
document.body.classList.toggle("dark", savedTheme === "dark");
} else {
// Fall back to system preference
const theme = prefersDark ? "dark" : "light";
htmlElement.setAttribute("data-theme", theme);
document.body.classList.toggle("dark", theme === "dark");
// Save initial preference
localStorage.setItem("sphinx-rtd-theme", theme);
}
// Listen for theme changes from the existing toggle
const observer = new MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
if (mutation.attributeName === "data-theme") {
const currentTheme = htmlElement.getAttribute("data-theme");
localStorage.setItem("sphinx-rtd-theme", currentTheme);
}
});
});
observer.observe(htmlElement, { attributes: true });
});

Binary file not shown.

Before

Width:  |  Height:  |  Size: 70 KiB

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 196 KiB

View file

@ -1,24 +0,0 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
import os
import time
def pytest_collection_modifyitems(items):
for item in items:
item.name = item.name.replace(' ', '_')
def pytest_runtest_teardown(item):
interval_seconds = os.getenv("LLAMA_STACK_TEST_INTERVAL_SECONDS")
if interval_seconds:
time.sleep(float(interval_seconds))
def pytest_configure(config):
config.option.tbstyle = "short"
config.option.disable_warnings = True

View file

@ -1,7 +0,0 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
sphinx-autobuild --write-all source build/html --watch source/

View file

@ -0,0 +1,163 @@
# Evaluation
## Evaluation Concepts
The Llama Stack Evaluation flow allows you to run evaluations on your GenAI application datasets or pre-registered benchmarks.
We introduce a set of APIs in Llama Stack for supporting running evaluations of LLM applications:
- `/datasetio` + `/datasets` API
- `/scoring` + `/scoring_functions` API
- `/eval` + `/benchmarks` API
This guide goes over the sets of APIs and developer experience flow of using Llama Stack to run evaluations for different use cases. Checkout our Colab notebook on working examples with evaluations [here](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing).
The Evaluation APIs are associated with a set of Resources. Please visit the Resources section in our [Core Concepts](../concepts/index.mdx) guide for better high-level understanding.
- **DatasetIO**: defines interface with datasets and data loaders.
- Associated with `Dataset` resource.
- **Scoring**: evaluate outputs of the system.
- Associated with `ScoringFunction` resource. We provide a suite of out-of-the box scoring functions and also the ability for you to add custom evaluators. These scoring functions are the core part of defining an evaluation task to output evaluation metrics.
- **Eval**: generate outputs (via Inference or Agents) and perform scoring.
- Associated with `Benchmark` resource.
## Evaluation Providers
Llama Stack provides multiple evaluation providers:
- **Meta Reference** (`inline::meta-reference`) - Meta's reference implementation with multi-language support
- **NVIDIA** (`remote::nvidia`) - NVIDIA's evaluation platform integration
### Meta Reference
Meta's reference implementation of evaluation tasks with support for multiple languages and evaluation metrics.
#### Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `RedisKVStoreConfig \| SqliteKVStoreConfig \| PostgresKVStoreConfig \| MongoDBKVStoreConfig` | No | sqlite | Key-value store configuration |
#### Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/meta_reference_eval.db
```
#### Features
- Multi-language evaluation support
- Comprehensive evaluation metrics
- Integration with various key-value stores (SQLite, Redis, PostgreSQL, MongoDB)
- Built-in support for popular benchmarks
### NVIDIA
NVIDIA's evaluation provider for running evaluation tasks on NVIDIA's platform.
#### Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `evaluator_url` | `str` | No | http://0.0.0.0:7331 | The url for accessing the evaluator service |
#### Sample Configuration
```yaml
evaluator_url: ${env.NVIDIA_EVALUATOR_URL:=http://localhost:7331}
```
#### Features
- Integration with NVIDIA's evaluation platform
- Remote evaluation capabilities
- Scalable evaluation processing
## Open-benchmark Eval
### List of open-benchmarks Llama Stack support
Llama stack pre-registers several popular open-benchmarks to easily evaluate model performance via CLI.
The list of open-benchmarks we currently support:
- [MMLU-COT](https://arxiv.org/abs/2009.03300) (Measuring Massive Multitask Language Understanding): Benchmark designed to comprehensively evaluate the breadth and depth of a model's academic and professional understanding
- [GPQA-COT](https://arxiv.org/abs/2311.12022) (A Graduate-Level Google-Proof Q&A Benchmark): A challenging benchmark of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI): Benchmark designed to evaluate multimodal models.
You can follow this [contributing guide](../references/evals_reference/index.mdx#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
### Run evaluation on open-benchmarks via CLI
We have built-in functionality to run the supported open-benchmarks using llama-stack-client CLI
#### Spin up Llama Stack server
Spin up llama stack server with 'open-benchmark' template
```
llama stack run llama_stack/distributions/open-benchmark/config.yaml
```
#### Run eval CLI
There are 3 necessary inputs to run a benchmark eval
- `list of benchmark_ids`: The list of benchmark ids to run evaluation on
- `model-id`: The model id to evaluate on
- `output_dir`: Path to store the evaluate results
```
llama-stack-client eval run-benchmark <benchmark_id_1> <benchmark_id_2> ... \
--model_id <model id to evaluate on> \
--output_dir <directory to store the evaluate results>
```
You can run
```
llama-stack-client eval run-benchmark help
```
to see the description of all the flags that eval run-benchmark has
In the output log, you can find the file path that has your evaluation results. Open that file and you can see you aggregate evaluation results over there.
## Usage Example
Here's a basic example of using the evaluation API:
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:8321")
# Register a dataset for evaluation
client.datasets.register(
purpose="evaluation",
source={
"type": "uri",
"uri": "huggingface://datasets/llamastack/evaluation_dataset"
},
dataset_id="my_eval_dataset"
)
# Run evaluation
eval_result = client.eval.run_evaluation(
dataset_id="my_eval_dataset",
scoring_functions=["accuracy", "bleu"],
model_id="my_model"
)
print(f"Evaluation completed: {eval_result}")
```
## Best Practices
- **Choose appropriate providers**: Use Meta Reference for comprehensive evaluation, NVIDIA for platform-specific needs
- **Configure storage properly**: Ensure your key-value store configuration matches your performance requirements
- **Monitor evaluation progress**: Large evaluations can take time - implement proper monitoring
- **Use appropriate scoring functions**: Select scoring metrics that align with your evaluation goals
## What's Next?
- Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
- Check out our [Building Applications - Evaluation](../building_applications/evals.mdx) guide for more details on how to use the Evaluation APIs to evaluate your applications.
- Check out our [Evaluation Reference](../references/evals_reference/index.mdx) for more details on the APIs.
- Explore the [Scoring](./scoring.mdx) documentation for available scoring functions.

View file

@ -0,0 +1,305 @@
# Post-Training
Post-training in Llama Stack allows you to fine-tune models using various providers and frameworks. This section covers all available post-training providers and how to use them effectively.
## Overview
Llama Stack provides multiple post-training providers:
- **HuggingFace SFTTrainer** (`inline::huggingface`) - Fine-tuning using HuggingFace ecosystem
- **TorchTune** (`inline::torchtune`) - Fine-tuning using Meta's TorchTune framework
- **NVIDIA** (`remote::nvidia`) - Fine-tuning using NVIDIA's platform
## HuggingFace SFTTrainer
[HuggingFace SFTTrainer](https://huggingface.co/docs/trl/en/sft_trainer) is an inline post training provider for Llama Stack. It allows you to run supervised fine tuning on a variety of models using many datasets.
### Features
- Simple access through the post_training API
- Fully integrated with Llama Stack
- GPU support, CPU support, and MPS support (MacOS Metal Performance Shaders)
### Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `device` | `str` | No | cuda | |
| `distributed_backend` | `Literal['fsdp', 'deepspeed']` | No | | |
| `checkpoint_format` | `Literal['full_state', 'huggingface']` | No | huggingface | |
| `chat_template` | `str` | No | |
| `model_specific_config` | `dict` | No | `{'trust_remote_code': True, 'attn_implementation': 'sdpa'}` | |
| `max_seq_length` | `int` | No | 2048 | |
| `gradient_checkpointing` | `bool` | No | False | |
| `save_total_limit` | `int` | No | 3 | |
| `logging_steps` | `int` | No | 10 | |
| `warmup_ratio` | `float` | No | 0.1 | |
| `weight_decay` | `float` | No | 0.01 | |
| `dataloader_num_workers` | `int` | No | 4 | |
| `dataloader_pin_memory` | `bool` | No | True | |
### Sample Configuration
```yaml
checkpoint_format: huggingface
distributed_backend: null
device: cpu
```
### Setup
You can access the HuggingFace trainer via the `starter` distribution:
```bash
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```
### Usage Example
```python
import time
import uuid
from llama_stack_client.types import (
post_training_supervised_fine_tune_params,
algorithm_config_param,
)
def create_http_client():
from llama_stack_client import LlamaStackClient
return LlamaStackClient(base_url="http://localhost:8321")
client = create_http_client()
# Example Dataset
client.datasets.register(
purpose="post-training/messages",
source={
"type": "uri",
"uri": "huggingface://datasets/llamastack/simpleqa?split=train",
},
dataset_id="simpleqa",
)
training_config = post_training_supervised_fine_tune_params.TrainingConfig(
data_config=post_training_supervised_fine_tune_params.TrainingConfigDataConfig(
batch_size=32,
data_format="instruct",
dataset_id="simpleqa",
shuffle=True,
),
gradient_accumulation_steps=1,
max_steps_per_epoch=0,
max_validation_steps=1,
n_epochs=4,
)
algorithm_config = algorithm_config_param.LoraFinetuningConfig(
alpha=1,
apply_lora_to_mlp=True,
apply_lora_to_output=False,
lora_attn_modules=["q_proj"],
rank=1,
type="LoRA",
)
job_uuid = f"test-job{uuid.uuid4()}"
# Example Model
training_model = "ibm-granite/granite-3.3-8b-instruct"
start_time = time.time()
response = client.post_training.supervised_fine_tune(
job_uuid=job_uuid,
logger_config={},
model=training_model,
hyperparam_search_config={},
training_config=training_config,
algorithm_config=algorithm_config,
checkpoint_dir="output",
)
print("Job: ", job_uuid)
# Wait for the job to complete!
while True:
status = client.post_training.job.status(job_uuid=job_uuid)
if not status:
print("Job not found")
break
print(status)
if status.status == "completed":
break
print("Waiting for job to complete...")
time.sleep(5)
end_time = time.time()
print("Job completed in", end_time - start_time, "seconds!")
print("Artifacts:")
print(client.post_training.job.artifacts(job_uuid=job_uuid))
```
## TorchTune
[TorchTune](https://github.com/pytorch/torchtune) is an inline post training provider for Llama Stack. It provides a simple and efficient way to fine-tune language models using PyTorch.
### Features
- Simple access through the post_training API
- Fully integrated with Llama Stack
- GPU support and single device capabilities
- Support for LoRA
### Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `torch_seed` | `int \| None` | No | | |
| `checkpoint_format` | `Literal['meta', 'huggingface']` | No | meta | |
### Sample Configuration
```yaml
checkpoint_format: meta
```
### Setup
You can access the TorchTune trainer by writing your own yaml pointing to the provider:
```yaml
post_training:
- provider_id: torchtune
provider_type: inline::torchtune
config: {}
```
You can then build and run your own stack with this provider.
### Usage Example
```python
import time
import uuid
from llama_stack_client.types import (
post_training_supervised_fine_tune_params,
algorithm_config_param,
)
def create_http_client():
from llama_stack_client import LlamaStackClient
return LlamaStackClient(base_url="http://localhost:8321")
client = create_http_client()
# Example Dataset
client.datasets.register(
purpose="post-training/messages",
source={
"type": "uri",
"uri": "huggingface://datasets/llamastack/simpleqa?split=train",
},
dataset_id="simpleqa",
)
training_config = post_training_supervised_fine_tune_params.TrainingConfig(
data_config=post_training_supervised_fine_tune_params.TrainingConfigDataConfig(
batch_size=32,
data_format="instruct",
dataset_id="simpleqa",
shuffle=True,
),
gradient_accumulation_steps=1,
max_steps_per_epoch=0,
max_validation_steps=1,
n_epochs=4,
)
algorithm_config = algorithm_config_param.LoraFinetuningConfig(
alpha=1,
apply_lora_to_mlp=True,
apply_lora_to_output=False,
lora_attn_modules=["q_proj"],
rank=1,
type="LoRA",
)
job_uuid = f"test-job{uuid.uuid4()}"
# Example Model
training_model = "meta-llama/Llama-2-7b-hf"
start_time = time.time()
response = client.post_training.supervised_fine_tune(
job_uuid=job_uuid,
logger_config={},
model=training_model,
hyperparam_search_config={},
training_config=training_config,
algorithm_config=algorithm_config,
checkpoint_dir="output",
)
print("Job: ", job_uuid)
# Wait for the job to complete!
while True:
status = client.post_training.job.status(job_uuid=job_uuid)
if not status:
print("Job not found")
break
print(status)
if status.status == "completed":
break
print("Waiting for job to complete...")
time.sleep(5)
end_time = time.time()
print("Job completed in", end_time - start_time, "seconds!")
print("Artifacts:")
print(client.post_training.job.artifacts(job_uuid=job_uuid))
```
## NVIDIA
NVIDIA's post-training provider for fine-tuning models on NVIDIA's platform.
### Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | The NVIDIA API key. |
| `dataset_namespace` | `str \| None` | No | default | The NVIDIA dataset namespace. |
| `project_id` | `str \| None` | No | test-example-model@v1 | The NVIDIA project ID. |
| `customizer_url` | `str \| None` | No | | Base URL for the NeMo Customizer API |
| `timeout` | `int` | No | 300 | Timeout for the NVIDIA Post Training API |
| `max_retries` | `int` | No | 3 | Maximum number of retries for the NVIDIA Post Training API |
| `output_model_dir` | `str` | No | test-example-model@v1 | Directory to save the output model |
### Sample Configuration
```yaml
api_key: ${env.NVIDIA_API_KEY:=}
dataset_namespace: ${env.NVIDIA_DATASET_NAMESPACE:=default}
project_id: ${env.NVIDIA_PROJECT_ID:=test-project}
customizer_url: ${env.NVIDIA_CUSTOMIZER_URL:=http://nemo.test}
```
## Best Practices
- **Choose the right provider**: Use HuggingFace for broader compatibility, TorchTune for Meta models, or NVIDIA for their ecosystem
- **Configure hardware appropriately**: Ensure your configuration matches your available hardware (CPU, GPU, MPS)
- **Monitor jobs**: Always monitor job status and handle completion appropriately
- **Use appropriate datasets**: Ensure your dataset format matches the expected input format for your chosen provider
## Next Steps
- Check out the [Building Applications - Fine-tuning](../building_applications/index.mdx) guide for application-level examples
- See the [Providers](../providers/post_training/index.mdx) section for detailed provider documentation
- Review the [API Reference](../advanced_apis/post_training.mdx) for complete API documentation

View file

@ -0,0 +1,193 @@
# Scoring
The Scoring API in Llama Stack allows you to evaluate outputs of your GenAI system using various scoring functions and metrics. This section covers all available scoring providers and their configuration.
## Overview
Llama Stack provides multiple scoring providers:
- **Basic** (`inline::basic`) - Simple evaluation metrics and scoring functions
- **Braintrust** (`inline::braintrust`) - Advanced evaluation using the Braintrust platform
- **LLM-as-Judge** (`inline::llm-as-judge`) - Uses language models to evaluate responses
The Scoring API is associated with `ScoringFunction` resources and provides a suite of out-of-the-box scoring functions. You can also add custom evaluators to meet specific evaluation needs.
## Basic Scoring
Basic scoring provider for simple evaluation metrics and scoring functions. This provider offers fundamental scoring capabilities without external dependencies.
### Configuration
No configuration required - this provider works out of the box.
```yaml
{}
```
### Features
- Simple evaluation metrics (accuracy, precision, recall, F1-score)
- String matching and similarity metrics
- Basic statistical scoring functions
- No external dependencies required
- Fast execution for standard metrics
### Use Cases
- Quick evaluation of basic accuracy metrics
- String similarity comparisons
- Statistical analysis of model outputs
- Development and testing scenarios
## Braintrust
Braintrust scoring provider for evaluation and scoring using the [Braintrust platform](https://braintrustdata.com/). Braintrust provides advanced evaluation capabilities and experiment tracking.
### Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `openai_api_key` | `str \| None` | No | | The OpenAI API Key for LLM-powered evaluations |
### Sample Configuration
```yaml
openai_api_key: ${env.OPENAI_API_KEY:=}
```
### Features
- Advanced evaluation metrics
- Experiment tracking and comparison
- LLM-powered evaluation functions
- Integration with Braintrust's evaluation suite
- Detailed scoring analytics and insights
### Use Cases
- Production evaluation pipelines
- A/B testing of model versions
- Advanced scoring with custom metrics
- Detailed evaluation reporting and analysis
## LLM-as-Judge
LLM-as-judge scoring provider that uses language models to evaluate and score responses. This approach leverages the reasoning capabilities of large language models to assess quality, relevance, and other subjective metrics.
### Configuration
No configuration required - this provider works out of the box.
```yaml
{}
```
### Features
- Subjective quality evaluation using LLMs
- Flexible evaluation criteria definition
- Natural language evaluation explanations
- Support for complex evaluation scenarios
- Contextual understanding of responses
### Use Cases
- Evaluating response quality and relevance
- Assessing creativity and coherence
- Subjective metric evaluation
- Human-like judgment for complex tasks
## Usage Examples
### Basic Scoring Example
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:8321")
# Register a basic accuracy scoring function
client.scoring_functions.register(
scoring_function_id="basic_accuracy",
provider_id="basic",
provider_scoring_function_id="accuracy"
)
# Use the scoring function
result = client.scoring.score(
input_rows=[
{"expected": "Paris", "actual": "Paris"},
{"expected": "London", "actual": "Paris"}
],
scoring_function_id="basic_accuracy"
)
print(f"Accuracy: {result.results[0].score}")
```
### LLM-as-Judge Example
```python
# Register an LLM-as-judge scoring function
client.scoring_functions.register(
scoring_function_id="quality_judge",
provider_id="llm_judge",
provider_scoring_function_id="response_quality",
params={
"criteria": "Evaluate response quality, relevance, and helpfulness",
"scale": "1-10"
}
)
# Score responses using LLM judgment
result = client.scoring.score(
input_rows=[{
"query": "What is machine learning?",
"response": "Machine learning is a subset of AI that enables computers to learn patterns from data..."
}],
scoring_function_id="quality_judge"
)
```
### Braintrust Integration Example
```python
# Register a Braintrust scoring function
client.scoring_functions.register(
scoring_function_id="braintrust_eval",
provider_id="braintrust",
provider_scoring_function_id="semantic_similarity"
)
# Run evaluation with Braintrust
result = client.scoring.score(
input_rows=[{
"reference": "The capital of France is Paris",
"candidate": "Paris is the capital city of France"
}],
scoring_function_id="braintrust_eval"
)
```
## Best Practices
- **Choose appropriate providers**: Use Basic for simple metrics, Braintrust for advanced analytics, LLM-as-Judge for subjective evaluation
- **Define clear criteria**: When using LLM-as-Judge, provide specific evaluation criteria and scales
- **Validate scoring functions**: Test your scoring functions with known examples before production use
- **Monitor performance**: Track scoring performance and adjust thresholds based on results
- **Combine multiple metrics**: Use different scoring providers together for comprehensive evaluation
## Integration with Evaluation
The Scoring API works closely with the [Evaluation](./evaluation.mdx) API to provide comprehensive evaluation workflows:
1. **Datasets** are loaded via the DatasetIO API
2. **Evaluation** generates model outputs using the Eval API
3. **Scoring** evaluates the quality of outputs using various scoring functions
4. **Results** are aggregated and reported for analysis
## Next Steps
- Check out the [Evaluation](./evaluation.mdx) guide for running complete evaluations
- See the [Building Applications - Evaluation](../building_applications/evals.mdx) guide for application examples
- Review the [Evaluation Reference](../references/evals_reference/) for comprehensive scoring function usage
- Explore the [Evaluation Concepts](../concepts/evaluation_concepts) for detailed conceptual information

View file

@ -0,0 +1,62 @@
---
title: Deprecated APIs
description: Legacy APIs that are being phased out
sidebar_label: Deprecated
sidebar_position: 1
---
# Deprecated APIs
This section contains APIs that are being phased out in favor of newer, more standardized implementations. These APIs are maintained for backward compatibility but are not recommended for new projects.
:::warning Deprecation Notice
These APIs are deprecated and will be removed in future versions. Please migrate to the recommended alternatives listed below.
:::
## Migration Guide
When using deprecated APIs, please refer to the migration guides provided for each API to understand how to transition to the supported alternatives.
## Deprecated API List
### Legacy Inference APIs
Some older inference endpoints that have been superseded by the standardized Inference API.
**Migration Path:** Use the [Inference API](../api/) instead.
### Legacy Vector Operations
Older vector database operations that have been replaced by the Vector IO API.
**Migration Path:** Use the [Vector IO API](../api/) instead.
### Legacy File Operations
Older file management endpoints that have been replaced by the Files API.
**Migration Path:** Use the [Files API](../api/) instead.
## Support Timeline
Deprecated APIs will be supported according to the following timeline:
- **Current Version**: Full support with deprecation warnings
- **Next Major Version**: Limited support with migration notices
- **Following Major Version**: Removal of deprecated APIs
## Getting Help
If you need assistance migrating from deprecated APIs:
1. Check the specific migration guides for each API
2. Review the [API Reference](../api/) for current alternatives
3. Consult the [Community Forums](https://github.com/llamastack/llama-stack/discussions) for migration support
4. Open an issue on GitHub for specific migration questions
## Contributing
If you find issues with deprecated APIs or have suggestions for improving the migration process, please contribute by:
1. Opening an issue describing the problem
2. Submitting a pull request with improvements
3. Updating migration documentation
For more information on contributing, see our [Contributing Guide](../contributing/).

View file

@ -0,0 +1,128 @@
---
title: Experimental APIs
description: APIs in development with limited support
sidebar_label: Experimental
sidebar_position: 1
---
# Experimental APIs
This section contains APIs that are currently in development and may have limited support or stability. These APIs are available for testing and feedback but should not be used in production environments.
:::warning Experimental Notice
These APIs are experimental and may change without notice. Use with caution and provide feedback to help improve them.
:::
## Current Experimental APIs
### Batch Inference API
Run inference on a dataset of inputs in batch mode for improved efficiency.
**Status:** In Development
**Provider Support:** Limited
**Use Case:** Large-scale inference operations
**Features:**
- Batch processing of multiple inputs
- Optimized resource utilization
- Progress tracking and monitoring
### Batch Agents API
Run agentic workflows on a dataset of inputs in batch mode.
**Status:** In Development
**Provider Support:** Limited
**Use Case:** Large-scale agent operations
**Features:**
- Batch agent execution
- Parallel processing capabilities
- Result aggregation and analysis
### Synthetic Data Generation API
Generate synthetic data for model development and testing.
**Status:** Early Development
**Provider Support:** Very Limited
**Use Case:** Training data augmentation
**Features:**
- Automated data generation
- Quality control mechanisms
- Customizable generation parameters
### Batches API (OpenAI-compatible)
OpenAI-compatible batch management for inference operations.
**Status:** In Development
**Provider Support:** Limited
**Use Case:** OpenAI batch processing compatibility
**Features:**
- OpenAI batch API compatibility
- Job scheduling and management
- Status tracking and monitoring
## Getting Started with Experimental APIs
### Prerequisites
- Llama Stack server running with experimental features enabled
- Appropriate provider configurations
- Understanding of API limitations
### Configuration
Experimental APIs may require special configuration flags or provider settings. Check the specific API documentation for setup requirements.
### Usage Guidelines
1. **Testing Only**: Use experimental APIs for testing and development only
2. **Monitor Changes**: Watch for updates and breaking changes
3. **Provide Feedback**: Report issues and suggest improvements
4. **Backup Data**: Always backup important data when using experimental features
## Feedback and Contribution
We encourage feedback on experimental APIs to help improve them:
### Reporting Issues
- Use GitHub issues with the "experimental" label
- Include detailed error messages and reproduction steps
- Specify the API version and provider being used
### Feature Requests
- Submit feature requests through GitHub discussions
- Provide use cases and expected behavior
- Consider contributing implementations
### Testing
- Test experimental APIs in your environment
- Report performance issues and optimization opportunities
- Share success stories and use cases
## Migration to Stable APIs
As experimental APIs mature, they will be moved to the stable API section. When this happens:
1. **Announcement**: We'll announce the promotion in release notes
2. **Migration Guide**: Detailed migration instructions will be provided
3. **Deprecation Timeline**: Experimental versions will be deprecated with notice
4. **Support**: Full support will be available for stable versions
## Provider Support
Experimental APIs may have limited provider support. Check the specific API documentation for:
- Supported providers
- Configuration requirements
- Known limitations
- Performance characteristics
## Roadmap
Experimental APIs are part of our ongoing development roadmap:
- **Q1 2024**: Batch Inference API stabilization
- **Q2 2024**: Batch Agents API improvements
- **Q3 2024**: Synthetic Data Generation API expansion
- **Q4 2024**: Batches API full OpenAI compatibility
For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions).

View file

@ -0,0 +1,287 @@
---
title: OpenAI API Compatibility
description: OpenAI-compatible APIs and features in Llama Stack
sidebar_label: OpenAI Compatibility
sidebar_position: 1
---
# OpenAI API Compatibility
Llama Stack provides comprehensive OpenAI API compatibility, allowing you to use existing OpenAI API clients and tools with Llama Stack providers. This compatibility layer ensures seamless migration and interoperability.
## Overview
OpenAI API compatibility in Llama Stack includes:
- **OpenAI-compatible endpoints** for all major APIs
- **Request/response format compatibility** with OpenAI standards
- **Authentication and authorization** using OpenAI-style API keys
- **Error handling** with OpenAI-compatible error codes and messages
- **Rate limiting** and usage tracking compatible with OpenAI patterns
## Supported OpenAI APIs
### Chat Completions API
OpenAI-compatible chat completions for conversational AI applications.
**Endpoint:** `/v1/chat/completions`
**Compatibility:** Full OpenAI API compatibility
**Providers:** All inference providers
**Features:**
- Message-based conversations
- System prompts and user messages
- Function calling support
- Streaming responses
- Temperature and other parameter controls
### Completions API
OpenAI-compatible text completions for general text generation.
**Endpoint:** `/v1/completions`
**Compatibility:** Full OpenAI API compatibility
**Providers:** All inference providers
**Features:**
- Text completion generation
- Prompt engineering support
- Customizable parameters
- Batch processing capabilities
### Embeddings API
OpenAI-compatible embeddings for vector operations.
**Endpoint:** `/v1/embeddings`
**Compatibility:** Full OpenAI API compatibility
**Providers:** All embedding providers
**Features:**
- Text embedding generation
- Multiple embedding models
- Batch embedding processing
- Vector similarity operations
### Files API
OpenAI-compatible file management for document processing.
**Endpoint:** `/v1/files`
**Compatibility:** Full OpenAI API compatibility
**Providers:** Local Filesystem, S3
**Features:**
- File upload and management
- Document processing
- File metadata tracking
- Secure file access
### Vector Store Files API
OpenAI-compatible vector store file operations for RAG applications.
**Endpoint:** `/v1/vector_stores/{vector_store_id}/files`
**Compatibility:** Full OpenAI API compatibility
**Providers:** FAISS, SQLite-vec, Milvus, ChromaDB, Qdrant, Weaviate, Postgres (PGVector)
**Features:**
- Automatic document processing
- Vector store integration
- File chunking and indexing
- Search and retrieval operations
### Batches API
OpenAI-compatible batch processing for large-scale operations.
**Endpoint:** `/v1/batches`
**Compatibility:** OpenAI API compatibility (experimental)
**Providers:** Limited support
**Features:**
- Batch job creation and management
- Progress tracking
- Result retrieval
- Error handling
## Migration from OpenAI
### Step 1: Update API Endpoint
Change your API endpoint from OpenAI to your Llama Stack server:
```python
# Before (OpenAI)
import openai
client = openai.OpenAI(api_key="your-openai-key")
# After (Llama Stack)
import openai
client = openai.OpenAI(
api_key="your-llama-stack-key",
base_url="http://localhost:8000/v1" # Your Llama Stack server
)
```
### Step 2: Configure Providers
Set up your preferred providers in the Llama Stack configuration:
```yaml
# stack-config.yaml
inference:
providers:
- name: "meta-reference"
type: "inline"
model: "llama-3.1-8b"
```
### Step 3: Test Compatibility
Verify that your existing code works with Llama Stack:
```python
# Test chat completions
response = client.chat.completions.create(
model="llama-3.1-8b",
messages=[
{"role": "user", "content": "Hello, world!"}
]
)
print(response.choices[0].message.content)
```
## Provider-Specific Features
### Meta Reference Provider
- Full OpenAI API compatibility
- Local model execution
- Custom model support
### Remote Providers
- OpenAI API compatibility
- Cloud-based execution
- Scalable infrastructure
### Vector Store Providers
- OpenAI vector store API compatibility
- Automatic document processing
- Advanced search capabilities
## Authentication
Llama Stack supports OpenAI-style authentication:
### API Key Authentication
```python
client = openai.OpenAI(
api_key="your-api-key",
base_url="http://localhost:8000/v1"
)
```
### Environment Variables
```bash
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="http://localhost:8000/v1"
```
## Error Handling
Llama Stack provides OpenAI-compatible error responses:
```python
try:
response = client.chat.completions.create(...)
except openai.APIError as e:
print(f"API Error: {e}")
except openai.RateLimitError as e:
print(f"Rate Limit Error: {e}")
except openai.APIConnectionError as e:
print(f"Connection Error: {e}")
```
## Rate Limiting
OpenAI-compatible rate limiting is supported:
- **Requests per minute** limits
- **Tokens per minute** limits
- **Concurrent request** limits
- **Usage tracking** and monitoring
## Monitoring and Observability
Track your API usage with OpenAI-compatible monitoring:
- **Request/response logging**
- **Usage metrics** and analytics
- **Performance monitoring**
- **Error tracking** and alerting
## Best Practices
### 1. Provider Selection
Choose providers based on your requirements:
- **Local development**: Meta Reference, Ollama
- **Production**: Cloud providers (Fireworks, Together, NVIDIA)
- **Specialized use cases**: Custom providers
### 2. Model Configuration
Configure models for optimal performance:
- **Model selection** based on task requirements
- **Parameter tuning** for specific use cases
- **Resource allocation** for performance
### 3. Error Handling
Implement robust error handling:
- **Retry logic** for transient failures
- **Fallback providers** for high availability
- **Monitoring** and alerting for issues
### 4. Security
Follow security best practices:
- **API key management** and rotation
- **Access control** and authorization
- **Data privacy** and compliance
## Implementation Examples
For detailed code examples and implementation guides, see our [OpenAI Implementation Guide](../providers/openai.mdx).
## Known Limitations
### Responses API Limitations
The Responses API is still in active development. For detailed information about current limitations and implementation status, see our [OpenAI Responses API Limitations](../providers/openai_responses_limitations.mdx).
## Troubleshooting
### Common Issues
**Connection Errors**
- Verify server is running
- Check network connectivity
- Validate API endpoint URL
**Authentication Errors**
- Verify API key is correct
- Check key permissions
- Ensure proper authentication headers
**Model Errors**
- Verify model is available
- Check provider configuration
- Validate model parameters
### Getting Help
For OpenAI compatibility issues:
1. **Check Documentation**: Review provider-specific documentation
2. **Community Support**: Ask questions in GitHub discussions
3. **Issue Reporting**: Open GitHub issues for bugs
4. **Professional Support**: Contact support for enterprise issues
## Roadmap
Upcoming OpenAI compatibility features:
- **Enhanced batch processing** support
- **Advanced function calling** capabilities
- **Improved error handling** and diagnostics
- **Performance optimizations** for large-scale deployments
For the latest updates, follow our [GitHub releases](https://github.com/llamastack/llama-stack/releases) and [roadmap discussions](https://github.com/llamastack/llama-stack/discussions).

49
docs/docs/api-overview.md Normal file
View file

@ -0,0 +1,49 @@
# API Reference Overview
The Llama Stack provides a comprehensive set of APIs organized by stability level to help you choose the right endpoints for your use case.
## 🟢 Stable APIs
**Production-ready APIs with backward compatibility guarantees.**
These APIs are fully tested, documented, and stable. They follow semantic versioning principles and maintain backward compatibility within major versions. Recommended for production applications.
[**Browse Stable APIs →**](./api/llama-stack-specification)
**Key Features:**
- ✅ Backward compatibility guaranteed
- ✅ Comprehensive testing and validation
- ✅ Production-ready reliability
- ✅ Long-term support
---
## 🟡 Experimental APIs
**Preview APIs that may change before becoming stable.**
These APIs include v1alpha and v1beta endpoints that are feature-complete but may undergo changes based on feedback. Great for exploring new capabilities and providing feedback.
[**Browse Experimental APIs →**](./api-experimental/llama-stack-specification-experimental-apis)
**Key Features:**
- 🧪 Latest features and capabilities
- 🧪 May change based on user feedback
- 🧪 Active development and iteration
- 🧪 Opportunity to influence final design
---
## 🔴 Deprecated APIs
**Legacy APIs for migration reference.**
These APIs are deprecated and will be removed in future versions. They are provided for migration purposes and to help transition to newer, stable alternatives.
[**Browse Deprecated APIs →**](./api-deprecated/llama-stack-specification-deprecated-apis)
**Key Features:**
- ⚠️ Will be removed in future versions
- ⚠️ Migration guidance provided
- ⚠️ Use for compatibility during transition
- ⚠️ Not recommended for new projects

144
docs/docs/api/index.mdx Normal file
View file

@ -0,0 +1,144 @@
---
title: API Reference
description: Complete reference for Llama Stack APIs
sidebar_label: Overview
sidebar_position: 1
---
# API Reference
Llama Stack provides a comprehensive set of APIs for building generative AI applications. All APIs follow OpenAI-compatible standards and can be used interchangeably across different providers.
## Core APIs
### Inference API
Run inference with Large Language Models (LLMs) and embedding models.
**Supported Providers:**
- Meta Reference (Single Node)
- Ollama (Single Node)
- Fireworks (Hosted)
- Together (Hosted)
- NVIDIA NIM (Hosted and Single Node)
- vLLM (Hosted and Single Node)
- TGI (Hosted and Single Node)
- AWS Bedrock (Hosted)
- Cerebras (Hosted)
- Groq (Hosted)
- SambaNova (Hosted)
- PyTorch ExecuTorch (On-device iOS, Android)
- OpenAI (Hosted)
- Anthropic (Hosted)
- Gemini (Hosted)
- WatsonX (Hosted)
### Agents API
Run multi-step agentic workflows with LLMs, including tool usage, memory (RAG), and complex reasoning.
**Supported Providers:**
- Meta Reference (Single Node)
- Fireworks (Hosted)
- Together (Hosted)
- PyTorch ExecuTorch (On-device iOS)
### Vector IO API
Perform operations on vector stores, including adding documents, searching, and deleting documents.
**Supported Providers:**
- FAISS (Single Node)
- SQLite-Vec (Single Node)
- Chroma (Hosted and Single Node)
- Milvus (Hosted and Single Node)
- Postgres (PGVector) (Hosted and Single Node)
- Weaviate (Hosted)
- Qdrant (Hosted and Single Node)
### Files API (OpenAI-compatible)
Manage file uploads, storage, and retrieval with OpenAI-compatible endpoints.
**Supported Providers:**
- Local Filesystem (Single Node)
- S3 (Hosted)
### Vector Store Files API (OpenAI-compatible)
Integrate file operations with vector stores for automatic document processing and search.
**Supported Providers:**
- FAISS (Single Node)
- SQLite-vec (Single Node)
- Milvus (Single Node)
- ChromaDB (Hosted and Single Node)
- Qdrant (Hosted and Single Node)
- Weaviate (Hosted)
- Postgres (PGVector) (Hosted and Single Node)
### Safety API
Apply safety policies to outputs at a systems level, not just model level.
**Supported Providers:**
- Llama Guard (Depends on Inference Provider)
- Prompt Guard (Single Node)
- Code Scanner (Single Node)
- AWS Bedrock (Hosted)
### Post Training API
Fine-tune models for specific use cases and domains.
**Supported Providers:**
- Meta Reference (Single Node)
- HuggingFace (Single Node)
- TorchTune (Single Node)
- NVIDIA NEMO (Hosted)
### Eval API
Generate outputs and perform scoring to evaluate system performance.
**Supported Providers:**
- Meta Reference (Single Node)
- NVIDIA NEMO (Hosted)
### Telemetry API
Collect telemetry data from the system for monitoring and observability.
**Supported Providers:**
- Meta Reference (Single Node)
### Tool Runtime API
Interact with various tools and protocols to extend LLM capabilities.
**Supported Providers:**
- Brave Search (Hosted)
- RAG Runtime (Single Node)
## API Compatibility
All Llama Stack APIs are designed to be OpenAI-compatible, allowing you to:
- Use existing OpenAI API clients and tools
- Migrate from OpenAI to other providers seamlessly
- Maintain consistent API contracts across different environments
## Getting Started
To get started with Llama Stack APIs:
1. **Choose a Distribution**: Select a pre-configured distribution that matches your environment
2. **Configure Providers**: Set up the providers you want to use for each API
3. **Start the Server**: Launch the Llama Stack server with your configuration
4. **Use the APIs**: Make requests to the API endpoints using your preferred client
For detailed setup instructions, see our [Getting Started Guide](../getting_started/quickstart).
## Provider Details
For complete provider compatibility and setup instructions, see our [Providers Documentation](../providers/).
## API Stability
Llama Stack APIs are organized by stability level:
- **[Stable APIs](./index.mdx)** - Production-ready APIs with full support
- **[Experimental APIs](../api-experimental/)** - APIs in development with limited support
- **[Deprecated APIs](../api-deprecated/)** - Legacy APIs being phased out
## OpenAI Integration
For specific OpenAI API compatibility features, see our [OpenAI Compatibility Guide](../api-openai/).

View file

@ -0,0 +1,112 @@
---
title: Agents
description: Build powerful AI applications with the Llama Stack agent framework
sidebar_label: Agents
sidebar_position: 3
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Agents
An Agent in Llama Stack is a powerful abstraction that allows you to build complex AI applications.
The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.
## Core Concepts
### 1. Agent Configuration
Agents are configured using the `AgentConfig` class, which includes:
- **Model**: The underlying LLM to power the agent
- **Instructions**: System prompt that defines the agent's behavior
- **Tools**: Capabilities the agent can use to interact with external systems
- **Safety Shields**: Guardrails to ensure responsible AI behavior
```python
from llama_stack_client import Agent
# Create the agent
agent = Agent(
llama_stack_client,
model="meta-llama/Llama-3-70b-chat",
instructions="You are a helpful assistant that can use tools to answer questions.",
tools=["builtin::code_interpreter", "builtin::rag/knowledge_search"],
)
```
### 2. Sessions
Agents maintain state through sessions, which represent a conversation thread:
```python
# Create a session
session_id = agent.create_session(session_name="My conversation")
```
### 3. Turns
Each interaction with an agent is called a "turn" and consists of:
- **Input Messages**: What the user sends to the agent
- **Steps**: The agent's internal processing (inference, tool execution, etc.)
- **Output Message**: The agent's response
<Tabs>
<TabItem value="streaming" label="Streaming Response">
```python
from llama_stack_client import AgentEventLogger
# Create a turn with streaming response
turn_response = agent.create_turn(
session_id=session_id,
messages=[{"role": "user", "content": "Tell me about Llama models"}],
)
for log in AgentEventLogger().log(turn_response):
log.print()
```
</TabItem>
<TabItem value="non-streaming" label="Non-Streaming Response">
```python
from rich.pretty import pprint
# Non-streaming API
response = agent.create_turn(
session_id=session_id,
messages=[{"role": "user", "content": "Tell me about Llama models"}],
stream=False,
)
print("Inputs:")
pprint(response.input_messages)
print("Output:")
pprint(response.output_message.content)
print("Steps:")
pprint(response.steps)
```
</TabItem>
</Tabs>
### 4. Steps
Each turn consists of multiple steps that represent the agent's thought process:
- **Inference Steps**: The agent generating text responses
- **Tool Execution Steps**: The agent using tools to gather information
- **Shield Call Steps**: Safety checks being performed
## Agent Execution Loop
Refer to the [Agent Execution Loop](./agent_execution_loop) for more details on what happens within an agent turn.
## Related Resources
- **[Agent Execution Loop](./agent_execution_loop)** - Understanding the internal processing flow
- **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced agents
- **[Tools Integration](./tools)** - Extending agent capabilities with external tools
- **[Safety Guardrails](./safety)** - Implementing responsible AI practices

View file

@ -0,0 +1,185 @@
---
title: Agent Execution Loop
description: Understanding the internal processing flow of Llama Stack agents
sidebar_label: Agent Execution Loop
sidebar_position: 4
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Agent Execution Loop
Agents are the heart of Llama Stack applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.
## Steps in the Agent Workflow
Each agent turn follows these key steps:
1. **Initial Safety Check**: The user's input is first screened through configured safety shields
2. **Context Retrieval**:
- If RAG is enabled, the agent can choose to query relevant documents from memory banks. You can use the `instructions` field to steer the agent.
- For new documents, they are first inserted into the memory bank.
- Retrieved context is provided to the LLM as a tool response in the message history.
3. **Inference Loop**: The agent enters its main execution loop:
- The LLM receives a user prompt (with previous tool outputs)
- The LLM generates a response, potentially with [tool calls](./tools)
- If tool calls are present:
- Tool inputs are safety-checked
- Tools are executed (e.g., web search, code execution)
- Tool responses are fed back to the LLM for synthesis
- The loop continues until:
- The LLM provides a final response without tool calls
- Maximum iterations are reached
- Token limit is exceeded
4. **Final Safety Check**: The agent's final response is screened through safety shields
## Execution Flow Diagram
```mermaid
sequenceDiagram
participant U as User
participant E as Executor
participant M as Memory Bank
participant L as LLM
participant T as Tools
participant S as Safety Shield
Note over U,S: Agent Turn Start
U->>S: 1. Submit Prompt
activate S
S->>E: Input Safety Check
deactivate S
loop Inference Loop
E->>L: 2.1 Augment with Context
L-->>E: 2.2 Response (with/without tool calls)
alt Has Tool Calls
E->>S: Check Tool Input
S->>T: 3.1 Execute Tool
T-->>E: 3.2 Tool Response
E->>L: 4.1 Tool Response
L-->>E: 4.2 Synthesized Response
end
opt Stop Conditions
Note over E: Break if:
Note over E: - No tool calls
Note over E: - Max iterations reached
Note over E: - Token limit exceeded
end
end
E->>S: Output Safety Check
S->>U: 5. Final Response
```
Each step in this process can be monitored and controlled through configurations.
## Agent Execution Example
Here's an example that demonstrates monitoring the agent's execution:
<Tabs>
<TabItem value="streaming" label="Streaming Execution">
```python
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
# Replace host and port
client = LlamaStackClient(base_url=f"http://{HOST}:{PORT}")
agent = Agent(
client,
# Check with `llama-stack-client models list`
model="Llama3.2-3B-Instruct",
instructions="You are a helpful assistant",
# Enable both RAG and tool usage
tools=[
{
"name": "builtin::rag/knowledge_search",
"args": {"vector_db_ids": ["my_docs"]},
},
"builtin::code_interpreter",
],
# Configure safety (optional)
input_shields=["llama_guard"],
output_shields=["llama_guard"],
# Control the inference loop
max_infer_iters=5,
sampling_params={
"strategy": {"type": "top_p", "temperature": 0.7, "top_p": 0.95},
"max_tokens": 2048,
},
)
session_id = agent.create_session("monitored_session")
# Stream the agent's execution steps
response = agent.create_turn(
messages=[{"role": "user", "content": "Analyze this code and run it"}],
documents=[
{
"content": "https://raw.githubusercontent.com/example/code.py",
"mime_type": "text/plain",
}
],
session_id=session_id,
)
# Monitor each step of execution
for log in AgentEventLogger().log(response):
log.print()
```
</TabItem>
<TabItem value="non-streaming" label="Non-Streaming Execution">
```python
from rich.pretty import pprint
# Using non-streaming API, the response contains input, steps, and output.
response = agent.create_turn(
messages=[{"role": "user", "content": "Analyze this code and run it"}],
documents=[
{
"content": "https://raw.githubusercontent.com/example/code.py",
"mime_type": "text/plain",
}
],
session_id=session_id,
stream=False,
)
pprint(f"Input: {response.input_messages}")
pprint(f"Output: {response.output_message.content}")
pprint(f"Steps: {response.steps}")
```
</TabItem>
</Tabs>
## Key Configuration Options
### Loop Control
- **max_infer_iters**: Maximum number of inference iterations (default: 5)
- **max_tokens**: Token limit for responses
- **temperature**: Controls response randomness
### Safety Configuration
- **input_shields**: Safety checks for user input
- **output_shields**: Safety checks for agent responses
### Tool Integration
- **tools**: List of available tools for the agent
- **tool_choice**: Control over when tools are used
## Related Resources
- **[Agents](./agent)** - Understanding agent fundamentals
- **[Tools Integration](./tools)** - Adding capabilities to agents
- **[Safety Guardrails](./safety)** - Implementing safety measures
- **[RAG (Retrieval Augmented Generation)](./rag)** - Building knowledge-enhanced workflows

View file

@ -0,0 +1,256 @@
---
title: Evaluations
description: Evaluate LLM applications with Llama Stack's comprehensive evaluation framework
sidebar_label: Evaluations
sidebar_position: 7
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
This guide walks you through the process of evaluating an LLM application built using Llama Stack. For detailed API reference, check out the [Evaluation Reference](../references/evals_reference/) guide that covers the complete set of APIs and developer experience flow.
:::tip[Interactive Examples]
Check out our [Colab notebook](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing) for working examples with evaluations, or try the [Getting Started notebook](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).
:::
## Application Evaluation Example
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)
Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
In this example, we will show you how to:
1. **Build an Agent** with Llama Stack
2. **Query the agent's sessions, turns, and steps** to analyze execution
3. **Evaluate the results** using scoring functions
## Step-by-Step Evaluation Process
### 1. Building a Search Agent
First, let's create an agent that can search the web to answer questions:
```python
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
client = LlamaStackClient(base_url=f"http://{HOST}:{PORT}")
agent = Agent(
client,
model="meta-llama/Llama-3.3-70B-Instruct",
instructions="You are a helpful assistant. Use search tool to answer the questions.",
tools=["builtin::websearch"],
)
# Test prompts for evaluation
user_prompts = [
"Which teams played in the NBA Western Conference Finals of 2024. Search the web for the answer.",
"In which episode and season of South Park does Bill Cosby (BSM-471) first appear? Give me the number and title. Search the web for the answer.",
"What is the British-American kickboxer Andrew Tate's kickboxing name? Search the web for the answer.",
]
session_id = agent.create_session("test-session")
# Execute all prompts in the session
for prompt in user_prompts:
response = agent.create_turn(
messages=[
{
"role": "user",
"content": prompt,
}
],
session_id=session_id,
)
for log in AgentEventLogger().log(response):
log.print()
```
### 2. Query Agent Execution Steps
Now, let's analyze the agent's execution steps to understand its performance:
<Tabs>
<TabItem value="session-analysis" label="Session Analysis">
```python
from rich.pretty import pprint
# Query the agent's session to get detailed execution data
session_response = client.agents.session.retrieve(
session_id=session_id,
agent_id=agent.agent_id,
)
pprint(session_response)
```
</TabItem>
<TabItem value="tool-validation" label="Tool Usage Validation">
```python
# Sanity check: Verify that all user prompts are followed by tool calls
num_tool_call = 0
for turn in session_response.turns:
for step in turn.steps:
if (
step.step_type == "tool_execution"
and step.tool_calls[0].tool_name == "brave_search"
):
num_tool_call += 1
print(
f"{num_tool_call}/{len(session_response.turns)} user prompts are followed by a tool call to `brave_search`"
)
```
</TabItem>
</Tabs>
### 3. Evaluate Agent Responses
Now we'll evaluate the agent's responses using Llama Stack's scoring API:
<Tabs>
<TabItem value="data-preparation" label="Data Preparation">
```python
# Process agent execution history into evaluation rows
eval_rows = []
# Define expected answers for our test prompts
expected_answers = [
"Dallas Mavericks and the Minnesota Timberwolves",
"Season 4, Episode 12",
"King Cobra",
]
# Create evaluation dataset from agent responses
for i, turn in enumerate(session_response.turns):
eval_rows.append(
{
"input_query": turn.input_messages[0].content,
"generated_answer": turn.output_message.content,
"expected_answer": expected_answers[i],
}
)
pprint(eval_rows)
```
</TabItem>
<TabItem value="scoring" label="Scoring & Evaluation">
```python
# Configure scoring parameters
scoring_params = {
"basic::subset_of": None, # Check if generated answer contains expected answer
}
# Run evaluation using Llama Stack's scoring API
scoring_response = client.scoring.score(
input_rows=eval_rows,
scoring_functions=scoring_params
)
pprint(scoring_response)
# Analyze results
for i, result in enumerate(scoring_response.results):
print(f"Query {i+1}: {result.score}")
print(f" Generated: {eval_rows[i]['generated_answer'][:100]}...")
print(f" Expected: {expected_answers[i]}")
print(f" Score: {result.score}")
print()
```
</TabItem>
</Tabs>
## Available Scoring Functions
Llama Stack provides several built-in scoring functions:
### Basic Scoring Functions
- **`basic::subset_of`**: Checks if the expected answer is contained in the generated response
- **`basic::exact_match`**: Performs exact string matching between expected and generated answers
- **`basic::regex_match`**: Uses regular expressions to match patterns in responses
### Advanced Scoring Functions
- **`llm_as_judge::accuracy`**: Uses an LLM to judge response accuracy
- **`llm_as_judge::helpfulness`**: Evaluates how helpful the response is
- **`llm_as_judge::safety`**: Assesses response safety and appropriateness
### Custom Scoring Functions
You can also create custom scoring functions for domain-specific evaluation needs.
## Evaluation Workflow Best Practices
### 🎯 **Dataset Preparation**
- Use diverse test cases that cover edge cases and common scenarios
- Include clear expected answers or success criteria
- Balance your dataset across different difficulty levels
### 📊 **Metrics Selection**
- Choose appropriate scoring functions for your use case
- Combine multiple metrics for comprehensive evaluation
- Consider both automated and human evaluation metrics
### 🔄 **Iterative Improvement**
- Run evaluations regularly during development
- Use evaluation results to identify areas for improvement
- Track performance changes over time
### 📈 **Analysis & Reporting**
- Analyze failures to understand model limitations
- Generate comprehensive evaluation reports
- Share results with stakeholders for informed decision-making
## Advanced Evaluation Scenarios
### Batch Evaluation
For evaluating large datasets efficiently:
```python
# Prepare large evaluation dataset
large_eval_dataset = [
{"input_query": query, "expected_answer": answer}
for query, answer in zip(queries, expected_answers)
]
# Run batch evaluation
batch_results = client.scoring.score(
input_rows=large_eval_dataset,
scoring_functions={
"basic::subset_of": None,
"llm_as_judge::accuracy": {"judge_model": "meta-llama/Llama-3.3-70B-Instruct"},
}
)
```
### Multi-Metric Evaluation
Combining different scoring approaches:
```python
comprehensive_scoring = {
"exact_match": "basic::exact_match",
"subset_match": "basic::subset_of",
"llm_judge": "llm_as_judge::accuracy",
"safety_check": "llm_as_judge::safety",
}
results = client.scoring.score(
input_rows=eval_rows,
scoring_functions=comprehensive_scoring
)
```
## Related Resources
- **[Agents](./agent)** - Building agents for evaluation
- **[Tools Integration](./tools)** - Using tools in evaluated agents
- **[Evaluation Reference](../references/evals_reference/)** - Complete API reference for evaluations
- **[Getting Started Notebook](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Interactive examples
- **[Evaluation Examples](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing)** - Additional evaluation scenarios

View file

@ -0,0 +1,80 @@
---
title: Building Applications
description: Comprehensive guides for building AI applications with Llama Stack
sidebar_label: Overview
sidebar_position: 5
---
# AI Application Examples
Llama Stack provides all the building blocks needed to create sophisticated AI applications.
## Getting Started
The best way to get started is to look at this comprehensive notebook which walks through the various APIs (from basic inference, to RAG agents) and how to use them.
**📓 [Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)**
## Core Topics
Here are the key topics that will help you build effective AI applications:
### 🤖 **Agent Development**
- **[Agent Framework](./agent.mdx)** - Understand the components and design patterns of the Llama Stack agent framework
- **[Agent Execution Loop](./agent_execution_loop.mdx)** - How agents process information, make decisions, and execute actions
- **[Agents vs Responses API](./responses_vs_agents.mdx)** - Learn when to use each API for different use cases
### 📚 **Knowledge Integration**
- **[RAG (Retrieval-Augmented Generation)](./rag.mdx)** - Enhance your agents with external knowledge through retrieval mechanisms
### 🛠️ **Capabilities & Extensions**
- **[Tools](./tools.mdx)** - Extend your agents' capabilities by integrating with external tools and APIs
### 📊 **Quality & Monitoring**
- **[Evaluations](./evals.mdx)** - Evaluate your agents' effectiveness and identify areas for improvement
- **[Telemetry](./telemetry.mdx)** - Monitor and analyze your agents' performance and behavior
- **[Safety](./safety.mdx)** - Implement guardrails and safety measures to ensure responsible AI behavior
## Application Patterns
### 🤖 **Conversational Agents**
Build intelligent chatbots and assistants that can:
- Maintain context across conversations
- Access external knowledge bases
- Execute actions through tool integrations
- Apply safety filters and guardrails
### 📖 **RAG Applications**
Create knowledge-augmented applications that:
- Retrieve relevant information from documents
- Generate contextually accurate responses
- Handle large knowledge bases efficiently
- Provide source attribution
### 🔧 **Tool-Enhanced Systems**
Develop applications that can:
- Search the web for real-time information
- Interact with databases and APIs
- Perform calculations and analysis
- Execute complex multi-step workflows
### 🛡️ **Enterprise Applications**
Build production-ready systems with:
- Comprehensive safety measures
- Performance monitoring and analytics
- Scalable deployment configurations
- Evaluation and quality assurance
## Next Steps
1. **📖 Start with the Notebook** - Work through the complete tutorial
2. **🎯 Choose Your Pattern** - Pick the application type that matches your needs
3. **🏗️ Build Your Foundation** - Set up your [providers](/docs/providers/) and [distributions](/docs/distributions/)
4. **🚀 Deploy & Monitor** - Use our [deployment guides](/docs/deploying/) for production
## Related Resources
- **[Getting Started](/docs/getting_started/quickstart)** - Basic setup and concepts
- **[Providers](/docs/providers/)** - Available AI service providers
- **[Distributions](/docs/distributions/)** - Pre-configured deployment packages
- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation

View file

@ -0,0 +1,87 @@
---
title: Admin UI & Chat Playground
description: Web-based admin interface and chat playground for Llama Stack
sidebar_label: Playground
sidebar_position: 10
---
# Admin UI & Chat Playground
The Llama Stack UI provides a comprehensive web-based admin interface for managing your Llama Stack server, with an integrated chat playground for interactive testing. This admin interface is the primary way to monitor, manage, and debug your Llama Stack applications.
## Quick Start
Launch the admin UI with:
```bash
npx llama-stack-ui
```
Then visit `http://localhost:8322` to access the interface.
## Admin Interface Features
The Llama Stack UI is organized into three main sections:
### 🎯 Create
**Chat Playground** - Interactive testing environment
- Real-time chat interface for testing agents and models
- Multi-turn conversations with tool calling support
- Agent SDK integration (will be migrated to Responses API)
- Custom system prompts and model parameter adjustment
### 📊 Manage
**Logs & Resource Management** - Monitor and manage your stack
- **Responses Logs**: View and analyze agent responses and interactions
- **Chat Completions Logs**: Monitor chat completion requests and responses
- **Vector Stores**: Create, manage, and monitor vector databases for RAG workflows
- **Prompts**: Full CRUD operations for prompt templates and management
- **Files**: Forthcoming file management capabilities
## Key Capabilities for Application Development
### Real-time Monitoring
- **Response Tracking**: Monitor all agent responses and tool calls
- **Completion Analysis**: View chat completion performance and patterns
- **Vector Store Activity**: Track RAG operations and document processing
- **Prompt Usage**: Analyze prompt template performance
### Resource Management
- **Vector Store CRUD**: Create, update, and delete vector databases
- **Prompt Library**: Organize and version control your prompts
- **File Operations**: Manage documents and assets (forthcoming)
### Interactive Testing
- **Chat Playground**: Test conversational flows before production deployment
- **Agent Prototyping**: Validate agent behaviors and tool integrations
## Development Workflow Integration
The admin UI supports your development lifecycle:
1. **Development**: Use chat playground to prototype and test features
2. **Monitoring**: Track system performance through logs and metrics
3. **Management**: Organize prompts, vector stores, and other resources
4. **Debugging**: Analyze logs to identify and resolve issues
## Architecture Notes
- **Current**: Chat playground uses Agents SDK
- **Future**: Migration to Responses API for improved performance and consistency
- **Admin Focus**: Primary emphasis on monitoring, logging, and resource management
## Getting Started
1. **Launch the UI**: Run `npx llama-stack-ui`
2. **Explore Logs**: Start with Responses and Chat Completions logs to understand your system activity
3. **Test in Playground**: Use the chat interface to validate your agent configurations
4. **Manage Resources**: Create vector stores and organize prompts through the UI
For detailed setup and configuration, see the [Llama Stack UI documentation](/docs/distributions/llama_stack_ui).
## Next Steps
- Set up your [first agent](/docs/building_applications/agent)
- Implement [RAG functionality](/docs/building_applications/rag)
- Add [evaluation metrics](/docs/building_applications/evals)
- Configure [safety measures](/docs/building_applications/safety)

View file

@ -0,0 +1,222 @@
---
title: Retrieval Augmented Generation (RAG)
description: Build knowledge-enhanced AI applications with external document retrieval
sidebar_label: RAG (Retrieval Augmented Generation)
sidebar_position: 2
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Retrieval Augmented Generation (RAG)
RAG enables your applications to reference and recall information from external documents. Llama Stack makes Agentic RAG available through OpenAI's Responses API.
## Quick Start
### 1. Start the Server
In one terminal, start the Llama Stack server:
```bash
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```
### 2. Choose Your Approach
Llama Stack supports various approaches for building RAG applications. The server provides two APIs (Responses and Chat Completions), plus a high-level client wrapper (Agent class):
#### Approach 1: Agent Class (Client-Side)
The **Agent class** is a high-level client wrapper around the Responses API with automatic tool execution and session management. Best for conversational agents and multi-turn RAG.
```python
from llama_stack_client import Agent, AgentEventLogger, LlamaStackClient
import requests
from io import BytesIO
client = LlamaStackClient(base_url="http://localhost:8321")
# Create vector store
vs = client.vector_stores.create(name="my_vector_db")
# Upload document
url = "https://www.paulgraham.com/greatwork.html"
response = requests.get(url)
file_buffer = BytesIO(response.content)
file_buffer.name = "greatwork.html"
file = client.files.create(file=file_buffer, purpose="assistants")
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file.id)
# Create agent with file_search tool (client-side wrapper)
agent = Agent(
client,
model="ollama/llama3.2:3b",
instructions="You are a helpful assistant",
tools=[
{
"type": "file_search",
"vector_store_ids": [vs.id], # Agent searches this automatically
}
],
)
# Just ask - agent handles retrieval automatically
response = agent.create_turn(
messages=[{"role": "user", "content": "How do you do great work?"}],
session_id=agent.create_session("my_session"),
stream=True,
)
for log in AgentEventLogger().log(response):
print(log, end="")
```
**How it works:**
- Client-side `Agent` class wraps the Responses API
- Agent automatically decides when to search the vector store
- Uses internal Python API for vector search (no HTTP overhead)
- Maintains conversation context across turns
- Best for: Interactive applications, chatbots, multi-turn conversations
#### Approach 2: Responses API
```python
import io, requests
from openai import OpenAI
url = "https://www.paulgraham.com/greatwork.html"
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
# Create vector store
vs = client.vector_stores.create()
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)
# Automatic tool calling (calls Responses API directly)
resp = client.responses.create(
model="gpt-4o",
input="How do you do great work?",
tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
include=["file_search_call.results"],
)
print(resp.output[-1].content[-1].text)
```
**How it works:**
- Server-side API with automatic tool calling
- Uses internal Python API for vector search
- No built-in session management (stateless by default)
- Best for: Single-turn queries, OpenAI-compatible applications
#### Approach 3: Chat Completions API
The **Chat Completions API** is a server-side API that gives you explicit control over retrieval and generation. Best for custom RAG pipelines and batch processing.
```python
import io, requests
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
# Create vector store and add documents
vs = client.vector_stores.create()
# ... upload and add files ...
# Explicitly search vector store via REST API
query = "How do you do great work?"
search_results = client.vector_stores.search(
vector_store_id=vs.id,
query=query,
limit=3
)
# Manually extract context
context = "\n\n".join([r.content for r in search_results.data if r.content])
# Manually construct prompt with context
completion = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Use the provided context to answer questions."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
]
)
print(completion.choices[0].message.content)
Doing great work is about more than just hard work and ambition; it involves combining several elements:
1. **Pursue What Excites You**: Engage in projects that are both ambitious and exciting to you. It's important to work on something you have a natural aptitude for and a deep interest in.
2. **Explore and Discover**: Great work often feels like a blend of discovery and creation. Focus on seeing possibilities and let ideas take their natural shape, rather than just executing a plan.
3. **Be Bold Yet Flexible**: Take bold steps in your work without over-planning. An adaptable approach that evolves with new ideas can often lead to breakthroughs.
4. **Work on Your Own Projects**: Develop a habit of working on projects of your own choosing, as these often lead to great achievements. These should be projects you find exciting and that challenge you intellectually.
5. **Be Earnest and Authentic**: Approach your work with earnestness and authenticity. Trying to impress others with affectation can be counterproductive, as genuine effort and intellectual honesty lead to better work outcomes.
6. **Build a Supportive Environment**: Work alongside great colleagues who inspire you and enhance your work. Surrounding yourself with motivating individuals creates a fertile environment for great work.
7. **Maintain High Morale**: High morale significantly impacts your ability to do great work. Stay optimistic and protect your mental well-being to maintain progress and momentum.
8. **Balance**: While hard work is essential, overworking can lead to diminishing returns. Balance periods of intensive work with rest to sustain productivity over time.
This approach shows that great work is less about following a strict formula and more about aligning your interests, ambition, and environment to foster creativity and innovation.
```
## Architecture Overview
Llama Stack provides OpenAI-compatible RAG capabilities through:
- **Vector Stores API**: OpenAI-compatible vector storage with automatic embedding model detection
- **Files API**: Document upload and processing using OpenAI's file format
- **Responses API**: Enhanced chat completions with agentic tool calling via file search
## Configuring Default Embedding Models
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your config.yaml like so:
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
```
With this configuration:
- `client.vector_stores.create()` works without requiring embedding model or provider parameters
- The system automatically uses the default vector store provider (`faiss`) when multiple providers are available
- The system automatically uses the default embedding model (`sentence-transformers/nomic-ai/nomic-embed-text-v1.5`) for any newly created vector store
- The `default_provider_id` specifies which vector storage backend to use
- The `default_embedding_model` specifies both the inference provider and model for embeddings
## Vector Store Operations
### Creating Vector Stores
You can create vector stores with automatic or explicit embedding model selection:
```python
# Automatic - uses default configured embedding model and vector store provider
vs = client.vector_stores.create()
# Explicit - specify embedding model and/or provider when you need specific ones
vs = client.vector_stores.create(
extra_body={
"provider_id": "faiss", # Optional: specify vector store provider
"embedding_model": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
"embedding_dimension": 768 # Optional: will be auto-detected if not provided
}
)
```

View file

@ -0,0 +1,221 @@
---
title: Agents vs OpenAI Responses API
description: Compare the Agents API and OpenAI Responses API for building AI applications with tool calling capabilities
sidebar_label: Agents vs Responses API
sidebar_position: 5
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Agents vs OpenAI Responses API
Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.
:::note
**Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](../providers/openai#chat-completions) directly, before progressing to Agents or Responses API.
:::
## Overview
### LLS Agents API
The Agents API is a full-featured, stateful system designed for complex, multi-turn conversations. It maintains conversation state through persistent sessions identified by a unique session ID. The API supports comprehensive agent lifecycle management, detailed execution tracking, and rich metadata about each interaction through a structured session/turn/step hierarchy. The API can orchestrate multiple tool calls within a single turn.
### OpenAI Responses API
The OpenAI Responses API is a full-featured, stateful system designed for complex, multi-turn conversations, with direct compatibility with OpenAI's conversational patterns enhanced by LLama Stack's tool calling capabilities. It maintains conversation state by chaining responses through a `previous_response_id`, allowing interactions to branch or continue from any prior point. Each response can perform multiple tool calls within a single turn.
### Key Differences
The LLS Agents API uses the Chat Completions API on the backend for inference as it's the industry standard for building AI applications and most LLM providers are compatible with this API. For a detailed comparison between Responses and Chat Completions, see [OpenAI's documentation](https://platform.openai.com/docs/guides/responses-vs-chat-completions).
Additionally, Agents let you specify input/output shields whereas Responses do not (though support is planned). Agents use a linear conversation model referenced by a single session ID. Responses, on the other hand, support branching, where each response can serve as a fork point, and conversations are tracked by the latest response ID. Responses also lets you dynamically choose the model, vector store, files, MCP servers, and more on each inference call, enabling more complex workflows. Agents require a static configuration for these components at the start of the session.
Today the Agents and Responses APIs can be used independently depending on the use case. But, it is also productive to treat the APIs as complementary. It is not currently supported, but it is planned for the LLS Agents API to alternatively use the Responses API as its backend instead of the default Chat Completions API, i.e., enabling a combination of the safety features of Agents with the dynamic configuration and branching capabilities of Responses.
## Feature Comparison
| Feature | LLS Agents API | OpenAI Responses API |
|---------|------------|---------------------|
| **Conversation Management** | Linear persistent sessions | Can branch from any previous response ID |
| **Input/Output Safety Shields** | Supported | Not yet supported |
| **Per-call Flexibility** | Static per-session configuration | Dynamic per-call configuration |
## Use Case Example: Research with Multiple Search Methods
Let's compare how both APIs handle a research task where we need to:
1. Search for current information and examples
2. Access different information sources dynamically
3. Continue the conversation based on search results
<Tabs>
<TabItem value="agents" label="Agents API">
### Session-based Configuration with Safety Shields
```python
# Create agent with static session configuration
agent = Agent(
client,
model="Llama3.2-3B-Instruct",
instructions="You are a helpful coding assistant",
tools=[
{
"name": "builtin::rag/knowledge_search",
"args": {"vector_db_ids": ["code_docs"]},
},
"builtin::code_interpreter",
],
input_shields=["llama_guard"],
output_shields=["llama_guard"],
)
session_id = agent.create_session("code_session")
# First turn: Search and execute
response1 = agent.create_turn(
messages=[
{
"role": "user",
"content": "Find examples of sorting algorithms and run a bubble sort on [3,1,4,1,5]",
},
],
session_id=session_id,
)
# Continue conversation in same session
response2 = agent.create_turn(
messages=[
{
"role": "user",
"content": "Now optimize that code and test it with a larger dataset",
},
],
session_id=session_id, # Same session, maintains full context
)
# Agents API benefits:
# ✅ Safety shields protect against malicious code execution
# ✅ Session maintains context between code executions
# ✅ Consistent tool configuration throughout conversation
print(f"First result: {response1.output_message.content}")
print(f"Optimization: {response2.output_message.content}")
```
</TabItem>
<TabItem value="responses" label="Responses API">
### Dynamic Per-call Configuration with Branching
```python
# First response: Use web search for latest algorithms
response1 = client.responses.create(
model="Llama3.2-3B-Instruct",
input="Search for the latest efficient sorting algorithms and their performance comparisons",
tools=[
{
"type": "web_search",
},
], # Web search for current information
)
# Continue conversation: Switch to file search for local docs
response2 = client.responses.create(
model="Llama3.2-1B-Instruct", # Switch to faster model
input="Now search my uploaded files for existing sorting implementations",
tools=[
{ # Using Responses API built-in tools
"type": "file_search",
"vector_store_ids": ["vs_abc123"], # Vector store containing uploaded files
},
],
previous_response_id=response1.id,
)
# Branch from first response: Try different search approach
response3 = client.responses.create(
model="Llama3.2-3B-Instruct",
input="Instead, search the web for Python-specific sorting best practices",
tools=[{"type": "web_search"}], # Different web search query
previous_response_id=response1.id, # Branch from response1
)
# Responses API benefits:
# ✅ Dynamic tool switching (web search ↔ file search per call)
# ✅ OpenAI-compatible tool patterns (web_search, file_search)
# ✅ Branch conversations to explore different information sources
# ✅ Model flexibility per search type
print(f"Web search results: {response1.output_message.content}")
print(f"File search results: {response2.output_message.content}")
print(f"Alternative web search: {response3.output_message.content}")
```
</TabItem>
</Tabs>
Both APIs demonstrate distinct strengths that make them valuable on their own for different scenarios. The Agents API excels in providing structured, safety-conscious workflows with persistent session management, while the Responses API offers flexibility through dynamic configuration and OpenAI compatible tool patterns.
## Use Case Examples
### 1. Research and Analysis with Safety Controls
**Best Choice: Agents API**
**Scenario:** You're building a research assistant for a financial institution that needs to analyze market data, execute code to process financial models, and search through internal compliance documents. The system must ensure all interactions are logged for regulatory compliance and protected by safety shields to prevent malicious code execution or data leaks.
**Why Agents API?** The Agents API provides persistent session management for iterative research workflows, built-in safety shields to protect against malicious code in financial models, and structured execution logs (session/turn/step) required for regulatory compliance. The static tool configuration ensures consistent access to your knowledge base and code interpreter throughout the entire research session.
### 2. Dynamic Information Gathering with Branching Exploration
**Best Choice: Responses API**
**Scenario:** You're building a competitive intelligence tool that helps businesses research market trends. Users need to dynamically switch between web search for current market data and file search through uploaded industry reports. They also want to branch conversations to explore different market segments simultaneously and experiment with different models for various analysis types.
**Why Responses API?** The Responses API's branching capability lets users explore multiple market segments from any research point. Dynamic per-call configuration allows switching between web search and file search as needed, while experimenting with different models (faster models for quick searches, more powerful models for deep analysis). The OpenAI-compatible tool patterns make integration straightforward.
### 3. OpenAI Migration with Advanced Tool Capabilities
**Best Choice: Responses API**
**Scenario:** You have an existing application built with OpenAI's Assistants API that uses file search and web search capabilities. You want to migrate to Llama Stack for better performance and cost control while maintaining the same tool calling patterns and adding new capabilities like dynamic vector store selection.
**Why Responses API?** The Responses API provides full OpenAI tool compatibility (`web_search`, `file_search`) with identical syntax, making migration seamless. The dynamic per-call configuration enables advanced features like switching vector stores per query or changing models based on query complexity - capabilities that extend beyond basic OpenAI functionality while maintaining compatibility.
### 4. Educational Programming Tutor
**Best Choice: Agents API**
**Scenario:** You're building a programming tutor that maintains student context across multiple sessions, safely executes code exercises, and tracks learning progress with audit trails for educators.
**Why Agents API?** Persistent sessions remember student progress across multiple interactions, safety shields prevent malicious code execution while allowing legitimate programming exercises, and structured execution logs help educators track learning patterns.
### 5. Advanced Software Debugging Assistant
**Best Choice: Agents API with Responses Backend**
**Scenario:** You're building a debugging assistant that helps developers troubleshoot complex issues. It needs to maintain context throughout a debugging session, safely execute diagnostic code, switch between different analysis tools dynamically, and branch conversations to explore multiple potential causes simultaneously.
**Why Agents + Responses?** The Agent provides safety shields for code execution and session management for the overall debugging workflow. The underlying Responses API enables dynamic model selection and flexible tool configuration per query, while branching lets you explore different theories (memory leak vs. concurrency issue) from the same debugging point and compare results.
:::info[Future Enhancement]
The ability to use Responses API as the backend for Agents is not yet implemented but is planned for a future release. Currently, Agents use Chat Completions API as their backend by default.
:::
## Decision Framework
Use this framework to choose the right API for your use case:
### Choose Agents API when:
- ✅ You need **safety shields** for input/output validation
- ✅ Your application requires **linear conversation flow** with persistent context
- ✅ You need **audit trails** and structured execution logs
- ✅ Your tool configuration is **static** throughout the session
- ✅ You're building **educational, financial, or enterprise** applications with compliance requirements
### Choose Responses API when:
- ✅ You need **conversation branching** to explore multiple paths
- ✅ You want **dynamic per-call configuration** (models, tools, vector stores)
- ✅ You're **migrating from OpenAI** and want familiar tool patterns
- ✅ You need **OpenAI compatibility** for existing workflows
- ✅ Your application benefits from **flexible, experimental** interactions
## Related Resources
- **[Agents](./agent)** - Understanding the Agents API fundamentals
- **[Agent Execution Loop](./agent_execution_loop)** - How agents process turns and steps
- **[Tools Integration](./tools)** - Adding capabilities to both APIs
- **[OpenAI Compatibility](../providers/openai)** - Using OpenAI-compatible endpoints
- **[Safety Guardrails](./safety)** - Implementing safety measures in agents

View file

@ -0,0 +1,394 @@
---
title: Safety Guardrails
description: Implement safety measures and content moderation in Llama Stack applications
sidebar_label: Safety
sidebar_position: 9
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Safety Guardrails
Safety is a critical component of any AI application. Llama Stack provides a comprehensive Shield system that can be applied at multiple touchpoints to ensure responsible AI behavior and content moderation.
## Shield System Overview
The Shield system in Llama Stack provides:
- **Content filtering** for both input and output messages
- **Multi-touchpoint protection** across your application flow
- **Configurable safety policies** tailored to your use case
- **Integration with agents** for automated safety enforcement
## Basic Shield Usage
### Registering a Safety Shield
<Tabs>
<TabItem value="registration" label="Shield Registration">
```python
# Register a safety shield
shield_id = "content_safety"
client.shields.register(
shield_id=shield_id,
provider_shield_id="llama-guard-basic"
)
```
</TabItem>
<TabItem value="manual-check" label="Manual Safety Check">
```python
# Run content through shield manually
response = client.safety.run_shield(
shield_id=shield_id,
messages=[{"role": "user", "content": "User message here"}]
)
if response.violation:
print(f"Safety violation detected: {response.violation.user_message}")
# Handle violation appropriately
else:
print("Content passed safety checks")
```
</TabItem>
</Tabs>
## Agent Integration
Shields can be automatically applied to agent interactions for seamless safety enforcement:
<Tabs>
<TabItem value="input-shields" label="Input Shields">
```python
from llama_stack_client import Agent
# Create agent with input safety shields
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a helpful assistant",
input_shields=["content_safety"], # Shield user inputs
tools=["builtin::websearch"],
)
session_id = agent.create_session("safe_session")
# All user inputs will be automatically screened
response = agent.create_turn(
messages=[{"role": "user", "content": "Tell me about AI safety"}],
session_id=session_id,
)
```
</TabItem>
<TabItem value="output-shields" label="Output Shields">
```python
# Create agent with output safety shields
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a helpful assistant",
output_shields=["content_safety"], # Shield agent outputs
tools=["builtin::websearch"],
)
session_id = agent.create_session("safe_session")
# All agent responses will be automatically screened
response = agent.create_turn(
messages=[{"role": "user", "content": "Help me with my research"}],
session_id=session_id,
)
```
</TabItem>
<TabItem value="both-shields" label="Input & Output Shields">
```python
# Create agent with comprehensive safety coverage
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a helpful assistant",
input_shields=["content_safety"], # Screen user inputs
output_shields=["content_safety"], # Screen agent outputs
tools=["builtin::websearch"],
)
session_id = agent.create_session("fully_protected_session")
# Both input and output are automatically protected
response = agent.create_turn(
messages=[{"role": "user", "content": "Research question here"}],
session_id=session_id,
)
```
</TabItem>
</Tabs>
## Available Shield Types
### Llama Guard Shields
Llama Guard provides state-of-the-art content safety classification:
<Tabs>
<TabItem value="basic" label="Basic Llama Guard">
```python
# Basic Llama Guard for general content safety
client.shields.register(
shield_id="llama_guard_basic",
provider_shield_id="llama-guard-basic"
)
```
**Use Cases:**
- General content moderation
- Harmful content detection
- Basic safety compliance
</TabItem>
<TabItem value="advanced" label="Advanced Llama Guard">
```python
# Advanced Llama Guard with custom categories
client.shields.register(
shield_id="llama_guard_advanced",
provider_shield_id="llama-guard-advanced",
config={
"categories": [
"violence", "hate_speech", "sexual_content",
"self_harm", "illegal_activity"
],
"threshold": 0.8
}
)
```
**Use Cases:**
- Fine-tuned safety policies
- Domain-specific content filtering
- Enterprise compliance requirements
</TabItem>
</Tabs>
### Custom Safety Shields
Create domain-specific safety shields for specialized use cases:
```python
# Register custom safety shield
client.shields.register(
shield_id="financial_compliance",
provider_shield_id="custom-financial-shield",
config={
"detect_pii": True,
"financial_advice_warning": True,
"regulatory_compliance": "FINRA"
}
)
```
## Safety Response Handling
When safety violations are detected, handle them appropriately:
<Tabs>
<TabItem value="basic-handling" label="Basic Handling">
```python
response = client.safety.run_shield(
shield_id="content_safety",
messages=[{"role": "user", "content": "Potentially harmful content"}]
)
if response.violation:
violation = response.violation
print(f"Violation Type: {violation.violation_type}")
print(f"User Message: {violation.user_message}")
print(f"Metadata: {violation.metadata}")
# Log the violation for audit purposes
logger.warning(f"Safety violation detected: {violation.violation_type}")
# Provide appropriate user feedback
return "I can't help with that request. Please try asking something else."
```
</TabItem>
<TabItem value="advanced-handling" label="Advanced Handling">
```python
def handle_safety_response(safety_response, user_message):
"""Advanced safety response handling with logging and user feedback"""
if not safety_response.violation:
return {"safe": True, "message": "Content passed safety checks"}
violation = safety_response.violation
# Log violation details
audit_log = {
"timestamp": datetime.now().isoformat(),
"violation_type": violation.violation_type,
"original_message": user_message,
"shield_response": violation.user_message,
"metadata": violation.metadata
}
logger.warning(f"Safety violation: {audit_log}")
# Determine appropriate response based on violation type
if violation.violation_type == "hate_speech":
user_feedback = "I can't engage with content that contains hate speech. Let's keep our conversation respectful."
elif violation.violation_type == "violence":
user_feedback = "I can't provide information that could promote violence. How else can I help you today?"
else:
user_feedback = "I can't help with that request. Please try asking something else."
return {
"safe": False,
"user_feedback": user_feedback,
"violation_details": audit_log
}
# Usage
safety_result = handle_safety_response(response, user_input)
if not safety_result["safe"]:
return safety_result["user_feedback"]
```
</TabItem>
</Tabs>
## Safety Configuration Best Practices
### 🛡️ **Multi-Layer Protection**
- Use both input and output shields for comprehensive coverage
- Combine multiple shield types for different threat categories
- Implement fallback mechanisms when shields fail
### 📊 **Monitoring & Auditing**
- Log all safety violations for compliance and analysis
- Monitor false positive rates to tune shield sensitivity
- Track safety metrics across different use cases
### ⚙️ **Configuration Management**
- Use environment-specific safety configurations
- Implement A/B testing for shield effectiveness
- Regularly update shield models and policies
### 🔧 **Integration Patterns**
- Integrate shields early in the development process
- Test safety measures with adversarial inputs
- Provide clear user feedback for violations
## Advanced Safety Scenarios
### Context-Aware Safety
```python
# Safety shields that consider conversation context
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a healthcare assistant",
input_shields=["medical_safety"],
output_shields=["medical_safety"],
# Context helps shields make better decisions
safety_context={
"domain": "healthcare",
"user_type": "patient",
"compliance_level": "HIPAA"
}
)
```
### Dynamic Shield Selection
```python
def select_shield_for_user(user_profile):
"""Select appropriate safety shield based on user context"""
if user_profile.age < 18:
return "child_safety_shield"
elif user_profile.context == "enterprise":
return "enterprise_compliance_shield"
else:
return "general_safety_shield"
# Use dynamic shield selection
shield_id = select_shield_for_user(current_user)
response = client.safety.run_shield(
shield_id=shield_id,
messages=messages
)
```
## Compliance and Regulations
### Industry-Specific Safety
<Tabs>
<TabItem value="healthcare" label="Healthcare (HIPAA)">
```python
# Healthcare-specific safety configuration
client.shields.register(
shield_id="hipaa_compliance",
provider_shield_id="healthcare-safety-shield",
config={
"detect_phi": True, # Protected Health Information
"medical_advice_warning": True,
"regulatory_framework": "HIPAA"
}
)
```
</TabItem>
<TabItem value="financial" label="Financial (FINRA)">
```python
# Financial services safety configuration
client.shields.register(
shield_id="finra_compliance",
provider_shield_id="financial-safety-shield",
config={
"detect_financial_advice": True,
"investment_disclaimers": True,
"regulatory_framework": "FINRA"
}
)
```
</TabItem>
<TabItem value="education" label="Education (COPPA)">
```python
# Educational platform safety for minors
client.shields.register(
shield_id="coppa_compliance",
provider_shield_id="educational-safety-shield",
config={
"child_protection": True,
"educational_content_only": True,
"regulatory_framework": "COPPA"
}
)
```
</TabItem>
</Tabs>
## Related Resources
- **[Agents](./agent)** - Integrating safety shields with intelligent agents
- **[Agent Execution Loop](./agent_execution_loop)** - Understanding safety in the execution flow
- **[Evaluations](./evals)** - Evaluating safety shield effectiveness
- **[Llama Guard Documentation](https://github.com/meta-llama/PurpleLlama/tree/main/Llama-Guard3)** - Advanced safety model details

View file

@ -0,0 +1,43 @@
---
title: Telemetry
description: Monitor and observe Llama Stack applications with comprehensive telemetry capabilities
sidebar_label: Telemetry
sidebar_position: 8
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Telemetry
The preferred way to instrument Llama Stack is with OpenTelemetry. Llama Stack enriches the data
collected by OpenTelemetry to capture helpful information about the performance and behavior of your
application. Here is an example of how to forward your telemetry to an OTLP collector from Llama Stack:
```sh
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME="llama-stack-server"
uv pip install opentelemetry-distro opentelemetry-exporter-otlp
uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement -
uv run opentelemetry-instrument llama stack run config.yaml
```
### Known issues
Some database instrumentation libraries have a known bug where spans get wrapped twice, or do not get connected to a trace.
To prevent this, you can disable database specific tracing, and rely just on the SQLAlchemy tracing. If you are using
`sqlite3` as your database, for example, you can disable the additional tracing like this:
```sh
export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3"
```
## Related Resources
- **[OpenTelemetry Documentation](https://opentelemetry.io/)** - Comprehensive observability framework
- **[Jaeger Documentation](https://www.jaegertracing.io/)** - Distributed tracing visualization

View file

@ -0,0 +1,333 @@
---
title: Tools
description: Extend agent capabilities with external tools and function calling
sidebar_label: Tools
sidebar_position: 6
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Tools
Tools are functions that can be invoked by an agent to perform tasks. They are organized into tool groups and registered with specific providers. Each tool group represents a collection of related tools from a single provider. They are organized into groups so that state can be externalized: the collection operates on the same state typically.
An example of this would be a "db_access" tool group that contains tools for interacting with a database. "list_tables", "query_table", "insert_row" could be examples of tools in this group.
Tools are treated as any other resource in llama stack like models. You can register them, have providers for them etc.
When instantiating an agent, you can provide it a list of tool groups that it has access to. Agent gets the corresponding tool definitions for the specified tool groups and passes them along to the model.
Refer to the [Building AI Applications](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) notebook for more examples on how to use tools.
## Server-side vs. Client-side Tool Execution
Llama Stack allows you to use both server-side and client-side tools. With server-side tools, `agent.create_turn` can perform execution of the tool calls emitted by the model transparently giving the user the final answer desired. If client-side tools are provided, the tool call is sent back to the user for execution and optional continuation using the `agent.resume_turn` method.
## Server-side Tools
Llama Stack provides built-in providers for some common tools. These include web search, math, and RAG capabilities.
### Web Search
You have three providers to execute the web search tool calls generated by a model: Brave Search, Bing Search, and Tavily Search.
To indicate that the web search tool calls should be executed by brave-search, you can point the "builtin::websearch" toolgroup to the "brave-search" provider.
```python
client.toolgroups.register(
toolgroup_id="builtin::websearch",
provider_id="brave-search",
args={"max_results": 5},
)
```
The tool requires an API key which can be provided either in the configuration or through the request header `X-LlamaStack-Provider-Data`. The format of the header is:
```
{"<provider_name>_api_key": <your api key>}
```
### Math
The WolframAlpha tool provides access to computational knowledge through the WolframAlpha API.
```python
client.toolgroups.register(
toolgroup_id="builtin::wolfram_alpha",
provider_id="wolfram-alpha"
)
```
Example usage:
```python
result = client.tool_runtime.invoke_tool(
tool_name="wolfram_alpha",
args={"query": "solve x^2 + 2x + 1 = 0"}
)
```
### RAG
The RAG tool enables retrieval of context from various types of memory banks (vector, key-value, keyword, and graph).
```python
# Register Memory tool group
client.toolgroups.register(
toolgroup_id="builtin::rag",
provider_id="faiss",
args={"max_chunks": 5, "max_tokens_in_context": 4096},
)
```
Features:
- Support for multiple memory bank types
- Configurable query generation
- Context retrieval with token limits
:::note[Default Configuration]
By default, llama stack config.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers.
:::
## Model Context Protocol (MCP)
[MCP](https://github.com/modelcontextprotocol) is an upcoming, popular standard for tool discovery and execution. It is a protocol that allows tools to be dynamically discovered from an MCP endpoint and can be used to extend the agent's capabilities.
### Using Remote MCP Servers
You can find some popular remote MCP servers [here](https://github.com/jaw9c/awesome-remote-mcp-servers). You can register them as toolgroups in the same way as local providers.
```python
client.toolgroups.register(
toolgroup_id="mcp::deepwiki",
provider_id="model-context-protocol",
mcp_endpoint=URL(uri="https://mcp.deepwiki.com/sse"),
)
```
Note that most of the more useful MCP servers need you to authenticate with them. Many of them use OAuth2.0 for authentication. You can provide the authorization token when creating the Agent:
```python
agent = Agent(
...,
tools=[
{
"type": "mcp",
"server_url": "https://mcp.deepwiki.com/sse",
"server_label": "mcp::deepwiki",
"authorization": "<your_access_token>", # OAuth token (without "Bearer " prefix)
}
],
)
agent.create_turn(...)
```
### Running Your Own MCP Server
Here's an example of how to run a simple MCP server that exposes a File System as a set of tools to the Llama Stack agent.
<Tabs>
<TabItem value="setup" label="Server Setup">
```shell
# Start your MCP server
mkdir /tmp/content
touch /tmp/content/foo
touch /tmp/content/bar
npx -y supergateway --port 8000 --stdio 'npx -y @modelcontextprotocol/server-filesystem /tmp/content'
```
</TabItem>
<TabItem value="register" label="Registration">
```python
# Register the MCP server as a tool group
client.toolgroups.register(
toolgroup_id="mcp::filesystem",
provider_id="model-context-protocol",
mcp_endpoint=URL(uri="http://localhost:8000/sse"),
)
```
</TabItem>
</Tabs>
## Adding Custom (Client-side) Tools
When you want to use tools other than the built-in tools, you just need to implement a python function with a docstring. The content of the docstring will be used to describe the tool and the parameters and passed along to the generative model.
```python
# Example tool definition
def my_tool(input: int) -> int:
"""
Runs my awesome tool.
:param input: some int parameter
"""
return input * 2
```
:::tip[Documentation Best Practices]
We employ python docstrings to describe the tool and the parameters. It is important to document the tool and the parameters so that the model can use the tool correctly. It is recommended to experiment with different docstrings to see how they affect the model's behavior.
:::
Once defined, simply pass the tool to the agent config. `Agent` will take care of the rest (calling the model with the tool definition, executing the tool, and returning the result to the model for the next iteration).
```python
# Example agent config with client provided tools
agent = Agent(client, ..., tools=[my_tool])
```
Refer to [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/) for an example of how to use client provided tools.
## Tool Invocation
Tools can be invoked using the `invoke_tool` method:
```python
result = client.tool_runtime.invoke_tool(
tool_name="web_search",
kwargs={"query": "What is the capital of France?"}
)
```
The result contains:
- `content`: The tool's output
- `error_message`: Optional error message if the tool failed
- `error_code`: Optional error code if the tool failed
## Listing Available Tools
You can list all available tools or filter by tool group:
```python
# List all tools
all_tools = client.tools.list_tools()
# List tools in a specific group
group_tools = client.tools.list_tools(toolgroup_id="search_tools")
```
## Complete Examples
### Web Search Agent
<Tabs>
<TabItem value="setup" label="Setup & Configuration">
1. Start by registering a Tavily API key at [Tavily](https://tavily.com/).
2. [Optional] Set the API key in your environment before starting the Llama Stack server
```bash
export TAVILY_SEARCH_API_KEY="your key"
```
</TabItem>
<TabItem value="implementation" label="Implementation">
```python
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(
base_url=f"http://localhost:8321",
provider_data={
"tavily_search_api_key": "your_TAVILY_SEARCH_API_KEY"
}, # Set this from the client side. No need to provide it if it has already been configured on the Llama Stack server.
)
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions=(
"You are a web search assistant, must use websearch tool to look up the most current and precise information available. "
),
tools=["builtin::websearch"],
)
session_id = agent.create_session("websearch-session")
response = agent.create_turn(
messages=[
{"role": "user", "content": "How did the USA perform in the last Olympics?"}
],
session_id=session_id,
)
for log in EventLogger().log(response):
log.print()
```
</TabItem>
</Tabs>
### WolframAlpha Math Agent
<Tabs>
<TabItem value="setup" label="Setup & Configuration">
1. Start by registering for a WolframAlpha API key at [WolframAlpha Developer Portal](https://developer.wolframalpha.com/access).
2. Provide the API key either by setting it in your environment before starting the Llama Stack server:
```bash
export WOLFRAM_ALPHA_API_KEY="your key"
```
or from the client side:
```python
client = LlamaStackClient(
base_url="http://localhost:8321",
provider_data={"wolfram_alpha_api_key": wolfram_api_key},
)
```
</TabItem>
<TabItem value="implementation" label="Implementation">
```python
# Configure the tools in the Agent by setting tools=["builtin::wolfram_alpha"]
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a mathematical assistant that can solve complex equations.",
tools=["builtin::wolfram_alpha"],
)
session_id = agent.create_session("math-session")
# Example user query
response = agent.create_turn(
messages=[{"role": "user", "content": "Solve x^2 + 2x + 1 = 0 using WolframAlpha"}],
session_id=session_id,
)
```
</TabItem>
</Tabs>
## Best Practices
### 🛠️ **Tool Selection**
- Use **server-side tools** for production applications requiring reliability and security
- Use **client-side tools** for development, prototyping, or specialized integrations
- Combine multiple tool types for comprehensive functionality
### 📝 **Documentation**
- Write clear, detailed docstrings for custom tools
- Include parameter descriptions and expected return types
- Test tool descriptions with the model to ensure proper usage
### 🔐 **Security**
- Store API keys securely using environment variables or secure configuration
- Use the `X-LlamaStack-Provider-Data` header for dynamic authentication
- Validate tool inputs and outputs for security
### 🔄 **Error Handling**
- Implement proper error handling in custom tools
- Use structured error responses with meaningful messages
- Monitor tool performance and reliability
## Related Resources
- **[Agents](./agent)** - Building intelligent agents with tools
- **[RAG (Retrieval Augmented Generation)](./rag)** - Using knowledge retrieval tools
- **[Agent Execution Loop](./agent_execution_loop)** - Understanding tool execution flow
- **[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)** - Comprehensive examples
- **[Llama Stack Apps Examples](https://github.com/meta-llama/llama-stack-apps)** - Real-world tool implementations

View file

@ -0,0 +1,105 @@
---
title: API Stability Leveling
description: Understanding API stability levels and versioning in Llama Stack
sidebar_label: API Stability
sidebar_position: 4
---
# Llama Stack API Stability Leveling
In order to provide a stable experience in Llama Stack, the various APIs need different stability levels indicating the level of support, backwards compatability, and overall production readiness.
## Different Levels
### v1alpha
- Little to no expectation of support between versions
- Breaking changes are permitted
- Datatypes and parameters can break
- Routes can be added and removed
#### Graduation Criteria
- an API can graduate from `v1alpha` to `v1beta` if the team has identified the extent of the non-optional routes and the shape of their parameters/return types for the API eg. `/v1/openai/chat/completions`. Optional types can change.
- CRUD must stay stable once in `v1beta`. This is a commitment to backward compatibility, guaranteeing that most code you write against the v1beta version will not break during future updates. We may make additive changes (like adding a new, optional field to a response), but we will not make breaking changes (like renaming an existing "modelName" field to "name", changing an ID's data type from an integer to a string, or altering an endpoint URL).
- for OpenAI APIs, a comparison to the OpenAI spec for the specific API can be done to ensure completeness.
### v1beta
- API routes remain consistent between versions
- Parameters and return types are not ensured between versions
- API, besides minor fixes and adjustments, should be _almost_ v1. Changes should not be drastic.
#### Graduation Criteria
- an API can graduate from `v1beta` to `v1` if the API surface and datatypes are complete as identified by the team. The parameters and return types that are mandatory for each route are stable. All aspects of graduating from `v1alpha1` to `v1beta` apply as well.
- Optional parameters, routes, or parts of the return type can be added after graduating to `v1`
### v1 (stable)
- Considered stable
- Backwards compatible between Z-streams
- Y-stream breaking changes must go through the proper approval and announcement process.
- Datatypes for a route and its return types cannot change between Z-streams
- Y-stream datatype changes should be sparing, unless the changes are additional net-new parameters
- Must have proper conformance testing as outlined in https://github.com/llamastack/llama-stack/issues/3237
### v2+ (Major Versions)
Introducing a new major version like `/v2` is a significant and disruptive event that should be treated as a last resort. It is reserved for essential changes to a stable `/v1` API that are fundamentally backward-incompatible and cannot be implemented through additive, non-breaking changes or breaking changes across X/Y-Stream releases (x.y.z).
If a `/v2` version is deemed absolutely necessary, it must adhere to the following protocol to ensure a sane and predictable transition for users:
#### Lifecycle Progression
A new major version must follow the same stability lifecycle as `/v1`. It will be introduced as `/v2alpha`, mature to `/v2beta`, and finally become stable as `/v2`.
#### Coexistence:
The new `/v2` API must be introduced alongside the existing `/v1` API and run in parallel. It must not replace the `/v1` API immediately.
#### Deprecation Policy:
When a `/v2` API is introduced, a clear and generous deprecation policy for the `/v1` API must be published simultaneously. This policy must outline the timeline for the eventual removal of the `/v1` API, giving users ample time to migrate.
### Deprecated APIs
Deprecated APIs are those that are no longer actively maintained or supported. Depreated APIs are marked with the flag `deprecated = True` in the OpenAPI spec. These APIs will be removed in a future release.
### API Stability vs. Provider Stability
The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.
Providers can iterate as much as they want on functionality as long as they work within the bounds of an API. If they need to change the API, then the API should not be `/v1`, or those breaking changes can only happen on a y-stream release basis.
### Approval and Announcement Process for Breaking Changes
- **PR Labeling**: Any pull request that introduces a breaking API change must be clearly labeled with `breaking-change`.
- **PR Title/Commit**: Any pull request that introduces a breaking API change must contain `BREAKING CHANGE` in the title and commit footer. Alternatively, the commit can include `!`, eg. `feat(api)!: title goes here` This is outlined in the [conventional commits documentation](https://www.conventionalcommits.org/en/v1.0.0/#specification)
- **Maintainer Review**: At least one maintainer must explicitly acknowledge the breaking change during review by applying the `breaking-change` label. An approval must come with this label or the acknowledgement this label has already been applied.
- **Announcement**: Breaking changes require inclusion in release notes and, if applicable, a separate communication (e.g., Discord, Github Issues, or GitHub Discussions) prior to release.
If a PR has proper approvals, labels, and commit/title hygiene, the failing API conformance tests will be bypassed.
## Enforcement
### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`
Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing, `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.
This migration is crucial as we get Llama Stack in the hands of users who intend to productize various APIs. A clear view of what is stable and what is actively being developed will enable users to pick and choose various APIs to build their products on.
This migration will be a breaking change for any API moving out of `/v1`. Ideally, this should happen before 0.3.0 and especially 1.0.0.
### `x-stability` tags in the OpenAPI spec for oasdiff
`x-stability` tags allow tools like oasdiff to enforce different rules for different stability levels; these tags should match the routes: [oasdiff stability](https://github.com/oasdiff/oasdiff/blob/main/docs/STABILITY.md)
### Testing
The testing of each stable API is already outlined in [issue #3237](https://github.com/llamastack/llama-stack/issues/3237) and is being worked on. These sorts of conformance tests should apply primarily to `/v1` APIs only, with `/v1alpha` and `/v1beta` having any tests the maintainers see fit as well as basic testing to ensure the routing works properly.
### New APIs going forward
Any subsequently introduced APIs should be introduced as `/v1alpha`

View file

@ -0,0 +1,19 @@
---
title: API Providers
description: Understanding remote vs inline provider implementations
sidebar_label: API Providers
sidebar_position: 2
---
# API Providers
The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include:
- LLM inference providers (e.g., Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.),
- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, etc.),
- Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.)
Providers come in two flavors:
- **Remote**: the provider runs as a separate service external to the Llama Stack codebase. Llama Stack contains a small amount of adapter code.
- **Inline**: the provider is fully specified and implemented within the Llama Stack codebase. It may be a simple wrapper around an existing library, or a full fledged implementation within Llama Stack.
Most importantly, Llama Stack always strives to provide at least one fully inline provider for each API so you can iterate on a fully featured environment locally.

View file

@ -0,0 +1,393 @@
---
title: External APIs
description: Understanding external APIs in Llama Stack
sidebar_label: External APIs
sidebar_position: 3
---
# External APIs
Llama Stack supports external APIs that live outside of the main codebase. This allows you to:
- Create and maintain your own APIs independently
- Share APIs with others without contributing to the main codebase
- Keep API-specific code separate from the core Llama Stack code
## Configuration
To enable external APIs, you need to configure the `external_apis_dir` in your Llama Stack configuration. This directory should contain your external API specifications:
```yaml
external_apis_dir: ~/.llama/apis.d/
```
## Directory Structure
The external APIs directory should follow this structure:
```
apis.d/
custom_api1.yaml
custom_api2.yaml
```
Each YAML file in these directories defines an API specification.
## API Specification
Here's an example of an external API specification for a weather API:
```yaml
module: weather
api_dependencies:
- inference
protocol: WeatherAPI
name: weather
pip_packages:
- llama-stack-api-weather
```
### API Specification Fields
- `module`: Python module containing the API implementation
- `protocol`: Name of the protocol class for the API
- `name`: Name of the API
- `pip_packages`: List of pip packages to install the API, typically a single package
## Required Implementation
External APIs must expose a `available_providers()` function in their module that returns a list of provider names:
```python
# llama_stack_api_weather/api.py
from llama_stack_api import Api, InlineProviderSpec, ProviderSpec
def available_providers() -> list[ProviderSpec]:
return [
InlineProviderSpec(
api=Api.weather,
provider_type="inline::darksky",
pip_packages=[],
module="llama_stack_provider_darksky",
config_class="llama_stack_provider_darksky.DarkSkyWeatherImplConfig",
),
]
```
A Protocol class like so:
```python
# llama_stack_api_weather/api.py
from typing import Protocol
from llama_stack_api import webmethod
class WeatherAPI(Protocol):
"""
A protocol for the Weather API.
"""
@webmethod(route="/locations", method="GET")
async def get_available_locations() -> dict[str, list[str]]:
"""
Get the available locations.
"""
...
```
## Example: Custom API
Here's a complete example of creating and using a custom API:
1. First, create the API package:
```bash
mkdir -p llama-stack-api-weather
cd llama-stack-api-weather
mkdir src/llama_stack_api_weather
git init
uv init
```
2. Edit `pyproject.toml`:
```toml
[project]
name = "llama-stack-api-weather"
version = "0.1.0"
description = "Weather API for Llama Stack"
readme = "README.md"
requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic"]
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["src"]
include = ["llama_stack_api_weather", "llama_stack_api_weather.*"]
```
3. Create the initial files:
```bash
touch src/llama_stack_api_weather/__init__.py
touch src/llama_stack_api_weather/api.py
```
```python
# llama-stack-api-weather/src/llama_stack_api_weather/__init__.py
"""Weather API for Llama Stack."""
from .api import WeatherAPI, available_providers
__all__ = ["WeatherAPI", "available_providers"]
```
4. Create the API implementation:
```python
# llama-stack-api-weather/src/llama_stack_api_weather/weather.py
from typing import Protocol
from llama_stack_api import (
Api,
ProviderSpec,
RemoteProviderSpec,
webmethod,
)
def available_providers() -> list[ProviderSpec]:
return [
RemoteProviderSpec(
api=Api.weather,
provider_type="remote::kaze",
config_class="llama_stack_provider_kaze.KazeProviderConfig",
adapter_type="kaze",
module="llama_stack_provider_kaze",
pip_packages=["llama_stack_provider_kaze"],
config_class="llama_stack_provider_kaze.KazeProviderConfig",
),
]
class WeatherProvider(Protocol):
"""
A protocol for the Weather API.
"""
@webmethod(route="/weather/locations", method="GET")
async def get_available_locations() -> dict[str, list[str]]:
"""
Get the available locations.
"""
...
```
5. Create the API specification:
```yaml
# ~/.llama/apis.d/weather.yaml
module: llama_stack_api_weather
name: weather
pip_packages: ["llama-stack-api-weather"]
protocol: WeatherProvider
```
6. Install the API package:
```bash
uv pip install -e .
```
7. Configure Llama Stack to use external APIs:
```yaml
version: "2"
image_name: "llama-stack-api-weather"
apis:
- weather
providers: {}
external_apis_dir: ~/.llama/apis.d
```
The API will now be available at `/v1/weather/locations`.
## Example: custom provider for the weather API
1. Create the provider package:
```bash
mkdir -p llama-stack-provider-kaze
cd llama-stack-provider-kaze
uv init
```
2. Edit `pyproject.toml`:
```toml
[project]
name = "llama-stack-provider-kaze"
version = "0.1.0"
description = "Kaze weather provider for Llama Stack"
readme = "README.md"
requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic", "aiohttp"]
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["src"]
include = ["llama_stack_provider_kaze", "llama_stack_provider_kaze.*"]
```
3. Create the initial files:
```bash
touch src/llama_stack_provider_kaze/__init__.py
touch src/llama_stack_provider_kaze/kaze.py
```
4. Create the provider implementation:
Initialization function:
```python
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/__init__.py
"""Kaze weather provider for Llama Stack."""
from .config import KazeProviderConfig
from .kaze import WeatherKazeAdapter
__all__ = ["KazeProviderConfig", "WeatherKazeAdapter"]
async def get_adapter_impl(config: KazeProviderConfig, _deps):
from .kaze import WeatherKazeAdapter
impl = WeatherKazeAdapter(config)
await impl.initialize()
return impl
```
Configuration:
```python
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/config.py
from pydantic import BaseModel, Field
class KazeProviderConfig(BaseModel):
"""Configuration for the Kaze weather provider."""
base_url: str = Field(
"https://api.kaze.io/v1",
description="Base URL for the Kaze weather API",
)
```
Main implementation:
```python
# llama-stack-provider-kaze/src/llama_stack_provider_kaze/kaze.py
from llama_stack_api_weather.api import WeatherProvider
from .config import KazeProviderConfig
class WeatherKazeAdapter(WeatherProvider):
"""Kaze weather provider implementation."""
def __init__(
self,
config: KazeProviderConfig,
) -> None:
self.config = config
async def initialize(self) -> None:
pass
async def get_available_locations(self) -> dict[str, list[str]]:
"""Get available weather locations."""
return {"locations": ["Paris", "Tokyo"]}
```
5. Create the provider specification:
```yaml
# ~/.llama/providers.d/remote/weather/kaze.yaml
adapter_type: kaze
pip_packages: ["llama_stack_provider_kaze"]
config_class: llama_stack_provider_kaze.config.KazeProviderConfig
module: llama_stack_provider_kaze
optional_api_dependencies: []
```
6. Install the provider package:
```bash
uv pip install -e .
```
7. Configure Llama Stack to use the provider:
```yaml
# ~/.llama/config.yaml
version: "2"
image_name: "llama-stack-api-weather"
apis:
- weather
providers:
weather:
- provider_id: kaze
provider_type: remote::kaze
config: {}
external_apis_dir: ~/.llama/apis.d
external_providers_dir: ~/.llama/providers.d
server:
port: 8321
```
8. Run the server:
```bash
llama stack run ~/.llama/config.yaml
```
9. Test the API:
```bash
curl -sSf http://127.0.0.1:8321/v1/weather/locations
{"locations":["Paris","Tokyo"]}%
```
## Best Practices
1. **Package Naming**: Use a clear and descriptive name for your API package.
2. **Version Management**: Keep your API package versioned and compatible with the Llama Stack version you're using.
3. **Dependencies**: Only include the minimum required dependencies in your API package.
4. **Documentation**: Include clear documentation in your API package about:
- Installation requirements
- Configuration options
- API endpoints and usage
- Any limitations or known issues
5. **Testing**: Include tests in your API package to ensure it works correctly with Llama Stack.
## Troubleshooting
If your external API isn't being loaded:
1. Check that the `external_apis_dir` path is correct and accessible.
2. Verify that the YAML files are properly formatted.
3. Ensure all required Python packages are installed.
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more information using `LLAMA_STACK_LOGGING=all=debug`.
5. Verify that the API package is installed in your Python environment.

View file

@ -0,0 +1,40 @@
---
title: APIs
description: Available REST APIs and planned capabilities in Llama Stack
sidebar_label: APIs
sidebar_position: 1
---
# APIs
A Llama Stack API is described as a collection of REST endpoints following OpenAI API standards. We currently support the following APIs:
- **Inference**: run inference with a LLM
- **Safety**: apply safety policies to the output at a Systems (not only model) level
- **Agents**: run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.
- **DatasetIO**: interface with datasets and data loaders
- **Scoring**: evaluate outputs of the system
- **Eval**: generate outputs (via Inference or Agents) and perform scoring
- **VectorIO**: perform operations on vector stores, such as adding documents, searching, and deleting documents
- **Files**: manage file uploads, storage, and retrieval
- **Post Training**: fine-tune a model
- **Tool Runtime**: interact with various tools and protocols
- **Responses**: generate responses from an LLM
We are working on adding a few more APIs to complete the application lifecycle. These will include:
- **Batch Inference**: run inference on a dataset of inputs
- **Batch Agents**: run agents on a dataset of inputs
- **Batches**: OpenAI-compatible batch management for inference
## OpenAI API Compatibility
We are working on adding OpenAI API compatibility to Llama Stack. This will allow you to use Llama Stack with OpenAI API clients and tools.
### File Operations and Vector Store Integration
The Files API and Vector Store APIs work together through file operations, enabling automatic document processing and search. This integration implements the [OpenAI Vector Store Files API specification](https://platform.openai.com/docs/api-reference/vector-stores-files) and allows you to:
- Upload documents through the Files API
- Automatically process and chunk documents into searchable vectors
- Store processed content in vector databases based on the availability of [our providers](../../providers/index.mdx)
- Search through documents using natural language queries
For detailed information about this integration, see [File Operations and Vector Store Integration](../file_operations_vector_stores.md).

View file

@ -0,0 +1,74 @@
---
title: Llama Stack Architecture
description: Understanding Llama Stack's service-oriented design and benefits
sidebar_label: Architecture
sidebar_position: 2
---
# Llama Stack architecture
Llama Stack allows you to build different layers of distributions for your AI workloads using various SDKs and API providers.
<img src="/img/llama-stack.png" alt="Llama Stack" width="400" />
## Benefits of Llama stack
### Current challenges in custom AI applications
Building production AI applications today requires solving multiple challenges:
**Infrastructure Complexity**
- Running large language models efficiently requires specialized infrastructure.
- Different deployment scenarios (local development, cloud, edge) need different solutions.
- Moving from development to production often requires significant rework.
**Essential Capabilities**
- Safety guardrails and content filtering are necessary in an enterprise setting.
- Just model inference is not enough - Knowledge retrieval and RAG capabilities are required.
- Nearly any application needs composable multi-step workflows.
- Without monitoring, observability and evaluation, you end up operating in the dark.
**Lack of Flexibility and Choice**
- Directly integrating with multiple providers creates tight coupling.
- Different providers have different APIs and abstractions.
- Changing providers requires significant code changes.
### Our Solution: A Universal Stack
Llama Stack addresses these challenges through a service-oriented, API-first approach:
**Develop Anywhere, Deploy Everywhere**
- Start locally with CPU-only setups
- Move to GPU acceleration when needed
- Deploy to cloud or edge without code changes
- Same APIs and developer experience everywhere
**Production-Ready Building Blocks**
- Pre-built safety guardrails and content filtering
- Built-in RAG and agent capabilities
- Comprehensive evaluation toolkit
- Full observability and monitoring
**True Provider Independence**
- Swap providers without application changes
- Mix and match best-in-class implementations
- Federation and fallback support
- No vendor lock-in
**Robust Ecosystem**
- Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies).
- Ecosystem offers tailored infrastructure, software, and services for deploying a variety of models.
## Our Philosophy
- **Service-Oriented**: REST APIs enforce clean interfaces and enable seamless transitions across different environments.
- **Composability**: Every component is independent but works together seamlessly
- **Production Ready**: Built for real-world applications, not just demos
- **Turnkey Solutions**: Easy to deploy built in solutions for popular deployment scenarios
With Llama Stack, you can focus on building your application while we handle the infrastructure complexity, essential capabilities, and provider integrations.

Some files were not shown because too many files have changed in this diff Show more