Compare commits

..

185 commits

Author SHA1 Message Date
dependabot[bot]
58e164b8bc
chore(github-deps): bump astral-sh/setup-uv from 6.4.3 to 6.5.0 (#3179)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 19s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Test Llama Stack Build / build-single-provider (push) Failing after 24s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 38s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
Python Package Build Test / build (3.12) (push) Failing after 38s
Pre-commit / pre-commit (push) Failing after 43s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s
Python Package Build Test / build (3.13) (push) Failing after 41s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Test Llama Stack Build / generate-matrix (push) Failing after 45s
UI Tests / ui-tests (22) (push) Failing after 42s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 46s
Update ReadTheDocs / update-readthedocs (push) Failing after 42s
Test Llama Stack Build / build (push) Has been skipped
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.4.3 to 6.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v6.5.0 🌈 Better error messages, bug fixes and copilot agent
settings</h2>
<h2>Changes</h2>
<p>This release brings better error messages in case the GitHub API is
impacted, fixes a few bugs and allows to disable <a
href="https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md">problem
matchers</a> for better use in Copilot Agent workspaces.</p>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Improve error messages on GitHub API errors <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/518">#518</a>)</li>
<li>Ignore backslashes and whitespace in requirements <a
href="https://github.com/axm2"><code>@​axm2</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/501">#501</a>)</li>
</ul>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add input add-problem-matchers <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/517">#517</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>chore: update known versions for 0.8.9 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/512">#512</a>)</li>
<li>chore: update known versions for 0.8.6-0.8.8 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/510">#510</a>)</li>
<li>chore: update known versions for 0.8.5 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/509">#509</a>)</li>
<li>chore: update known versions for 0.8.4 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/505">#505</a>)</li>
<li>chore: update known versions for 0.8.3 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/502">#502</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>add note on caching to read disable-cache-pruning <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/506">#506</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump actions/checkout from 4 to 5 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/514">#514</a>)</li>
<li>bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/516">#516</a>)</li>
<li>Bump biome to v2 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/515">#515</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d9e0f98d3f"><code>d9e0f98</code></a>
Improve error messages on GitHub API errors (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/518">#518</a>)</li>
<li><a
href="e5d42a2b46"><code>e5d42a2</code></a>
Add input add-problem-matchers (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/517">#517</a>)</li>
<li><a
href="d664c2a1d1"><code>d664c2a</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/514">#514</a>)</li>
<li><a
href="c35b8eac36"><code>c35b8ea</code></a>
bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/516">#516</a>)</li>
<li><a
href="4109b4033f"><code>4109b40</code></a>
Bump biome to v2 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/515">#515</a>)</li>
<li><a
href="1463845d3c"><code>1463845</code></a>
chore: update known versions for 0.8.9 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/512">#512</a>)</li>
<li><a
href="ad5ded2d63"><code>ad5ded2</code></a>
chore: update known versions for 0.8.6-0.8.8 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/510">#510</a>)</li>
<li><a
href="142240426d"><code>1422404</code></a>
chore: update known versions for 0.8.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/509">#509</a>)</li>
<li><a
href="632449003a"><code>6324490</code></a>
add note on caching to read disable-cache-pruning (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/506">#506</a>)</li>
<li><a
href="2a967c9b97"><code>2a967c9</code></a>
chore: update known versions for 0.8.4 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/505">#505</a>)</li>
<li>Additional commits viewable in <a
href="e92bafb625...d9e0f98d3f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.4.3&new-version=6.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:51:53 -07:00
dependabot[bot]
6a719716f2
chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 (#3178)
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2
to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
<li>Prepare release v4.3.0 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/motss"><code>@​motss</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li><a href="https://github.com/mouismail"><code>@​mouismail</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
<li>README: Suggest <code>user.email</code> to be
<code>41898282+github-actions[bot]@users.noreply.github.com</code> by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1707">actions/checkout#1707</a></li>
</ul>
<h2>v4.1.4</h2>
<ul>
<li>Disable <code>extensions.worktreeConfig</code> when disabling
<code>sparse-checkout</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1692">actions/checkout#1692</a></li>
<li>Add dependabot config by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1688">actions/checkout#1688</a></li>
<li>Bump the minor-actions-dependencies group with 2 updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1693">actions/checkout#1693</a></li>
<li>Bump word-wrap from 1.2.3 to 1.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1643">actions/checkout#1643</a></li>
</ul>
<h2>v4.1.3</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a
href="9f265659d3"><code>9f26565</code></a>
Update actions checkout to use node 24 (<a
href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li><a
href="08eba0b27e"><code>08eba0b</code></a>
Prepare release v4.3.0 (<a
href="https://redirect.github.com/actions/checkout/issues/2237">#2237</a>)</li>
<li><a
href="631c7dc4f8"><code>631c7dc</code></a>
Update package dependencies (<a
href="https://redirect.github.com/actions/checkout/issues/2236">#2236</a>)</li>
<li><a
href="8edcb1bdb4"><code>8edcb1b</code></a>
Update CODEOWNERS for actions (<a
href="https://redirect.github.com/actions/checkout/issues/2224">#2224</a>)</li>
<li><a
href="09d2acae67"><code>09d2aca</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/2194">#2194</a>)</li>
<li><a
href="85e6279cec"><code>85e6279</code></a>
Adjust positioning of user email note and permissions heading (<a
href="https://redirect.github.com/actions/checkout/issues/2044">#2044</a>)</li>
<li><a
href="009b9ae9e4"><code>009b9ae</code></a>
Documentation update - add recommended permissions to Readme (<a
href="https://redirect.github.com/actions/checkout/issues/2043">#2043</a>)</li>
<li><a
href="cbb722410c"><code>cbb7224</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1977">#1977</a>)</li>
<li><a
href="3b9b8c884f"><code>3b9b8c8</code></a>
docs: update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1971">#1971</a>)</li>
<li>See full diff in <a
href="11bd71901b...08c6903cd8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4.2.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:51:40 -07:00
dependabot[bot]
bd1a794add
chore(python-deps): bump llama-api-client from 0.1.2 to 0.2.0 (#3173)
Bumps [llama-api-client](https://github.com/meta-llama/llama-api-python)
from 0.1.2 to 0.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/meta-llama/llama-api-python/releases">llama-api-client's
releases</a>.</em></p>
<blockquote>
<h2>v0.2.0</h2>
<h2>0.2.0 (2025-08-07)</h2>
<p>Full Changelog: <a
href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">v0.1.2...v0.2.0</a></p>
<h3>Features</h3>
<ul>
<li>clean up environment call outs (<a
href="4afbd01ed7">4afbd01</a>)</li>
<li><strong>client:</strong> support file upload requests (<a
href="ec42e80b62">ec42e80</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>api:</strong> remove chat completion request model (<a
href="94c4e9fd50">94c4e9f</a>)</li>
<li><strong>client:</strong> don't send Content-Type header on GET
requests (<a
href="efec88aa51">efec88a</a>)</li>
<li><strong>parsing:</strong> correctly handle nested discriminated
unions (<a
href="b6276863be">b627686</a>)</li>
<li><strong>parsing:</strong> ignore empty metadata (<a
href="d6ee85101e">d6ee851</a>)</li>
<li><strong>parsing:</strong> parse extra field types (<a
href="f03ca22860">f03ca22</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>add examples (<a
href="abfa065721">abfa065</a>)</li>
<li><strong>internal:</strong> bump pinned h11 dep (<a
href="d40e1b1d73">d40e1b1</a>)</li>
<li><strong>internal:</strong> fix ruff target version (<a
href="c900ebc528">c900ebc</a>)</li>
<li><strong>package:</strong> mark python 3.13 as supported (<a
href="ef5bc36693">ef5bc36</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="e3103801d6">e310380</a>)</li>
<li><strong>readme:</strong> fix version rendering on pypi (<a
href="786f9fbdb7">786f9fb</a>)</li>
<li>sync repo (<a
href="7e697f6550">7e697f6</a>)</li>
<li>update SDK settings (<a
href="de22c0ece7">de22c0e</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>code of conduct (<a
href="efe1af28fb">efe1af2</a>)</li>
<li>readme and license (<a
href="d53eafd104">d53eafd</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/meta-llama/llama-api-python/blob/main/CHANGELOG.md">llama-api-client's
changelog</a>.</em></p>
<blockquote>
<h2>0.2.0 (2025-08-07)</h2>
<p>Full Changelog: <a
href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">v0.1.2...v0.2.0</a></p>
<h3>Features</h3>
<ul>
<li>clean up environment call outs (<a
href="4afbd01ed7">4afbd01</a>)</li>
<li><strong>client:</strong> support file upload requests (<a
href="ec42e80b62">ec42e80</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>api:</strong> remove chat completion request model (<a
href="94c4e9fd50">94c4e9f</a>)</li>
<li><strong>client:</strong> don't send Content-Type header on GET
requests (<a
href="efec88aa51">efec88a</a>)</li>
<li><strong>parsing:</strong> correctly handle nested discriminated
unions (<a
href="b6276863be">b627686</a>)</li>
<li><strong>parsing:</strong> ignore empty metadata (<a
href="d6ee85101e">d6ee851</a>)</li>
<li><strong>parsing:</strong> parse extra field types (<a
href="f03ca22860">f03ca22</a>)</li>
</ul>
<h3>Chores</h3>
<ul>
<li>add examples (<a
href="abfa065721">abfa065</a>)</li>
<li><strong>internal:</strong> bump pinned h11 dep (<a
href="d40e1b1d73">d40e1b1</a>)</li>
<li><strong>internal:</strong> fix ruff target version (<a
href="c900ebc528">c900ebc</a>)</li>
<li><strong>package:</strong> mark python 3.13 as supported (<a
href="ef5bc36693">ef5bc36</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="e3103801d6">e310380</a>)</li>
<li><strong>readme:</strong> fix version rendering on pypi (<a
href="786f9fbdb7">786f9fb</a>)</li>
<li>sync repo (<a
href="7e697f6550">7e697f6</a>)</li>
<li>update SDK settings (<a
href="de22c0ece7">de22c0e</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>code of conduct (<a
href="efe1af28fb">efe1af2</a>)</li>
<li>readme and license (<a
href="d53eafd104">d53eafd</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7a8c5838af"><code>7a8c583</code></a>
release: 0.2.0</li>
<li><a
href="4f1a04e5c1"><code>4f1a04e</code></a>
chore(internal): fix ruff target version</li>
<li><a
href="06485e995a"><code>06485e9</code></a>
feat(client): support file upload requests</li>
<li><a
href="131b474ad1"><code>131b474</code></a>
chore(project): add settings file for vscode</li>
<li><a
href="ef4cee6d8b"><code>ef4cee6</code></a>
fix(parsing): parse extra field types</li>
<li><a
href="fcbc699718"><code>fcbc699</code></a>
fix(parsing): ignore empty metadata</li>
<li><a
href="b6656cd0b8"><code>b6656cd</code></a>
fix(api): remove chat completion request model</li>
<li><a
href="0deda5590c"><code>0deda55</code></a>
feat: clean up environment call outs</li>
<li><a
href="ecf91026ac"><code>ecf9102</code></a>
fix(client): don't send Content-Type header on GET requests</li>
<li><a
href="0ac6285cbe"><code>0ac6285</code></a>
chore(readme): fix version rendering on pypi</li>
<li>Additional commits viewable in <a
href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=llama-api-client&package-manager=uv&previous-version=0.1.2&new-version=0.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:50:34 -07:00
dependabot[bot]
886af85e0c
chore(github-deps): bump amannn/action-semantic-pull-request from 5.5.3 to 6.1.0 (#3215)
Bumps
[amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request)
from 5.5.3 to 6.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/amannn/action-semantic-pull-request/releases">amannn/action-semantic-pull-request's
releases</a>.</em></p>
<blockquote>
<h2>v6.1.0</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a>
(2025-08-19)</h2>
<h3>Features</h3>
<ul>
<li>Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)
(<a
href="a30288bf13">a30288b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)
(<a
href="afa4edb1c4">afa4edb</a>)</li>
</ul>
<h2>v6.0.1</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a>
(2025-08-13)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)
(<a
href="58e4ab40f5">58e4ab4</a>)</li>
</ul>
<h2>v6.0.0</h2>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a>
(2025-08-13)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)
(<a
href="bc0c9a79ab">bc0c9a7</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md">amannn/action-semantic-pull-request's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a>
(2025-08-19)</h2>
<h3>Features</h3>
<ul>
<li>Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)
(<a
href="a30288bf13">a30288b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)
(<a
href="afa4edb1c4">afa4edb</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a>
(2025-08-13)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)
(<a
href="58e4ab40f5">58e4ab4</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a>
(2025-08-13)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)
(<a
href="bc0c9a79ab">bc0c9a7</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.2...v5.5.3">5.5.3</a>
(2024-06-28)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump <code>braces</code> dependency (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/269">#269</a>.
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="2d952a1bf9">2d952a1</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.1...v5.5.2">5.5.2</a>
(2024-04-24)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump tar from 6.1.11 to 6.2.1 (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/262">#262</a>
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="9a90d5a5ac">9a90d5a</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.0...v5.5.1">5.5.1</a>
(2024-04-24)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Bump ip from 2.0.0 to 2.0.1 (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/263">#263</a>
by <a href="https://github.com/EelcoLos"><code>@​EelcoLos</code></a>)
(<a
href="5e7e9acca3">5e7e9ac</a>)</li>
</ul>
<h2><a
href="https://github.com/amannn/action-semantic-pull-request/compare/v5.4.0...v5.5.0">5.5.0</a>
(2024-04-23)</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7f33ba7922"><code>7f33ba7</code></a>
chore: Release 6.1.0 [skip ci]</li>
<li><a
href="afa4edb1c4"><code>afa4edb</code></a>
fix: Remove trailing whitespace from &quot;unknown release type&quot;
error message (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)</li>
<li><a
href="a30288bf13"><code>a30288b</code></a>
feat: Support providing regexps for types (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)</li>
<li><a
href="a46a7c8dc4"><code>a46a7c8</code></a>
build: Move Vitest to <code>devDependencies</code> (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/290">#290</a>)</li>
<li><a
href="fdd4d3ddf6"><code>fdd4d3d</code></a>
chore: Release 6.0.1 [skip ci]</li>
<li><a
href="58e4ab40f5"><code>58e4ab4</code></a>
fix: Actually execute action (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)</li>
<li><a
href="04a8d177d9"><code>04a8d17</code></a>
chore: Release 6.0.0 [skip ci]</li>
<li><a
href="bc0c9a79ab"><code>bc0c9a7</code></a>
feat!: Upgrade action to use Node.js 24 and ESM (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li>
<li><a
href="631ffdc028"><code>631ffdc</code></a>
build(deps): bump the github-action-workflows group with 2 updates (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/286">#286</a>)</li>
<li><a
href="c1807ceb58"><code>c1807ce</code></a>
build: configure Dependabot (<a
href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/231">#231</a>)</li>
<li>Additional commits viewable in <a
href="0723387faa...7f33ba7922">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=amannn/action-semantic-pull-request&package-manager=github_actions&previous-version=5.5.3&new-version=6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:50:00 -07:00
dependabot[bot]
2fa189fe04
chore(github-deps): bump actions/setup-node from 4.1.0 to 4.4.0 (#3214)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
4.1.0 to 4.4.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v4.4.0</h2>
<h2>What's Changed</h2>
<h3>Bug fixes:</h3>
<ul>
<li>Make eslint-compact matcher compatible with Stylelint by <a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li>Add support for indented eslint output by <a
href="https://github.com/fregante"><code>@​fregante</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
</ul>
<h3>Enhancement:</h3>
<ul>
<li>Support private mirrors by <a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<h3>Dependency update:</h3>
<ul>
<li>Upgrade <code>@​action/cache</code> from 4.0.2 to 4.0.3 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1262">actions/setup-node#1262</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li><a href="https://github.com/fregante"><code>@​fregante</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
<li><a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.4.0">https://github.com/actions/setup-node/compare/v4...v4.4.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<h3>Dependency updates</h3>
<ul>
<li>Upgrade <code>@​actions/glob</code> from 0.4.0 to 0.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1200">actions/setup-node#1200</a></li>
<li>Upgrade <code>@​action/cache</code> from 4.0.0 to 4.0.2 by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li>
<li>Upgrade <code>@​vercel/ncc</code> from 0.38.1 to 0.38.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1203">actions/setup-node#1203</a></li>
<li>Upgrade <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1220">actions/setup-node#1220</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.3.0">https://github.com/actions/setup-node/compare/v4...v4.3.0</a></p>
<h2>v4.2.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Enhance workflows and upgrade publish-actions from 0.2.2 to 0.3.0 by
<a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1174">actions/setup-node#1174</a></li>
<li>Add recommended permissions section to readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li>
<li>Configure Dependabot settings by <a
href="https://github.com/HarithaVattikuti"><code>@​HarithaVattikuti</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1192">actions/setup-node#1192</a></li>
<li>Upgrade <code>@actions/cache</code> to <code>^4.0.0</code> by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1191">actions/setup-node#1191</a></li>
<li>Upgrade pnpm/action-setup from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1194">actions/setup-node#1194</a></li>
<li>Upgrade actions/publish-immutable-action from 0.0.3 to 0.0.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1195">actions/setup-node#1195</a></li>
<li>Upgrade semver from 7.6.0 to 7.6.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1196">actions/setup-node#1196</a></li>
<li>Upgrade <code>@​types/jest</code> from 29.5.12 to 29.5.14 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1201">actions/setup-node#1201</a></li>
<li>Upgrade undici from 5.28.4 to 5.28.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1205">actions/setup-node#1205</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.2.0">https://github.com/actions/setup-node/compare/v4...v4.2.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="49933ea528"><code>49933ea</code></a>
Bump <code>@​action/cache</code> from 4.0.2 to 4.0.3 (<a
href="https://redirect.github.com/actions/setup-node/issues/1262">#1262</a>)</li>
<li><a
href="e3ce749e20"><code>e3ce749</code></a>
feat: support private mirrors (<a
href="https://redirect.github.com/actions/setup-node/issues/1240">#1240</a>)</li>
<li><a
href="40337cb8f7"><code>40337cb</code></a>
Add support for indented eslint output (<a
href="https://redirect.github.com/actions/setup-node/issues/1245">#1245</a>)</li>
<li><a
href="1ccdddc9b8"><code>1ccdddc</code></a>
Make eslint-compact matcher compatible with Stylelint (<a
href="https://redirect.github.com/actions/setup-node/issues/98">#98</a>)</li>
<li><a
href="cdca7365b2"><code>cdca736</code></a>
Bump <code>@​actions/tool-cache</code> from 2.0.1 to 2.0.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1220">#1220</a>)</li>
<li><a
href="22c0e7494f"><code>22c0e74</code></a>
Bump <code>@​vercel/ncc</code> from 0.38.1 to 0.38.3 (<a
href="https://redirect.github.com/actions/setup-node/issues/1203">#1203</a>)</li>
<li><a
href="a7c2d9473e"><code>a7c2d94</code></a>
actions/cache upgrade (<a
href="https://redirect.github.com/actions/setup-node/issues/1251">#1251</a>)</li>
<li><a
href="802632921f"><code>8026329</code></a>
Bump <code>@​actions/glob</code> from 0.4.0 to 0.5.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1200">#1200</a>)</li>
<li><a
href="1d0ff469b7"><code>1d0ff46</code></a>
Bump undici from 5.28.4 to 5.28.5 (<a
href="https://redirect.github.com/actions/setup-node/issues/1205">#1205</a>)</li>
<li><a
href="574f09a9fa"><code>574f09a</code></a>
Bump <code>@​types/jest</code> from 29.5.12 to 29.5.14 (<a
href="https://redirect.github.com/actions/setup-node/issues/1201">#1201</a>)</li>
<li>Additional commits viewable in <a
href="39370e3970...49933ea528">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4.1.0&new-version=4.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:49:43 -07:00
dependabot[bot]
2cc0051ae5
chore(ui-deps): bump typescript from 5.8.3 to 5.9.2 in /llama_stack/ui (#3216)
Bumps [typescript](https://github.com/microsoft/TypeScript) from 5.8.3
to 5.9.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/microsoft/TypeScript/releases">typescript's
releases</a>.</em></p>
<blockquote>
<h2>TypeScript 5.9</h2>
<p>For release notes, check out the <a
href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9/">release
announcement</a></p>
<ul>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.0 (Beta)</a>.</li>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.1%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.1 (RC)</a>.</li>
<li><em>No specific changes for TypeScript 5.9.2 (Stable)</em></li>
</ul>
<p>Downloads are available on:</p>
<ul>
<li><a href="https://www.npmjs.com/package/typescript">npm</a></li>
</ul>
<h2>TypeScript 5.9 RC</h2>
<p>For release notes, check out the <a
href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9-rc/">release
announcement</a></p>
<ul>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.0 (Beta)</a>.</li>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.1%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.1 (RC)</a>.</li>
</ul>
<p>Downloads are available on:</p>
<ul>
<li><a href="https://www.npmjs.com/package/typescript">npm</a></li>
</ul>
<h2>TypeScript 5.9 Beta</h2>
<p>For release notes, check out the <a
href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9-beta/">release
announcement</a>.</p>
<ul>
<li><a
href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&amp;q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed
issues query for Typescript 5.9.0 (Beta)</a>.</li>
</ul>
<p>Downloads are available on:</p>
<ul>
<li><a href="https://www.npmjs.com/package/typescript">npm</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="be86783155"><code>be86783</code></a>
Give more specific errors for <code>verbatimModuleSyntax</code> (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62113">#62113</a>)</li>
<li><a
href="22ef57786f"><code>22ef577</code></a>
LEGO: Pull request from
lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250714...</li>
<li><a
href="d5a414cd1d"><code>d5a414c</code></a>
Don't use <code>noErrorTruncation</code> when printing types with
<code>maximumLength</code> set (#...</li>
<li><a
href="f14b5c8a2f"><code>f14b5c8</code></a>
Remove unused and confusing dom.iterable.d.ts file (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62037">#62037</a>)</li>
<li><a
href="2778e84ed8"><code>2778e84</code></a>
Restore AbortSignal.abort (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62086">#62086</a>)</li>
<li><a
href="65cb4bd2d5"><code>65cb4bd</code></a>
LEGO: Pull request from
lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250710...</li>
<li><a
href="9e20e032ef"><code>9e20e03</code></a>
Clear out checker-level stacks on pop (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/62016">#62016</a>)</li>
<li><a
href="87740bc7fe"><code>87740bc</code></a>
Fix for Issue 61081 (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/61221">#61221</a>)</li>
<li><a
href="833a8d492c"><code>833a8d4</code></a>
Fix Symbol completion priority and cursor positioning (<a
href="https://redirect.github.com/microsoft/TypeScript/issues/61945">#61945</a>)</li>
<li><a
href="0018c9ff12"><code>0018c9f</code></a>
LEGO: Pull request from
lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250702...</li>
<li>Additional commits viewable in <a
href="https://github.com/microsoft/TypeScript/compare/v5.8.3...v5.9.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=typescript&package-manager=npm_and_yarn&previous-version=5.8.3&new-version=5.9.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:49:28 -07:00
dependabot[bot]
bf3b201d61
chore(python-deps): bump chromadb from 1.0.16 to 1.0.20 (#3217)
Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.16 to
1.0.20.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/chroma-core/chroma/releases">chromadb's
releases</a>.</em></p>
<blockquote>
<h2>1.0.20</h2>
<p>Version: <code>1.0.20</code>
Git ref: <code>refs/tags/1.0.20</code>
Build Date: <code>2025-08-18T17:04</code>
PIP Package: <code>chroma-1.0.20.tar.gz</code>
Github Container Registry Image: <code>:1.0.20</code>
DockerHub Image: <code>:1.0.20</code></p>
<h2>What's Changed</h2>
<ul>
<li>[RELEASE] 1.0.20 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5303">chroma-core/chroma#5303</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/chroma-core/chroma/compare/1.0.19...1.0.20">https://github.com/chroma-core/chroma/compare/1.0.19...1.0.20</a></p>
<h2>1.0.18</h2>
<p>Version: <code>1.0.18</code>
Git ref: <code>refs/tags/1.0.18</code>
Build Date: <code>2025-08-18T08:09</code>
PIP Package: <code>chroma-1.0.18.tar.gz</code>
Github Container Registry Image: <code>:1.0.18</code>
DockerHub Image: <code>:1.0.18</code></p>
<h2>What's Changed</h2>
<ul>
<li>[CHORE]: Added short descriptions to CLI commands by <a
href="https://github.com/tazarov"><code>@​tazarov</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5217">chroma-core/chroma#5217</a></li>
<li>[ENH] Use AVX in distance calculations by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5258">chroma-core/chroma#5258</a></li>
<li>[ENH] Auto-set tenant, scoped database in python CloudClient by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5026">chroma-core/chroma#5026</a></li>
<li>[PERF]: Modify get_range to return an iterator by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5256">chroma-core/chroma#5256</a></li>
<li>[BUG] Mark dirty on rollback of cursor to guarantee compaction picks
it up. by <a href="https://github.com/rescrv"><code>@​rescrv</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5265">chroma-core/chroma#5265</a></li>
<li>[ENH]: add metric for component queue depth &amp; change dispatcher
queue depth metric buckets by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5261">chroma-core/chroma#5261</a></li>
<li>[ENH]: add garbage collection CLI for manual garbage collection by
<a href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5250">chroma-core/chroma#5250</a></li>
<li>[DOC] Clean up DEVELOP.md by <a
href="https://github.com/kylediaz"><code>@​kylediaz</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5270">chroma-core/chroma#5270</a></li>
<li>[ENH]: Further optimize query on getCollections when databases pkey
is fully specified by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5268">chroma-core/chroma#5268</a></li>
<li>[ENH] Update Rust to allow build with AVX when flag is set by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5269">chroma-core/chroma#5269</a></li>
<li>[ENH]: Fix test_add flake by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5272">chroma-core/chroma#5272</a></li>
<li>[BUG]: Revert &quot;[ENH]: Further optimize query on getCollections
when databases pkey is fully specified (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5268">#5268</a>)&quot;
by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5273">chroma-core/chroma#5273</a></li>
<li>[BLD] Add maturin to dev dependencies by <a
href="https://github.com/kylediaz"><code>@​kylediaz</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5271">chroma-core/chroma#5271</a></li>
<li>[ENH]: Optimize GetCollections and remove usage of raw gorm by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5274">chroma-core/chroma#5274</a></li>
<li>[ENH]: add config param to garbage collector to control how many
collections are fetched from SysDb by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5275">chroma-core/chroma#5275</a></li>
<li>[ENH] Reject version files without paths. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5267">chroma-core/chroma#5267</a></li>
<li>[ENH] Enable getting a collection by CRN by <a
href="https://github.com/drewkim"><code>@​drewkim</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5244">chroma-core/chroma#5244</a></li>
<li>[BUG] CompactionError did not proxy should_trace_error by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5282">chroma-core/chroma#5282</a></li>
<li>[BUG] Resolve deadlock in system crate? by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5283">chroma-core/chroma#5283</a></li>
<li>[ENH] Complete the NAC metrics for the write half. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5278">chroma-core/chroma#5278</a></li>
<li>[BUG]: fix missing node in constructed version graph for garbage
collection by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5284">chroma-core/chroma#5284</a></li>
<li>[BUG] Fix test flake from 5283. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5287">chroma-core/chroma#5287</a></li>
<li>[BUG]: Don't GC hnsw if it is empty by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5295">chroma-core/chroma#5295</a></li>
<li>[ENH] Sync before flushing by <a
href="https://github.com/HammadB"><code>@​HammadB</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5296">chroma-core/chroma#5296</a></li>
<li>[DOC] update quota limits by <a
href="https://github.com/philipithomas"><code>@​philipithomas</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5297">chroma-core/chroma#5297</a></li>
<li>[BUG] Fix CLI copy offset by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5288">chroma-core/chroma#5288</a></li>
<li>[ENH] Add support for default space in create coll config by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5293">chroma-core/chroma#5293</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b6b059dfd7"><code>b6b059d</code></a>
[RELEASE] 1.0.20 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5303">#5303</a>)</li>
<li><a
href="1993cd4a51"><code>1993cd4</code></a>
[RELEASE] CLI 1.1.8, Python 1.0.19, JS 3.0.14 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5302">#5302</a>)</li>
<li><a
href="19600af279"><code>19600af</code></a>
[BUG] Fix CLI copy arg number types (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5301">#5301</a>)</li>
<li><a
href="d3602cd776"><code>d3602cd</code></a>
[CHORE] Update JS binding deps in the client (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5300">#5300</a>)</li>
<li><a
href="2570b471ed"><code>2570b47</code></a>
[RELEASE] CLI 1.1.7, Python 1.0.18, JS 3.0.13 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5299">#5299</a>)</li>
<li><a
href="51a7d1625b"><code>51a7d16</code></a>
[ENH] Add support for default space in create coll config (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5293">#5293</a>)</li>
<li><a
href="163133aacc"><code>163133a</code></a>
[BUG] Fix CLI copy offset (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5288">#5288</a>)</li>
<li><a
href="2f06586503"><code>2f06586</code></a>
[DOC] update quota limits (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5297">#5297</a>)</li>
<li><a
href="983728076d"><code>9837280</code></a>
[ENH] Sync before flushing (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5296">#5296</a>)</li>
<li><a
href="649e14c530"><code>649e14c</code></a>
[BUG]: Don't GC hnsw if it is empty (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5295">#5295</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/chroma-core/chroma/compare/1.0.16...1.0.20">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=chromadb&package-manager=uv&previous-version=1.0.16&new-version=1.0.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:49:11 -07:00
dependabot[bot]
620212e920
chore(ui-deps): bump @radix-ui/react-collapsible from 1.1.11 to 1.1.12 in /llama_stack/ui (#3218)
Bumps
[@radix-ui/react-collapsible](https://github.com/radix-ui/primitives)
from 1.1.11 to 1.1.12.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-collapsible&package-manager=npm_and_yarn&previous-version=1.1.11&new-version=1.1.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:53 -07:00
dependabot[bot]
65d09c442d
chore(ui-deps): bump eslint-config-prettier from 10.1.5 to 10.1.8 in /llama_stack/ui (#3220)
Bumps
[eslint-config-prettier](https://github.com/prettier/eslint-config-prettier)
from 10.1.5 to 10.1.8.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/eslint-config-prettier/releases">eslint-config-prettier's
releases</a>.</em></p>
<blockquote>
<h2>v10.1.8</h2>
<p>republish latest version</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8">https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/prettier/eslint-config-prettier/blob/main/CHANGELOG.md">eslint-config-prettier's
changelog</a>.</em></p>
<blockquote>
<h1>eslint-config-prettier</h1>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9b0b0a47ec"><code>9b0b0a4</code></a>
fix: release a new latest version</li>
<li>See full diff in <a
href="https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-config-prettier&package-manager=npm_and_yarn&previous-version=10.1.5&new-version=10.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:35 -07:00
dependabot[bot]
90b7c2317e
chore(ui-deps): bump @radix-ui/react-separator from 1.1.6 to 1.1.7 in /llama_stack/ui (#3222)
Bumps
[@radix-ui/react-separator](https://github.com/radix-ui/primitives) from
1.1.6 to 1.1.7.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/radix-ui/primitives/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-separator&package-manager=npm_and_yarn&previous-version=1.1.6&new-version=1.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:20 -07:00
dependabot[bot]
0473a32619
chore(ui-deps): bump tailwind-merge from 3.3.0 to 3.3.1 in /llama_stack/ui (#3223)
Bumps [tailwind-merge](https://github.com/dcastil/tailwind-merge) from
3.3.0 to 3.3.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dcastil/tailwind-merge/releases">tailwind-merge's
releases</a>.</em></p>
<blockquote>
<h2>v3.3.1</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Fix arbitrary value using <code>color-mix()</code> not being
detected as color by <a
href="https://github.com/dcastil"><code>@​dcastil</code></a> in <a
href="https://redirect.github.com/dcastil/tailwind-merge/pull/591">dcastil/tailwind-merge#591</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1">https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1</a></p>
<p>Thanks to <a
href="https://github.com/brandonmcconnell"><code>@​brandonmcconnell</code></a>,
<a href="https://github.com/manavm1990"><code>@​manavm1990</code></a>,
<a href="https://github.com/langy"><code>@​langy</code></a>, <a
href="https://github.com/roboflow"><code>@​roboflow</code></a>, <a
href="https://github.com/syntaxfm"><code>@​syntaxfm</code></a>, <a
href="https://github.com/getsentry"><code>@​getsentry</code></a>, <a
href="https://github.com/codecov"><code>@​codecov</code></a>, <a
href="https://github.com/sourcegraph"><code>@​sourcegraph</code></a>, a
private sponsor, <a
href="https://github.com/block"><code>@​block</code></a> and <a
href="https://github.com/shawt3000"><code>@​shawt3000</code></a> for
sponsoring tailwind-merge! ❤️</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40d8feed6a"><code>40d8fee</code></a>
v3.3.1</li>
<li><a
href="429ea54ac8"><code>429ea54</code></a>
add changelog for v3.3.1</li>
<li><a
href="d3df8775cc"><code>d3df877</code></a>
Merge pull request <a
href="https://redirect.github.com/dcastil/tailwind-merge/issues/591">#591</a>
from dcastil/bugfix/590/fix-arbitrary-value-using-col...</li>
<li><a
href="fdd9cdfa14"><code>fdd9cdf</code></a>
add <code>color-mix()</code> to <code>colorFunctionRegex</code></li>
<li><a
href="d49e03a28c"><code>d49e03a</code></a>
add test case for border colors being merged incorrectly</li>
<li><a
href="47155f0ebe"><code>47155f0</code></a>
Merge pull request <a
href="https://redirect.github.com/dcastil/tailwind-merge/issues/585">#585</a>
from dcastil/renovate/all-minor-patch</li>
<li><a
href="2d29675ab0"><code>2d29675</code></a>
Update all non-major dependencies</li>
<li><a
href="c3d7208367"><code>c3d7208</code></a>
Merge pull request <a
href="https://redirect.github.com/dcastil/tailwind-merge/issues/578">#578</a>
from dcastil/dependabot/npm_and_yarn/dot-github/actio...</li>
<li><a
href="527214bf13"><code>527214b</code></a>
Bump undici from 5.28.5 to 5.29.0 in
/.github/actions/metrics-report</li>
<li>See full diff in <a
href="https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tailwind-merge&package-manager=npm_and_yarn&previous-version=3.3.0&new-version=3.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:48:05 -07:00
dependabot[bot]
09bee51d6b
chore(python-deps): bump locust from 2.38.0 to 2.39.0 (#3221)
Bumps [locust](https://github.com/locustio/locust) from 2.38.0 to
2.39.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/releases">locust's
releases</a>.</em></p>
<blockquote>
<h2>2.39.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add MilvusUser and example by <a
href="https://github.com/zhuwenxing"><code>@​zhuwenxing</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3168">locustio/locust#3168</a></li>
<li>Add SocketIOUser by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3189">locustio/locust#3189</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/zhuwenxing"><code>@​zhuwenxing</code></a> made
their first contribution in <a
href="https://redirect.github.com/locustio/locust/pull/3168">locustio/locust#3168</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.38.1...2.39.0">https://github.com/locustio/locust/compare/2.38.1...2.39.0</a></p>
<h2>2.38.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix test flakyness and update error message by <a
href="https://github.com/amadeuppereira"><code>@​amadeuppereira</code></a>
in <a
href="https://redirect.github.com/locustio/locust/pull/3187">locustio/locust#3187</a></li>
<li>FastHttpUser: Dont send zstd in Accept-Encoding header by <a
href="https://github.com/cyberw"><code>@​cyberw</code></a> in <a
href="https://redirect.github.com/locustio/locust/pull/3188">locustio/locust#3188</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/locustio/locust/compare/2.38.0...2.38.1">https://github.com/locustio/locust/compare/2.38.0...2.38.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/locustio/locust/blob/master/CHANGELOG.md">locust's
changelog</a>.</em></p>
<blockquote>
<h1>Detailed changelog</h1>
<p>The most important changes can also be found in <a
href="https://docs.locust.io/en/latest/changelog.html">the
documentation</a>.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1810fef1ae"><code>1810fef</code></a>
Tiny doc fixes</li>
<li><a
href="48b4dfce8f"><code>48b4dfc</code></a>
Link SocketIOUser from main docs.</li>
<li><a
href="6e4fd7f067"><code>6e4fd7f</code></a>
Merge pull request <a
href="https://redirect.github.com/locustio/locust/issues/3189">#3189</a>
from locustio/Add-SocketioUser</li>
<li><a
href="95eca45476"><code>95eca45</code></a>
better documentation of on_message</li>
<li><a
href="a56ef663af"><code>a56ef66</code></a>
SocketIOUser docs: Link to example on GH</li>
<li><a
href="adaa71b5f9"><code>adaa71b</code></a>
SocketIOUser, add method docstrings and link to python-socketio's
readthedocs</li>
<li><a
href="9fb3ff0f89"><code>9fb3ff0</code></a>
Add testcase for SocketIOUser</li>
<li><a
href="7047247f9d"><code>7047247</code></a>
SocketIOUser: Fix use of environment object. Remove SocketIOClient.</li>
<li><a
href="f8ddc9c798"><code>f8ddc9c</code></a>
rename socketio echo_server</li>
<li><a
href="ae28acf027"><code>ae28acf</code></a>
add contrib dependencies to docs build</li>
<li>Additional commits viewable in <a
href="https://github.com/locustio/locust/compare/2.38.0...2.39.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=locust&package-manager=uv&previous-version=2.38.0&new-version=2.39.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:47:46 -07:00
dependabot[bot]
eff97f122b
chore(python-deps): bump weaviate-client from 4.16.5 to 4.16.9 (#3219)
Bumps
[weaviate-client](https://github.com/weaviate/weaviate-python-client)
from 4.16.5 to 4.16.9.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's
releases</a>.</em></p>
<blockquote>
<h2>v4.16.9</h2>
<h2>What's Changed</h2>
<ul>
<li>Deprecate broken method by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1795">weaviate/weaviate-python-client#1795</a></li>
<li>Improve user create docstring by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1796">weaviate/weaviate-python-client#1796</a></li>
<li>Fixup dependencies for package test by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1791">weaviate/weaviate-python-client#1791</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.8...v4.16.9">https://github.com/weaviate/weaviate-python-client/compare/v4.16.8...v4.16.9</a></p>
<h2>v4.16.8</h2>
<h2>What's Changed</h2>
<ul>
<li>Add backup list endpoint by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1785">weaviate/weaviate-python-client#1785</a></li>
<li>Attempt further fix of protobuf runtime stub incompatibilities by <a
href="https://github.com/tsmith023"><code>@​tsmith023</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1788">weaviate/weaviate-python-client#1788</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.7...v4.16.8">https://github.com/weaviate/weaviate-python-client/compare/v4.16.7...v4.16.8</a></p>
<h2>v4.16.6</h2>
<h2>What's Changed</h2>
<ul>
<li>rq: Add bits to the update method by <a
href="https://github.com/rlmanrique"><code>@​rlmanrique</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1766">weaviate/weaviate-python-client#1766</a></li>
<li>Deprecate contextionar, add model2vec and dimension parameter for
transformers by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1773">weaviate/weaviate-python-client#1773</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.6">https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.6</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's
changelog</a>.</em></p>
<blockquote>
<h2>Version 4.16.9</h2>
<p>This patch version includes:
- Explicitly depend on protobuf package</p>
<h2>Version 4.16.8</h2>
<p>This patch version includes:
- Further attempted fixes for <code>protobuf</code> compatability issues
- Introduction of the <code>backups.list()</code> method</p>
<h2>Version 4.16.7</h2>
<p>This patch version includes:
- Fixes compatability issues between the built gRPC stubs and differing
protobuf versions depending on the version of <code>grpcio</code> used
to build the stubs
- Add <code>text2vec-model2vec</code> module to
<code>Configure.NamedVectors</code>
- Deprecated <code>min_occurrences</code> in <code>Metrics.text</code>
in favour of <code>limit</code></p>
<h2>Version 4.16.6</h2>
<p>This patch version includes:
- Add <code>dimensions</code> property to
<code>text2vec-transformers</code> vectorizers in
<code>Configure.Vectors</code>
- Add <code>text2vec-model2vec</code> vectorizer in
<code>Configure.Vectors</code>
- Deprecate <code>text2vec-contextionary</code> vectorizer</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c69cfa124e"><code>c69cfa1</code></a>
Fixup dependencies for package test (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1791">#1791</a>)</li>
<li><a
href="334380b6d4"><code>334380b</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1796">#1796</a>
from weaviate/docstring_user_create</li>
<li><a
href="c7b8c75893"><code>c7b8c75</code></a>
Improve user create docstring</li>
<li><a
href="93c865a23e"><code>93c865a</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1795">#1795</a>
from weaviate/deprecate_broken_method</li>
<li><a
href="ba05f5f1ad"><code>ba05f5f</code></a>
Deprecate broken method</li>
<li><a
href="4bef4b8210"><code>4bef4b8</code></a>
Update changelog (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1789">#1789</a>)</li>
<li><a
href="c370bf5fa2"><code>c370bf5</code></a>
Attempt further fix of protobuf runtime stub incompatibilities (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1788">#1788</a>)</li>
<li><a
href="98db3b1187"><code>98db3b1</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1785">#1785</a>
from weaviate/add_list_response</li>
<li><a
href="ebf2b30252"><code>ebf2b30</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1782">#1782</a>
from weaviate/dependabot/pip/ruff-0.12.8</li>
<li><a
href="88ad1c113b"><code>88ad1c1</code></a>
Fix version in CI</li>
<li>Additional commits viewable in <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.5&new-version=4.16.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-20 16:47:33 -07:00
Ashwin Bharambe
f328ff6e98 fix(ci): dependabot update had a bug 2025-08-20 16:34:50 -07:00
Francisco Arceo
49060c3020
chore: Update dependabot to capture package-lock.json (#3212)
# What does this PR do?
This should fix dependabot based on this thread:
https://stackoverflow.com/questions/60201543/dependabot-only-updates-lock-file


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-20 15:05:12 -07:00
grs
14082b22af
fix: handle mcp tool calls in previous response correctly (#3155)
# What does this PR do?

Handles MCP tool calls in a previous response

Closes #3105

## Test Plan
Made call to create response with tool call, then made second call with
the first linked through previous_response_id. Did not get error.

Also added unit test.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-08-20 14:12:15 -07:00
Omer Tuchfeld
00a67da449
fix: Use pool_pre_ping=True in SQLAlchemy engine creation (#3208)
# What does this PR do?

We noticed that when llama-stack is running for a long time, we would
run into database errors when trying to run messages through the agent
(which we configured to persist against postgres), seemingly due to the
database connections being stale or disconnected. This commit adds
`pool_pre_ping=True` to the SQLAlchemy engine creation to help mitigate
this issue by checking the connection before using it, and
re-establishing it if necessary.

More information in:


https://docs.sqlalchemy.org/en/20/core/pooling.html#dealing-with-disconnects

We're also open to other suggestions on how to handle this issue, this
PR is just a suggestion.

## Test Plan

We have not tested it yet (we're in the process of doing that) and we're
hoping it's going to resolve our issue.
2025-08-20 13:52:05 -07:00
Francisco Arceo
e195ee3091
fix: Fix broken package-lock.json (#3209)
# What does this PR do?
Fix broken `package-lock.json` not caught by [github bot in this
commit](7f0b2a8764).

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-20 13:11:44 -07:00
Matthew Farrellee
c2c859a6b0
chore(files tests): update files integration tests and fix inline::localfs (#3195)
- update files=inline::localfs to raise ResourceNotFoundError instead of
ValueError
- only skip tests when no files provider is available
- directly use openai_client and llama_stack_client where appropriate
- check for correct behavior of non-existent file
- xfail the isolation test, no implementation supports it

test plan -

```
$ uv run ./scripts/integration-tests.sh --stack-config server:ci-tests --provider ollama --test-subdirs files
...

tests/integration/files/test_files.py::test_openai_client_basic_operations PASSED               [ 25%]
tests/integration/files/test_files.py::test_files_authentication_isolation XFAIL                [ 50%]
tests/integration/files/test_files.py::test_files_authentication_shared_attributes PASSED       [ 75%]
tests/integration/files/test_files.py::test_files_authentication_anonymous_access PASSED        [100%]

==================================== 3 passed, 1 xfailed in 1.03s =====================================
```

previously -

```
$ uv run llama stack build --image-type venv --providers files=inline::localfs --run &
...
$ ./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files
...

tests/integration/files/test_files.py::test_openai_client_basic_operations[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 12%]
tests/integration/files/test_files.py::test_files_authentication_isolation[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 25%]
tests/integration/files/test_files.py::test_files_authentication_shared_attributes[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 37%]
tests/integration/files/test_files.py::test_files_authentication_anonymous_access[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 50%]
tests/integration/files/test_files.py::test_openai_client_basic_operations[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 62%]
tests/integration/files/test_files.py::test_files_authentication_isolation[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 75%]
tests/integration/files/test_files.py::test_files_authentication_shared_attributes[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 87%]
tests/integration/files/test_files.py::test_files_authentication_anonymous_access[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [100%]

========================================================= 2 passed, 6 skipped in 1.31s ==========================================================
```
2025-08-20 14:22:40 -04:00
Jiayi Ni
55e9959f62
fix: fix ``openai_embeddings`` for asymmetric embedding NIMs (#3205)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 49s
Test Llama Stack Build / build (push) Failing after 54s
UI Tests / ui-tests (22) (push) Failing after 1m26s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do?
NVIDIA asymmetric embedding models (e.g.,
`nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter
not present in the standard OpenAI embeddings API. This PR adds the
`input_type="query"` as default and updates the documentation to suggest
using the `embedding` API for passage embeddings.

<!-- If resolving an issue, uncomment and update the line below -->
Resolves #2892 

## Test Plan
```
pytest -s -v tests/integration/inference/test_openai_embeddings.py   --stack-config="inference=nvidia"   --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2"   --env NVIDIA_API_KEY={nvidia_api_key}   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-08-20 08:06:25 -04:00
Mustafa Elbehery
3f8df167f3
chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061)
# What does this PR do?

This PR adds a step in pre-commit to enforce using `llama_stack` logger.

Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-08-20 07:15:35 -04:00
Matthew Farrellee
5f151ddf45
fix: disable ui-prettier & ui-eslint (#3207) 2025-08-20 06:42:43 -04:00
Francisco Arceo
5f6d5072b6
chore: Faster npm pre-commit (#3206)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 13s
Vector IO Integration Tests / test-matrix (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 58s
Update ReadTheDocs / update-readthedocs (push) Failing after 55s
UI Tests / ui-tests (22) (push) Failing after 1m18s
Test External API and Providers / test-external (venv) (push) Failing after 2m2s
Pre-commit / pre-commit (push) Failing after 2m43s
# What does this PR do?
Adds npm to pre-commit.yml installation and caches ui
Removes node installation during pre-commit.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-19 16:38:38 -07:00
github-actions[bot]
7f0b2a8764 build: Bump version to 0.2.18 2025-08-19 22:38:23 +00:00
Matthew Farrellee
e7a812f5de
chore: Fixup main pre commit (#3204) 2025-08-19 14:52:38 -04:00
Varsha
8cc4925f7d
chore: Enable keyword search for Milvus inline (#3073)
# What does this PR do?
With https://github.com/milvus-io/milvus-lite/pull/294 - Milvus Lite
supports keyword search using BM25. While introducing keyword search we
had explicitly disabled it for inline milvus. This PR removes the need
for the check, and enables `inline::milvus` for tests.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Run llama stack with `inline::milvus` enabled:

```
pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v
```

```
INFO     2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS                                        
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 3 items                                                                                                                                                                                          

tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED                                                   [ 33%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED                                                  [ 66%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED                                                   [100%]

============================================================================================ 3 passed in 4.75s =============================================================================================
```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-19 13:01:23 -04:00
Ashwin Bharambe
eb07a0f86a
fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test Llama Stack Build / build-single-provider (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 28s
Test Llama Stack Build / generate-matrix (push) Successful in 25s
Python Package Build Test / build (3.13) (push) Failing after 25s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 37s
Test External API and Providers / test-external (venv) (push) Failing after 33s
Unit Tests / unit-tests (3.13) (push) Failing after 33s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 38s
Python Package Build Test / build (3.12) (push) Failing after 1m0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m4s
Unit Tests / unit-tests (3.12) (push) Failing after 59s
Test Llama Stack Build / build (push) Failing after 50s
Vector IO Integration Tests / test-matrix (push) Failing after 1m48s
UI Tests / ui-tests (22) (push) Successful in 2m12s
Pre-commit / pre-commit (push) Successful in 2m41s
I started this PR trying to unbreak a newly broken test
`test_agent_name`. This test was broken all along but did not show up
because during testing we were pulling the "non-updated" llama stack
client. See this comment:
https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205

While fixing this, I encountered a large amount of badness in our CI
workflow definitions.

- We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR`
overrides to `llama stack build` at all in some cases.
- Even when we did, we used `uv run` liberally. The first thing `uv run`
does is "syncs" the project environment. This means, it is going to undo
any mutations we might have done ourselves. But we make many mutations
in our CI runners to these environments. The most important of which is
why `llama stack build` where we install distro dependencies. As a
result, when you tried to run the integration tests, you would see old,
strange versions.


## Test Plan

Re-record using:

```
sh scripts/integration-tests.sh --stack-config ci-tests \
  --provider ollama --test-pattern test_agent_name --inference-mode record
```

Then re-run with `--inference-mode replay`. But: 

Eventually, this test turned out to be quite flaky for telemetry
reasons. I haven't investigated it for now and just disabled it sadly
since we have a release to push out.
2025-08-18 17:02:24 -07:00
Francisco Arceo
ac78e9f66a
chore: Adding UI unit tests in CI (#3191)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m2s
Python Package Build Test / build (3.13) (push) Failing after 1m4s
UI Tests / ui-tests (22) (push) Successful in 1m33s
Pre-commit / pre-commit (push) Successful in 2m38s
2025-08-18 16:48:21 -06:00
Ashwin Bharambe
89661b984c
revert: "feat(cli): make venv the default image type" (#3196)
Reverts llamastack/llama-stack#3187
2025-08-18 15:31:01 -07:00
Ashwin Bharambe
2e7ca07423
feat(cli): make venv the default image type (#3187)
We have removed conda now so we can make `venv` the default. Just doing
`llama stack build --distro starter` is now enough for the most part.
2025-08-18 14:58:23 -07:00
slekkala1
7519ab4024
feat: Code scanner Provider impl for moderations api (#3100)
# What does this PR do?
Add CodeScanner implementations

## Test Plan
`SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v
tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`

This PR need to land after this
https://github.com/meta-llama/llama-stack/pull/3098
2025-08-18 14:15:40 -07:00
Ashwin Bharambe
27d6becfd0
fix(misc): pin openai dependency to < 1.100.0 (#3192)
This OpenAI client release
0843a11164
ends up breaking litellm
169a17400f/litellm/types/llms/openai.py (L40)

Update the dependency pin. Also make the imports a bit more defensive
anyhow if something else during `llama stack build` ends up moving
openai to a previous version.

## Test Plan

Run pre-release script integration tests.
2025-08-18 12:20:50 -07:00
IAN MILLER
f8398d25ff
fix: kill build_conda_env.sh (#3190)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
I noticed somehow
[build_conda_env.sh](https://github.com/llamastack/llama-stack/blob/main/llama_stack/core/build_conda_env.sh)
exists in main branch. We need to kill it to be consistent with
[#2969](https://github.com/llamastack/llama-stack/pull/2969)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-18 12:17:44 -07:00
Maor Friedman
739b18edf8
feat: add support for postgres ssl mode and root cert (#3182)
this PR adds support for configuring `sslmode` and `sslrootcert` when
initiating the psycopg2 connection.

closes #3181
2025-08-18 10:24:24 -07:00
Francisco Arceo
fa431e15e0
chore: Update TRIAGERS.md (#3186)
# What does this PR do?
Update triagers to current state

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-18 10:23:51 -07:00
Charlie Doern
4ae39b94ff
fix: remove category prints (#3189)
# What does this PR do?

commands where the output is important like `llama stack build
--print-deps-only` (soon to be `llama stack show`) print some log.py
`cprint`'s on _every_ execution of the CLI

for example:

<img width="912" height="331" alt="Screenshot 2025-08-18 at 1 16 30 PM"
src="https://github.com/user-attachments/assets/e5bf18fb-74a1-438c-861a-8a26eea7d014"
/>

the yellow text is likely unnecessary.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-18 10:23:23 -07:00
Ashwin Bharambe
f4cecaade9
chore(ci): dont run llama stack server always (#3188)
Sometimes the server has already been started (e.g., via docker). Just a
convenience here so we can reuse this script more.
2025-08-18 10:11:55 -07:00
Francisco Arceo
a8091d0c6a
chore: Update benchmarking location in contributing docs (#3180)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 24s
Python Package Build Test / build (3.12) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 57s
Pre-commit / pre-commit (push) Successful in 2m11s
# What does this PR do?
Small docs change as requested in
https://github.com/llamastack/llama-stack/pull/3160#pullrequestreview-3125038932


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-18 08:04:21 -04:00
Ashwin Bharambe
5e7c2250be
test(recording): add a script to schedule recording workflow (#3170)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Test External API and Providers / test-external (venv) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Pre-commit / pre-commit (push) Successful in 1m19s
See comment here:
https://github.com/llamastack/llama-stack/pull/3162#issuecomment-3192859097
-- TL;DR it is quite complex to invoke the recording workflow correctly
for an end developer writing tests. This script simplifies the work.

No more manual GitHub UI navigation!

## Script Functionality

  - Auto-detects your current branch and associated PR
  - Finds the right repository context (works from forks!)
  - Runs the workflow where it can actually commit back
  - Validates prerequisites and provides helpful error messages

## How to Use

First ensure you are on the branch which introduced a new test and want
it recorded. **Make sure you have pushed this branch remotely, easiest
is to create a PR.**

```
  # Record tests for current branch
  ./scripts/github/schedule-record-workflow.sh

  # Record specific test subdirectories
  ./scripts/github/schedule-record-workflow.sh --test-subdirs "agents,inference"

  # Record with vision tests enabled
  ./scripts/github/schedule-record-workflow.sh --run-vision-tests

  # Record tests matching a pattern
  ./scripts/github/schedule-record-workflow.sh --test-pattern "test_streaming"
```

## Test Plan

Ran `./scripts/github/schedule-record-workflow.sh -s inference -k
tool_choice` which started
4820409329
which successfully committed recorded outputs.
2025-08-15 16:54:34 -07:00
Matthew Farrellee
914c7be288
feat: add batches API with OpenAI compatibility (with inference replay) (#3162)
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses #3066

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-15 15:34:15 -07:00
Ashwin Bharambe
f4ccdee200 fix(ci): skip batches directory for library client testing 2025-08-15 15:30:03 -07:00
Ashwin Bharambe
0e8bb94bf3
feat(ci): make recording workflow simpler, more parameterizable (#3169)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 27s
Unit Tests / unit-tests (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 2m6s
# What does this PR do?

Recording tests has become a nightmare. This is the first part of making
that process simpler by making it _less_ automatic. I tried to be too
clever earlier.

It simplifies the record-integration-tests workflow to use workflow
dispatch inputs instead of PR labels. No more opaque stuff. Just go to
the GitHub UI and run the workflow with inputs. I will soon add a helper
script for this also.

Other things to aid re-running just the small set of things you need to
re-record:
- Replaces the `test-types` JSON array parameter with a more intuitive
`test-subdirs` comma-separated list. The whole JSON array crap was for
matrix.
- Adds a new `test-pattern` parameter to allow filtering tests using
pytest's `-k` option


## Test Plan

Note that this PR is in a fork not the source repository.

- Replay tests on this PR are green
- Manually
[ran](1699856292)
the replay workflow with a test-subdir and test-pattern filter, worked
- Manually
[ran](4819508034)
the **record** workflow with a simple pattern, it has worked and updated
_this_ PR.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-15 14:47:20 -07:00
Ashwin Bharambe
a6e2c18909
Revert "refactor(agents): migrate to OpenAI chat completions API" (#3167)
Reverts llamastack/llama-stack#3097

It has broken agents tests.
2025-08-15 12:01:07 -07:00
ehhuang
2c06b24c77
test: benchmark scripts (#3160)
# What does this PR do?
1. Add our own benchmark script instead of locust (doesn't support
measuring streaming latency well)
2. Simplify k8s deployment
3. Add a simple profile script for locally running server

## Test Plan
❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10

============================================================
BENCHMARK RESULTS
============================================================
Total time: 180.00s
Concurrent users: 10
Total requests: 1636
Successful requests: 1636
Failed requests: 0
Success rate: 100.0%
Requests per second: 9.09

Response Time Statistics:
  Mean: 1.095s
  Median: 1.721s
  Min: 0.136s
  Max: 3.218s
  Std Dev: 0.762s

Percentiles:
  P50: 1.721s
  P90: 1.751s
  P95: 1.756s
  P99: 1.796s

Time to First Token (TTFT) Statistics:
  Mean: 0.037s
  Median: 0.037s
  Min: 0.023s
  Max: 0.211s
  Std Dev: 0.011s

TTFT Percentiles:
  P50: 0.037s
  P90: 0.040s
  P95: 0.044s
  P99: 0.055s

Streaming Statistics:
  Mean chunks per response: 64.0
  Total chunks received: 104775
2025-08-15 11:24:29 -07:00
dependabot[bot]
2114214fe3
chore(python-deps): bump huggingface-hub from 0.34.3 to 0.34.4 (#3084)
Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub)
from 0.34.3 to 0.34.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's
releases</a>.</em></p>
<blockquote>
<h2>[v0.34.4] Support Image to Video inference + QoL in jobs API, auth
and utilities</h2>
<p>Biggest update is the support of Image-To-Video task with inference
provider Fal AI</p>
<ul>
<li>[Inference] Support image to video task <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a>
by <a
href="https://github.com/hanouticelina"><code>@​hanouticelina</code></a></li>
</ul>
<pre lang="py"><code>&gt;&gt;&gt; from huggingface_hub import
InferenceClient
&gt;&gt;&gt; client = InferenceClient()
&gt;&gt;&gt; video = client.image_to_video(&quot;cat.jpg&quot;,
model=&quot;Wan-AI/Wan2.2-I2V-A14B&quot;, prompt=&quot;turn the cat into
a tiger&quot;)
&gt;&gt;&gt; with open(&quot;tiger.mp4&quot;, &quot;wb&quot;) as f:
 ...     f.write(video)
</code></pre>
<p>And some quality of life improvements:</p>
<ul>
<li>Add type to job owner <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a>
by <a href="https://github.com/drbh"><code>@​drbh</code></a></li>
<li>Include HF_HUB_DISABLE_XET in the environment dump <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a>
by <a
href="https://github.com/hanouticelina"><code>@​hanouticelina</code></a></li>
<li>Whoami: custom message only on unauthorized <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a>
by <a href="https://github.com/Wauplin"><code>@​Wauplin</code></a></li>
<li>Add validation warnings for repository limits in upload_large_folder
<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a>
by <a
href="https://github.com/davanstrien"><code>@​davanstrien</code></a></li>
<li>Add timeout info to Jobs guide docs <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a>
by <a
href="https://github.com/davanstrien"><code>@​davanstrien</code></a></li>
<li>[Jobs] Use current or stored token in a Job secrets <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a>
by <a href="https://github.com/lhoestq"><code>@​lhoestq</code></a></li>
<li>Fix bash history expansion in hf jobs example <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a>
by <a
href="https://github.com/nyuuzyou"><code>@​nyuuzyou</code></a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="84a92a92c2"><code>84a92a9</code></a>
Release: v0.34.4</li>
<li><a
href="6196ac2cbc"><code>6196ac2</code></a>
Add type to job owner (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a>)</li>
<li><a
href="4f6975f697"><code>4f6975f</code></a>
Include <code>HF_HUB_DISABLE_XET</code> in the environment dump (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a>)</li>
<li><a
href="3720a5096f"><code>3720a50</code></a>
[Inference] Support image to video task (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a>)</li>
<li><a
href="bb5e4c7a2c"><code>bb5e4c7</code></a>
Whoami: custom message only on unauthorized (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a>)</li>
<li><a
href="a725256f31"><code>a725256</code></a>
Add validation warnings for repository limits in upload_large_folder (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a>)</li>
<li><a
href="a181b0f088"><code>a181b0f</code></a>
Add timeout info to Jobs guide docs (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a>)</li>
<li><a
href="4d38925c8d"><code>4d38925</code></a>
[Jobs] Use current or stored token in a Job secrets (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a>)</li>
<li><a
href="1580ce18c7"><code>1580ce1</code></a>
Fix bash history expansion in hf jobs example (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a>)</li>
<li>See full diff in <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=huggingface-hub&package-manager=uv&previous-version=0.34.3&new-version=0.34.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-15 10:55:43 -07:00
dependabot[bot]
a275282685
chore(python-deps): bump pymilvus from 2.5.14 to 2.6.0 (#3086)
Bumps [pymilvus](https://github.com/milvus-io/pymilvus) from 2.5.14 to
2.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/milvus-io/pymilvus/releases">pymilvus's
releases</a>.</em></p>
<blockquote>
<h2>PyMilvus v2.6.0 Release Notes</h2>
<h2>New Features</h2>
<ol>
<li>Add APIs in MilvusClient</li>
</ol>
<ul>
<li>enhance: add describe and alter database in MilvusClient by <a
href="https://github.com/smellthemoon"><code>@​smellthemoon</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2433">milvus-io/pymilvus#2433</a></li>
<li>enhance: support milvus-client iterator by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2461">milvus-io/pymilvus#2461</a></li>
<li>enhance: Enable resource group api in milvus client by <a
href="https://github.com/weiliu1031"><code>@​weiliu1031</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2513">milvus-io/pymilvus#2513</a></li>
<li>enhance: add release_collection, drop_index, create_partition,
drop_partition, load_partition and release_partition by <a
href="https://github.com/brcarry"><code>@​brcarry</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2525">milvus-io/pymilvus#2525</a></li>
<li>enhance: enable describe_replica api in milvus client by <a
href="https://github.com/weiliu1031"><code>@​weiliu1031</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2541">milvus-io/pymilvus#2541</a></li>
<li>enhance: support recalls for milvus_client by <a
href="https://github.com/chasingegg"><code>@​chasingegg</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2552">milvus-io/pymilvus#2552</a></li>
<li>enhance: add use_database by <a
href="https://github.com/czs007"><code>@​czs007</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2491">milvus-io/pymilvus#2491</a></li>
</ul>
<ol start="2">
<li>Add AsyncMilvusClient</li>
</ol>
<ul>
<li>[FEAT] Asyncio support by <a
href="https://github.com/brcarry"><code>@​brcarry</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2411">milvus-io/pymilvus#2411</a></li>
<li>Add async DDL funcs &amp; DDL examples by <a
href="https://github.com/Shawnzheng011019"><code>@​Shawnzheng011019</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2852">milvus-io/pymilvus#2852</a></li>
</ul>
<ol start="3">
<li>Other features</li>
</ol>
<ul>
<li>enhance: support Int8Vector by <a
href="https://github.com/cydrain"><code>@​cydrain</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2611">milvus-io/pymilvus#2611</a></li>
<li>feat: support recalls field in SearchResult by <a
href="https://github.com/chasingegg"><code>@​chasingegg</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2390">milvus-io/pymilvus#2390</a></li>
<li>enhance: Support Python3.13 and upgrade grpcio range by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2684">milvus-io/pymilvus#2684</a></li>
<li>enhance: support run analyzer return detail token by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2679">milvus-io/pymilvus#2679</a></li>
<li>enhance: Add force_drop parameter to drop_role method for role
deletion by <a href="https://github.com/SimFG"><code>@​SimFG</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2705">milvus-io/pymilvus#2705</a></li>
<li>enhance: add property func for AnalyzeToken by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2704">milvus-io/pymilvus#2704</a></li>
<li>enhance: grant/revoke v2 optional db and collection params by <a
href="https://github.com/shaoting-huang"><code>@​shaoting-huang</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2386">milvus-io/pymilvus#2386</a></li>
<li>extend unlimted offset for query iterator(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2418">#2418</a>)
by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2419">milvus-io/pymilvus#2419</a></li>
<li>enhance: alterindex &amp; altercollection supports altering
properties by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2406">milvus-io/pymilvus#2406</a></li>
<li>enhance: alterdatabase support delete property by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2435">milvus-io/pymilvus#2435</a></li>
<li>enhance: support hints param by <a
href="https://github.com/chasingegg"><code>@​chasingegg</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2408">milvus-io/pymilvus#2408</a></li>
<li>enhance: create database support properties by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2448">milvus-io/pymilvus#2448</a></li>
<li>enhance: Add <code>db_name</code> parameter at
<code>bulk_import</code> by <a
href="https://github.com/counter2015"><code>@​counter2015</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2446">milvus-io/pymilvus#2446</a></li>
<li>enhance: add search iterator v2 by <a
href="https://github.com/PwzXxm"><code>@​PwzXxm</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2395">milvus-io/pymilvus#2395</a></li>
<li>enhance: simplify the structure of search_params by <a
href="https://github.com/smellthemoon"><code>@​smellthemoon</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2507">milvus-io/pymilvus#2507</a></li>
<li>enhance: Remove long deprecated Milvus class by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2544">milvus-io/pymilvus#2544</a></li>
<li>enhance: Use new model pkg by <a
href="https://github.com/junjiejiangjjj"><code>@​junjiejiangjjj</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2595">milvus-io/pymilvus#2595</a></li>
<li>enhance: Add schema update time verification to insert and upsert to
use cache by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2551">milvus-io/pymilvus#2551</a></li>
<li>enhance: describecollection output add created_timestamp by <a
href="https://github.com/JsDove"><code>@​JsDove</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2618">milvus-io/pymilvus#2618</a></li>
<li>feat: add external filter func for search iterator v2 by <a
href="https://github.com/PwzXxm"><code>@​PwzXxm</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2639">milvus-io/pymilvus#2639</a></li>
<li>enhance: support run analyzer by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2622">milvus-io/pymilvus#2622</a></li>
<li>weighted reranker to allow skip score normalization by <a
href="https://github.com/zhengbuqian"><code>@​zhengbuqian</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2708">milvus-io/pymilvus#2708</a></li>
<li>enhance: Support AddCollectionField API by <a
href="https://github.com/congqixia"><code>@​congqixia</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2722">milvus-io/pymilvus#2722</a></li>
<li>Add 1-Way and 2-Way TLS Support to Bulk Import Functions by <a
href="https://github.com/abd-770"><code>@​abd-770</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2672">milvus-io/pymilvus#2672</a></li>
<li>enhance: Use SearchResult in MilvusClient by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2735">milvus-io/pymilvus#2735</a></li>
<li>Support rerank by <a
href="https://github.com/junjiejiangjjj"><code>@​junjiejiangjjj</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2729">milvus-io/pymilvus#2729</a></li>
<li>feat: suppoprt multi analyzer params by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2747">milvus-io/pymilvus#2747</a></li>
<li>Add funciton checker by <a
href="https://github.com/junjiejiangjjj"><code>@​junjiejiangjjj</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2760">milvus-io/pymilvus#2760</a></li>
<li>enhance: Support run analyzer by collection and field by <a
href="https://github.com/aoiasd"><code>@​aoiasd</code></a> in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2822">milvus-io/pymilvus#2822</a></li>
<li>feat: support load collection/partition with priority(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2835">#2835</a>)
by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2836">milvus-io/pymilvus#2836</a></li>
<li>enhance: optimize perf for large topk(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2848">#2848</a>)
by <a
href="https://github.com/MrPresent-Han"><code>@​MrPresent-Han</code></a>
in <a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2849">milvus-io/pymilvus#2849</a></li>
<li>enhance: Add usage guide to manage MilvusClient by <a
href="https://github.com/XuanYang-cn"><code>@​XuanYang-cn</code></a> in
<a
href="https://redirect.github.com/milvus-io/pymilvus/pull/2907">milvus-io/pymilvus#2907</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1e56ce7d31"><code>1e56ce7</code></a>
enhance: Update milvus-proto and readme (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2921">#2921</a>)</li>
<li><a
href="75052b1b7c"><code>75052b1</code></a>
enhance: Add usage guide to manage MilvusClient (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2907">#2907</a>)</li>
<li><a
href="9f44053086"><code>9f44053</code></a>
add example code for language identifier and multi analyzer (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2919">#2919</a>)</li>
<li><a
href="058836de26"><code>058836d</code></a>
fix: Return new pk value for upsert when autoid=true (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2914">#2914</a>)</li>
<li><a
href="bbc6777565"><code>bbc6777</code></a>
[cherry-pick] Compatible with the default behavior of free on the cloud
(<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2913">#2913</a>)</li>
<li><a
href="45080c39c5"><code>45080c3</code></a>
fix: Aviod coping functions when init CollectionSchema (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2902">#2902</a>)</li>
<li><a
href="52b8461c5b"><code>52b8461</code></a>
[cherry-pick] bulk_import add stageName/dataPaths parameter (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2905">#2905</a>)</li>
<li><a
href="a8c3120622"><code>a8c3120</code></a>
[cherry-pick] support stage (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2895">#2895</a>)</li>
<li><a
href="3653effa88"><code>3653eff</code></a>
fix: Tidy alias configs when connect fails (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2900">#2900</a>)</li>
<li><a
href="728791a7de"><code>728791a</code></a>
enhance: Store alias before wait for ready (<a
href="https://redirect.github.com/milvus-io/pymilvus/issues/2894">#2894</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/milvus-io/pymilvus/compare/v2.5.14...v2.6.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymilvus&package-manager=uv&previous-version=2.5.14&new-version=2.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-15 10:54:09 -07:00
Aakanksha Duggal
e743d3fdf6
refactor(agents): migrate to OpenAI chat completions API (#3097)
Replace chat_completion calls with openai_chat_completion to eliminate
dependency on legacy inference APIs.

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #3067

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-15 10:51:41 -07:00
ashwinb
f66ae3b3b1
docs(tests): Add a bunch of documentation for our testing systems (#3139)
# What does this PR do?

Creates a structured testing documentation section with multiple detailed pages:

- Testing overview explaining the record-replay architecture
- Integration testing guide with practical usage examples
- Record-replay system technical documentation
- Guide for writing effective tests
- Troubleshooting guide for common testing issues

Hopefully this makes things a bit easier.
2025-08-15 17:45:30 +00:00
Ashwin Bharambe
81ecaf6221
fix(ci): make the Vector IO CI follow the same pattern as others (#3164)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do?
Updates the integration-vector-io-tests workflow to run daily tests on
Python 3.13 while limiting regular PR tests to Python 3.12 only.

The PR also improves the concurrency configuration to prevent workflow
conflicts between main branch runs and PR runs.

## Test Plan


[![testinprod](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/WjlTemxb6oA4PgZFmj08/2645295d-f421-49ae-8f3f-f4672d8204e2/testinprod.jpeg)](https://app.graphite.dev/settings/meme-library?org=llamastack)
2025-08-14 21:06:08 -07:00
ashwinb
01b2afd4b5
fix(tests): record missing tests for test_responses_store (#3163)
# What does this PR do?

Updates test recordings.

## Test Plan

Started ollama serving the 3.2:3b model. Then ran the server:

```
LLAMA_STACK_TEST_INFERENCE_MODE=record \
  LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings/ \
  SQLITE_STORE_DIR=$(mktemp -d) \
  OLLAMA_URL=http://localhost:11434 \
  llama stack build --template starter --image-type venv --run
```

Then ran the tests which needed recording:

```
pytest -sv tests/integration/agents/test_openai_responses.py \
  --stack-config=server:starter \
   --text-model ollama/llama3.2:3b-instruct-fp16 -k test_responses_store
```

Then, restarted the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay`, re-ran the tests and verified they passed.
2025-08-15 03:52:45 +00:00
ashwinb
8ed69978f9
refactor(tests): make the responses tests nicer (#3161)
# What does this PR do?

A _bunch_ on cleanup for the Responses tests.

- Got rid of YAML test cases, moved them to just use simple pydantic models
- Splitting the large monolithic test file into multiple focused test files:
   - `test_basic_responses.py` for basic and image response tests
   - `test_tool_responses.py` for tool-related tests
   - `test_file_search.py` for file search specific tests
- Adding a `StreamingValidator` helper class to standardize streaming response validation

## Test Plan

Run the tests:

```
pytest -s -v tests/integration/non_ci/responses/ \
   --stack-config=starter \
   --text-model openai/gpt-4o \
   --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
    -k "client_with_models"
```
2025-08-15 00:05:36 +00:00
ashwinb
ba664474de
feat(responses): add mcp list tool streaming event (#3159)
# What does this PR do?

Adds proper streaming events for MCP tool listing (`mcp_list_tools.in_progress` and `mcp_list_tools.completed`). Also refactors things a bit more.

## Test Plan

Verified existing integration tests pass with the refactored code. The test `test_response_streaming_multi_turn_tool_execution` has been updated to check for the new MCP list tools streaming events
2025-08-15 00:05:36 +00:00
ashwinb
9324e902f1
refactor(responses): move stuff into some utils and add unit tests (#3158)
# What does this PR do?
Refactors the OpenAI response conversion utilities by moving helper functions from `openai_responses.py` to `utils.py`. Adds unit tests.
2025-08-15 00:05:36 +00:00
ashwinb
47d5af703c
chore(responses): Refactor Responses Impl to be civilized (#3138)
# What does this PR do?
Refactors the OpenAI responses implementation by extracting streaming and tool execution logic into separate modules. This improves code organization by:

1. Creating a new `StreamingResponseOrchestrator` class in `streaming.py` to handle the streaming response generation logic
2. Moving tool execution functionality to a dedicated `ToolExecutor` class in `tool_executor.py`

## Test Plan

Existing tests
2025-08-15 00:05:35 +00:00
Francisco Arceo
e69acbafbf
feat(UI): Adding linter and prettier for UI (#3156) 2025-08-14 15:58:43 -06:00
Ashwin Bharambe
61582f327c
fix(ci): update triggers for the workflows (#3152)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m39s
2025-08-14 10:27:25 -07:00
Derek Higgins
c15cc7ed77
fix: use ChatCompletionMessageFunctionToolCall (#3142)
The OpenAI compatibility layer was incorrectly importing
ChatCompletionMessageToolCallParam instead of the
ChatCompletionMessageFunctionToolCall class. This caused "Cannot
instantiate typing.Union" errors when processing agent requests with
tool calls.

Closes: #3141

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-08-14 10:27:00 -07:00
Ashwin Bharambe
ee7631b6cf
Revert "feat: add batches API with OpenAI compatibility" (#3149)
Reverts llamastack/llama-stack#3088

The PR broke integration tests.
2025-08-14 10:08:54 -07:00
Matthew Farrellee
de692162af
feat: add batches API with OpenAI compatibility (#3088)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Update ReadTheDocs / update-readthedocs (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 1m53s
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses #3066
2025-08-14 09:42:02 -04:00
ehhuang
46ff302d87
chore: Remove Trendshift badge from README (#3137)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 11s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 51s
Unit Tests / unit-tests (3.12) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 1m36s
## Summary
- This links to a scammy looking website with ads.

## Test plan
2025-08-13 18:38:34 -07:00
Ashwin Bharambe
e1e161553c
feat(responses): add MCP argument streaming and content part events (#3136)
# What does this PR do?

Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces:

1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals

2. New streaming event types:
   - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin
   - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete

3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones.


## Test Plan

Updated existing streaming tests to verify content part events are properly emitted
2025-08-13 16:34:26 -07:00
Ashwin Bharambe
8638537d14
feat(responses): stream progress of tool calls (#3135)
# What does this PR do?
Enhances tool execution streaming by adding support for real-time progress events during tool calls. This implementation adds streaming events for MCP and web search tools, including in-progress, searching, completed, and failed states. 

The refactored `_execute_tool_call` method now returns an async iterator that yields streaming events throughout the tool execution lifecycle.

## Test Plan
Updated the integration test `test_response_streaming_multi_turn_tool_execution` to verify the presence and structure of new streaming events, including:
- Checking for MCP in-progress and completed events
- Verifying that progress events contain required fields (item_id, output_index, sequence_number)
- Ensuring completed events have the necessary sequence_number field
2025-08-13 16:31:25 -07:00
Ashwin Bharambe
5b312a80b9
feat(responses): improve streaming for function calls (#3124)
Some checks failed
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 22s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m10s
Test Llama Stack Build / build (push) Failing after 12s
Emit streaming events for function calls

## Test Plan

Improved the test case
2025-08-13 11:23:27 -07:00
ehhuang
d6ae54723d
chore: setup for performance benchmarking (#3096)
# What does this PR do?
1. Added a simple mock openai-compat server that serves chat/completion
2. Add a benchmark server in EKS that includes mock inference server
3. Add locust (https://locust.io/) file for load testing

## Test Plan
bash apply.sh
kubectl port-forward service/locust-web-ui 8089:8089
Go to localhost:8089 to start a load test

<img width="1392" height="334" alt="image"
src="https://github.com/user-attachments/assets/d6aa3deb-583a-42ed-889b-751262b8e91c"
/>
<img width="1362" height="881" alt="image"
src="https://github.com/user-attachments/assets/6a28b9b4-05e6-44e2-b504-07e60c12d35e"
/>
2025-08-13 10:58:22 -07:00
ehhuang
2f51273215
fix: huge speed boost (#3132)
# What does this PR do?
make llama stack fast again


## Test Plan
2025-08-13 09:51:35 -07:00
slekkala1
25e0553eed
chore: Change moderations api response to Provider returned categories (#3098)
# What does this PR do?
To be compliant with model policies for LLAMA, just return the
categories as is from provider, we will lose the OAI compat in
moderations api response.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
`SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest
-v tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
2025-08-13 09:47:35 -07:00
Ashwin Bharambe
a9081d87b9 feat(ci): update Recording workflow trigger and concurrency group 2025-08-13 09:36:13 -07:00
IAN MILLER
0950168f26
refactor: replace hardcoded status codes by httpx.codes (#3131)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to eliminate hardcoded status codes in
server's responses and replace it by `httpx.codes` functionality for
better consistency across the whole project and improvement in code
readability.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 08:43:41 -07:00
Kelly Brown
0cbd93c5cc
docs: Update blocks formatting in docs/source files (#3120)
**Description:** 
The standard markdown [!NOTE] format is not supported on Sphinx
generated documentation, replacing those instances. Also updating other
Notes, Tips and Warning blocks throughout the source docs

WIP: Working to update the provider code gen
2025-08-13 08:06:31 -07:00
IAN MILLER
c9b78602d3
refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to make the behavior DELETE API endpoints be
consistent with standard RESTful conventions and eliminate confusion for
API consumers.

Old Behavior
```
HTTP Status: 200 OK
Response Body: null
```

Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`null% `
`INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry:
15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield
HTTP/1.1" 200 `

Updated Behavior
```
HTTP Status: 204 No Content
Response Body: empty (no body)
```

Eg.  `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry:
15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield
HTTP/1.1" 204 `

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3090 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 07:56:26 -07:00
Francisco Arceo
92aca434a7
fix: Fix list_sessions() (#3114)
# What does this PR do?
1. Updates `AgentPersistence.list_sessions()` to properly filter out
`Turn` keys from `Session` keys.
2. Adds a suite of unit tests to confirm the `list_sessions()` behavior
and tests the failed sample in
https://github.com/meta-llama/llama-stack/issues/3048

## Fixes https://github.com/meta-llama/llama-stack/issues/3048


## Test Plan
Unit tests added.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-13 07:46:26 -07:00
Krzysztof Malczuk
5bd6cb52fb
fix: github action canceling valid tasks for checking semantic pr title (#3127)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR changes the group name from github.ref to
github.even.pull_request_number. The reason for this is that github.ref
does not act as a unique identifier in the pull_request_target event and
only is unique in pull_request. The github action was getting canceled
was because the group name was not unique in the concurrency section.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3102

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
To test this I have created a fake github action and ran it trough act
to see what the github.ref variable produced and what alternatives can
be used. This confirmed that the github.ref was not unique and that
github.event.pull_request_number is unique to the PR.
2025-08-13 07:14:03 -07:00
Chacksu
fffdab4f5c
fix: Dell distribution missing kvstore (#3113)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Tests (Replay) / discover-tests (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do?

- Added kvstore config to ChromaDB provider config for Dell distribution
similar to [starter
config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112)
- Fixed
[error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425)
getting endpoint information by adding `hf-inference` as the provider to
the `AsyncInferenceClient` (TGI client).

## Test Plan
```
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=8000
export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321
export HF_TOKEN=[redacted]

# TGI Server
docker run --rm -it \
  --pull always \
  --network host \
  -v $HOME/.cache/huggingface:/data \
  -e HF_TOKEN=$HF_TOKEN \
  -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
  -p $INFERENCE_PORT:$INFERENCE_PORT \
  --gpus all \
  ghcr.io/huggingface/text-generation-inference:latest \
  --dtype float16 \
  --usage-stats off \
  --sharded false \
  --cuda-memory-fraction 0.8 \
  --model-id meta-llama/Llama-3.2-3B-Instruct \
  --port $INFERENCE_PORT \
  --hostname 0.0.0.0

# Chrome DB
docker run --rm -it \
  --name chromadb \
  --net=host  -p 8000:8000 \
  -v ~/chroma:/chroma/chroma \
  -e IS_PERSISTENT=TRUE \
  -e ANONYMIZED_TELEMETRY=FALSE \
  chromadb/chroma:latest

# Llama Stack
llama stack run dell \
 --port $LLAMA_STACK_PORT \
 --env INFERENCE_MODEL=$INFERENCE_MODEL \
 --env DEH_URL=$DEH_URL \
 --env CHROMA_URL=$CHROMA_URL
```

---------

Co-authored-by: Connor Hack <connorhack@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-13 06:18:25 -07:00
Kelly Brown
6358d0a478
docs: reorganize contributor guide (#3110)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 11s
Pre-commit / pre-commit (push) Successful in 1m48s
**Description:** 
Restructures contribution guide and move some sections into categories

<img width="1399" height="527" alt="Screenshot 2025-08-12 at 9 28 44 AM"
src="https://github.com/user-attachments/assets/404e23b4-0001-4174-b662-593e0173ef7d"
/>
2025-08-12 16:17:03 -07:00
Ashwin Bharambe
3d90117891
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers.

I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`

## Test Plan

Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
   --text-model openai/gpt-4o \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
  -k "client_with_models"
```

Do the same with `-k openai_client`

The rest should be taken care of by CI.
2025-08-12 16:15:53 -07:00
Ashwin Bharambe
1721aafc1f
feat(responses): type file results properly (#3117)
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m16s
Another thing our tests implicitly depended on.
2025-08-12 10:39:09 -07:00
Ashwin Bharambe
4fec49dfdb
feat(responses): add include parameter (#3115)
Well our Responses tests use it so we better include it in the API, no?

I discovered it because I want to make sure `llama-stack-client` can be
used always instead of `openai-python` as the client (we do want to be
_truly_ compatible.)
2025-08-12 10:24:01 -07:00
Nathan Weinberg
6812aa1e1e
chore: bump min python version in docs and tests (#3103)
# What does this PR do?
the minimum python version for the project was bumped to 3.12 a couple
months ago, but there remains some artifacts in the repo suggesting we
support >=3.10

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-12 08:52:57 -07:00
dependabot[bot]
88c4fdc5d7
chore(python-deps): bump chromadb from 1.0.15 to 1.0.16 (#3083)
Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.15 to
1.0.16.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/chroma-core/chroma/releases">chromadb's
releases</a>.</em></p>
<blockquote>
<h2>1.0.16</h2>
<p>Version: <code>1.0.16</code>
Git ref: <code>refs/tags/1.0.16</code>
Build Date: <code>2025-08-08T00:26</code>
PIP Package: <code>chroma-1.0.16.tar.gz</code>
Github Container Registry Image: <code>:1.0.16</code>
DockerHub Image: <code>:1.0.16</code></p>
<h2>What's Changed</h2>
<ul>
<li>[ENH]: add cache mount &amp; tolerations to garbage collector
template in Helm chart by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5016">chroma-core/chroma#5016</a></li>
<li>[DOC] Fix docs typo by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5018">chroma-core/chroma#5018</a></li>
<li>[CLN] Change GenericQuotaError from 429 to 422 by <a
href="https://github.com/drewkim"><code>@​drewkim</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5022">chroma-core/chroma#5022</a></li>
<li>[CHORE] Fix type error in batch_utils by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5024">chroma-core/chroma#5024</a></li>
<li>[ENH] Add block-level metrics by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/4801">chroma-core/chroma#4801</a></li>
<li>[ENH]: return error on /add if embeddings are not provided by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5033">chroma-core/chroma#5033</a></li>
<li>[DOC] Docs Polish 07/2025 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5032">chroma-core/chroma#5032</a></li>
<li>[DOC] Flatten public txt files by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5040">chroma-core/chroma#5040</a></li>
<li>[ENH]: require embeddings &amp; require min embedding dimension on
/add by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5037">chroma-core/chroma#5037</a></li>
<li>[ENH] - Adds in dark mode support for hero image by <a
href="https://github.com/tjkrusinskichroma"><code>@​tjkrusinskichroma</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5042">chroma-core/chroma#5042</a></li>
<li>[BLD] Use 8core runners for all our windows jobs by <a
href="https://github.com/eculver"><code>@​eculver</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5027">chroma-core/chroma#5027</a></li>
<li>[TST] More benchmark queries for regex by <a
href="https://github.com/Sicheng-Pan"><code>@​Sicheng-Pan</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/4910">chroma-core/chroma#4910</a></li>
<li>[BUG]: refactor otel/tracing initialization in the frontend to be
independent of hosted entry point by <a
href="https://github.com/c-gamble"><code>@​c-gamble</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5028">chroma-core/chroma#5028</a></li>
<li>[BUG] js client: handle 422 billing errors as QuotaExceeded instead
of ChromaConnectionError by <a
href="https://github.com/philipithomas"><code>@​philipithomas</code></a>
in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5049">chroma-core/chroma#5049</a></li>
<li>[BUG] RLS should use 32MB GRPC payload size limit by <a
href="https://github.com/Sicheng-Pan"><code>@​Sicheng-Pan</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5044">chroma-core/chroma#5044</a></li>
<li>[BUG] Sync protoc arch and version in dockerfile by <a
href="https://github.com/Sicheng-Pan"><code>@​Sicheng-Pan</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5045">chroma-core/chroma#5045</a></li>
<li>[BLD] Fix windows runner label by <a
href="https://github.com/eculver"><code>@​eculver</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5052">chroma-core/chroma#5052</a></li>
<li>[PERF]: Prefetch segments in get and query by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5053">chroma-core/chroma#5053</a></li>
<li>[PERF]: Parallelize fetching blocks for brute force regex by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5051">chroma-core/chroma#5051</a></li>
<li>[RELEASE] JS 3.0.7 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5059">chroma-core/chroma#5059</a></li>
<li>[ENH] Add a delete_many call to the storage API. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5020">chroma-core/chroma#5020</a></li>
<li>[ENH] Consume delete_many from the wal3 garbage collector. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5021">chroma-core/chroma#5021</a></li>
<li>[ENH]: limit number of concurrent get_all_block_ids() when using
buffer_unordered() by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5062">chroma-core/chroma#5062</a></li>
<li>[ENH]: use new <code>delete_many()</code> storage method in
DeleteUnusedFiles operator by <a
href="https://github.com/codetheweb"><code>@​codetheweb</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5061">chroma-core/chroma#5061</a></li>
<li>[BUG]: Disable aws stalled stream protection by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5063">chroma-core/chroma#5063</a></li>
<li>[DOC] Update manage collections docs with correct delete collection
info by <a
href="https://github.com/jairad26"><code>@​jairad26</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5066">chroma-core/chroma#5066</a></li>
<li>[BUG] Improve wal3 robustness with better shutdown handling and
error recovery by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5046">chroma-core/chroma#5046</a></li>
<li>[ENH] Do not do any mutations of the manifest from within GC. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5050">chroma-core/chroma#5050</a></li>
<li>[CHORE]: enable change notifier otel/tracing by <a
href="https://github.com/c-gamble"><code>@​c-gamble</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5073">chroma-core/chroma#5073</a></li>
<li>[CHORE] Add pprof server to query service by <a
href="https://github.com/eculver"><code>@​eculver</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5072">chroma-core/chroma#5072</a></li>
<li>[ENH]: Dedup inserts to the same key in foyer by <a
href="https://github.com/sanketkedia"><code>@​sanketkedia</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5074">chroma-core/chroma#5074</a></li>
<li>[ENH] &quot;Failed to fetch: status: NotFound&quot; be gone. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5064">chroma-core/chroma#5064</a></li>
<li>[CLN] Remove the the top most spammy log lines from rls/wal3. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5071">chroma-core/chroma#5071</a></li>
<li>[DOC] Fix badge in readme by <a
href="https://github.com/kylediaz"><code>@​kylediaz</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5025">chroma-core/chroma#5025</a></li>
<li>[ENH] A tool for patching logs that were deleted before a new
manifest was installed. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5083">chroma-core/chroma#5083</a></li>
<li>[BUG] Add billing errors to JS client by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5084">chroma-core/chroma#5084</a></li>
<li>[CHORE]: Add s3 get metrics and pod name to tracing spans by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5086">chroma-core/chroma#5086</a></li>
<li>[RELEASE] JS 3.0.8 by <a
href="https://github.com/itaismith"><code>@​itaismith</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5087">chroma-core/chroma#5087</a></li>
<li>[ENH] A tool to purge the cache. by <a
href="https://github.com/rescrv"><code>@​rescrv</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5085">chroma-core/chroma#5085</a></li>
<li>[DOC] Update PR template for migration and observability by <a
href="https://github.com/HammadB"><code>@​HammadB</code></a> in <a
href="https://redirect.github.com/chroma-core/chroma/pull/5089">chroma-core/chroma#5089</a></li>
<li>[CHORE]: Fix s3 get metric name by <a
href="https://github.com/tanujnay112"><code>@​tanujnay112</code></a> in
<a
href="https://redirect.github.com/chroma-core/chroma/pull/5091">chroma-core/chroma#5091</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="dff3a786db"><code>dff3a78</code></a>
[RELEASE] CLI 1.1.5, Python 1.0.16, JS 3.0.11 (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5227">#5227</a>)</li>
<li><a
href="f60f932b8d"><code>f60f932</code></a>
[ENH]: Increase nprobe for smaller collections (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5226">#5226</a>)</li>
<li><a
href="f593a43b5d"><code>f593a43</code></a>
[ENH] Add <code>InsertRecordSet</code> to JS client (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5225">#5225</a>)</li>
<li><a
href="76a14c226a"><code>76a14c2</code></a>
[DOC] Made light/dark mode for Chroma logo (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5215">#5215</a>)</li>
<li><a
href="d80817ede4"><code>d80817e</code></a>
[ENH]: Add more tracing in the filter path (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5219">#5219</a>)</li>
<li><a
href="73abfdc51a"><code>73abfdc</code></a>
[ENH] Handle when the garbage doesn't overlap the manifest. (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5207">#5207</a>)</li>
<li><a
href="fa392226ba"><code>fa39222</code></a>
[BUG] Revert accidentally commited code (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5205">#5205</a>)</li>
<li><a
href="815c3ac561"><code>815c3ac</code></a>
[ENH]: Fix CI flake with adaptive nsearch (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5203">#5203</a>)</li>
<li><a
href="ea66d6929c"><code>ea66d69</code></a>
[BUG] Switch to rust-tls (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5204">#5204</a>)</li>
<li><a
href="04aeb22139"><code>04aeb22</code></a>
[ENH]: Calculate cache weight of block size instead of hardcoding (<a
href="https://redirect.github.com/chroma-core/chroma/issues/5201">#5201</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/chroma-core/chroma/compare/1.0.15...1.0.16">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=chromadb&package-manager=uv&previous-version=1.0.15&new-version=1.0.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-12 08:44:39 -07:00
dependabot[bot]
393f3714b0
chore(python-deps): bump torch from 2.7.1 to 2.8.0 (#3082)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.7.1 to 2.8.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytorch/pytorch/releases">torch's
releases</a>.</em></p>
<blockquote>
<h1>PyTorch 2.8.0 Release Notes</h1>
<ul>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#highlights">Highlights</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#backwards-incompatible-changes">Backwards
Incompatible Changes</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#deprecations">Deprecations</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#new-features">New
Features</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#improvements">Improvements</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#bug-fixes">Bug
fixes</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#performance">Performance</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#documentation">Documentation</a></li>
<li><a
href="https://github.com/pytorch/pytorch/blob/HEAD/#developers">Developers</a></li>
</ul>
<h1>Highlights</h1>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ba56102387"><code>ba56102</code></a>
Cherrypick: Add the RunLLM widget to the website (<a
href="https://redirect.github.com/pytorch/pytorch/issues/159592">#159592</a>)</li>
<li><a
href="c525a02c89"><code>c525a02</code></a>
[dynamo, docs] cherry pick torch.compile programming model docs into 2.8
(<a
href="https://redirect.github.com/pytorch/pytorch/issues/15">#15</a>...</li>
<li><a
href="a1cb3cc05d"><code>a1cb3cc</code></a>
[Release Only] Remove nvshmem from list of preload libraries (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158925">#158925</a>)</li>
<li><a
href="c76b2356bc"><code>c76b235</code></a>
Move out super large one off foreach_copy test (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158880">#158880</a>)</li>
<li><a
href="20a0e225a0"><code>20a0e22</code></a>
Revert &quot;[Dynamo] Allow inlining into AO quantization modules (<a
href="https://redirect.github.com/pytorch/pytorch/issues/152934">#152934</a>)&quot;
(<a
href="https://redirect.github.com/pytorch/pytorch/issues/158">#158</a>...</li>
<li><a
href="9167ac8c75"><code>9167ac8</code></a>
[MPS] Switch Cholesky decomp to column wise (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158237">#158237</a>)</li>
<li><a
href="5534685c62"><code>5534685</code></a>
[MPS] Reimplement <code>tri[ul]</code> as Metal shaders (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158867">#158867</a>)</li>
<li><a
href="d19e08d74b"><code>d19e08d</code></a>
Cherry pick PR 158746 (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158801">#158801</a>)</li>
<li><a
href="a6c044ab9a"><code>a6c044a</code></a>
[cherry-pick] Unify torch.tensor and torch.ops.aten.scalar_tensor
behavior (#...</li>
<li><a
href="620ebd0646"><code>620ebd0</code></a>
[Dynamo] Use proper sources for constructing dataclass defaults (<a
href="https://redirect.github.com/pytorch/pytorch/issues/158689">#158689</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pytorch/pytorch/compare/v2.7.1...v2.8.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torch&package-manager=uv&previous-version=2.7.1&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-12 08:44:24 -07:00
Matthew Farrellee
b70e2f1f09
fix(dep): update to openai >= 1.99.6 and use new Function location (#3087)
# What does this PR do?

closes #3072 

## Test Plan

ci
2025-08-12 08:40:32 -07:00
Mustafa Elbehery
4a13ef45e9
fix: Implement missing run_moderation method in PromptGuardSafetyImpl (#3101)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR addresses an issue where `PromptGuardSafetyImpl` was an
incomplete implementation of an abstract class. The class was missing
the required run_moderation method from its parent interface.


Currently, running `pre-commit` locally fails with the error below.

```
llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation"  [abstract]
Found 1 error in 1 file (checked 410 source files)
```

This PR fixes the issue as follows

- Added the missing run_moderation method to PromptGuardSafetyImpl
- Method raises NotImplementedError with appropriate message indicating
this functionality is not implemented for PromptGuard
- This allows the class to be properly instantiated while clearly
indicating the limitation

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-12 08:32:52 -07:00
Nathan Weinberg
19123ca957
refactor: standardize InferenceRouter model handling (#2965)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m19s
2025-08-12 04:20:39 -06:00
Ashwin Bharambe
803114180b
chore(logging)!: use comma as a delimiter (#3095)
Some checks failed
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Integration Tests (Replay) / discover-tests (push) Successful in 31s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 30s
Python Package Build Test / build (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 32s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m24s
Test Llama Stack Build / build (push) Failing after 54s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 13s
Using commas is much more shell-friendly. A semi-colon is a statement
delimiter and must be escaped.

This change is backwards incompatible but I imagine not many people are
using this. I could be wrong. Looking for feedback.
2025-08-11 11:51:43 -07:00
Francisco Arceo
f7adf58b1b
docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093)
# What does this PR do?

- Adds documentation on how to contribute a Vector DB provider.
- Updates the testing section to be a little friendlier to navigate.
- Also added new shortcut for search so that `/` and `⌘ K` or `ctrl+K`
trigger search


<img width="1903" height="1346" alt="Screenshot 2025-08-11 at 10 10
12 AM"
src="https://github.com/user-attachments/assets/6995b3b8-a2ab-4200-be72-c5b03a784a29"
/>

<img width="1915" height="1438" alt="Screenshot 2025-08-11 at 10 10
25 AM"
src="https://github.com/user-attachments/assets/1f54d30e-5be1-4f27-b1e9-3c3537dcb8e9"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-11 11:11:09 -07:00
Mustafa Elbehery
b5b5f5b9ae
chore: add mypy prompt guard (#2678)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds static type coverage to `llama-stack`

Part of https://github.com/meta-llama/llama-stack/issues/2647

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-11 08:40:40 -07:00
Francisco Arceo
7448a4a88c
chore: Updating UI Sidebar (#3081)
# What does this PR do?
This updates the sidebar to look a little more like other popular ones.

<img width="1913" height="1352" alt="Screenshot 2025-08-08 at 11 25
31 PM"
src="https://github.com/user-attachments/assets/00738412-1101-48ec-8864-cde4a8733ec1"
/>

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-11 07:39:52 -07:00
Matthew Farrellee
8faff92591
chore: remove redundant code in unregister_toolgroup (#3092)
# What does this PR do?

removes redundant code

## Test Plan

ci
2025-08-11 07:38:54 -07:00
Eran Cohen
a4bad6c0b4
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do?
- Add new Vertex AI remote inference provider with litellm integration
- Support for Gemini models through Google Cloud Vertex AI platform
- Uses Google Cloud Application Default Credentials (ADC) for
authentication
- Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro,
gemini-2.0-flash.
- Updated provider registry to include vertexai provider
- Updated starter template to support Vertex AI configuration
- Added comprehensive documentation and sample configuration

<!-- If resolving an issue, uncomment and update the line below -->
relates to https://github.com/meta-llama/llama-stack/issues/2747

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Eran Cohen <eranco@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-11 08:22:04 -04:00
Francisco Arceo
78a59a4dbe
chore: Adding GitHub Stars, trends, and contributor shout out to README (#3079)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / discover-tests (push) Successful in 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 50s
Unit Tests / unit-tests (3.13) (push) Failing after 48s
Pre-commit / pre-commit (push) Successful in 1m54s
# What does this PR do?

Updates READMe to add 
1. GitHub badge highlighting Llama Stack as #1 Repo of the Day
2. GitHub Star History (cumulative stars chart)
3. Contributor shout out

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-10 21:11:14 -04:00
Varsha
69dc789e15
docs: Add unsupported search mode info about FAISS (#3089) 2025-08-10 17:34:34 -06:00
Varsha
ce72a28525
docs: Update doc on search modes for Milvus (#3078)
# What does this PR do?
Update Milvus doc on using search modes. 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-10 18:48:36 -04:00
Vlastimil Eliáš
1677d6bffd
feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 59s
Pre-commit / pre-commit (push) Successful in 1m41s
PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider

Closes #3046 

## Test Plan
I was not able to locate any existing test for this provider, so I
performed manual testing. But the change is really trivial and
straightforward.
2025-08-08 13:48:15 -07:00
ehhuang
0b5a794c27
fix: telemetry logger spams when queue is full (#3070)
# What does this PR do?


## Test Plan
Ran a stress test on chat completion endpoint locally:

For 10 concurrent users over 3 minutes:
Before:
<img width="1440" height="201" alt="image"
src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6"
/>

After:
<img width="1434" height="204" alt="image"
src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866"
/>

(Will send scripts in a future PR.)
2025-08-08 13:47:36 -07:00
Francisco Arceo
9b70bb9d4b
feat(ui): Adding Vector Store Files to Admin UI (#3041)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 57s
Unit Tests / unit-tests (3.12) (push) Failing after 55s
Pre-commit / pre-commit (push) Successful in 2m10s
# What does this PR do?
This PR updates the UI to create new:
1. `/files/{file_id}` 
2. `files/{file_id}/contents`
3. `files/{file_id}/contents/{content_id}` 

The list of files are clickable which brings the user to the FIles
Detail page
The File Details page shows all of the content
The content details page shows the individual chunk/content parsed 

These only use our existing OpenAI compatible APIs. I have a separate
branch where I expose the embedding and the portal is correctly
populated. I included the FE rendering code for that in this PR.

1. `vector-stores/{vector_store_id}/files/{file_id}` 
<img width="1913" height="1351" alt="Screenshot 2025-08-06 at 10 20
12 PM"
src="https://github.com/user-attachments/assets/08010d5e-60c8-4bd9-9f3e-a2731ed1ad55"
/>

2. `vector-stores/{vector_store_id}/files/{file_id}/contents`
<img width="1920" height="1272" alt="Screenshot 2025-08-06 at 10 21
23 PM"
src="https://github.com/user-attachments/assets/3b91e67b-5d64-4fe6-91b6-18f14587e850"
/>

3.
`vector-stores/{vector_store_id}/files/{file_id}/contents/{content_id}`
<img width="1916" height="1273" alt="Screenshot 2025-08-06 at 10 21
45 PM"
src="https://github.com/user-attachments/assets/d38ca996-e8d9-460c-9e39-7ff0cb5ec0dd"
/>

## Test Plan
I tested this locally and reviewed the code. I generated a significant
share of the code with Claude and some manual intervention. After this,
I'll begin adding tests to the UI.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-08 07:44:06 -07:00
Jiayi Ni
9e78f2da96
docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s
# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A
2025-08-08 11:27:55 +02:00
Ashwin Bharambe
e90fe25890
fix(tests): move llama stack client init back to fixture (#3071)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 50s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 54s
Unit Tests / unit-tests (3.13) (push) Failing after 47s
Pre-commit / pre-commit (push) Successful in 1m44s
See inline comments
2025-08-07 15:29:53 -07:00
Ashwin Bharambe
5f1ddd35e4
chore(tests): refactor and move responses tests away from verifications (#3068)
This PR kills the verifications infrastructure which is no longer used.
It was relocated to the `llama-stack-evals`
(https://github.com/meta-llama/llama-stack-evals) repository previously.

Responses tests used this infrastructure but that wasn't quite
necessary, just a little useful back when @bbrownin introduced the
tests. On Discord, we agreed that tests can be moved to our regular
integrations test infra.

## Test Plan

Some tests currently do fail (although they run!) I will send a
follow-up PR which makes them all pass.
2025-08-07 13:48:16 -07:00
Dean Wampler
342550c1e2
docs: Added comment about a known limitation of AgentEventLogger (#2930)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 7s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Pre-commit / pre-commit (push) Successful in 1m11s
# What does this PR do?
`AgentEventLogger` only supports streaming responses, so I suggest
adding a comment near the bottom of `demo_script.py` letting the user
know this, e.g., if they change the `stream` value to `False` in the
call to `create_turn`, they need to comment out the logging lines.

See https://github.com/llamastack/llama-stack-client-python/issues/15 

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Dean Wampler <dean.wampler@ibm.com>
2025-08-07 10:09:57 -07:00
Varsha
e3928e6a29
feat: Implement hybrid search in Milvus (#2644)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
   
    To test:
    ```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
    ```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-07 09:42:03 +02:00
Nathan Weinberg
5a2d323eca
docs: add use of custom exceptions to code style guide (#3049)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s
Integration Tests (Replay) / discover-tests (push) Successful in 18s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s
Unit Tests / unit-tests (3.12) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 1m3s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m5s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 48s
Unit Tests / unit-tests (3.13) (push) Failing after 1m0s
Pre-commit / pre-commit (push) Successful in 1m55s
# What does this PR do?
Adds a blurb to the `CONTRIBUTING.md` encouraging the use of the
standardized custom exception classes for resources where applicable

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-06 14:12:08 -07:00
slekkala1
26d3d25c87
feat: Add moderations create api (#3020)
# What does this PR do?
This PR adds Open AI Compatible moderations api. Currently only
implementing for llama guard safety provider
Image support, expand to other safety providers and Deprecation of
run_shield will be next steps.


## Test Plan
Added 2 new tests for safe/ unsafe text prompt examples for the new open
ai compatible moderations api usage
`SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest
-v tests/integration/safety/test_safety.py
--text-model=llama3.2:3b-instruct-fp16
--embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
(Had some issue with previous PR
https://github.com/meta-llama/llama-stack/pull/2994 while updating and
accidentally close it , reopened new one )
2025-08-06 13:51:23 -07:00
Charlie Doern
0caef40e0d
fix: telemetry fixes (inference and core telemetry) (#2733)
# What does this PR do?

I found a few issues while adding new metrics for various APIs:

currently metrics are only propagated in `chat_completion` and
`completion`

since most providers use the `openai_..` routes as the default in
`llama-stack-client inference chat-completion`, metrics are currently
not working as expected.

in order to get them working the following had to be done:

1. get the completion as usual
2. use new `openai_` versions of the metric gathering functions which
use `.usage` from the `OpenAI..` response types to gather the metrics
which are already populated.
3. define a `stream_generator` which counts the tokens and computes the
metrics (only for stream=True)
5. add metrics to response


NOTE: I could not add metrics to `openai_completion` where stream=True
because that ONLY returns an `OpenAICompletion` not an AsyncGenerator
that we can manipulate.


acquire the lock, and add event to the span as the other `_log_...`
methods do

some new output:

`llama-stack-client inference chat-completion --message hi`

<img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM"
src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326"
/>


and in the client:

<img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM"
src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007"
/>

these were not previously being recorded nor were they being printed to
the server due to the improper console sink handling

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-06 13:37:40 -07:00
Ashwin Bharambe
c252dfa3ef
fix(ci): allow tests to skip llama stack client instantiation (#3052)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Pre-commit / pre-commit (push) Successful in 1m16s
Test Llama Stack Build / build (push) Failing after 8s
2025-08-06 11:15:41 -07:00
IAN MILLER
8ba04205ac
docs: remove pure venv references (#3047)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Remove pure venv (without uv) references in docs

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-06 10:42:34 -07:00
Nathan Weinberg
e9fced773a
refactor: introduce common 'ResourceNotFoundError' exception (#3032)
# What does this PR do?
1. Introduce new base custom exception class `ResourceNotFoundError`
2. All other "not found" exception classes now inherit from
`ResourceNotFoundError`

Closes #3030

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-06 10:22:55 -07:00
Ashwin Bharambe
dfce05d0c5
fix(docs): update llama stack build CLI doc (#3050) 2025-08-06 09:32:09 -07:00
ehhuang
3e695cf320
chore: update postgres_demo with new config (#3045)
# What does this PR do?

closes https://github.com/meta-llama/llama-stack/issues/3044

## Test Plan
matches starter's template
2025-08-06 07:48:40 -07:00
Mohamed Rebai
7eff1bb3ec
ci(pre-commit): enforce presence of 'upload-time' field in uv.lock (#2920)
# What does this PR do?
This PR adds a minimum version `0.7.0` to the project. The diff issue
happens because an `upload-time` field in the `uv.lock` file did not
exist in older uv versions (pre `0.6.15`). This effectively prevents
large diffs in PRs from devs that use older versions of uv.

Closes #2887

---------

Co-authored-by: Charlie Doern <charlie@doern.me>
2025-08-06 07:46:59 -07:00
Ashwin Bharambe
7f834339ba
chore(misc): make tests and starter faster (#3042)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s
Pre-commit / pre-commit (push) Successful in 1m53s
A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.

- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
2025-08-05 14:55:05 -07:00
IAN MILLER
e12524af85
feat: create unregister shield API endpoint in Llama Stack (#2853)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Integration Tests (Replay) / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m21s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

Extend the Shields Protocol and implement the capability to unregister
previously registered shields and CLI for shields management.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2581 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

First of, test API for shields
1. Install and start Ollama:

`ollama serve`


2. Pull Llama Guard Model in Ollama:

`ollama pull llama-guard3:8b`

3. Configure env variables:

```
export ENABLE_OLLAMA=ollama
export OLLAMA_URL=http://localhost:11434
```

4. Build Llama Stack distro:

`llama stack build --template starter --image-type venv  `

5. Start Llama Stack server:

`llama stack run starter --port 8321`

6. Check if Ollama model is available:

`curl -X GET http://localhost:8321/v1/models | jq '.data[] |
select(.provider_id=="ollama")'`

7. Register a new Shield using Ollama provider:

```
curl -X POST http://localhost:8321/v1/shields \
 -H "Content-Type: application/json" \
 -d '{
   "shield_id": "test-shield",
   "provider_id": "llama-guard",
   "provider_shield_id": "ollama/llama-guard3:8b",
   "params": {}
 }'
```

`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

8. Check if shield was registered:

`curl -X GET http://localhost:8321/v1/shields/test-shield`


`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`

9. Run shield:

```
curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "test-shield",
    "messages": [
      {
        "role": "user",
        "content": "How can I hack into someone computer?"
      }
    ],
    "params": {}
  }'
```

`{"violation":{"violation_level":"error","user_message":"I can't answer
that. Can I help with something
else?","metadata":{"violation_type":"S2"}}}% `

10. Unregister shield:

`curl -X DELETE http://localhost:8321/v1/shields/test-shield`

`null% `

11. Verify shield was deleted:

`curl -X GET http://localhost:8321/v1/shields/test-shield`

`{"detail":"Invalid value: Shield 'test-shield' not found"}%`

All tests passed 

```
========================================================================== 430 passed, 194 warnings in 19.54s ==========================================================================
/Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
  loop.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Wrote HTML report to htmlcov-3.12/index.html

```
2025-08-05 07:33:46 -07:00
github-actions[bot]
e565b91182 build: Bump version to 0.2.17
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 7s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m38s
2025-08-05 01:43:30 +00:00
Ashwin Bharambe
ea46f74092 fix: rectify typo in MANIFEST.in due to #2975 2025-08-04 18:22:49 -07:00
ehhuang
bb6b6041d6
chore: fix: integration tests failures marked as successful (#3039) 2025-08-04 17:06:28 -07:00
Francisco Arceo
eac1e0c7d4
chore: Fixing Markdown renderer (#3038) 2025-08-04 14:16:09 -07:00
Nathan Weinberg
68b0071861
chore: standardize session not found error (#3031)
# What does this PR do?
1. Creates a new `SessionNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-04 13:12:02 -07:00
Nathan Weinberg
05cfa213b6
chore: standardize tool group not found error (#2986)
# What does this PR do?
1. Creates a new `ToolGroupNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-08-04 11:41:33 -07:00
dependabot[bot]
55a2694c80
chore(python-deps): bump openai from 1.97.1 to 1.98.0 (#3025)
Bumps [openai](https://github.com/openai/openai-python) from 1.97.1 to
1.98.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/releases">openai's
releases</a>.</em></p>
<blockquote>
<h2>v1.98.0</h2>
<h2>1.98.0 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.2...v1.98.0">v1.97.2...v1.98.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> manual updates (<a
href="88a8036c5e">88a8036</a>)</li>
</ul>
<h2>v1.97.2</h2>
<h2>1.97.2 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.1...v1.97.2">v1.97.1...v1.97.2</a></p>
<h3>Chores</h3>
<ul>
<li><strong>client:</strong> refactor streaming slightly to better
future proof it (<a
href="71c0c74713">71c0c74</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="29c22c90fd">29c22c9</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's
changelog</a>.</em></p>
<blockquote>
<h2>1.98.0 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.2...v1.98.0">v1.97.2...v1.98.0</a></p>
<h3>Features</h3>
<ul>
<li><strong>api:</strong> manual updates (<a
href="88a8036c5e">88a8036</a>)</li>
</ul>
<h2>1.97.2 (2025-07-30)</h2>
<p>Full Changelog: <a
href="https://github.com/openai/openai-python/compare/v1.97.1...v1.97.2">v1.97.1...v1.97.2</a></p>
<h3>Chores</h3>
<ul>
<li><strong>client:</strong> refactor streaming slightly to better
future proof it (<a
href="71c0c74713">71c0c74</a>)</li>
<li><strong>project:</strong> add settings file for vscode (<a
href="29c22c90fd">29c22c9</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a3315d9fcc"><code>a3315d9</code></a>
release: 1.98.0 (<a
href="https://redirect.github.com/openai/openai-python/issues/2503">#2503</a>)</li>
<li><a
href="48188cc8d5"><code>48188cc</code></a>
release: 1.97.2 (<a
href="https://redirect.github.com/openai/openai-python/issues/2494">#2494</a>)</li>
<li>See full diff in <a
href="https://github.com/openai/openai-python/compare/v1.97.1...v1.98.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.97.1&new-version=1.98.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-04 11:40:56 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
dependabot[bot]
12f964437a
chore(python-deps): bump opentelemetry-exporter-otlp-proto-http from 1.35.0 to 1.36.0 (#3027)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Test Llama Stack Build / build-single-provider (push) Failing after 19s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 28s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 34s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Has started running
Test Llama Stack Build / build (push) Failing after 12s
Pre-commit / pre-commit (push) Successful in 1m46s
Bumps
[opentelemetry-exporter-otlp-proto-http](https://github.com/open-telemetry/opentelemetry-python)
from 1.35.0 to 1.36.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/open-telemetry/opentelemetry-python/blob/main/CHANGELOG.md">opentelemetry-exporter-otlp-proto-http's
changelog</a>.</em></p>
<blockquote>
<h2>Version 1.36.0/0.57b0 (2025-07-29)</h2>
<ul>
<li>
<p>Add missing Prometheus exporter documentation
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4485">#4485</a>)</p>
</li>
<li>
<p>Overwrite logging.config.fileConfig and logging.config.dictConfig to
ensure
the OTLP <code>LogHandler</code> remains attached to the root logger.
Fix a bug that
can cause a deadlock to occur over <code>logging._lock</code> in some
cases (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4636">#4636</a>).</p>
</li>
<li>
<p>otlp-http-exporter: set default value for param
<code>timeout_sec</code> in <code>_export</code> method
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4691">#4691</a>)</p>
</li>
<li>
<p>Update OTLP gRPC/HTTP exporters: calling shutdown will now interrupt
exporters that are sleeping
before a retry attempt, and cause them to return failure immediately.
Update BatchSpan/LogRecordProcessors: shutdown will now complete after
30 seconds of trying to finish
exporting any buffered telemetry, instead of continuing to export until
all telemetry was exported.
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4638">#4638</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1aaa2a2587"><code>1aaa2a2</code></a>
Prepare release 1.36.0/0.57b0 (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4704">#4704</a>)</li>
<li><a
href="f9ca4755af"><code>f9ca475</code></a>
Use <code>@pytest.mark.flaky</code> decorator instead of
<code>@flaky.flaky</code> (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4700">#4700</a>)</li>
<li><a
href="eb1a4c574c"><code>eb1a4c5</code></a>
otlp-http-exporter: set default value for param <code>timeout_sec</code>
in <code>_export</code> me...</li>
<li><a
href="23aad5e4ad"><code>23aad5e</code></a>
Add permissions that were missed on the first pass (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4692">#4692</a>)</li>
<li><a
href="344c647774"><code>344c647</code></a>
Add minimum token permissions for all github workflow files (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4663">#4663</a>)</li>
<li><a
href="ff9dc82d3a"><code>ff9dc82</code></a>
Migrate from opentelemetrybot to otelbot (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4685">#4685</a>)</li>
<li><a
href="d4e606846e"><code>d4e6068</code></a>
Interrupt exporter retry backoff sleeps when shutdown is called. Update
Batch...</li>
<li><a
href="a28b0cadce"><code>a28b0ca</code></a>
Fix broken link in Prometheus exporter README. Fixes <a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4399">#4399</a>
(<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4485">#4485</a>)</li>
<li><a
href="9746645818"><code>9746645</code></a>
Introducing tox-uv (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4516">#4516</a>)</li>
<li><a
href="57cb935e88"><code>57cb935</code></a>
Fix issue where deadlock can occur over logging._lock (<a
href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4636">#4636</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/open-telemetry/opentelemetry-python/compare/v1.35.0...v1.36.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=opentelemetry-exporter-otlp-proto-http&package-manager=uv&previous-version=1.35.0&new-version=1.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-04 09:37:58 -07:00
dependabot[bot]
48b49e318f
chore(python-deps): bump weaviate-client from 4.16.4 to 4.16.5 (#3026)
[//]: # (dependabot-start)
⚠️  **Dependabot is rebasing this PR** ⚠️ 

Rebasing might not happen immediately, so don't worry if this takes some
time.

Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.

---

[//]: # (dependabot-end)

Bumps
[weaviate-client](https://github.com/weaviate/weaviate-python-client)
from 4.16.4 to 4.16.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's
releases</a>.</em></p>
<blockquote>
<h2>v3.13.0 - Support for Weaviate v1.18</h2>
<h2>What's Changed</h2>
<ul>
<li>Extend CRUD operations for single data objects and reference with
consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
<li>Extend batch operations with consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/240">weaviate/weaviate-python-client#240</a></li>
<li>Add Cursor api by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/241">weaviate/weaviate-python-client#241</a></li>
<li>Add support for backup Azure module by <a
href="https://github.com/antas-marcin"><code>@​antas-marcin</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/246">weaviate/weaviate-python-client#246</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
made their first contribution in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
<li><a
href="https://github.com/antas-marcin"><code>@​antas-marcin</code></a>
made their first contribution in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/246">weaviate/weaviate-python-client#246</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.13.0">https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.13.0</a></p>
<h2>v3.12.1b - Support for weaviate v1.18</h2>
<h2>What's Changed</h2>
<ul>
<li>Extend CRUD operations for single data objects and reference with
consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
<li>Extend batch operations with consistency level by <a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/240">weaviate/weaviate-python-client#240</a></li>
<li>Add Cursor api by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/241">weaviate/weaviate-python-client#241</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/redouan-rhazouani"><code>@​redouan-rhazouani</code></a>
made their first contribution in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.12.1b">https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.12.1b</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's
changelog</a>.</em></p>
<blockquote>
<h2>Version 4.16.5</h2>
<p>This patch version includes:
- Add <code>dimensions</code> property to Google vectorizers in
<code>Configure.Vectors</code></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="731cbf0b9a"><code>731cbf0</code></a>
Update changelog (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1768">#1768</a>)</li>
<li><a
href="2627bf39c1"><code>2627bf3</code></a>
Bump ruff from 0.12.4 to 0.12.5 (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1761">#1761</a>)</li>
<li><a
href="401a1e2ff0"><code>401a1e2</code></a>
Bump coverage from 7.9.2 to 7.10.1 (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1760">#1760</a>)</li>
<li><a
href="44aef22189"><code>44aef22</code></a>
Bump authlib from 1.6.0 to 1.6.1 (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1749">#1749</a>)</li>
<li><a
href="dca002e39e"><code>dca002e</code></a>
Add <code>dimensions</code> property to Google vectorizers in
<code>Configure.Vectors</code> (<a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1767">#1767</a>)</li>
<li>See full diff in <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.4...v4.16.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.4&new-version=4.16.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-04 09:37:31 -07:00
Matthew Farrellee
4411e6e362
chore(ci): remove reportlab dep (#3033)
# What does this PR do?

remove reportlab dep. change dynamic pdf generation into a pre-computed
pdf.

## Test Plan

ci
2025-08-04 09:36:13 -07:00
Eran Cohen
e5b542dd8e
feat: switch to async completion in LiteLLM OpenAI mixin (#3029)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 17s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Test External API and Providers / test-external (venv) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Pre-commit / pre-commit (push) Successful in 1m10s
2025-08-03 12:08:56 -07:00
Varsha
dbfc15123e
test: Implement vector store search test (#3001)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s
Test Llama Stack Build / build-single-provider (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 7s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 45s
Update ReadTheDocs / update-readthedocs (push) Failing after 35s
Pre-commit / pre-commit (push) Successful in 1m30s
# What does this PR do?
Implement vector store search test

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
```
pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v
```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-02 15:57:38 -07:00
Varsha
3c2aee610d
refactor: Remove double filtering based on score threshold (#3019)
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`

Closes: https://github.com/meta-llama/llama-stack/issues/3018

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:57:03 -07:00
ehhuang
1e3b5aa9b8
chore: CI action names (#3014)
# What does this PR do?


## Test Plan

CI
<img width="795" height="162" alt="image"
src="https://github.com/user-attachments/assets/78dedfa6-809c-4d82-9eb3-6479234dd657"
/>
2025-08-02 15:56:42 -07:00
dependabot[bot]
edc19698fb
chore(python-deps): bump huggingface-hub from 0.34.2 to 0.34.3 (#3028)
Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub)
from 0.34.2 to 0.34.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's
releases</a>.</em></p>
<blockquote>
<h2>[v0.34.3] Jobs improvements and <code>whoami</code> user prefix</h2>
<ul>
<li>[Jobs] Update uv image <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3270">#3270</a>
by <a href="https://github.com/lhoestq"><code>@​lhoestq</code></a></li>
<li>[Update] HF Jobs Documentation <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3268">#3268</a>
by <a
href="https://github.com/ariG23498"><code>@​ariG23498</code></a></li>
<li>Add 'user:' prefix to whoami command output <a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3267">#3267</a>
by <a href="https://github.com/gary149"><code>@​gary149</code></a></li>
</ul>
<p>Full Changelog: <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3">https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0bbc5e1b10"><code>0bbc5e1</code></a>
Release: v0.34.3</li>
<li><a
href="f464fc15f3"><code>f464fc1</code></a>
update uv image (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3270">#3270</a>)</li>
<li><a
href="24c77eb319"><code>24c77eb</code></a>
[Update] HF Jobs Documentation (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3268">#3268</a>)</li>
<li><a
href="977c018e3d"><code>977c018</code></a>
Add 'user:' prefix to whoami command output for consistency (<a
href="https://redirect.github.com/huggingface/huggingface_hub/issues/3267">#3267</a>)</li>
<li>See full diff in <a
href="https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=huggingface-hub&package-manager=uv&previous-version=0.34.2&new-version=0.34.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-02 15:53:46 -07:00
IAN MILLER
a749d5f4a4
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2539

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:52:59 -07:00
ehhuang
f2eee4e417
chore: create integration-tests script (#3016)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s
Python Package Build Test / build (3.13) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 19s
Unit Tests / unit-tests (3.13) (push) Failing after 23s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 36s
Unit Tests / unit-tests (3.12) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 40s
Python Package Build Test / build (3.12) (push) Failing after 33s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 43s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s
Pre-commit / pre-commit (push) Successful in 2m22s
2025-08-01 17:38:49 -07:00
ehhuang
6ac710f3b0
fix(recording): endpoint resolution (#3013)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Unit Tests / unit-tests (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 56s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s
Pre-commit / pre-commit (push) Successful in 1m49s
# What does this PR do?


## Test Plan
2025-08-01 16:23:54 -07:00
Matthew Farrellee
140ee7d337
fix: sambanova inference provider (#2996)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

closes #2995 

update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin

## Test Plan

```
$ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct
...
======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ========================
```
2025-08-01 09:09:14 -07:00
Francisco Arceo
0527c0fb15
chore: Update README for supported DBs (#3005)
# What does this PR do?
Update README for supported DBs

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-01 08:23:36 -07:00
Varsha
1f0766308d
feat: Add openAI compatible APIs to Qdrant (#2465)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Integration Tests (Replay) / discover-tests (push) Successful in 24s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s
Test Llama Stack Build / build (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.

<!-- If resolving an issue, uncomment and update the line below -->
 Closes #2463 


## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-08-01 00:41:34 -04:00
ehhuang
194abe7734
test: use llama stack build when starting server (#2999)
# What does this PR do?
This should be more robust as sometimes its run without running build
first.

## Test Plan
OLLAMA_URL=http://localhost:11434 LLAMA_STACK_TEST_INFERENCE_MODE=replay
LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings
LLAMA_STACK_CONFIG=server:starter uv run --with pytest-repeat pytest
tests/integration/telemetry
--text-model="ollama/llama3.2:3b-instruct-fp16" -vvs
2025-07-31 21:09:14 -07:00
Ashwin Bharambe
0b08d64ddb
feat(ci): introduce workflow for re-recording inference outputs (#3002) 2025-07-31 17:30:47 -07:00
Francisco Arceo
33cca26154
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do?

This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.

This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.

## Test Plan
CI enabled

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-31 20:29:50 -04:00
Ashwin Bharambe
369286f95b fix(ci): syntax error in the disabled workflow
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s
Python Package Build Test / build (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Python Package Build Test / build (3.13) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s
Test External API and Providers / test-external (venv) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s
Update ReadTheDocs / update-readthedocs (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 23s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 45s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m3s
2025-07-31 15:35:42 -07:00
Ashwin Bharambe
89ff93182c
feat(ci): only run on 3.12, run on both 3.12 and 3.13 nightly (#3000)
We don't need to run on all python versions all the time
2025-07-31 15:32:05 -07:00
Ashwin Bharambe
f4489eeb83
fix(ci): simplify integration tests replay mode (#2997)
We are going to split record and replay workflows completely to simplify
the concurrency key design.

We can add vision tests by just adding to our matrix.
2025-07-31 15:18:18 -07:00
Matthew Farrellee
218c89fff1
feat: Add clear error message when API key is missing (#2992)
# What does this PR do?

Improve user experience by providing specific guidance when no API key
is available, showing both provider data header and config options with
the correct field name for each provider.

Also adds comprehensive test coverage for API key resolution scenarios.

addresses #2990 for providers using litellm openai mixin

## Test Plan

`./scripts/unit-tests.sh
tests/unit/providers/inference/test_litellm_openai_mixin.py`
2025-07-31 16:33:16 -04:00
Ashwin Bharambe
22f79bdb9e fix(ci): lets attempt another fix for concurrency 2025-07-31 13:22:24 -07:00
Ashwin Bharambe
18576349ca fix(ci): simplified concurrency and job eligibility criteria 2025-07-31 13:11:04 -07:00
Ashwin Bharambe
d1b300ead9 fix(ci, nvidia): do not use module level pytest skip for now 2025-07-31 12:32:31 -07:00
Ashwin Bharambe
752fd3b1c1 fix(ci): use single quotes please 2025-07-31 11:56:25 -07:00
Ashwin Bharambe
5ba25efd54 fix(ci): ensure workflow runs when manually run or scheduled 2025-07-31 11:54:51 -07:00
Ashwin Bharambe
27d866795c
feat(ci): add support for running vision inference tests (#2972)
This PR significantly refactors the Integration Tests workflow. The main
goal behind the PR was to enable recording of vision tests which were
never run as part of our CI ever before. During debugging, I ended up
making several other changes refactoring and hopefully increasing the
robustness of the workflow.

After doing the experiments, I have updated the trigger event to be
`pull_request_target` so this workflow can get write permissions by
default but it will run with source code from the base (main) branch in
the source repository only. If you do change the workflow, you'd need to
experiment using the `workflow_dispatch` triggers. This should not be
news to anyone using Github Actions (except me!)

It is likely to be a little rocky though while I learn more about GitHub
Actions, etc. Please be patient :)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-31 11:50:42 -07:00
Charlie Doern
709c974bd8
fix: integration tests not triggering on PR open (#2985)
# What does this PR do?

I realized that when a new PR is opened, the integration tests aren't
triggering (or aren't always?) since the replay logic was introduced

amend the concurrency logic a bit to trigger  on opened PRs

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-31 11:36:44 -07:00
Nehanth Narendrula
b41d696e4f
fix: Post Training Model change in Tests in order to make it less intensive (#2991)
# What does this PR do?

Changed from` ibm-granite/granite-3.3-2b-instruct` to`
HuggingFaceTB/SmolLM2-135M-Instruct` so it as not resource intensive in
CI

Idea came from -
https://github.com/meta-llama/llama-stack/pull/2984#issuecomment-3140400830
2025-07-31 11:22:34 -07:00
Nathan Weinberg
ffb6306fbd
fix: remove redundant code from unregister_vector_db (#2983)
get_vector_db() will raise an exception if a vector store won't be
returned

client handling is redundant

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-31 09:22:04 -07:00
Christian Zaccaria
ea8dd58144
chore: Remove coverage badge from README.md (#2976)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
It looks like the coverage badge is still present in the README. This PR
removes it.

For more context: https://github.com/meta-llama/llama-stack/pull/2950
2025-07-31 09:21:30 -07:00
Kelly Brown
8a6c0fb930
docs: Reformat external provider documentation (#2982)
**Description** 
This PR adjusts the external providers documentation to align with the
new providers format. Splits up sections into the existing external
providers and how to create them as well.

<img width="1049" height="478" alt="Screenshot 2025-07-31 at 9 48 26 AM"
src="https://github.com/user-attachments/assets/f13599cb-2fd1-4e57-8ca9-27b067264e33"
/>

Open to feedback and adjusting titles
2025-07-31 09:21:13 -07:00
Nehanth Narendrula
3a574ef23c
fix: remove unused DPO parameters from schema and tests (#2988)
# What does this PR do?

I removed these DPO parameters from the schema in [this
PR](https://github.com/meta-llama/llama-stack/pull/2804), but I may not
have done it correctly, since they were reintroduced in [this
commit](cb7354a9ce (diff-4e9a8cb358213d6118c4b6ec2a76d0367af06441bf0717e13a775ade75e2061dR15081))—likely
due to a pre-commit hook.

I've made the changes again, and the pre-commit hook automatically
updated the spec sheet.
2025-07-31 09:11:08 -07:00
Charlie Doern
5c33bc1353
fix: post_training ci (#2984)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s
Integration Tests / record-tests (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 28s
Integration Tests / run-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 29s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 45s
Pre-commit / pre-commit (push) Successful in 1m30s
2025-07-31 08:26:06 -07:00
Nehanth Narendrula
cf73146132
feat: Enable DPO training with HuggingFace inline provider (#2825)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Python Package Build Test / build (3.13) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 19s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s
Test External API and Providers / test-external (venv) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 31s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s
Unit Tests / unit-tests (3.12) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s
Test Llama Stack Build / build (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s
Pre-commit / pre-commit (push) Successful in 1m50s
What does this PR do?

This PR adds support for Direct Preference Optimization (DPO) training
via the existing HuggingFace inline provider. It introduces a new DPO
training recipe, config schema updates, dataset integration, and
end-to-end testing to support preference-based fine-tuning with TRL.

Test Plan

Added integration test:

tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize

Ran tests on both CPU and CUDA environments

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 23:33:36 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Francisco Arceo
f3d5459647
feat(UI): adding MVP playground UI (#2828)
# What does this PR do?
I've been tinkering a little with a simple chat playground in the UI, so
I'm opening the PR with what's kind of a WIP.

If you look at the first commit, that includes the big part of the
changes. The rest of the files changed come from adding installing the
`shadcn` components.

Note this is missing a lot; e.g.,
- sessions
- document upload
- audio (the shadcn components install these by default from
https://shadcn-chatbot-kit.vercel.app/docs/components/chat)

I still need to wire up a lot more to make it actually fully functional
but it does basic chat using the LS Typescript Client.

Basic demo: 

<img width="1329" height="1430" alt="Image"
src="https://github.com/user-attachments/assets/917a2096-36d4-4925-b83b-f1f2cda98698"
/>

<img width="1319" height="1424" alt="Image"
src="https://github.com/user-attachments/assets/fab1583b-1c72-4bf3-baf2-405aee13c6bb"
/>


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-30 19:44:16 -07:00
Ashwin Bharambe
d6ae2b0f47
fix(ci): more correct concurrency key for workflows (#2973)
See comment inline. We don't want a random label to pre-empt an existing
workflow which had gone ahead.
2025-07-30 18:23:14 -07:00
Nathan Weinberg
406ca72957
fix: remove redundant code from unregister_dataset (#2971)
Some checks failed
Integration Tests / discover-tests (push) Has been skipped
Integration Tests / record-tests (push) Has been skipped
Integration Tests / run-tests (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 53s
Update ReadTheDocs / update-readthedocs (push) Failing after 52s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m0s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s
Pre-commit / pre-commit (push) Successful in 1m44s
get_dataset() will raise an exception if a dataset won't be returned

client handling is redundant

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 16:40:01 -07:00
Sai Prashanth S
cb7354a9ce
docs: Add detailed docstrings to API models and update OpenAPI spec (#2889)
This PR focuses on improving the developer experience by adding
comprehensive docstrings to the API data models across the Llama Stack.
These docstrings provide detailed explanations for each model and its
fields, making the API easier to understand and use.

**Key changes:**
- **Added Docstrings:** Added reST formatted docstrings to Pydantic
models in the `llama_stack/apis/` directory. This includes models for:
  - Agents (`agents.py`)
  - Benchmarks (`benchmarks.py`)
  - Datasets (`datasets.py`)
  - Inference (`inference.py`)
  - And many other API modules.
- **OpenAPI Spec Update:** Regenerated the OpenAPI specification
(`docs/_static/llama-stack-spec.yaml` and
`docs/_static/llama-stack-spec.html`) to include the new docstrings.
This will be reflected in the API documentation, providing richer
information to users.

**Impact:**
- Developers using the Llama Stack API will have a better understanding
of the data structures.
- The auto-generated API documentation is now more informative.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-07-30 16:32:59 -07:00
Nathan Weinberg
cd5c6a2fcd
chore: standardize vector store not found error (#2968)
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 15:19:16 -07:00
Nathan Weinberg
272a3e9937
chore: standardize dataset not found error (#2962)
# What does this PR do?
1. Adds a broad schema for custom exception classes in the Llama Stack
project
2. Creates a new `DatasetNotFoundError` class
3. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 14:52:46 -07:00
IAN MILLER
25d3dfa30f
fix: fix No module named 'ollama' in test_inference_recordings.py (#2967)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the following error in unit test that was running on up to
date main branch:
```
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_recording_mode - ModuleNotFoundError: No module named 'ollama'
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_replay_mode - ModuleNotFoundError: No module named 'ollama'
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_replay_missing_recording - ModuleNotFoundError: No module named 'ollama'
FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_embeddings_recording - ModuleNotFoundError: No module named 'ollama'
=============================== 4 failed, 499 passed, 198 warnings in 34.50s ================================
```


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run  `./scripts/unit-tests.sh`
2025-07-30 16:33:33 -04:00
Nathan Weinberg
c5622c79de
chore: standardize model not found error (#2964)
# What does this PR do?
1. Creates a new `ModelNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 12:19:53 -07:00
Ashwin Bharambe
266e2afb9c
fix(ci): slightly update workflow trigger (#2966)
We want to avoid re-triggering the workflow when random other labels are
added (e.g., `meta-cla`, etc.) Also no point restarting the workflow
when someone _unlabels_.
2025-07-30 12:04:13 -07:00
Kelly Brown
026caa5551
docs: part 1 - fix warnings in documentation generation (#2861)
**Description**
This PR removes some of the warnings when uv builds the docs
- Errors appear when generating docs about .md files not appearing in
toctree. ~~Adding content to the `providers-gen.py ` file that adds `---
orphan: true ---` to to each file.~~. Added a toctree generator to the
`providers-gen.py` file, this gets rid of the errors in the builds.
- Deletes the `_openai_compat` files, extension of PR #2849
- Adds the `files` APIs section to the `providers` toctree on the index
page
- Manually adds the `--- orphan: true ---` to the advanced apis. Ill try
to find a way to modify the providers code gen so it automatically adds
it, but this fixes the errors.
- Adds the `testing.md` to the `contributing` toctree
- Adds `starting_llama_stack_server.md` to `distributions` toctree

There are some other warnings im still looking at but this PR gets rid
of most of the toctree errors
Theres also an issue with the actual distribution-codegen that I can
investigate in another PR. Opened a bug for it here #2873
2025-07-30 10:50:10 -07:00
ehhuang
38d5c44354
chore: fix k8s config (#2959)
# What does this PR do?


## Test Plan
deployed to EKS
2025-07-30 10:11:59 -07:00
Ashwin Bharambe
fd2aaf4978
fix: use OLLAMA_URL to activate Ollama provider in starter (#2963)
We tried to always keep Ollama enabled. However doing so makes the
provider implementation half-assed -- should it error when it cannot
connect to Ollama or not? What happens during periodic model refresh?
Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to
conditionally enable the provider.

## Test Plan

Run `uv run llama stack build --template starter --image-type venv
--run` with and without `OLLAMA_URL` set. Verify using
`llama-stack-client provider list` that ollama is correctly enabled.
2025-07-30 10:11:17 -07:00
Matthew Farrellee
b69bafba30
fix(library_client): improve initialization error handling and prevent AttributeError (#2944)
# What does this PR do?

- Initialize route_impls to None in constructor to prevent
AttributeError
- Consolidate initialization checks to single point in request() method
- Improve error message to be more helpful ("Please call initialize()
first")
- Add comprehensive test suite to prevent regressions

The library client now has better error handling when users forget to
call initialize(), showing a clear ValueError instead of confusing
AttributeError. All initialization validation is now centralized in the
request() method, with internal methods (_call_non_streaming,
_call_streaming, _convert_body) relying on this single check for
cleaner, more maintainable code.

closes #2943 

## Test Plan

`./scripts/unit-tests.sh`
2025-07-30 11:58:47 -04:00
Ashwin Bharambe
9b69b6ac05 fix: pre-commit issue
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s
Integration Tests / discover-tests (push) Successful in 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 27s
Integration Tests / record-tests (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 29s
Unit Tests / unit-tests (3.13) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 33s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 34s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 33s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 37s
Unit Tests / unit-tests (3.12) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 36s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Integration Tests / run-tests (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 1m43s
2025-07-29 17:52:36 -07:00
Ashwin Bharambe
f6afb3c26b
feat(ci): keep only one re-recording job because independent recordings will conflict (#2956)
A couple of important updates:

- When recording tests, we cannot be generating a matrix because all the
independent recordings will conflict.
- In fact, we just don't need a matrix on test types any more because
things are very fast and the overhead of `llama stack build` and setting
up `uv` etc. is much more.
- Refactored the running of tests into an independent action
2025-07-29 17:48:04 -07:00
Ashwin Bharambe
b237df8f18
feat(ci): use replay mode, setup ollama if specific label exists on PR (#2955)
This PR makes setting up Ollama optional for CI. By default, we use
`replay` mode for inference requests and use the stored results from the
`tests/integration/recordings/` directory.

Every so often, users will update tests which will need us to re-record.
To do this, we check for the existence of a label `re-record-tests` on
the PR. If detected,
- ollama is spun up
- inference mode is set to record
- after the tests are done, if any new changes are detected, they are
pushed back to the PR

## Test Plan

This is GitHub CI. Gotta test it live.
2025-07-29 16:50:26 -07:00
Ashwin Bharambe
0ac503ec0d
feat(tests): record responses for evals and telemetry tests (#2954)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 10s
Test Llama Stack Build / build (push) Failing after 8s
Integration Tests / test-matrix (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s
Python Package Build Test / build (3.13) (push) Failing after 38s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s
Pre-commit / pre-commit (push) Successful in 2m2s
Continuing with https://github.com/meta-llama/llama-stack/pull/2952

This also includes a "fix" to inference store related tests so that we
pull a large number of inference responses from the DB so as to always
find the one we just wrote.
2025-07-29 15:46:21 -07:00
Ashwin Bharambe
81c7d6fa2e
chore(ci): disable post training tests (#2953)
Post training tests need _much_ better thinking before we can re-enable
them to be run on every single PR. Running periodically should be
approached only when it is shown that the tests are reliable and as
light-weight as can be; otherwise, it is just kicking the can down the
road.
2025-07-29 14:20:09 -07:00
Ashwin Bharambe
072d20a124
feat(test): record agents, safety and vector_io integration tests (#2952)
Continue to build on top of
https://github.com/meta-llama/llama-stack/pull/2941

## Test Plan

Run server with `LLAMA_STACK_TEST_INFERENCE_MODE=record` and then run
the integration tests with `--stack-config=server:starter`. Then restart
the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay` and re-run the
tests. Verify that no request hit Ollama at any point.
2025-07-29 14:02:14 -07:00
Matthew Farrellee
2d1ab3ca55
fix: use same image_name logic for build & run config (#2949)
# What does this PR do?

when --image-name is not provided the build script default to the
image_name in the config, this makes sure the same is done for the run
script

## Test Plan

llama stack build w/o --image-name
2025-07-29 12:54:21 -07:00
Francisco Arceo
6ac973ec80
chore: Delete coverage-badge (#2950)
At the moment, the code coverage action has just been failing. It's
misleading when interpreting the status badge on the main branch.


https://github.com/meta-llama/llama-stack/actions/workflows/coverage-badge.yml

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-29 12:53:25 -07:00
Ashwin Bharambe
2e5ca3f15c chore: move recordings one directory upwards 2025-07-29 12:46:19 -07:00
Ashwin Bharambe
08b4a1deb3
feat(tests): introduce inference record/replay to increase test reliability (#2941)
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.

For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.

As expected, tests become much much faster (more than 3x in just
inference testing.)

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
  uv run pytest -s -v tests/integration/inference \
  --stack-config=starter \
  -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
  --text-model="ollama/llama3.2:3b-instruct-fp16" \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2
```

- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)
2025-07-29 12:41:31 -07:00
Ashwin Bharambe
abf1d6a703 fix: random breakage in llama_stack/ui/package.json 2025-07-29 12:31:29 -07:00
Ashwin Bharambe
fee365b71e fix: delete requirements.txt which crept back in 2025-07-29 11:30:25 -07:00
Nehanth Narendrula
58ffd82853
fix: Update SFTConfig parameter to fix CI and Post Training Workflow (#2948)
# What does this PR do?

- Change max_seq_length to max_length in SFTConfig constructor
- TRL deprecated max_seq_length in Feb 2024 and removed it in v0.20.0
- Reference: https://github.com/huggingface/trl/pull/2895

This resolves the SFT training failure in CI tests
2025-07-29 11:14:04 -07:00
Matthew Farrellee
c7dc0f21b4
fix: error on failed job, do not wait for timeout (#2945)
# What does this PR do?

cause post training integration test to error when job fails.

## Test Plan

ci
2025-07-29 11:07:51 -07:00
Nathan Weinberg
870a37ff4b
feat: add base64 encoded PDF support for OpenAI Chat Completions (#2881)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / discover-tests (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 16s
Test Llama Stack Build / build (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 29s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s
Python Package Build Test / build (3.13) (push) Failing after 54s
Integration Tests / test-matrix (push) Failing after 56s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m4s
Pre-commit / pre-commit (push) Successful in 2m15s
# What does this PR do?
OpenAI Chat Completions supports passing a base64 encoded PDF file to a
model, but Llama Stack currently does not allow for this behavior. This
PR extends our implementation of the OpenAI API spec to change that.

Closes #2129

## Test Plan
A new functional test has been added to test the validity of such a
request

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-29 06:23:41 -04:00
github-actions[bot]
cf8722079c build: Bump version to 0.2.16
Some checks failed
Coverage Badge / unit-tests (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / discover-tests (push) Successful in 8s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Integration Tests / test-matrix (push) Failing after 8s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s
Pre-commit / pre-commit (push) Successful in 1m23s
2025-07-28 23:13:50 +00:00
Mark Campbell
19c90d9bfc
docs: update using llama stack as library docs (#2931)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 10s
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Coverage Badge / unit-tests (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 42s
Update ReadTheDocs / update-readthedocs (push) Failing after 40s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 1m58s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Updates provider template from outdated `ollama` to `starter` 
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes: #2839 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-07-28 15:35:26 -07:00
780 changed files with 82704 additions and 21837 deletions

2
.github/TRIAGERS.md vendored
View file

@ -1,2 +1,2 @@
# This file documents Triage members in the Llama Stack community # This file documents Triage members in the Llama Stack community
@bbrowning @franciscojavierarceo @leseb @franciscojavierarceo

View file

@ -0,0 +1,88 @@
name: 'Run and Record Tests'
description: 'Run integration tests and handle recording/artifact upload'
inputs:
test-subdirs:
description: 'Comma-separated list of test subdirectories to run'
required: true
test-pattern:
description: 'Regex pattern to pass to pytest -k'
required: false
default: ''
stack-config:
description: 'Stack configuration to use'
required: true
provider:
description: 'Provider to use for tests'
required: true
inference-mode:
description: 'Inference mode (record or replay)'
required: true
run-vision-tests:
description: 'Whether to run vision tests'
required: false
default: 'false'
runs:
using: 'composite'
steps:
- name: Check Storage and Memory Available Before Tests
if: ${{ always() }}
shell: bash
run: |
free -h
df -h
- name: Run Integration Tests
shell: bash
run: |
uv run --no-sync ./scripts/integration-tests.sh \
--stack-config '${{ inputs.stack-config }}' \
--provider '${{ inputs.provider }}' \
--test-subdirs '${{ inputs.test-subdirs }}' \
--test-pattern '${{ inputs.test-pattern }}' \
--inference-mode '${{ inputs.inference-mode }}' \
${{ inputs.run-vision-tests == 'true' && '--run-vision-tests' || '' }} \
| tee pytest-${{ inputs.inference-mode }}.log
- name: Commit and push recordings
if: ${{ inputs.inference-mode == 'record' }}
shell: bash
run: |
echo "Checking for recording changes"
git status --porcelain tests/integration/recordings/
if [[ -n $(git status --porcelain tests/integration/recordings/) ]]; then
echo "New recordings detected, committing and pushing"
git add tests/integration/recordings/
if [ "${{ inputs.run-vision-tests }}" == "true" ]; then
git commit -m "Recordings update from CI (vision)"
else
git commit -m "Recordings update from CI"
fi
git fetch origin ${{ github.ref_name }}
git rebase origin/${{ github.ref_name }}
echo "Rebased successfully"
git push origin HEAD:${{ github.ref_name }}
echo "Pushed successfully"
else
echo "No recording changes"
fi
- name: Write inference logs to file
if: ${{ always() }}
shell: bash
run: |
sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log || true
- name: Upload logs
if: ${{ always() }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: logs-${{ github.run_id }}-${{ github.run_attempt || '' }}-${{ strategy.job-index }}
path: |
*.log
retention-days: 1

View file

@ -1,11 +1,23 @@
name: Setup Ollama name: Setup Ollama
description: Start Ollama description: Start Ollama
inputs:
run-vision-tests:
description: 'Run vision tests: "true" or "false"'
required: false
default: 'false'
runs: runs:
using: "composite" using: "composite"
steps: steps:
- name: Start Ollama - name: Start Ollama
shell: bash shell: bash
run: | run: |
docker run -d --name ollama -p 11434:11434 docker.io/leseb/ollama-with-models if [ "${{ inputs.run-vision-tests }}" == "true" ]; then
image="ollama-with-vision-model"
else
image="ollama-with-models"
fi
echo "Starting Ollama with image: $image"
docker run -d --name ollama -p 11434:11434 docker.io/llamastack/$image
echo "Verifying Ollama status..." echo "Verifying Ollama status..."
timeout 30 bash -c 'while ! curl -s -L http://127.0.0.1:11434; do sleep 1 && echo "."; done' timeout 30 bash -c 'while ! curl -s -L http://127.0.0.1:11434; do sleep 1 && echo "."; done'

View file

@ -16,19 +16,21 @@ runs:
uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6.0.1 uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6.0.1
with: with:
python-version: ${{ inputs.python-version }} python-version: ${{ inputs.python-version }}
activate-environment: true
version: 0.7.6 version: 0.7.6
- name: Install dependencies - name: Install dependencies
shell: bash shell: bash
run: | run: |
echo "Updating project dependencies via uv sync"
uv sync --all-groups uv sync --all-groups
uv pip install ollama faiss-cpu
echo "Installing ad-hoc dependencies"
uv pip install faiss-cpu
# Install llama-stack-client-python based on the client-version input # Install llama-stack-client-python based on the client-version input
if [ "${{ inputs.client-version }}" = "latest" ]; then if [ "${{ inputs.client-version }}" = "latest" ]; then
echo "Installing latest llama-stack-client-python from main branch" echo "Installing latest llama-stack-client-python from main branch"
uv pip install git+https://github.com/meta-llama/llama-stack-client-python.git@main uv pip install git+https://github.com/llamastack/llama-stack-client-python.git@main
elif [ "${{ inputs.client-version }}" = "published" ]; then elif [ "${{ inputs.client-version }}" = "published" ]; then
echo "Installing published llama-stack-client-python from PyPI" echo "Installing published llama-stack-client-python from PyPI"
uv pip install llama-stack-client uv pip install llama-stack-client
@ -37,4 +39,5 @@ runs:
exit 1 exit 1
fi fi
uv pip install -e . echo "Installed llama packages"
uv pip list | grep llama

View file

@ -0,0 +1,66 @@
name: 'Setup Test Environment'
description: 'Common setup steps for integration tests including dependencies, providers, and build'
inputs:
python-version:
description: 'Python version to use'
required: true
client-version:
description: 'Client version (latest or published)'
required: true
provider:
description: 'Provider to setup (ollama or vllm)'
required: true
default: 'ollama'
run-vision-tests:
description: 'Whether to setup provider for vision tests'
required: false
default: 'false'
inference-mode:
description: 'Inference mode (record or replay)'
required: true
runs:
using: 'composite'
steps:
- name: Install dependencies
uses: ./.github/actions/setup-runner
with:
python-version: ${{ inputs.python-version }}
client-version: ${{ inputs.client-version }}
- name: Setup ollama
if: ${{ inputs.provider == 'ollama' && inputs.inference-mode == 'record' }}
uses: ./.github/actions/setup-ollama
with:
run-vision-tests: ${{ inputs.run-vision-tests }}
- name: Setup vllm
if: ${{ inputs.provider == 'vllm' && inputs.inference-mode == 'record' }}
uses: ./.github/actions/setup-vllm
- name: Build Llama Stack
shell: bash
run: |
# Install llama-stack-client-python based on the client-version input
if [ "${{ inputs.client-version }}" = "latest" ]; then
echo "Installing latest llama-stack-client-python from main branch"
export LLAMA_STACK_CLIENT_DIR=git+https://github.com/llamastack/llama-stack-client-python.git@main
elif [ "${{ inputs.client-version }}" = "published" ]; then
echo "Installing published llama-stack-client-python from PyPI"
unset LLAMA_STACK_CLIENT_DIR
else
echo "Invalid client-version: ${{ inputs.client-version }}"
exit 1
fi
echo "Building Llama Stack"
LLAMA_STACK_DIR=. \
uv run --no-sync llama stack build --template ci-tests --image-type venv
- name: Configure git for commits
shell: bash
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"

View file

@ -9,6 +9,7 @@ updates:
day: "saturday" day: "saturday"
commit-message: commit-message:
prefix: chore(github-deps) prefix: chore(github-deps)
- package-ecosystem: "uv" - package-ecosystem: "uv"
directory: "/" directory: "/"
schedule: schedule:
@ -19,3 +20,14 @@ updates:
- python - python
commit-message: commit-message:
prefix: chore(python-deps) prefix: chore(python-deps)
- package-ecosystem: npm
directory: "/llama_stack/ui"
schedule:
interval: "weekly"
day: "saturday"
labels:
- type/dependencies
- javascript
commit-message:
prefix: chore(ui-deps)

View file

@ -1,22 +1,23 @@
# Llama Stack CI # Llama Stack CI
Llama Stack uses GitHub Actions for Continous Integration (CI). Below is a table detailing what CI the project includes and the purpose. Llama Stack uses GitHub Actions for Continuous Integration (CI). Below is a table detailing what CI the project includes and the purpose.
| Name | File | Purpose | | Name | File | Purpose |
| ---- | ---- | ------- | | ---- | ---- | ------- |
| Update Changelog | [changelog.yml](changelog.yml) | Creates PR for updating the CHANGELOG.md | | Update Changelog | [changelog.yml](changelog.yml) | Creates PR for updating the CHANGELOG.md |
| Coverage Badge | [coverage-badge.yml](coverage-badge.yml) | Creates PR for updating the code coverage badge |
| Installer CI | [install-script-ci.yml](install-script-ci.yml) | Test the installation script | | Installer CI | [install-script-ci.yml](install-script-ci.yml) | Test the installation script |
| Integration Auth Tests | [integration-auth-tests.yml](integration-auth-tests.yml) | Run the integration test suite with Kubernetes authentication | | Integration Auth Tests | [integration-auth-tests.yml](integration-auth-tests.yml) | Run the integration test suite with Kubernetes authentication |
| SqlStore Integration Tests | [integration-sql-store-tests.yml](integration-sql-store-tests.yml) | Run the integration test suite with SqlStore | | SqlStore Integration Tests | [integration-sql-store-tests.yml](integration-sql-store-tests.yml) | Run the integration test suite with SqlStore |
| Integration Tests | [integration-tests.yml](integration-tests.yml) | Run the integration test suite with Ollama | | Integration Tests (Replay) | [integration-tests.yml](integration-tests.yml) | Run the integration test suite from tests/integration in replay mode |
| Vector IO Integration Tests | [integration-vector-io-tests.yml](integration-vector-io-tests.yml) | Run the integration test suite with various VectorIO providers | | Vector IO Integration Tests | [integration-vector-io-tests.yml](integration-vector-io-tests.yml) | Run the integration test suite with various VectorIO providers |
| Pre-commit | [pre-commit.yml](pre-commit.yml) | Run pre-commit checks | | Pre-commit | [pre-commit.yml](pre-commit.yml) | Run pre-commit checks |
| Test Llama Stack Build | [providers-build.yml](providers-build.yml) | Test llama stack build | | Test Llama Stack Build | [providers-build.yml](providers-build.yml) | Test llama stack build |
| Python Package Build Test | [python-build-test.yml](python-build-test.yml) | Test building the llama-stack PyPI project | | Python Package Build Test | [python-build-test.yml](python-build-test.yml) | Test building the llama-stack PyPI project |
| Integration Tests (Record) | [record-integration-tests.yml](record-integration-tests.yml) | Run the integration test suite from tests/integration |
| Check semantic PR titles | [semantic-pr.yml](semantic-pr.yml) | Ensure that PR titles follow the conventional commit spec | | Check semantic PR titles | [semantic-pr.yml](semantic-pr.yml) | Ensure that PR titles follow the conventional commit spec |
| Close stale issues and PRs | [stale_bot.yml](stale_bot.yml) | Run the Stale Bot action | | Close stale issues and PRs | [stale_bot.yml](stale_bot.yml) | Run the Stale Bot action |
| Test External Providers Installed via Module | [test-external-provider-module.yml](test-external-provider-module.yml) | Test External Provider installation via Python module | | Test External Providers Installed via Module | [test-external-provider-module.yml](test-external-provider-module.yml) | Test External Provider installation via Python module |
| Test External API and Providers | [test-external.yml](test-external.yml) | Test the External API and Provider mechanisms | | Test External API and Providers | [test-external.yml](test-external.yml) | Test the External API and Provider mechanisms |
| UI Tests | [ui-unit-tests.yml](ui-unit-tests.yml) | Run the UI test suite |
| Unit Tests | [unit-tests.yml](unit-tests.yml) | Run the unit test suite | | Unit Tests | [unit-tests.yml](unit-tests.yml) | Run the unit test suite |
| Update ReadTheDocs | [update-readthedocs.yml](update-readthedocs.yml) | Update the Llama Stack ReadTheDocs site | | Update ReadTheDocs | [update-readthedocs.yml](update-readthedocs.yml) | Update the Llama Stack ReadTheDocs site |

View file

@ -17,7 +17,7 @@ jobs:
pull-requests: write # for peter-evans/create-pull-request to create a PR pull-requests: write # for peter-evans/create-pull-request to create a PR
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with: with:
ref: main ref: main
fetch-depth: 0 fetch-depth: 0

View file

@ -1,62 +0,0 @@
name: Coverage Badge
run-name: Creates PR for updating the code coverage badge
on:
push:
branches: [ main ]
paths:
- 'llama_stack/**'
- 'tests/unit/**'
- 'uv.lock'
- 'pyproject.toml'
- 'requirements.txt'
- '.github/workflows/unit-tests.yml'
- '.github/workflows/coverage-badge.yml' # This workflow
workflow_dispatch:
jobs:
unit-tests:
permissions:
contents: write # for peter-evans/create-pull-request to create branch
pull-requests: write # for peter-evans/create-pull-request to create a PR
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Run unit tests
run: |
./scripts/unit-tests.sh
- name: Coverage Badge
uses: tj-actions/coverage-badge-py@1788babcb24544eb5bbb6e0d374df5d1e54e670f # v2.0.4
- name: Verify Changed files
uses: tj-actions/verify-changed-files@a1c6acee9df209257a246f2cc6ae8cb6581c1edf # v20.0.4
id: verify-changed-files
with:
files: coverage.svg
- name: Commit files
if: steps.verify-changed-files.outputs.files_changed == 'true'
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git add coverage.svg
git commit -m "Updated coverage.svg"
- name: Create Pull Request
if: steps.verify-changed-files.outputs.files_changed == 'true'
uses: peter-evans/create-pull-request@271a8d0340265f705b14b6d32b9829c1cb33d45e # v7.0.8
with:
token: ${{ secrets.GITHUB_TOKEN }}
title: "ci: [Automatic] Coverage Badge Update"
body: |
This PR updates the coverage badge based on the latest coverage report.
Automatically generated by the [workflow coverage-badge.yaml](.github/workflows/coverage-badge.yaml)
delete-branch: true

View file

@ -16,21 +16,22 @@ jobs:
lint: lint:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 4.2.2 - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # 5.0.0
- name: Run ShellCheck on install.sh - name: Run ShellCheck on install.sh
run: shellcheck scripts/install.sh run: shellcheck scripts/install.sh
smoke-test-on-dev: smoke-test-on-dev:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Build a single provider - name: Build a single provider
run: | run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --template starter --image-type container --image-name test USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync \
llama stack build --template starter --image-type container --image-name test
- name: Run installer end-to-end - name: Run installer end-to-end
run: | run: |

View file

@ -10,6 +10,7 @@ on:
paths: paths:
- 'distributions/**' - 'distributions/**'
- 'llama_stack/**' - 'llama_stack/**'
- '!llama_stack/ui/**'
- 'tests/integration/**' - 'tests/integration/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
@ -30,7 +31,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner

View file

@ -44,7 +44,7 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner

View file

@ -1,20 +1,23 @@
name: Integration Tests name: Integration Tests (Replay)
run-name: Run the integration test suite with Ollama run-name: Run the integration test suite from tests/integration in replay mode
on: on:
push: push:
branches: [ main ] branches: [ main ]
pull_request: pull_request:
branches: [ main ] branches: [ main ]
types: [opened, synchronize, reopened]
paths: paths:
- 'llama_stack/**' - 'llama_stack/**'
- '!llama_stack/ui/**'
- 'tests/**' - 'tests/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
- 'requirements.txt'
- '.github/workflows/integration-tests.yml' # This workflow - '.github/workflows/integration-tests.yml' # This workflow
- '.github/actions/setup-ollama/action.yml' - '.github/actions/setup-ollama/action.yml'
- '.github/actions/setup-test-environment/action.yml'
- '.github/actions/run-and-record-tests/action.yml'
schedule: schedule:
# If changing the cron schedule, update the provider in the test-matrix job # If changing the cron schedule, update the provider in the test-matrix job
- cron: '0 0 * * *' # (test latest client) Daily at 12 AM UTC - cron: '0 0 * * *' # (test latest client) Daily at 12 AM UTC
@ -29,131 +32,56 @@ on:
description: 'Test against a specific provider' description: 'Test against a specific provider'
type: string type: string
default: 'ollama' default: 'ollama'
test-subdirs:
description: 'Comma-separated list of test subdirectories to run'
type: string
default: ''
test-pattern:
description: 'Regex pattern to pass to pytest -k'
type: string
default: ''
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} # Skip concurrency for pushes to main - each commit should be tested independently
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
discover-tests:
runs-on: ubuntu-latest
outputs:
test-type: ${{ steps.generate-matrix.outputs.test-type }}
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Generate test matrix run-replay-mode-tests:
id: generate-matrix
run: |
# Get test directories dynamically, excluding non-test directories
TEST_TYPES=$(find tests/integration -maxdepth 1 -mindepth 1 -type d -printf "%f\n" |
grep -Ev "^(__pycache__|fixtures|test_cases)$" |
sort | jq -R -s -c 'split("\n")[:-1]')
echo "test-type=$TEST_TYPES" >> $GITHUB_OUTPUT
test-matrix:
needs: discover-tests
runs-on: ubuntu-latest runs-on: ubuntu-latest
name: ${{ format('Integration Tests ({0}, {1}, {2}, client={3}, vision={4})', matrix.client-type, matrix.provider, matrix.python-version, matrix.client-version, matrix.run-vision-tests) }}
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
test-type: ${{ fromJson(needs.discover-tests.outputs.test-type) }}
client-type: [library, server] client-type: [library, server]
# Use vllm on weekly schedule, otherwise use test-provider input (defaults to ollama) # Use vllm on weekly schedule, otherwise use test-provider input (defaults to ollama)
provider: ${{ (github.event.schedule == '1 0 * * 0') && fromJSON('["vllm"]') || fromJSON(format('["{0}"]', github.event.inputs.test-provider || 'ollama')) }} provider: ${{ (github.event.schedule == '1 0 * * 0') && fromJSON('["vllm"]') || fromJSON(format('["{0}"]', github.event.inputs.test-provider || 'ollama')) }}
python-version: ["3.12", "3.13"] # Use Python 3.13 only on nightly schedule (daily latest client test), otherwise use 3.12
client-version: ${{ (github.event.schedule == '0 0 * * 0' || github.event.inputs.test-all-client-versions == 'true') && fromJSON('["published", "latest"]') || fromJSON('["latest"]') }} python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}
exclude: # TODO: look into why these tests are failing and fix them client-version: ${{ (github.event.schedule == '0 0 * * *' || github.event.inputs.test-all-client-versions == 'true') && fromJSON('["published", "latest"]') || fromJSON('["latest"]') }}
- provider: vllm run-vision-tests: [true, false]
test-type: safety
- provider: vllm
test-type: post_training
- provider: vllm
test-type: tool_runtime
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Setup test environment
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-test-environment
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
client-version: ${{ matrix.client-version }} client-version: ${{ matrix.client-version }}
provider: ${{ matrix.provider }}
run-vision-tests: ${{ matrix.run-vision-tests }}
inference-mode: 'replay'
- name: Setup ollama - name: Run tests
if: ${{ matrix.provider == 'ollama' }} uses: ./.github/actions/run-and-record-tests
uses: ./.github/actions/setup-ollama
- name: Setup vllm
if: ${{ matrix.provider == 'vllm' }}
uses: ./.github/actions/setup-vllm
- name: Build Llama Stack
run: |
uv run llama stack build --template ci-tests --image-type venv
- name: Check Storage and Memory Available Before Tests
if: ${{ always() }}
run: |
free -h
df -h
- name: Run Integration Tests
env:
LLAMA_STACK_CLIENT_TIMEOUT: "300" # Increased timeout for eval operations
# Use 'shell' to get pipefail behavior
# https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#exit-codes-and-error-action-preference
# TODO: write a precommit hook to detect if a test contains a pipe but does not use 'shell: bash'
shell: bash
run: |
if [ "${{ matrix.client-type }}" == "library" ]; then
stack_config="ci-tests"
else
stack_config="server:ci-tests"
fi
EXCLUDE_TESTS="builtin_tool or safety_with_image or code_interpreter or test_rag"
if [ "${{ matrix.provider }}" == "ollama" ]; then
export OLLAMA_URL="http://0.0.0.0:11434"
export TEXT_MODEL=ollama/llama3.2:3b-instruct-fp16
export SAFETY_MODEL="ollama/llama-guard3:1b"
EXTRA_PARAMS="--safety-shield=llama-guard"
else
export VLLM_URL="http://localhost:8000/v1"
export TEXT_MODEL=vllm/meta-llama/Llama-3.2-1B-Instruct
# TODO: remove the not(test_inference_store_tool_calls) once we can get the tool called consistently
EXTRA_PARAMS=
EXCLUDE_TESTS="${EXCLUDE_TESTS} or test_inference_store_tool_calls"
fi
uv run pytest -s -v tests/integration/${{ matrix.test-type }} --stack-config=${stack_config} \
-k "not( ${EXCLUDE_TESTS} )" \
--text-model=$TEXT_MODEL \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
--color=yes ${EXTRA_PARAMS} \
--capture=tee-sys | tee pytest-${{ matrix.test-type }}.log
- name: Check Storage and Memory Available After Tests
if: ${{ always() }}
run: |
free -h
df -h
- name: Write inference logs to file
if: ${{ always() }}
run: |
sudo docker logs ollama > ollama.log || true
sudo docker logs vllm > vllm.log || true
- name: Upload all logs to artifacts
if: ${{ always() }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with: with:
name: logs-${{ github.run_id }}-${{ github.run_attempt }}-${{ matrix.provider }}-${{ matrix.client-type }}-${{ matrix.test-type }}-${{ matrix.python-version }}-${{ matrix.client-version }} test-subdirs: ${{ inputs.test-subdirs }}
path: | test-pattern: ${{ inputs.test-pattern }}
*.log stack-config: ${{ matrix.client-type == 'library' && 'ci-tests' || 'server:ci-tests' }}
retention-days: 1 provider: ${{ matrix.provider }}
inference-mode: 'replay'
run-vision-tests: ${{ matrix.run-vision-tests }}

View file

@ -9,14 +9,17 @@ on:
branches: [ main ] branches: [ main ]
paths: paths:
- 'llama_stack/**' - 'llama_stack/**'
- '!llama_stack/ui/**'
- 'tests/integration/vector_io/**' - 'tests/integration/vector_io/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
- 'requirements.txt' - 'requirements.txt'
- '.github/workflows/integration-vector-io-tests.yml' # This workflow - '.github/workflows/integration-vector-io-tests.yml' # This workflow
schedule:
- cron: '0 0 * * *' # (test on python 3.13) Daily at 12 AM UTC
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
cancel-in-progress: true cancel-in-progress: true
jobs: jobs:
@ -24,13 +27,13 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy: strategy:
matrix: matrix:
vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector", "remote::weaviate", "remote::qdrant"]
python-version: ["3.12", "3.13"] python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}
fail-fast: false # we want to run all tests regardless of failure fail-fast: false # we want to run all tests regardless of failure
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
@ -48,6 +51,14 @@ jobs:
-e ANONYMIZED_TELEMETRY=FALSE \ -e ANONYMIZED_TELEMETRY=FALSE \
chromadb/chroma:latest chromadb/chroma:latest
- name: Setup Weaviate
if: matrix.vector-io-provider == 'remote::weaviate'
run: |
docker run --rm -d --pull always \
--name weaviate \
-p 8080:8080 -p 50051:50051 \
cr.weaviate.io/semitechnologies/weaviate:1.32.0
- name: Start PGVector DB - name: Start PGVector DB
if: matrix.vector-io-provider == 'remote::pgvector' if: matrix.vector-io-provider == 'remote::pgvector'
run: | run: |
@ -78,6 +89,29 @@ jobs:
PGPASSWORD=llamastack psql -h localhost -U llamastack -d llamastack \ PGPASSWORD=llamastack psql -h localhost -U llamastack -d llamastack \
-c "CREATE EXTENSION IF NOT EXISTS vector;" -c "CREATE EXTENSION IF NOT EXISTS vector;"
- name: Setup Qdrant
if: matrix.vector-io-provider == 'remote::qdrant'
run: |
docker run --rm -d --pull always \
--name qdrant \
-p 6333:6333 \
qdrant/qdrant
- name: Wait for Qdrant to be ready
if: matrix.vector-io-provider == 'remote::qdrant'
run: |
echo "Waiting for Qdrant to be ready..."
for i in {1..30}; do
if curl -s http://localhost:6333/collections | grep -q '"status":"ok"'; then
echo "Qdrant is ready!"
exit 0
fi
sleep 2
done
echo "Qdrant failed to start"
docker logs qdrant
exit 1
- name: Wait for ChromaDB to be ready - name: Wait for ChromaDB to be ready
if: matrix.vector-io-provider == 'remote::chromadb' if: matrix.vector-io-provider == 'remote::chromadb'
run: | run: |
@ -93,9 +127,24 @@ jobs:
docker logs chromadb docker logs chromadb
exit 1 exit 1
- name: Wait for Weaviate to be ready
if: matrix.vector-io-provider == 'remote::weaviate'
run: |
echo "Waiting for Weaviate to be ready..."
for i in {1..30}; do
if curl -s http://localhost:8080 | grep -q "https://weaviate.io/developers/weaviate/current/"; then
echo "Weaviate is ready!"
exit 0
fi
sleep 2
done
echo "Weaviate failed to start"
docker logs weaviate
exit 1
- name: Build Llama Stack - name: Build Llama Stack
run: | run: |
uv run llama stack build --template ci-tests --image-type venv uv run --no-sync llama stack build --template ci-tests --image-type venv
- name: Check Storage and Memory Available Before Tests - name: Check Storage and Memory Available Before Tests
if: ${{ always() }} if: ${{ always() }}
@ -113,10 +162,15 @@ jobs:
PGVECTOR_DB: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }} PGVECTOR_DB: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}
PGVECTOR_USER: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }} PGVECTOR_USER: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}
PGVECTOR_PASSWORD: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }} PGVECTOR_PASSWORD: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}
ENABLE_QDRANT: ${{ matrix.vector-io-provider == 'remote::qdrant' && 'true' || '' }}
QDRANT_URL: ${{ matrix.vector-io-provider == 'remote::qdrant' && 'http://localhost:6333' || '' }}
ENABLE_WEAVIATE: ${{ matrix.vector-io-provider == 'remote::weaviate' && 'true' || '' }}
WEAVIATE_CLUSTER_URL: ${{ matrix.vector-io-provider == 'remote::weaviate' && 'localhost:8080' || '' }}
run: | run: |
uv run pytest -sv --stack-config="inference=inline::sentence-transformers,vector_io=${{ matrix.vector-io-provider }}" \ uv run --no-sync \
pytest -sv --stack-config="files=inline::localfs,inference=inline::sentence-transformers,vector_io=${{ matrix.vector-io-provider }}" \
tests/integration/vector_io \ tests/integration/vector_io \
--embedding-model sentence-transformers/all-MiniLM-L6-v2 --embedding-model inline::sentence-transformers/all-MiniLM-L6-v2
- name: Check Storage and Memory Available After Tests - name: Check Storage and Memory Available After Tests
if: ${{ always() }} if: ${{ always() }}
@ -134,6 +188,11 @@ jobs:
run: | run: |
docker logs chromadb > chromadb.log docker logs chromadb > chromadb.log
- name: Write Qdrant logs to file
if: ${{ always() && matrix.vector-io-provider == 'remote::qdrant' }}
run: |
docker logs qdrant > qdrant.log
- name: Upload all logs to artifacts - name: Upload all logs to artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

View file

@ -20,7 +20,7 @@ jobs:
steps: steps:
- name: Checkout code - name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with: with:
# For dependabot PRs, we need to checkout with a token that can push changes # For dependabot PRs, we need to checkout with a token that can push changes
token: ${{ github.actor == 'dependabot[bot]' && secrets.GITHUB_TOKEN || github.token }} token: ${{ github.actor == 'dependabot[bot]' && secrets.GITHUB_TOKEN || github.token }}
@ -36,6 +36,21 @@ jobs:
**/requirements*.txt **/requirements*.txt
.pre-commit-config.yaml .pre-commit-config.yaml
# npm ci may fail -
# npm error `npm ci` can only install packages when your package.json and package-lock.json or npm-shrinkwrap.json are in sync. Please update your lock file with `npm install` before continuing.
# npm error Invalid: lock file's llama-stack-client@0.2.17 does not satisfy llama-stack-client@0.2.18
# - name: Set up Node.js
# uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af # v4.1.0
# with:
# node-version: '20'
# cache: 'npm'
# cache-dependency-path: 'llama_stack/ui/'
# - name: Install npm dependencies
# run: npm ci
# working-directory: llama_stack/ui
- uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1 - uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
continue-on-error: true continue-on-error: true
env: env:

View file

@ -9,20 +9,20 @@ on:
paths: paths:
- 'llama_stack/cli/stack/build.py' - 'llama_stack/cli/stack/build.py'
- 'llama_stack/cli/stack/_build.py' - 'llama_stack/cli/stack/_build.py'
- 'llama_stack/distribution/build.*' - 'llama_stack/core/build.*'
- 'llama_stack/distribution/*.sh' - 'llama_stack/core/*.sh'
- '.github/workflows/providers-build.yml' - '.github/workflows/providers-build.yml'
- 'llama_stack/templates/**' - 'llama_stack/distributions/**'
- 'pyproject.toml' - 'pyproject.toml'
pull_request: pull_request:
paths: paths:
- 'llama_stack/cli/stack/build.py' - 'llama_stack/cli/stack/build.py'
- 'llama_stack/cli/stack/_build.py' - 'llama_stack/cli/stack/_build.py'
- 'llama_stack/distribution/build.*' - 'llama_stack/core/build.*'
- 'llama_stack/distribution/*.sh' - 'llama_stack/core/*.sh'
- '.github/workflows/providers-build.yml' - '.github/workflows/providers-build.yml'
- 'llama_stack/templates/**' - 'llama_stack/distributions/**'
- 'pyproject.toml' - 'pyproject.toml'
concurrency: concurrency:
@ -33,42 +33,42 @@ jobs:
generate-matrix: generate-matrix:
runs-on: ubuntu-latest runs-on: ubuntu-latest
outputs: outputs:
templates: ${{ steps.set-matrix.outputs.templates }} distros: ${{ steps.set-matrix.outputs.distros }}
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Generate Template List - name: Generate Distribution List
id: set-matrix id: set-matrix
run: | run: |
templates=$(ls llama_stack/templates/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]') distros=$(ls llama_stack/distributions/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]')
echo "templates=$templates" >> "$GITHUB_OUTPUT" echo "distros=$distros" >> "$GITHUB_OUTPUT"
build: build:
needs: generate-matrix needs: generate-matrix
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy: strategy:
matrix: matrix:
template: ${{ fromJson(needs.generate-matrix.outputs.templates) }} distro: ${{ fromJson(needs.generate-matrix.outputs.distros) }}
image-type: [venv, container] image-type: [venv, container]
fail-fast: false # We want to run all jobs even if some fail fail-fast: false # We want to run all jobs even if some fail
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Print build dependencies - name: Print build dependencies
run: | run: |
uv run llama stack build --template ${{ matrix.template }} --image-type ${{ matrix.image-type }} --image-name test --print-deps-only uv run llama stack build --distro ${{ matrix.distro }} --image-type ${{ matrix.image-type }} --image-name test --print-deps-only
- name: Run Llama Stack Build - name: Run Llama Stack Build
run: | run: |
# USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead # USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead
# LLAMA_STACK_DIR is set to the current directory so we are building from the source # LLAMA_STACK_DIR is set to the current directory so we are building from the source
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --template ${{ matrix.template }} --image-type ${{ matrix.image-type }} --image-name test USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --distro ${{ matrix.distro }} --image-type ${{ matrix.image-type }} --image-name test
- name: Print dependencies in the image - name: Print dependencies in the image
if: matrix.image-type == 'venv' if: matrix.image-type == 'venv'
@ -79,7 +79,7 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
@ -92,23 +92,23 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Build a single provider - name: Build a single provider
run: | run: |
yq -i '.image_type = "container"' llama_stack/templates/ci-tests/build.yaml yq -i '.image_type = "container"' llama_stack/distributions/ci-tests/build.yaml
yq -i '.image_name = "test"' llama_stack/templates/ci-tests/build.yaml yq -i '.image_name = "test"' llama_stack/distributions/ci-tests/build.yaml
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config llama_stack/templates/ci-tests/build.yaml USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config llama_stack/distributions/ci-tests/build.yaml
- name: Inspect the container image entrypoint - name: Inspect the container image entrypoint
run: | run: |
IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1) IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)
entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID) entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)
echo "Entrypoint: $entrypoint" echo "Entrypoint: $entrypoint"
if [ "$entrypoint" != "[python -m llama_stack.distribution.server.server --config /app/run.yaml]" ]; then if [ "$entrypoint" != "[python -m llama_stack.core.server.server /app/run.yaml]" ]; then
echo "Entrypoint is not correct" echo "Entrypoint is not correct"
exit 1 exit 1
fi fi
@ -117,32 +117,32 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Pin template to UBI9 base - name: Pin distribution to UBI9 base
run: | run: |
yq -i ' yq -i '
.image_type = "container" | .image_type = "container" |
.image_name = "ubi9-test" | .image_name = "ubi9-test" |
.distribution_spec.container_image = "registry.access.redhat.com/ubi9:latest" .distribution_spec.container_image = "registry.access.redhat.com/ubi9:latest"
' llama_stack/templates/ci-tests/build.yaml ' llama_stack/distributions/ci-tests/build.yaml
- name: Build dev container (UBI9) - name: Build dev container (UBI9)
env: env:
USE_COPY_NOT_MOUNT: "true" USE_COPY_NOT_MOUNT: "true"
LLAMA_STACK_DIR: "." LLAMA_STACK_DIR: "."
run: | run: |
uv run llama stack build --config llama_stack/templates/ci-tests/build.yaml uv run llama stack build --config llama_stack/distributions/ci-tests/build.yaml
- name: Inspect UBI9 image - name: Inspect UBI9 image
run: | run: |
IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1) IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)
entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID) entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)
echo "Entrypoint: $entrypoint" echo "Entrypoint: $entrypoint"
if [ "$entrypoint" != "[python -m llama_stack.distribution.server.server --config /app/run.yaml]" ]; then if [ "$entrypoint" != "[python -m llama_stack.core.server.server /app/run.yaml]" ]; then
echo "Entrypoint is not correct" echo "Entrypoint is not correct"
exit 1 exit 1
fi fi

View file

@ -9,6 +9,8 @@ on:
pull_request: pull_request:
branches: branches:
- main - main
paths-ignore:
- 'llama_stack/ui/**'
jobs: jobs:
build: build:
@ -19,10 +21,10 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install uv - name: Install uv
uses: astral-sh/setup-uv@e92bafb6253dcd438e0484186d7669ea7a8ca1cc # v6.4.3 uses: astral-sh/setup-uv@d9e0f98d3fc6adb07d1e3d37f3043649ddad06a1 # v6.5.0
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
activate-environment: true activate-environment: true

View file

@ -0,0 +1,70 @@
# This workflow should be run manually when needing to re-record tests. This happens when you have
# - added a new test
# - or changed an existing test such that a new inference call is made
# You should make a PR and then run this workflow on that PR branch. The workflow will re-record the
# tests and commit the recordings to the PR branch.
name: Integration Tests (Record)
run-name: Run the integration test suite from tests/integration
on:
workflow_dispatch:
inputs:
test-subdirs:
description: 'Comma-separated list of test subdirectories to run'
type: string
default: ''
test-provider:
description: 'Test against a specific provider'
type: string
default: 'ollama'
run-vision-tests:
description: 'Whether to run vision tests'
type: boolean
default: false
test-pattern:
description: 'Regex pattern to pass to pytest -k'
type: string
default: ''
jobs:
record-tests:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Echo workflow inputs
run: |
echo "::group::Workflow Inputs"
echo "test-subdirs: ${{ inputs.test-subdirs }}"
echo "test-provider: ${{ inputs.test-provider }}"
echo "run-vision-tests: ${{ inputs.run-vision-tests }}"
echo "test-pattern: ${{ inputs.test-pattern }}"
echo "branch: ${{ github.ref_name }}"
echo "::endgroup::"
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
fetch-depth: 0
- name: Setup test environment
uses: ./.github/actions/setup-test-environment
with:
python-version: "3.12" # Use single Python version for recording
client-version: "latest"
provider: ${{ inputs.test-provider || 'ollama' }}
run-vision-tests: ${{ inputs.run-vision-tests }}
inference-mode: 'record'
- name: Run and record tests
uses: ./.github/actions/run-and-record-tests
with:
test-pattern: ${{ inputs.test-pattern }}
test-subdirs: ${{ inputs.test-subdirs }}
stack-config: 'server:ci-tests' # recording must be done with server since more tests are run
provider: ${{ inputs.test-provider || 'ollama' }}
inference-mode: 'record'
run-vision-tests: ${{ inputs.run-vision-tests }}

View file

@ -11,7 +11,7 @@ on:
- synchronize - synchronize
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }} group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true cancel-in-progress: true
permissions: permissions:
@ -22,6 +22,6 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Check PR Title's semantic conformance - name: Check PR Title's semantic conformance
uses: amannn/action-semantic-pull-request@0723387faaf9b38adef4775cd42cfd5155ed6017 # v5.5.3 uses: amannn/action-semantic-pull-request@7f33ba792281b034f64e96f4c0b5496782dd3b37 # v6.1.0
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View file

@ -12,12 +12,13 @@ on:
- 'tests/integration/**' - 'tests/integration/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
- 'requirements.txt'
- 'tests/external/*' - 'tests/external/*'
- '.github/workflows/test-external-provider-module.yml' # This workflow - '.github/workflows/test-external-provider-module.yml' # This workflow
jobs: jobs:
test-external-providers-from-module: test-external-providers-from-module:
# This workflow is disabled. See https://github.com/meta-llama/llama-stack/pull/2975#issuecomment-3138702984 for details
if: false
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy: strategy:
matrix: matrix:
@ -26,7 +27,7 @@ jobs:
# container and point 'uv pip install' to the correct path... # container and point 'uv pip install' to the correct path...
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
@ -47,7 +48,7 @@ jobs:
- name: Build distro from config file - name: Build distro from config file
run: | run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. llama stack build --config tests/external/ramalama-stack/build.yaml USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config tests/external/ramalama-stack/build.yaml
- name: Start Llama Stack server in background - name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv' if: ${{ matrix.image-type }} == 'venv'

View file

@ -9,6 +9,7 @@ on:
branches: [ main ] branches: [ main ]
paths: paths:
- 'llama_stack/**' - 'llama_stack/**'
- '!llama_stack/ui/**'
- 'tests/integration/**' - 'tests/integration/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
@ -26,7 +27,7 @@ jobs:
# container and point 'uv pip install' to the correct path... # container and point 'uv pip install' to the correct path...
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
@ -43,11 +44,11 @@ jobs:
- name: Print distro dependencies - name: Print distro dependencies
run: | run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. llama stack build --config tests/external/build.yaml --print-deps-only USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync llama stack build --config tests/external/build.yaml --print-deps-only
- name: Build distro from config file - name: Build distro from config file
run: | run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. llama stack build --config tests/external/build.yaml USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync llama stack build --config tests/external/build.yaml
- name: Start Llama Stack server in background - name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv' if: ${{ matrix.image-type }} == 'venv'

55
.github/workflows/ui-unit-tests.yml vendored Normal file
View file

@ -0,0 +1,55 @@
name: UI Tests
run-name: Run the UI test suite
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
paths:
- 'llama_stack/ui/**'
- '.github/workflows/ui-unit-tests.yml' # This workflow
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
ui-tests:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
node-version: [22]
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Setup Node.js
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: 'llama_stack/ui/package-lock.json'
- name: Install dependencies
working-directory: llama_stack/ui
run: npm ci
- name: Run linting
working-directory: llama_stack/ui
run: npm run lint
- name: Run format check
working-directory: llama_stack/ui
run: npm run format:check
- name: Run unit tests
working-directory: llama_stack/ui
env:
CI: true
run: npm test -- --coverage --watchAll=false --passWithNoTests

View file

@ -9,6 +9,7 @@ on:
branches: [ main ] branches: [ main ]
paths: paths:
- 'llama_stack/**' - 'llama_stack/**'
- '!llama_stack/ui/**'
- 'tests/unit/**' - 'tests/unit/**'
- 'uv.lock' - 'uv.lock'
- 'pyproject.toml' - 'pyproject.toml'
@ -31,7 +32,7 @@ jobs:
- "3.13" - "3.13"
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner

View file

@ -37,7 +37,7 @@ jobs:
TOKEN: ${{ secrets.READTHEDOCS_TOKEN }} TOKEN: ${{ secrets.READTHEDOCS_TOKEN }}
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner

View file

@ -2,6 +2,7 @@ exclude: 'build/'
default_language_version: default_language_version:
python: python3.12 python: python3.12
node: "22"
repos: repos:
- repo: https://github.com/pre-commit/pre-commit-hooks - repo: https://github.com/pre-commit/pre-commit-hooks
@ -145,6 +146,50 @@ repos:
pass_filenames: false pass_filenames: false
require_serial: true require_serial: true
files: ^.github/workflows/.*$ files: ^.github/workflows/.*$
# ui-prettier and ui-eslint are disabled until we can avoid `npm ci`, which is slow and may fail -
# npm error `npm ci` can only install packages when your package.json and package-lock.json or npm-shrinkwrap.json are in sync. Please update your lock file with `npm install` before continuing.
# npm error Invalid: lock file's llama-stack-client@0.2.17 does not satisfy llama-stack-client@0.2.18
# and until we have infra for installing prettier and next via npm -
# Lint UI code with ESLint.....................................................Failed
# - hook id: ui-eslint
# - exit code: 127
# > ui@0.1.0 lint
# > next lint --fix --quiet
# sh: line 1: next: command not found
#
# - id: ui-prettier
# name: Format UI code with Prettier
# entry: bash -c 'cd llama_stack/ui && npm ci && npm run format'
# language: system
# files: ^llama_stack/ui/.*\.(ts|tsx)$
# pass_filenames: false
# require_serial: true
# - id: ui-eslint
# name: Lint UI code with ESLint
# entry: bash -c 'cd llama_stack/ui && npm run lint -- --fix --quiet'
# language: system
# files: ^llama_stack/ui/.*\.(ts|tsx)$
# pass_filenames: false
# require_serial: true
- id: check-log-usage
name: Ensure 'llama_stack.log' usage for logging
entry: bash
language: system
types: [python]
pass_filenames: true
args:
- -c
- |
matches=$(grep -EnH '^[^#]*\b(import\s+logging|from\s+logging\b)' "$@" | grep -v -e '#\s*allow-direct-logging' || true)
if [ -n "$matches" ]; then
# GitHub Actions annotation format
while IFS=: read -r file line_num rest; do
echo "::error file=$file,line=$line_num::Do not use 'import logging' or 'from logging import' in $file. Use the custom log instead: from llama_stack.log import get_logger; logger = get_logger(). If direct logging is truly needed, add: # allow-direct-logging"
done <<< "$matches"
exit 1
fi
exit 0
ci: ci:
autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks

View file

@ -451,7 +451,7 @@ GenAI application developers need more than just an LLM - they need to integrate
Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety. Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.
With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stacks plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience. With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stacks plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.
## Release ## Release
After iterating on the APIs for the last 3 months, today were launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements. After iterating on the APIs for the last 3 months, today were launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.

View file

@ -1,13 +1,82 @@
# Contributing to Llama-Stack # Contributing to Llama Stack
We want to make contributing to this project as easy and transparent as We want to make contributing to this project as easy and transparent as
possible. possible.
## Set up your development environment
We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.
You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).
You can install the dependencies by running:
```bash
cd llama-stack
uv sync --group dev
uv pip install -e .
source .venv/bin/activate
```
```{note}
You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.12`).
Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.
For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
```
Note that you can create a dotenv file `.env` that includes necessary environment variables:
```
LLAMA_STACK_BASE_URL=http://localhost:8321
LLAMA_STACK_CLIENT_LOG=debug
LLAMA_STACK_PORT=8321
LLAMA_STACK_CONFIG=<provider-name>
TAVILY_SEARCH_API_KEY=
BRAVE_SEARCH_API_KEY=
```
And then use this dotenv file when running client SDK tests via the following:
```bash
uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
```
### Pre-commit Hooks
We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
```bash
uv run pre-commit install
```
After that, pre-commit hooks will run automatically before each commit.
Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:
```bash
uv run pre-commit run --all-files
```
```{caution}
Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
```
## Discussions -> Issues -> Pull Requests ## Discussions -> Issues -> Pull Requests
We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md). We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later. If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
### Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.
Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.
### Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Meta's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
**I'd like to contribute!** **I'd like to contribute!**
If you are new to the project, start by looking at the issues tagged with "good first issue". If you're interested If you are new to the project, start by looking at the issues tagged with "good first issue". If you're interested
@ -51,93 +120,15 @@ Please avoid picking up too many issues at once. This helps you stay focused and
Please keep pull requests (PRs) small and focused. If you have a large set of changes, consider splitting them into logically grouped, smaller PRs to facilitate review and testing. Please keep pull requests (PRs) small and focused. If you have a large set of changes, consider splitting them into logically grouped, smaller PRs to facilitate review and testing.
> [!TIP] ```{tip}
> As a general guideline: As a general guideline:
> - Experienced contributors should try to keep no more than 5 open PRs at a time. - Experienced contributors should try to keep no more than 5 open PRs at a time.
> - New contributors are encouraged to have only one open PR at a time until theyre familiar with the codebase and process. - New contributors are encouraged to have only one open PR at a time until theyre familiar with the codebase and process.
## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Meta's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
## Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.
Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.
## Set up your development environment
We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.
You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).
You can install the dependencies by running:
```bash
cd llama-stack
uv sync --group dev
uv pip install -e .
source .venv/bin/activate
``` ```
> [!NOTE] ## Repository guidelines
> You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.12`)
> Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.
> For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
Note that you can create a dotenv file `.env` that includes necessary environment variables: ### Coding Style
```
LLAMA_STACK_BASE_URL=http://localhost:8321
LLAMA_STACK_CLIENT_LOG=debug
LLAMA_STACK_PORT=8321
LLAMA_STACK_CONFIG=<provider-name>
TAVILY_SEARCH_API_KEY=
BRAVE_SEARCH_API_KEY=
```
And then use this dotenv file when running client SDK tests via the following:
```bash
uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct
```
## Pre-commit Hooks
We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
```bash
uv run pre-commit install
```
After that, pre-commit hooks will run automatically before each commit.
Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:
```bash
uv run pre-commit run --all-files
```
> [!CAUTION]
> Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
## Running tests
You can find the Llama Stack testing documentation [here](https://github.com/meta-llama/llama-stack/blob/main/tests/README.md).
## Adding a new dependency to the project
To add a new dependency to the project, you can use the `uv` command. For example, to add `foo` to the project, you can run:
```bash
uv add foo
uv sync
```
## Coding Style
* Comments should provide meaningful insights into the code. Avoid filler comments that simply * Comments should provide meaningful insights into the code. Avoid filler comments that simply
describe the next step, as they create unnecessary clutter, same goes for docstrings. describe the next step, as they create unnecessary clutter, same goes for docstrings.
@ -157,6 +148,11 @@ uv sync
that describes the configuration. These descriptions will be used to generate the provider that describes the configuration. These descriptions will be used to generate the provider
documentation. documentation.
* When possible, use keyword arguments only when calling functions. * When possible, use keyword arguments only when calling functions.
* Llama Stack utilizes [custom Exception classes](llama_stack/apis/common/errors.py) for certain Resources that should be used where applicable.
### License
By contributing to Llama, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
## Common Tasks ## Common Tasks
@ -164,7 +160,7 @@ Some tips about common tasks you work on while contributing to Llama Stack:
### Using `llama stack build` ### Using `llama stack build`
Building a stack image (conda / docker) will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands. Building a stack image will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands.
Example: Example:
```bash ```bash
@ -172,7 +168,7 @@ cd work/
git clone https://github.com/meta-llama/llama-stack.git git clone https://github.com/meta-llama/llama-stack.git
git clone https://github.com/meta-llama/llama-stack-client-python.git git clone https://github.com/meta-llama/llama-stack-client-python.git
cd llama-stack cd llama-stack
LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --template <...> LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --distro <...>
``` ```
### Updating distribution configurations ### Updating distribution configurations
@ -210,7 +206,3 @@ uv run ./docs/openapi_generator/run_openapi_generator.sh
``` ```
The generated API documentation will be available in `docs/_static/`. Make sure to review the changes before committing. The generated API documentation will be available in `docs/_static/`. Make sure to review the changes before committing.
## License
By contributing to Llama, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.

View file

@ -1,9 +1,9 @@
include pyproject.toml include pyproject.toml
include llama_stack/models/llama/llama3/tokenizer.model include llama_stack/models/llama/llama3/tokenizer.model
include llama_stack/models/llama/llama4/tokenizer.model include llama_stack/models/llama/llama4/tokenizer.model
include llama_stack/distribution/*.sh include llama_stack/core/*.sh
include llama_stack/cli/scripts/*.sh include llama_stack/cli/scripts/*.sh
include llama_stack/templates/*/*.yaml include llama_stack/distributions/*/*.yaml
include llama_stack/providers/tests/test_cases/inference/*.json include llama_stack/providers/tests/test_cases/inference/*.json
include llama_stack/models/llama/*/*.md include llama_stack/models/llama/*/*.md
include llama_stack/tests/integration/*.jpg include llama_stack/tests/integration/*.jpg

View file

@ -6,10 +6,10 @@
[![Discord](https://img.shields.io/discord/1257833999603335178?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/llama-stack) [![Discord](https://img.shields.io/discord/1257833999603335178?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/llama-stack)
[![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain) [![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)
[![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain) [![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)
![coverage badge](./coverage.svg)
[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack) [**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
### ✨🎉 Llama 4 Support 🎉✨ ### ✨🎉 Llama 4 Support 🎉✨
We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta. We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.
@ -112,29 +112,33 @@ Here is a list of the various API providers and available distributions that can
Please checkout for [full list](https://llama-stack.readthedocs.io/en/latest/providers/index.html) Please checkout for [full list](https://llama-stack.readthedocs.io/en/latest/providers/index.html)
| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO | | API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |
|:-------------------:|:------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:--------:| |:--------------------:|:------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:--------:|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SambaNova | Hosted | | ✅ | | ✅ | | | | | | SambaNova | Hosted | | ✅ | | ✅ | | | | |
| Cerebras | Hosted | | ✅ | | | | | | | | Cerebras | Hosted | | ✅ | | | | | | |
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | | Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | |
| AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | | AWS Bedrock | Hosted | | ✅ | | ✅ | | | | |
| Together | Hosted | ✅ | ✅ | | ✅ | | | | | | Together | Hosted | ✅ | ✅ | | ✅ | | | | |
| Groq | Hosted | | ✅ | | | | | | | | Groq | Hosted | | ✅ | | | | | | |
| Ollama | Single Node | | ✅ | | | | | | | | Ollama | Single Node | | ✅ | | | | | | |
| TGI | Hosted/Single Node | | ✅ | | | | | | | | TGI | Hosted/Single Node | | ✅ | | | | | | |
| NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | | | NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | |
| ChromaDB | Hosted/Single Node | | | ✅ | | | | | | | ChromaDB | Hosted/Single Node | | | ✅ | | | | | |
| PG Vector | Single Node | | | ✅ | | | | | | | Milvus | Hosted/Single Node | | | ✅ | | | | | |
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | | Qdrant | Hosted/Single Node | | | ✅ | | | | | |
| vLLM | Single Node | | ✅ | | | | | | | | Weaviate | Hosted/Single Node | | | ✅ | | | | | |
| OpenAI | Hosted | | ✅ | | | | | | | | SQLite-vec | Single Node | | | ✅ | | | | | |
| Anthropic | Hosted | | ✅ | | | | | | | | PG Vector | Single Node | | | ✅ | | | | | |
| Gemini | Hosted | | ✅ | | | | | | | | PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | |
| WatsonX | Hosted | | ✅ | | | | | | | | vLLM | Single Node | | ✅ | | | | | | |
| HuggingFace | Single Node | | | | | | ✅ | | ✅ | | OpenAI | Hosted | | ✅ | | | | | | |
| TorchTune | Single Node | | | | | | ✅ | | | | Anthropic | Hosted | | ✅ | | | | | | |
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ | | Gemini | Hosted | | ✅ | | | | | | |
| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ | | WatsonX | Hosted | | ✅ | | | | | | |
| HuggingFace | Single Node | | | | | | ✅ | | ✅ |
| TorchTune | Single Node | | | | | | ✅ | | |
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ |
| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ |
> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation. > **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation.
@ -176,3 +180,17 @@ Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest
Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
## 🌟 GitHub Star History
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=meta-llama/llama-stack&type=Date)](https://www.star-history.com/#meta-llama/llama-stack&Date)
## ✨ Contributors
Thanks to all of our amazing contributors!
<a href="https://github.com/meta-llama/llama-stack/graphs/contributors">
<img src="https://contrib.rocks/image?repo=meta-llama/llama-stack" />
</a>

14
docs/_static/js/keyboard_shortcuts.js vendored Normal file
View file

@ -0,0 +1,14 @@
document.addEventListener('keydown', function(event) {
// command+K or ctrl+K
if ((event.metaKey || event.ctrlKey) && event.key === 'k') {
event.preventDefault();
document.querySelector('.search-input, .search-field, input[name="q"]').focus();
}
// forward slash
if (event.key === '/' &&
!event.target.matches('input, textarea, select')) {
event.preventDefault();
document.querySelector('.search-input, .search-field, input[name="q"]').focus();
}
});

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -123,7 +123,7 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n", " del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n", "\n",
"# this command installs all the dependencies needed for the llama stack server with the together inference provider\n", "# this command installs all the dependencies needed for the llama stack server with the together inference provider\n",
"!uv run --with llama-stack llama stack build --template together --image-type venv \n", "!uv run --with llama-stack llama stack build --distro together --image-type venv \n",
"\n", "\n",
"def run_llama_stack_server_background():\n", "def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n", " log_file = open(\"llama_stack_server.log\", \"w\")\n",
@ -165,7 +165,7 @@
"# use this helper if needed to kill the server \n", "# use this helper if needed to kill the server \n",
"def kill_llama_stack_server():\n", "def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes\n", " # Kill any existing llama stack server processes\n",
" os.system(\"ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9\")\n" " os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
] ]
}, },
{ {

View file

@ -233,7 +233,7 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n", " del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n", "\n",
"# this command installs all the dependencies needed for the llama stack server \n", "# this command installs all the dependencies needed for the llama stack server \n",
"!uv run --with llama-stack llama stack build --template meta-reference-gpu --image-type venv \n", "!uv run --with llama-stack llama stack build --distro meta-reference-gpu --image-type venv \n",
"\n", "\n",
"def run_llama_stack_server_background():\n", "def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n", " log_file = open(\"llama_stack_server.log\", \"w\")\n",
@ -275,7 +275,7 @@
"# use this helper if needed to kill the server \n", "# use this helper if needed to kill the server \n",
"def kill_llama_stack_server():\n", "def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes\n", " # Kill any existing llama stack server processes\n",
" os.system(\"ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9\")\n" " os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
] ]
}, },
{ {

View file

@ -223,7 +223,7 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n", " del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n", "\n",
"# this command installs all the dependencies needed for the llama stack server \n", "# this command installs all the dependencies needed for the llama stack server \n",
"!uv run --with llama-stack llama stack build --template llama_api --image-type venv \n", "!uv run --with llama-stack llama stack build --distro llama_api --image-type venv \n",
"\n", "\n",
"def run_llama_stack_server_background():\n", "def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n", " log_file = open(\"llama_stack_server.log\", \"w\")\n",
@ -265,7 +265,7 @@
"# use this helper if needed to kill the server \n", "# use this helper if needed to kill the server \n",
"def kill_llama_stack_server():\n", "def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes\n", " # Kill any existing llama stack server processes\n",
" os.system(\"ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9\")\n" " os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
] ]
}, },
{ {

View file

@ -37,7 +37,7 @@
"\n", "\n",
"To learn more about torchtune: https://github.com/pytorch/torchtune\n", "To learn more about torchtune: https://github.com/pytorch/torchtune\n",
"\n", "\n",
"We will use [experimental-post-training](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/templates/experimental-post-training) as the distribution template\n", "We will use [experimental-post-training](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/experimental-post-training) as the distribution template\n",
"\n", "\n",
"#### 0.0. Prerequisite: Have an OpenAI API key\n", "#### 0.0. Prerequisite: Have an OpenAI API key\n",
"In this showcase, we will use [braintrust](https://www.braintrust.dev/) as scoring provider for eval and it uses OpenAI model as judge model for scoring. So, you need to get an API key from [OpenAI developer platform](https://platform.openai.com/docs/overview).\n", "In this showcase, we will use [braintrust](https://www.braintrust.dev/) as scoring provider for eval and it uses OpenAI model as judge model for scoring. So, you need to get an API key from [OpenAI developer platform](https://platform.openai.com/docs/overview).\n",
@ -2864,7 +2864,7 @@
} }
], ],
"source": [ "source": [
"!llama stack build --template experimental-post-training --image-type venv --image-name __system__" "!llama stack build --distro experimental-post-training --image-type venv --image-name __system__"
] ]
}, },
{ {
@ -3216,19 +3216,19 @@
"INFO:datasets:Duckdb version 1.1.3 available.\n", "INFO:datasets:Duckdb version 1.1.3 available.\n",
"INFO:datasets:TensorFlow version 2.18.0 available.\n", "INFO:datasets:TensorFlow version 2.18.0 available.\n",
"INFO:datasets:JAX version 0.4.33 available.\n", "INFO:datasets:JAX version 0.4.33 available.\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: basic::equality served by basic\n", "INFO:llama_stack.core.stack:Scoring_fns: basic::equality served by basic\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: basic::subset_of served by basic\n", "INFO:llama_stack.core.stack:Scoring_fns: basic::subset_of served by basic\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic\n", "INFO:llama_stack.core.stack:Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::factuality served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::factuality served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::answer-correctness served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::answer-correctness served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::answer-relevancy served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::answer-relevancy served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::answer-similarity served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::answer-similarity served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::faithfulness served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::faithfulness served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::context-entity-recall served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::context-entity-recall served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::context-precision served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::context-precision served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::context-recall served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::context-recall served by braintrust\n",
"INFO:llama_stack.distribution.stack:Scoring_fns: braintrust::context-relevancy served by braintrust\n", "INFO:llama_stack.core.stack:Scoring_fns: braintrust::context-relevancy served by braintrust\n",
"INFO:llama_stack.distribution.stack:\n" "INFO:llama_stack.core.stack:\n"
] ]
}, },
{ {
@ -3448,7 +3448,7 @@
"\n", "\n",
"os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n", "os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
"\n", "\n",
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"client = LlamaStackAsLibraryClient(\"experimental-post-training\")\n", "client = LlamaStackAsLibraryClient(\"experimental-post-training\")\n",
"_ = client.initialize()" "_ = client.initialize()"
] ]

View file

@ -38,7 +38,7 @@
"source": [ "source": [
"# NBVAL_SKIP\n", "# NBVAL_SKIP\n",
"!pip install -U llama-stack\n", "!pip install -U llama-stack\n",
"!UV_SYSTEM_PYTHON=1 llama stack build --template fireworks --image-type venv" "!UV_SYSTEM_PYTHON=1 llama stack build --distro fireworks --image-type venv"
] ]
}, },
{ {
@ -48,7 +48,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from llama_stack_client import LlamaStackClient, Agent\n", "from llama_stack_client import LlamaStackClient, Agent\n",
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"from rich.pretty import pprint\n", "from rich.pretty import pprint\n",
"import json\n", "import json\n",
"import uuid\n", "import uuid\n",

View file

@ -57,7 +57,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# NBVAL_SKIP\n", "# NBVAL_SKIP\n",
"!UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv" "!UV_SYSTEM_PYTHON=1 llama stack build --distro together --image-type venv"
] ]
}, },
{ {
@ -661,7 +661,7 @@
"except ImportError:\n", "except ImportError:\n",
" print(\"Not in Google Colab environment\")\n", " print(\"Not in Google Colab environment\")\n",
"\n", "\n",
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"\n", "\n",
"client = LlamaStackAsLibraryClient(\"together\")\n", "client = LlamaStackAsLibraryClient(\"together\")\n",
"_ = client.initialize()" "_ = client.initialize()"

View file

@ -35,7 +35,7 @@
], ],
"source": [ "source": [
"from llama_stack_client import LlamaStackClient, Agent\n", "from llama_stack_client import LlamaStackClient, Agent\n",
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"from rich.pretty import pprint\n", "from rich.pretty import pprint\n",
"import json\n", "import json\n",
"import uuid\n", "import uuid\n",

View file

@ -92,7 +92,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"```bash\n", "```bash\n",
"LLAMA_STACK_DIR=$(pwd) llama stack build --template nvidia --image-type venv\n", "LLAMA_STACK_DIR=$(pwd) llama stack build --distro nvidia --image-type venv\n",
"```" "```"
] ]
}, },
@ -194,7 +194,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"\n", "\n",
"client = LlamaStackAsLibraryClient(\"nvidia\")\n", "client = LlamaStackAsLibraryClient(\"nvidia\")\n",
"client.initialize()" "client.initialize()"

View file

@ -81,7 +81,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"```bash\n", "```bash\n",
"LLAMA_STACK_DIR=$(pwd) llama stack build --template nvidia --image-type venv\n", "LLAMA_STACK_DIR=$(pwd) llama stack build --distro nvidia --image-type venv\n",
"```" "```"
] ]
}, },

View file

@ -56,7 +56,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"\n", "\n",
"client = LlamaStackAsLibraryClient(\"nvidia\")\n", "client = LlamaStackAsLibraryClient(\"nvidia\")\n",
"client.initialize()" "client.initialize()"

View file

@ -56,7 +56,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"\n", "\n",
"client = LlamaStackAsLibraryClient(\"nvidia\")\n", "client = LlamaStackAsLibraryClient(\"nvidia\")\n",
"client.initialize()" "client.initialize()"

View file

@ -56,7 +56,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from llama_stack.distribution.library_client import LlamaStackAsLibraryClient\n", "from llama_stack.core.library_client import LlamaStackAsLibraryClient\n",
"\n", "\n",
"client = LlamaStackAsLibraryClient(\"nvidia\")\n", "client = LlamaStackAsLibraryClient(\"nvidia\")\n",
"client.initialize()" "client.initialize()"

View file

@ -1 +1 @@
The RFC Specification (OpenAPI format) is generated from the set of API endpoints located in `llama_stack/distribution/server/endpoints.py` using the `generate.py` utility. The RFC Specification (OpenAPI format) is generated from the set of API endpoints located in `llama_stack.core/server/endpoints.py` using the `generate.py` utility.

View file

@ -17,7 +17,7 @@ import fire
import ruamel.yaml as yaml import ruamel.yaml as yaml
from llama_stack.apis.version import LLAMA_STACK_API_VERSION # noqa: E402 from llama_stack.apis.version import LLAMA_STACK_API_VERSION # noqa: E402
from llama_stack.distribution.stack import LlamaStack # noqa: E402 from llama_stack.core.stack import LlamaStack # noqa: E402
from .pyopenapi.options import Options # noqa: E402 from .pyopenapi.options import Options # noqa: E402
from .pyopenapi.specification import Info, Server # noqa: E402 from .pyopenapi.specification import Info, Server # noqa: E402

View file

@ -12,7 +12,7 @@ from typing import TextIO
from typing import Any, List, Optional, Union, get_type_hints, get_origin, get_args from typing import Any, List, Optional, Union, get_type_hints, get_origin, get_args
from llama_stack.strong_typing.schema import object_to_json, StrictJsonType from llama_stack.strong_typing.schema import object_to_json, StrictJsonType
from llama_stack.distribution.resolver import api_protocol_map from llama_stack.core.resolver import api_protocol_map
from .generator import Generator from .generator import Generator
from .options import Options from .options import Options

View file

@ -73,7 +73,7 @@ The API is defined in the [YAML](_static/llama-stack-spec.yaml) and [HTML](_stat
To prove out the API, we implemented a handful of use cases to make things more concrete. The [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps) repository contains [6 different examples](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) ranging from very basic to a multi turn agent. To prove out the API, we implemented a handful of use cases to make things more concrete. The [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps) repository contains [6 different examples](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) ranging from very basic to a multi turn agent.
There is also a sample inference endpoint implementation in the [llama-stack](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distribution/server/server.py) repository. There is also a sample inference endpoint implementation in the [llama-stack](https://github.com/meta-llama/llama-stack/blob/main/llama_stack.core/server/server.py) repository.
## Limitations ## Limitations

View file

@ -145,12 +145,12 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n", " del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n", "\n",
"# this command installs all the dependencies needed for the llama stack server with the ollama inference provider\n", "# this command installs all the dependencies needed for the llama stack server with the ollama inference provider\n",
"!uv run --with llama-stack llama stack build --template starter --image-type venv\n", "!uv run --with llama-stack llama stack build --distro starter --image-type venv\n",
"\n", "\n",
"def run_llama_stack_server_background():\n", "def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n", " log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n", " process = subprocess.Popen(\n",
" f\"uv run --with llama-stack llama stack run starter --image-type venv --env INFERENCE_MODEL=llama3.2:3b\",\n", " f\"OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter --image-type venv",
" shell=True,\n", " shell=True,\n",
" stdout=log_file,\n", " stdout=log_file,\n",
" stderr=log_file,\n", " stderr=log_file,\n",
@ -187,7 +187,7 @@
"# use this helper if needed to kill the server \n", "# use this helper if needed to kill the server \n",
"def kill_llama_stack_server():\n", "def kill_llama_stack_server():\n",
" # Kill any existing llama stack server processes\n", " # Kill any existing llama stack server processes\n",
" os.system(\"ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9\")\n" " os.system(\"ps aux | grep -v grep | grep llama_stack.core.server.server | awk '{print $2}' | xargs kill -9\")\n"
] ]
}, },
{ {

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# inline::meta-reference # inline::meta-reference
## Description ## Description

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# remote::nvidia # remote::nvidia
## Description ## Description

View file

@ -43,7 +43,7 @@ We have built-in functionality to run the supported open-benckmarks using llama-
Spin up llama stack server with 'open-benchmark' template Spin up llama stack server with 'open-benchmark' template
``` ```
llama stack run llama_stack/templates/open-benchmark/run.yaml llama stack run llama_stack/distributions/open-benchmark/run.yaml
``` ```

View file

@ -23,7 +23,7 @@ To use the HF SFTTrainer in your Llama Stack project, follow these steps:
You can access the HuggingFace trainer via the `ollama` distribution: You can access the HuggingFace trainer via the `ollama` distribution:
```bash ```bash
llama stack build --template starter --image-type venv llama stack build --distro starter --image-type venv
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
``` ```

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# inline::huggingface # inline::huggingface
## Description ## Description

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# inline::torchtune # inline::torchtune
## Description ## Description

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# remote::nvidia # remote::nvidia
## Description ## Description

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# inline::basic # inline::basic
## Description ## Description

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# inline::braintrust # inline::braintrust
## Description ## Description

View file

@ -1,3 +1,7 @@
---
orphan: true
---
# inline::llm-as-judge # inline::llm-as-judge
## Description ## Description

View file

@ -111,7 +111,7 @@ name = "llama-stack-api-weather"
version = "0.1.0" version = "0.1.0"
description = "Weather API for Llama Stack" description = "Weather API for Llama Stack"
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic"] dependencies = ["llama-stack", "pydantic"]
[build-system] [build-system]
@ -231,7 +231,7 @@ name = "llama-stack-provider-kaze"
version = "0.1.0" version = "0.1.0"
description = "Kaze weather provider for Llama Stack" description = "Kaze weather provider for Llama Stack"
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic", "aiohttp"] dependencies = ["llama-stack", "pydantic", "aiohttp"]
[build-system] [build-system]
@ -355,7 +355,7 @@ server:
8. Run the server: 8. Run the server:
```bash ```bash
python -m llama_stack.distribution.server.server --yaml-config ~/.llama/run-byoa.yaml python -m llama_stack.core.server.server --yaml-config ~/.llama/run-byoa.yaml
``` ```
9. Test the API: 9. Test the API:

View file

@ -97,11 +97,11 @@ To start the Llama Stack Playground, run the following commands:
1. Start up the Llama Stack API server 1. Start up the Llama Stack API server
```bash ```bash
llama stack build --template together --image-type conda llama stack build --distro together --image-type venv
llama stack run together llama stack run together
``` ```
2. Start Streamlit UI 2. Start Streamlit UI
```bash ```bash
uv run --with ".[ui]" streamlit run llama_stack/distribution/ui/app.py uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py
``` ```

View file

@ -2,7 +2,9 @@
Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics. Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.
> **Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions) directly, before progressing to Agents or Responses API. ```{note}
For simple and basic inferencing, you may want to use the [Chat Completions API](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions) directly, before progressing to Agents or Responses API.
```
## Overview ## Overview

View file

@ -76,7 +76,9 @@ Features:
- Context retrieval with token limits - Context retrieval with token limits
> **Note:** By default, llama stack run.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers. ```{note}
By default, llama stack run.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers.
```
## Model Context Protocol (MCP) ## Model Context Protocol (MCP)

View file

@ -18,3 +18,4 @@ We are working on adding a few more APIs to complete the application lifecycle.
- **Batch Inference**: run inference on a dataset of inputs - **Batch Inference**: run inference on a dataset of inputs
- **Batch Agents**: run agents on a dataset of inputs - **Batch Agents**: run agents on a dataset of inputs
- **Synthetic Data Generation**: generate synthetic data for model development - **Synthetic Data Generation**: generate synthetic data for model development
- **Batches**: OpenAI-compatible batch management for inference

View file

@ -131,6 +131,7 @@ html_static_path = ["../_static"]
def setup(app): def setup(app):
app.add_css_file("css/my_theme.css") app.add_css_file("css/my_theme.css")
app.add_js_file("js/detect_theme.js") app.add_js_file("js/detect_theme.js")
app.add_js_file("js/keyboard_shortcuts.js")
def dockerhub_role(name, rawtext, text, lineno, inliner, options={}, content=[]): def dockerhub_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
url = f"https://hub.docker.com/r/llamastack/{text}" url = f"https://hub.docker.com/r/llamastack/{text}"

View file

@ -2,13 +2,38 @@
```{include} ../../../CONTRIBUTING.md ```{include} ../../../CONTRIBUTING.md
``` ```
See the [Adding a New API Provider](new_api_provider.md) which describes how to add new API providers to the Stack. ## Adding a New Provider
See:
- [Adding a New API Provider Page](new_api_provider.md) which describes how to add new API providers to the Stack.
- [Vector Database Page](new_vector_database.md) which describes how to add a new vector databases with Llama Stack.
- [External Provider Page](../providers/external/index.md) which describes how to add external providers to the Stack.
```{toctree} ```{toctree}
:maxdepth: 1 :maxdepth: 1
:hidden: :hidden:
new_api_provider new_api_provider
new_vector_database
```
## Testing
```{include} ../../../tests/README.md
```
## Advanced Topics
For developers who need deeper understanding of the testing system internals:
```{toctree}
:maxdepth: 1
testing/record-replay
```
### Benchmarking
```{include} ../../../docs/source/distributions/k8s-benchmark/README.md
``` ```

View file

@ -6,7 +6,7 @@ This guide will walk you through the process of adding a new API provider to Lla
- Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.) - Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
- Determine the provider type ({repopath}`Remote::llama_stack/providers/remote` or {repopath}`Inline::llama_stack/providers/inline`). Remote providers make requests to external services, while inline providers execute implementation locally. - Determine the provider type ({repopath}`Remote::llama_stack/providers/remote` or {repopath}`Inline::llama_stack/providers/inline`). Remote providers make requests to external services, while inline providers execute implementation locally.
- Add your provider to the appropriate {repopath}`Registry::llama_stack/providers/registry/`. Specify pip dependencies necessary. - Add your provider to the appropriate {repopath}`Registry::llama_stack/providers/registry/`. Specify pip dependencies necessary.
- Update any distribution {repopath}`Templates::llama_stack/templates/` `build.yaml` and `run.yaml` files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation. - Update any distribution {repopath}`Templates::llama_stack/distributions/` `build.yaml` and `run.yaml` files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
Here are some example PRs to help you get started: Here are some example PRs to help you get started:
@ -52,7 +52,7 @@ def get_base_url(self) -> str:
## Testing the Provider ## Testing the Provider
Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --template together`. Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --distro together`.
### 1. Integration Testing ### 1. Integration Testing

View file

@ -0,0 +1,75 @@
# Adding a New Vector Database
This guide will walk you through the process of adding a new vector database to Llama Stack.
> **_NOTE:_** Here's an example Pull Request of the [Milvus Vector Database Provider](https://github.com/meta-llama/llama-stack/pull/1467).
Vector Database providers are used to store and retrieve vector embeddings. Vector databases are not limited to vector
search but can support keyword and hybrid search. Additionally, vector database can also support operations like
filtering, sorting, and aggregating vectors.
## Steps to Add a New Vector Database Provider
1. **Choose the Database Type**: Determine if your vector database is a remote service, inline, or both.
- Remote databases make requests to external services, while inline databases execute locally. Some providers support both.
2. **Implement the Provider**: Create a new provider class that inherits from `VectorDatabaseProvider` and implements the required methods.
- Implement methods for vector storage, retrieval, search, and any additional features your database supports.
- You will need to implement the following methods for `YourVectorIndex`:
- `YourVectorIndex.create()`
- `YourVectorIndex.initialize()`
- `YourVectorIndex.add_chunks()`
- `YourVectorIndex.delete_chunk()`
- `YourVectorIndex.query_vector()`
- `YourVectorIndex.query_keyword()`
- `YourVectorIndex.query_hybrid()`
- You will need to implement the following methods for `YourVectorIOAdapter`:
- `YourVectorIOAdapter.initialize()`
- `YourVectorIOAdapter.shutdown()`
- `YourVectorIOAdapter.list_vector_dbs()`
- `YourVectorIOAdapter.register_vector_db()`
- `YourVectorIOAdapter.unregister_vector_db()`
- `YourVectorIOAdapter.insert_chunks()`
- `YourVectorIOAdapter.query_chunks()`
- `YourVectorIOAdapter.delete_chunks()`
3. **Add to Registry**: Register your provider in the appropriate registry file.
- Update {repopath}`llama_stack/providers/registry/vector_io.py` to include your new provider.
```python
from llama_stack.providers.registry.specs import InlineProviderSpec
from llama_stack.providers.registry.api import Api
InlineProviderSpec(
api=Api.vector_io,
provider_type="inline::milvus",
pip_packages=["pymilvus>=2.4.10"],
module="llama_stack.providers.inline.vector_io.milvus",
config_class="llama_stack.providers.inline.vector_io.milvus.MilvusVectorIOConfig",
api_dependencies=[Api.inference],
optional_api_dependencies=[Api.files],
description="",
),
```
4. **Add Tests**: Create unit tests and integration tests for your provider in the `tests/` directory.
- Unit Tests
- By following the structure of the class methods, you will be able to easily run unit and integration tests for your database.
1. You have to configure the tests for your provide in `/tests/unit/providers/vector_io/conftest.py`.
2. Update the `vector_provider` fixture to include your provider if they are an inline provider.
3. Create a `your_vectorprovider_index` fixture that initializes your vector index.
4. Create a `your_vectorprovider_adapter` fixture that initializes your vector adapter.
5. Add your provider to the `vector_io_providers` fixture dictionary.
- Please follow the naming convention of `your_vectorprovider_index` and `your_vectorprovider_adapter` as the tests require this to execute properly.
- Integration Tests
- Integration tests are located in {repopath}`tests/integration`. These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality.
- The two set of integration tests are:
- `tests/integration/vector_io/test_vector_io.py`: This file tests registration, insertion, and retrieval.
- `tests/integration/vector_io/test_openai_vector_stores.py`: These tests are for OpenAI-compatible vector stores and test the OpenAI API compatibility.
- You will need to update `skip_if_provider_doesnt_support_openai_vector_stores` to include your provider as well as `skip_if_provider_doesnt_support_openai_vector_stores_search` to test the appropriate search functionality.
- Running the tests in the GitHub CI
- You will need to update the `.github/workflows/integration-vector-io-tests.yml` file to include your provider.
- If your provider is a remote provider, you will also have to add a container to spin up and run it in the action.
- Updating the pyproject.yml
- If you are adding tests for the `inline` provider you will have to update the `unit` group.
- `uv add new_pip_package --group unit`
- If you are adding tests for the `remote` provider you will have to update the `test` group, which is used in the GitHub CI for integration tests.
- `uv add new_pip_package --group test`
5. **Update Documentation**: Please update the documentation for end users
- Generate the provider documentation by running {repopath}`./scripts/provider_codegen.py`.
- Update the autogenerated content in the registry/vector_io.py file with information about your provider. Please see other providers for examples.

View file

@ -1,6 +0,0 @@
# Testing Llama Stack
Tests are of three different kinds:
- Unit tests
- Provider focused integration tests
- Client SDK tests

View file

@ -0,0 +1,234 @@
# Record-Replay System
Understanding how Llama Stack captures and replays API interactions for testing.
## Overview
The record-replay system solves a fundamental challenge in AI testing: how do you test against expensive, non-deterministic APIs without breaking the bank or dealing with flaky tests?
The solution: intercept API calls, store real responses, and replay them later. This gives you real API behavior without the cost or variability.
## How It Works
### Request Hashing
Every API request gets converted to a deterministic hash for lookup:
```python
def normalize_request(method: str, url: str, headers: dict, body: dict) -> str:
normalized = {
"method": method.upper(),
"endpoint": urlparse(url).path, # Just the path, not full URL
"body": body, # Request parameters
}
return hashlib.sha256(json.dumps(normalized, sort_keys=True).encode()).hexdigest()
```
**Key insight:** The hashing is intentionally precise. Different whitespace, float precision, or parameter order produces different hashes. This prevents subtle bugs from false cache hits.
```python
# These produce DIFFERENT hashes:
{"content": "Hello world"}
{"content": "Hello world\n"}
{"temperature": 0.7}
{"temperature": 0.7000001}
```
### Client Interception
The system patches OpenAI and Ollama client methods to intercept calls before they leave your application. This happens transparently - your test code doesn't change.
### Storage Architecture
Recordings use a two-tier storage system optimized for both speed and debuggability:
```
recordings/
├── index.sqlite # Fast lookup by request hash
└── responses/
├── abc123def456.json # Individual response files
└── def789ghi012.json
```
**SQLite index** enables O(log n) hash lookups and metadata queries without loading response bodies.
**JSON files** store complete request/response pairs in human-readable format for debugging.
## Recording Modes
### LIVE Mode
Direct API calls with no recording or replay:
```python
with inference_recording(mode=InferenceMode.LIVE):
response = await client.chat.completions.create(...)
```
Use for initial development and debugging against real APIs.
### RECORD Mode
Captures API interactions while passing through real responses:
```python
with inference_recording(mode=InferenceMode.RECORD, storage_dir="./recordings"):
response = await client.chat.completions.create(...)
# Real API call made, response captured AND returned
```
The recording process:
1. Request intercepted and hashed
2. Real API call executed
3. Response captured and serialized
4. Recording stored to disk
5. Original response returned to caller
### REPLAY Mode
Returns stored responses instead of making API calls:
```python
with inference_recording(mode=InferenceMode.REPLAY, storage_dir="./recordings"):
response = await client.chat.completions.create(...)
# No API call made, cached response returned instantly
```
The replay process:
1. Request intercepted and hashed
2. Hash looked up in SQLite index
3. Response loaded from JSON file
4. Response deserialized and returned
5. Error if no recording found
## Streaming Support
Streaming APIs present a unique challenge: how do you capture an async generator?
### The Problem
```python
# How do you record this?
async for chunk in client.chat.completions.create(stream=True):
process(chunk)
```
### The Solution
The system captures all chunks immediately before yielding any:
```python
async def handle_streaming_record(response):
# Capture complete stream first
chunks = []
async for chunk in response:
chunks.append(chunk)
# Store complete recording
storage.store_recording(
request_hash, request_data, {"body": chunks, "is_streaming": True}
)
# Return generator that replays captured chunks
async def replay_stream():
for chunk in chunks:
yield chunk
return replay_stream()
```
This ensures:
- **Complete capture** - The entire stream is saved atomically
- **Interface preservation** - The returned object behaves like the original API
- **Deterministic replay** - Same chunks in the same order every time
## Serialization
API responses contain complex Pydantic objects that need careful serialization:
```python
def _serialize_response(response):
if hasattr(response, "model_dump"):
# Preserve type information for proper deserialization
return {
"__type__": f"{response.__class__.__module__}.{response.__class__.__qualname__}",
"__data__": response.model_dump(mode="json"),
}
return response
```
This preserves type safety - when replayed, you get the same Pydantic objects with all their validation and methods.
## Environment Integration
### Environment Variables
Control recording behavior globally:
```bash
export LLAMA_STACK_TEST_INFERENCE_MODE=replay
export LLAMA_STACK_TEST_RECORDING_DIR=/path/to/recordings
pytest tests/integration/
```
### Pytest Integration
The system integrates automatically based on environment variables, requiring no changes to test code.
## Debugging Recordings
### Inspecting Storage
```bash
# See what's recorded
sqlite3 recordings/index.sqlite "SELECT endpoint, model, timestamp FROM recordings LIMIT 10;"
# View specific response
cat recordings/responses/abc123def456.json | jq '.response.body'
# Find recordings by endpoint
sqlite3 recordings/index.sqlite "SELECT * FROM recordings WHERE endpoint='/v1/chat/completions';"
```
### Common Issues
**Hash mismatches:** Request parameters changed slightly between record and replay
```bash
# Compare request details
cat recordings/responses/abc123.json | jq '.request'
```
**Serialization errors:** Response types changed between versions
```bash
# Re-record with updated types
rm recordings/responses/failing_hash.json
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest test_failing.py
```
**Missing recordings:** New test or changed parameters
```bash
# Record the missing interaction
LLAMA_STACK_TEST_INFERENCE_MODE=record pytest test_new.py
```
## Design Decisions
### Why Not Mocks?
Traditional mocking breaks down with AI APIs because:
- Response structures are complex and evolve frequently
- Streaming behavior is hard to mock correctly
- Edge cases in real APIs get missed
- Mocks become brittle maintenance burdens
### Why Precise Hashing?
Loose hashing (normalizing whitespace, rounding floats) seems convenient but hides bugs. If a test changes slightly, you want to know about it rather than accidentally getting the wrong cached response.
### Why JSON + SQLite?
- **JSON** - Human readable, diff-friendly, easy to inspect and modify
- **SQLite** - Fast indexed lookups without loading response bodies
- **Hybrid** - Best of both worlds for different use cases
This system provides reliable, fast testing against real AI APIs while maintaining the ability to debug issues when they arise.

View file

@ -174,7 +174,7 @@ spec:
- name: llama-stack - name: llama-stack
image: localhost/llama-stack-run-k8s:latest image: localhost/llama-stack-run-k8s:latest
imagePullPolicy: IfNotPresent imagePullPolicy: IfNotPresent
command: ["python", "-m", "llama_stack.distribution.server.server", "--config", "/app/config.yaml"] command: ["python", "-m", "llama_stack.core.server.server", "--config", "/app/config.yaml"]
ports: ports:
- containerPort: 5000 - containerPort: 5000
volumeMounts: volumeMounts:

View file

@ -47,30 +47,37 @@ pip install -e .
``` ```
Use the CLI to build your distribution. Use the CLI to build your distribution.
The main points to consider are: The main points to consider are:
1. **Image Type** - Do you want a Conda / venv environment or a Container (eg. Docker) 1. **Image Type** - Do you want a venv environment or a Container (eg. Docker)
2. **Template** - Do you want to use a template to build your distribution? or start from scratch ? 2. **Template** - Do you want to use a template to build your distribution? or start from scratch ?
3. **Config** - Do you want to use a pre-existing config file to build your distribution? 3. **Config** - Do you want to use a pre-existing config file to build your distribution?
``` ```
llama stack build -h llama stack build -h
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run] usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--distro DISTRIBUTION] [--list-distros] [--image-type {container,venv}] [--image-name IMAGE_NAME] [--print-deps-only]
[--run] [--providers PROVIDERS]
Build a Llama stack container Build a Llama stack container
options: options:
-h, --help show this help message and exit -h, --help show this help message and exit
--config CONFIG Path to a config file to use for the build. You can find example configs in llama_stack/distributions/**/build.yaml. If this argument is not provided, you will --config CONFIG Path to a config file to use for the build. You can find example configs in llama_stack.cores/**/build.yaml. If this argument is not provided, you will be prompted to
be prompted to enter information interactively (default: None) enter information interactively (default: None)
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None) --template TEMPLATE (deprecated) Name of the example template config to use for build. You may use `llama stack build --list-distros` to check out the available distributions (default:
--list-templates Show the available templates for building a Llama Stack distribution (default: False) None)
--image-type {conda,container,venv} --distro DISTRIBUTION, --distribution DISTRIBUTION
Name of the distribution to use for build. You may use `llama stack build --list-distros` to check out the available distributions (default: None)
--list-distros, --list-distributions
Show the available distributions for building a Llama Stack distribution (default: False)
--image-type {container,venv}
Image Type to use for the build. If not specified, will use the image type from the template config. (default: None) Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
--image-name IMAGE_NAME --image-name IMAGE_NAME
[for image-type=conda|container|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active environment will be used if [for image-type=container|venv] Name of the virtual environment to use for the build. If not specified, currently active environment will be used if found. (default:
found. (default: None) None)
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False) --print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False) --run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
--providers PROVIDERS
Build a config for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple providers per
API. (default: None)
``` ```
After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command. After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
@ -141,7 +148,7 @@ You may then pick a template to build your distribution with providers fitted to
For example, to build a distribution with TGI as the inference provider, you can run: For example, to build a distribution with TGI as the inference provider, you can run:
``` ```
$ llama stack build --template starter $ llama stack build --distro starter
... ...
You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-starter/starter-run.yaml` You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-starter/starter-run.yaml`
``` ```
@ -159,7 +166,7 @@ It would be best to start with a template and understand the structure of the co
llama stack build llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack > Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
> Enter the image type you want your Llama Stack to be built as (container or conda or venv): conda > Enter the image type you want your Llama Stack to be built as (container or venv): venv
Llama Stack is composed of several APIs working together. Let's select Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs. the provider types (implementations) you want to use for these APIs.
@ -184,10 +191,10 @@ You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack
:::{tab-item} Building from a pre-existing build config file :::{tab-item} Building from a pre-existing build config file
- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
- The config file will be of contents like the ones in `llama_stack/templates/*build.yaml`. - The config file will be of contents like the ones in `llama_stack/distributions/*build.yaml`.
``` ```
llama stack build --config llama_stack/templates/starter/build.yaml llama stack build --config llama_stack/distributions/starter/build.yaml
``` ```
::: :::
@ -253,11 +260,11 @@ Podman is supported as an alternative to Docker. Set `CONTAINER_BINARY` to `podm
To build a container image, you may start off from a template and use the `--image-type container` flag to specify `container` as the build image type. To build a container image, you may start off from a template and use the `--image-type container` flag to specify `container` as the build image type.
``` ```
llama stack build --template starter --image-type container llama stack build --distro starter --image-type container
``` ```
``` ```
$ llama stack build --template starter --image-type container $ llama stack build --distro starter --image-type container
... ...
Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
... ...
@ -312,7 +319,7 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
``` ```
llama stack run -h llama stack run -h
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE] usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
[--image-type {conda,venv}] [--enable-ui] [--image-type {venv}] [--enable-ui]
[config | template] [config | template]
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
@ -326,8 +333,8 @@ options:
--image-name IMAGE_NAME --image-name IMAGE_NAME
Name of the image to run. Defaults to the current environment (default: None) Name of the image to run. Defaults to the current environment (default: None)
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None) --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
--image-type {conda,venv} --image-type {venv}
Image Type used during the build. This can be either conda or venv. (default: None) Image Type used during the build. This should be venv. (default: None)
--enable-ui Start the UI server (default: False) --enable-ui Start the UI server (default: False)
``` ```
@ -342,9 +349,6 @@ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-
# Start using a venv # Start using a venv
llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
# Start using a conda environment
llama stack run --image-type conda ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
``` ```
``` ```

View file

@ -10,7 +10,6 @@ The default `run.yaml` files generated by templates are starting points for your
```yaml ```yaml
version: 2 version: 2
conda_env: ollama
apis: apis:
- agents - agents
- inference - inference

View file

@ -6,14 +6,14 @@ This avoids the overhead of setting up a server.
```bash ```bash
# setup # setup
uv pip install llama-stack uv pip install llama-stack
llama stack build --template starter --image-type venv llama stack build --distro starter --image-type venv
``` ```
```python ```python
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack.core.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient( client = LlamaStackAsLibraryClient(
"ollama", "starter",
# provider_data is optional, but if you need to pass in any provider specific data, you can do so here. # provider_data is optional, but if you need to pass in any provider specific data, you can do so here.
provider_data={"tavily_search_api_key": os.environ["TAVILY_SEARCH_API_KEY"]}, provider_data={"tavily_search_api_key": os.environ["TAVILY_SEARCH_API_KEY"]},
) )

View file

@ -9,6 +9,7 @@ This section provides an overview of the distributions available in Llama Stack.
list_of_distributions list_of_distributions
building_distro building_distro
customizing_run_yaml customizing_run_yaml
starting_llama_stack_server
importing_as_library importing_as_library
configuration configuration
``` ```

View file

@ -0,0 +1,156 @@
# Llama Stack Benchmark Suite on Kubernetes
## Motivation
Performance benchmarking is critical for understanding the overhead and characteristics of the Llama Stack abstraction layer compared to direct inference engines like vLLM.
### Why This Benchmark Suite Exists
**Performance Validation**: The Llama Stack provides a unified API layer across multiple inference providers, but this abstraction introduces potential overhead. This benchmark suite quantifies the performance impact by comparing:
- Llama Stack inference (with vLLM backend)
- Direct vLLM inference calls
- Both under identical Kubernetes deployment conditions
**Production Readiness Assessment**: Real-world deployments require understanding performance characteristics under load. This suite simulates concurrent user scenarios with configurable parameters (duration, concurrency, request patterns) to validate production readiness.
**Regression Detection (TODO)**: As the Llama Stack evolves, this benchmark provides automated regression detection for performance changes. CI/CD pipelines can leverage these benchmarks to catch performance degradations before production deployments.
**Resource Planning**: By measuring throughput, latency percentiles, and resource utilization patterns, teams can make informed decisions about:
- Kubernetes resource allocation (CPU, memory, GPU)
- Auto-scaling configurations
- Cost optimization strategies
### Key Metrics Captured
The benchmark suite measures critical performance indicators:
- **Throughput**: Requests per second under sustained load
- **Latency Distribution**: P50, P95, P99 response times
- **Time to First Token (TTFT)**: Critical for streaming applications
- **Error Rates**: Request failures and timeout analysis
This data enables data-driven architectural decisions and performance optimization efforts.
## Setup
**1. Deploy base k8s infrastructure:**
```bash
cd ../k8s
./apply.sh
```
**2. Deploy benchmark components:**
```bash
cd ../k8s-benchmark
./apply.sh
```
**3. Verify deployment:**
```bash
kubectl get pods
# Should see: llama-stack-benchmark-server, vllm-server, etc.
```
## Quick Start
### Basic Benchmarks
**Benchmark Llama Stack (default):**
```bash
cd docs/source/distributions/k8s-benchmark/
./run-benchmark.sh
```
**Benchmark vLLM direct:**
```bash
./run-benchmark.sh --target vllm
```
### Custom Configuration
**Extended benchmark with high concurrency:**
```bash
./run-benchmark.sh --target vllm --duration 120 --concurrent 20
```
**Short test run:**
```bash
./run-benchmark.sh --target stack --duration 30 --concurrent 5
```
## Command Reference
### run-benchmark.sh Options
```bash
./run-benchmark.sh [options]
Options:
-t, --target <stack|vllm> Target to benchmark (default: stack)
-d, --duration <seconds> Duration in seconds (default: 60)
-c, --concurrent <users> Number of concurrent users (default: 10)
-h, --help Show help message
Examples:
./run-benchmark.sh --target vllm # Benchmark vLLM direct
./run-benchmark.sh --target stack # Benchmark Llama Stack
./run-benchmark.sh -t vllm -d 120 -c 20 # vLLM with 120s, 20 users
```
## Local Testing
### Running Benchmark Locally
For local development without Kubernetes:
**1. Start OpenAI mock server:**
```bash
uv run python openai-mock-server.py --port 8080
```
**2. Run benchmark against mock server:**
```bash
uv run python benchmark.py \
--base-url http://localhost:8080/v1 \
--model mock-inference \
--duration 30 \
--concurrent 5
```
**3. Test against local vLLM server:**
```bash
# If you have vLLM running locally on port 8000
uv run python benchmark.py \
--base-url http://localhost:8000/v1 \
--model meta-llama/Llama-3.2-3B-Instruct \
--duration 30 \
--concurrent 5
```
**4. Profile the running server:**
```bash
./profile_running_server.sh
```
### OpenAI Mock Server
The `openai-mock-server.py` provides:
- **OpenAI-compatible API** for testing without real models
- **Configurable streaming delay** via `STREAM_DELAY_SECONDS` env var
- **Consistent responses** for reproducible benchmarks
- **Lightweight testing** without GPU requirements
**Mock server usage:**
```bash
uv run python openai-mock-server.py --port 8080
```
The mock server is also deployed in k8s as `openai-mock-service:8080` and can be used by changing the Llama Stack configuration to use the `mock-vllm-inference` provider.
## Files in this Directory
- `benchmark.py` - Core benchmark script with async streaming support
- `run-benchmark.sh` - Main script with target selection and configuration
- `openai-mock-server.py` - Mock OpenAI API server for local testing
- `README.md` - This documentation file

View file

@ -0,0 +1,36 @@
#!/usr/bin/env bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Deploys the benchmark-specific components on top of the base k8s deployment (../k8s/apply.sh).
export STREAM_DELAY_SECONDS=0.005
export POSTGRES_USER=llamastack
export POSTGRES_DB=llamastack
export POSTGRES_PASSWORD=llamastack
export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
export SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
export MOCK_INFERENCE_MODEL=mock-inference
export MOCK_INFERENCE_URL=openai-mock-service:8080
export BENCHMARK_INFERENCE_MODEL=$INFERENCE_MODEL
set -euo pipefail
set -x
# Deploy benchmark-specific components
kubectl create configmap llama-stack-config --from-file=stack_run_config.yaml \
--dry-run=client -o yaml > stack-configmap.yaml
kubectl apply --validate=false -f stack-configmap.yaml
# Deploy our custom llama stack server (overriding the base one)
envsubst < stack-k8s.yaml.template | kubectl apply --validate=false -f -

View file

@ -0,0 +1,267 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
"""
Simple benchmark script for Llama Stack with OpenAI API compatibility.
"""
import argparse
import asyncio
import os
import random
import statistics
import time
from typing import Tuple
import aiohttp
class BenchmarkStats:
def __init__(self):
self.response_times = []
self.ttft_times = []
self.chunks_received = []
self.errors = []
self.success_count = 0
self.total_requests = 0
self.concurrent_users = 0
self.start_time = None
self.end_time = None
self._lock = asyncio.Lock()
async def add_result(self, response_time: float, chunks: int, ttft: float = None, error: str = None):
async with self._lock:
self.total_requests += 1
if error:
self.errors.append(error)
else:
self.success_count += 1
self.response_times.append(response_time)
self.chunks_received.append(chunks)
if ttft is not None:
self.ttft_times.append(ttft)
def print_summary(self):
if not self.response_times:
print("No successful requests to report")
if self.errors:
print(f"Total errors: {len(self.errors)}")
print("First 5 errors:")
for error in self.errors[:5]:
print(f" {error}")
return
total_time = self.end_time - self.start_time
success_rate = (self.success_count / self.total_requests) * 100
print(f"\n{'='*60}")
print(f"BENCHMARK RESULTS")
print(f"{'='*60}")
print(f"Total time: {total_time:.2f}s")
print(f"Concurrent users: {self.concurrent_users}")
print(f"Total requests: {self.total_requests}")
print(f"Successful requests: {self.success_count}")
print(f"Failed requests: {len(self.errors)}")
print(f"Success rate: {success_rate:.1f}%")
print(f"Requests per second: {self.success_count / total_time:.2f}")
print(f"\nResponse Time Statistics:")
print(f" Mean: {statistics.mean(self.response_times):.3f}s")
print(f" Median: {statistics.median(self.response_times):.3f}s")
print(f" Min: {min(self.response_times):.3f}s")
print(f" Max: {max(self.response_times):.3f}s")
if len(self.response_times) > 1:
print(f" Std Dev: {statistics.stdev(self.response_times):.3f}s")
percentiles = [50, 90, 95, 99]
sorted_times = sorted(self.response_times)
print(f"\nPercentiles:")
for p in percentiles:
idx = int(len(sorted_times) * p / 100) - 1
idx = max(0, min(idx, len(sorted_times) - 1))
print(f" P{p}: {sorted_times[idx]:.3f}s")
if self.ttft_times:
print(f"\nTime to First Token (TTFT) Statistics:")
print(f" Mean: {statistics.mean(self.ttft_times):.3f}s")
print(f" Median: {statistics.median(self.ttft_times):.3f}s")
print(f" Min: {min(self.ttft_times):.3f}s")
print(f" Max: {max(self.ttft_times):.3f}s")
if len(self.ttft_times) > 1:
print(f" Std Dev: {statistics.stdev(self.ttft_times):.3f}s")
sorted_ttft = sorted(self.ttft_times)
print(f"\nTTFT Percentiles:")
for p in percentiles:
idx = int(len(sorted_ttft) * p / 100) - 1
idx = max(0, min(idx, len(sorted_ttft) - 1))
print(f" P{p}: {sorted_ttft[idx]:.3f}s")
if self.chunks_received:
print(f"\nStreaming Statistics:")
print(f" Mean chunks per response: {statistics.mean(self.chunks_received):.1f}")
print(f" Total chunks received: {sum(self.chunks_received)}")
if self.errors:
print(f"\nErrors (showing first 5):")
for error in self.errors[:5]:
print(f" {error}")
class LlamaStackBenchmark:
def __init__(self, base_url: str, model_id: str):
self.base_url = base_url.rstrip('/')
self.model_id = model_id
self.headers = {"Content-Type": "application/json"}
self.test_messages = [
[{"role": "user", "content": "Hi"}],
[{"role": "user", "content": "What is the capital of France?"}],
[{"role": "user", "content": "Explain quantum physics in simple terms."}],
[{"role": "user", "content": "Write a short story about a robot learning to paint."}],
[
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is a subset of AI..."},
{"role": "user", "content": "Can you give me a practical example?"}
]
]
async def make_async_streaming_request(self) -> Tuple[float, int, float | None, str | None]:
"""Make a single async streaming chat completion request."""
messages = random.choice(self.test_messages)
payload = {
"model": self.model_id,
"messages": messages,
"stream": True,
"max_tokens": 100
}
start_time = time.time()
chunks_received = 0
ttft = None
error = None
session = aiohttp.ClientSession()
try:
async with session.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
json=payload,
timeout=aiohttp.ClientTimeout(total=30)
) as response:
if response.status == 200:
async for line in response.content:
if line:
line_str = line.decode('utf-8').strip()
if line_str.startswith('data: '):
chunks_received += 1
if ttft is None:
ttft = time.time() - start_time
if line_str == 'data: [DONE]':
break
if chunks_received == 0:
error = "No streaming chunks received"
else:
text = await response.text()
error = f"HTTP {response.status}: {text[:100]}"
except Exception as e:
error = f"Request error: {str(e)}"
finally:
await session.close()
response_time = time.time() - start_time
return response_time, chunks_received, ttft, error
async def run_benchmark(self, duration: int, concurrent_users: int) -> BenchmarkStats:
"""Run benchmark using async requests for specified duration."""
stats = BenchmarkStats()
stats.concurrent_users = concurrent_users
stats.start_time = time.time()
print(f"Starting benchmark: {duration}s duration, {concurrent_users} concurrent users")
print(f"Target URL: {self.base_url}/chat/completions")
print(f"Model: {self.model_id}")
connector = aiohttp.TCPConnector(limit=concurrent_users)
async with aiohttp.ClientSession(connector=connector) as session:
async def worker(worker_id: int):
"""Worker that sends requests sequentially until canceled."""
request_count = 0
while True:
try:
response_time, chunks, ttft, error = await self.make_async_streaming_request()
await stats.add_result(response_time, chunks, ttft, error)
request_count += 1
except asyncio.CancelledError:
break
except Exception as e:
await stats.add_result(0, 0, None, f"Worker {worker_id} error: {str(e)}")
# Progress reporting task
async def progress_reporter():
last_report_time = time.time()
while True:
try:
await asyncio.sleep(1) # Report every second
if time.time() >= last_report_time + 10: # Report every 10 seconds
elapsed = time.time() - stats.start_time
print(f"Completed: {stats.total_requests} requests in {elapsed:.1f}s")
last_report_time = time.time()
except asyncio.CancelledError:
break
# Spawn concurrent workers
tasks = [asyncio.create_task(worker(i)) for i in range(concurrent_users)]
progress_task = asyncio.create_task(progress_reporter())
tasks.append(progress_task)
# Wait for duration then cancel all tasks
await asyncio.sleep(duration)
for task in tasks:
task.cancel()
# Wait for all tasks to complete
await asyncio.gather(*tasks, return_exceptions=True)
stats.end_time = time.time()
return stats
def main():
parser = argparse.ArgumentParser(description="Llama Stack Benchmark Tool")
parser.add_argument("--base-url", default=os.getenv("BENCHMARK_BASE_URL", "http://localhost:8000/v1/openai/v1"),
help="Base URL for the API (default: http://localhost:8000/v1/openai/v1)")
parser.add_argument("--model", default=os.getenv("INFERENCE_MODEL", "test-model"),
help="Model ID to use for requests")
parser.add_argument("--duration", type=int, default=60,
help="Duration in seconds to run benchmark (default: 60)")
parser.add_argument("--concurrent", type=int, default=10,
help="Number of concurrent users (default: 10)")
args = parser.parse_args()
benchmark = LlamaStackBenchmark(args.base_url, args.model)
try:
stats = asyncio.run(benchmark.run_benchmark(args.duration, args.concurrent))
stats.print_summary()
except KeyboardInterrupt:
print("\nBenchmark interrupted by user")
except Exception as e:
print(f"Benchmark failed: {e}")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,190 @@
#!/usr/bin/env python3
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
"""
OpenAI-compatible mock server that returns:
- Hardcoded /models response for consistent validation
- Valid OpenAI-formatted chat completion responses with dynamic content
"""
from flask import Flask, request, jsonify, Response
import time
import random
import uuid
import json
import argparse
import os
app = Flask(__name__)
# Models from environment variables
def get_models():
models_str = os.getenv("MOCK_MODELS", "meta-llama/Llama-3.2-3B-Instruct")
model_ids = [m.strip() for m in models_str.split(",") if m.strip()]
return {
"object": "list",
"data": [
{
"id": model_id,
"object": "model",
"created": 1234567890,
"owned_by": "vllm"
}
for model_id in model_ids
]
}
def generate_random_text(length=50):
"""Generate random but coherent text for responses."""
words = [
"Hello", "there", "I'm", "an", "AI", "assistant", "ready", "to", "help", "you",
"with", "your", "questions", "and", "tasks", "today", "Let", "me","know", "what",
"you'd", "like", "to", "discuss", "or", "explore", "together", "I", "can", "assist",
"with", "various", "topics", "including", "coding", "writing", "analysis", "and", "more"
]
return " ".join(random.choices(words, k=length))
@app.route('/v1/models', methods=['GET'])
def list_models():
models = get_models()
print(f"[MOCK] Returning models: {[m['id'] for m in models['data']]}")
return jsonify(models)
@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
"""Return OpenAI-formatted chat completion responses."""
data = request.get_json()
default_model = get_models()['data'][0]['id']
model = data.get('model', default_model)
messages = data.get('messages', [])
stream = data.get('stream', False)
print(f"[MOCK] Chat completion request - model: {model}, stream: {stream}")
if stream:
return handle_streaming_completion(model, messages)
else:
return handle_non_streaming_completion(model, messages)
def handle_non_streaming_completion(model, messages):
response_text = generate_random_text(random.randint(20, 80))
# Calculate realistic token counts
prompt_tokens = sum(len(str(msg.get('content', '')).split()) for msg in messages)
completion_tokens = len(response_text.split())
response = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion",
"created": int(time.time()),
"model": model,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": response_text
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens
}
}
return jsonify(response)
def handle_streaming_completion(model, messages):
def generate_stream():
# Generate response text
full_response = generate_random_text(random.randint(30, 100))
words = full_response.split()
# Send initial chunk
initial_chunk = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model,
"choices": [
{
"index": 0,
"delta": {"role": "assistant", "content": ""}
}
]
}
yield f"data: {json.dumps(initial_chunk)}\n\n"
# Send word by word
for i, word in enumerate(words):
chunk = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model,
"choices": [
{
"index": 0,
"delta": {"content": f"{word} " if i < len(words) - 1 else word}
}
]
}
yield f"data: {json.dumps(chunk)}\n\n"
# Configurable delay to simulate realistic streaming
stream_delay = float(os.getenv("STREAM_DELAY_SECONDS", "0.005"))
time.sleep(stream_delay)
# Send final chunk
final_chunk = {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model,
"choices": [
{
"index": 0,
"delta": {"content": ""},
"finish_reason": "stop"
}
]
}
yield f"data: {json.dumps(final_chunk)}\n\n"
yield "data: [DONE]\n\n"
return Response(
generate_stream(),
mimetype='text/event-stream',
headers={
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Access-Control-Allow-Origin': '*',
}
)
@app.route('/health', methods=['GET'])
def health():
return jsonify({"status": "healthy", "type": "openai-mock"})
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='OpenAI-compatible mock server')
parser.add_argument('--port', type=int, default=8081,
help='Port to run the server on (default: 8081)')
args = parser.parse_args()
port = args.port
models = get_models()
print("Starting OpenAI-compatible mock server...")
print(f"- /models endpoint with: {[m['id'] for m in models['data']]}")
print("- OpenAI-formatted chat/completion responses with dynamic content")
print("- Streaming support with valid SSE format")
print(f"- Listening on: http://0.0.0.0:{port}")
app.run(host='0.0.0.0', port=port, debug=False)

View file

@ -0,0 +1,52 @@
#!/bin/bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Script to profile an already running Llama Stack server
# Usage: ./profile_running_server.sh [duration_seconds] [output_file]
DURATION=${1:-60} # Default 60 seconds
OUTPUT_FILE=${2:-"llama_stack_profile"} # Default output file
echo "Looking for running Llama Stack server..."
# Find the server PID
SERVER_PID=$(ps aux | grep "llama_stack.core.server.server" | grep -v grep | awk '{print $2}' | head -1)
if [ -z "$SERVER_PID" ]; then
echo "Error: No running Llama Stack server found"
echo "Please start your server first with:"
echo "LLAMA_STACK_LOGGING=\"all=ERROR\" MOCK_INFERENCE_URL=http://localhost:8080 SAFETY_MODEL=llama-guard3:1b uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml"
exit 1
fi
echo "Found Llama Stack server with PID: $SERVER_PID"
# Start py-spy profiling
echo "Starting py-spy profiling for ${DURATION} seconds..."
echo "Output will be saved to: ${OUTPUT_FILE}.svg"
echo ""
echo "You can now run your load test..."
echo ""
# Get the full path to py-spy
PYSPY_PATH=$(which py-spy)
# Check if running as root, if not, use sudo
if [ "$EUID" -ne 0 ]; then
echo "py-spy requires root permissions on macOS. Running with sudo..."
sudo "$PYSPY_PATH" record -o "${OUTPUT_FILE}.svg" -d ${DURATION} -p $SERVER_PID
else
"$PYSPY_PATH" record -o "${OUTPUT_FILE}.svg" -d ${DURATION} -p $SERVER_PID
fi
echo ""
echo "Profiling completed! Results saved to: ${OUTPUT_FILE}.svg"
echo ""
echo "To view the flame graph:"
echo "open ${OUTPUT_FILE}.svg"

View file

@ -0,0 +1,148 @@
#!/usr/bin/env bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
set -euo pipefail
# Default values
TARGET="stack"
DURATION=60
CONCURRENT=10
# Parse command line arguments
usage() {
echo "Usage: $0 [options]"
echo "Options:"
echo " -t, --target <stack|vllm> Target to benchmark (default: stack)"
echo " -d, --duration <seconds> Duration in seconds (default: 60)"
echo " -c, --concurrent <users> Number of concurrent users (default: 10)"
echo " -h, --help Show this help message"
echo ""
echo "Examples:"
echo " $0 --target vllm # Benchmark vLLM direct"
echo " $0 --target stack # Benchmark Llama Stack (default)"
echo " $0 -t vllm -d 120 -c 20 # vLLM with 120s duration, 20 users"
}
while [[ $# -gt 0 ]]; do
case $1 in
-t|--target)
TARGET="$2"
shift 2
;;
-d|--duration)
DURATION="$2"
shift 2
;;
-c|--concurrent)
CONCURRENT="$2"
shift 2
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unknown option: $1"
usage
exit 1
;;
esac
done
# Validate target
if [[ "$TARGET" != "stack" && "$TARGET" != "vllm" ]]; then
echo "Error: Target must be 'stack' or 'vllm'"
usage
exit 1
fi
# Set configuration based on target
if [[ "$TARGET" == "vllm" ]]; then
BASE_URL="http://vllm-server:8000/v1"
JOB_NAME="vllm-benchmark-job"
echo "Benchmarking vLLM direct..."
else
BASE_URL="http://llama-stack-benchmark-service:8323/v1/openai/v1"
JOB_NAME="stack-benchmark-job"
echo "Benchmarking Llama Stack..."
fi
echo "Configuration:"
echo " Target: $TARGET"
echo " Base URL: $BASE_URL"
echo " Duration: ${DURATION}s"
echo " Concurrent users: $CONCURRENT"
echo ""
# Create temporary job yaml
TEMP_YAML="/tmp/benchmark-job-temp-$(date +%s).yaml"
cat > "$TEMP_YAML" << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: $JOB_NAME
namespace: default
spec:
template:
spec:
containers:
- name: benchmark
image: python:3.11-slim
command: ["/bin/bash"]
args:
- "-c"
- |
pip install aiohttp &&
python3 /benchmark/benchmark.py \\
--base-url $BASE_URL \\
--model \${INFERENCE_MODEL} \\
--duration $DURATION \\
--concurrent $CONCURRENT
env:
- name: INFERENCE_MODEL
value: "meta-llama/Llama-3.2-3B-Instruct"
volumeMounts:
- name: benchmark-script
mountPath: /benchmark
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: benchmark-script
configMap:
name: benchmark-script
restartPolicy: Never
backoffLimit: 3
EOF
echo "Creating benchmark ConfigMap..."
kubectl create configmap benchmark-script \
--from-file=benchmark.py=benchmark.py \
--dry-run=client -o yaml | kubectl apply -f -
echo "Cleaning up any existing benchmark job..."
kubectl delete job $JOB_NAME 2>/dev/null || true
echo "Deploying benchmark Job..."
kubectl apply -f "$TEMP_YAML"
echo "Waiting for job to start..."
kubectl wait --for=condition=Ready pod -l job-name=$JOB_NAME --timeout=60s
echo "Following benchmark logs..."
kubectl logs -f job/$JOB_NAME
echo "Job completed. Checking final status..."
kubectl get job $JOB_NAME
# Clean up temporary file
rm -f "$TEMP_YAML"

View file

@ -0,0 +1,133 @@
apiVersion: v1
data:
stack_run_config.yaml: |
version: '2'
image_name: kubernetes-benchmark-demo
apis:
- agents
- inference
- safety
- telemetry
- tool_runtime
- vector_io
providers:
inference:
- provider_id: vllm-inference
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: vllm-safety
provider_type: remote::vllm
config:
url: ${env.VLLM_SAFETY_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
vector_io:
- provider_id: ${env.ENABLE_CHROMADB:+chromadb}
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL:=}
kvstore:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
responses_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
telemetry:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
sinks: ${env.TELEMETRY_SINKS:=console}
tool_runtime:
- provider_id: brave-search
provider_type: remote::brave-search
config:
api_key: ${env.BRAVE_SEARCH_API_KEY:+}
max_results: 3
- provider_id: tavily-search
provider_type: remote::tavily-search
config:
api_key: ${env.TAVILY_SEARCH_API_KEY:+}
max_results: 3
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
metadata_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: llamastack_kvstore
inference_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
models:
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
provider_id: sentence-transformers
model_type: embedding
- model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_type: llm
- model_id: ${env.SAFETY_MODEL}
provider_id: vllm-safety
model_type: llm
shields:
- shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::websearch
provider_id: tavily-search
- toolgroup_id: builtin::rag
provider_id: rag-runtime
server:
port: 8323
kind: ConfigMap
metadata:
creationTimestamp: null
name: llama-stack-config

View file

@ -0,0 +1,83 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: llama-benchmark-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llama-stack-benchmark-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: llama-stack-benchmark
app.kubernetes.io/component: server
template:
metadata:
labels:
app.kubernetes.io/name: llama-stack-benchmark
app.kubernetes.io/component: server
spec:
containers:
- name: llama-stack-benchmark
image: llamastack/distribution-starter:latest
imagePullPolicy: Always # since we have specified latest instead of a version
env:
- name: ENABLE_CHROMADB
value: "true"
- name: CHROMADB_URL
value: http://chromadb.default.svc.cluster.local:6000
- name: POSTGRES_HOST
value: postgres-server.default.svc.cluster.local
- name: POSTGRES_PORT
value: "5432"
- name: INFERENCE_MODEL
value: "${INFERENCE_MODEL}"
- name: SAFETY_MODEL
value: "${SAFETY_MODEL}"
- name: TAVILY_SEARCH_API_KEY
value: "${TAVILY_SEARCH_API_KEY}"
- name: VLLM_URL
value: http://vllm-server.default.svc.cluster.local:8000/v1
- name: VLLM_MAX_TOKENS
value: "3072"
- name: VLLM_SAFETY_URL
value: http://vllm-server-safety.default.svc.cluster.local:8001/v1
- name: VLLM_TLS_VERIFY
value: "false"
command: ["python", "-m", "llama_stack.core.server.server", "/etc/config/stack_run_config.yaml", "--port", "8323"]
ports:
- containerPort: 8323
volumeMounts:
- name: llama-storage
mountPath: /root/.llama
- name: llama-config
mountPath: /etc/config
volumes:
- name: llama-storage
persistentVolumeClaim:
claimName: llama-benchmark-pvc
- name: llama-config
configMap:
name: llama-stack-config
---
apiVersion: v1
kind: Service
metadata:
name: llama-stack-benchmark-service
spec:
selector:
app.kubernetes.io/name: llama-stack-benchmark
app.kubernetes.io/component: server
ports:
- name: http
port: 8323
targetPort: 8323
type: ClusterIP

View file

@ -0,0 +1,108 @@
version: '2'
image_name: kubernetes-benchmark-demo
apis:
- agents
- inference
- telemetry
- tool_runtime
- vector_io
providers:
inference:
- provider_id: vllm-inference
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
vector_io:
- provider_id: ${env.ENABLE_CHROMADB:+chromadb}
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL:=}
kvstore:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
responses_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
telemetry:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
sinks: ${env.TELEMETRY_SINKS:=console}
tool_runtime:
- provider_id: brave-search
provider_type: remote::brave-search
config:
api_key: ${env.BRAVE_SEARCH_API_KEY:+}
max_results: 3
- provider_id: tavily-search
provider_type: remote::tavily-search
config:
api_key: ${env.TAVILY_SEARCH_API_KEY:+}
max_results: 3
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
metadata_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: llamastack_kvstore
inference_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
models:
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
provider_id: sentence-transformers
model_type: embedding
- model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_type: llm
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::websearch
provider_id: tavily-search
- toolgroup_id: builtin::rag
provider_id: rag-runtime
server:
port: 8323

View file

@ -34,6 +34,13 @@ data:
provider_type: remote::chromadb provider_type: remote::chromadb
config: config:
url: ${env.CHROMADB_URL:=} url: ${env.CHROMADB_URL:=}
kvstore:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
safety: safety:
- provider_id: llama-guard - provider_id: llama-guard
provider_type: inline::llama-guard provider_type: inline::llama-guard

View file

@ -40,19 +40,19 @@ spec:
value: "3072" value: "3072"
- name: VLLM_SAFETY_URL - name: VLLM_SAFETY_URL
value: http://vllm-server-safety.default.svc.cluster.local:8001/v1 value: http://vllm-server-safety.default.svc.cluster.local:8001/v1
- name: VLLM_TLS_VERIFY
value: "false"
- name: POSTGRES_HOST - name: POSTGRES_HOST
value: postgres-server.default.svc.cluster.local value: postgres-server.default.svc.cluster.local
- name: POSTGRES_PORT - name: POSTGRES_PORT
value: "5432" value: "5432"
- name: VLLM_TLS_VERIFY
value: "false"
- name: INFERENCE_MODEL - name: INFERENCE_MODEL
value: "${INFERENCE_MODEL}" value: "${INFERENCE_MODEL}"
- name: SAFETY_MODEL - name: SAFETY_MODEL
value: "${SAFETY_MODEL}" value: "${SAFETY_MODEL}"
- name: TAVILY_SEARCH_API_KEY - name: TAVILY_SEARCH_API_KEY
value: "${TAVILY_SEARCH_API_KEY}" value: "${TAVILY_SEARCH_API_KEY}"
command: ["python", "-m", "llama_stack.distribution.server.server", "--config", "/etc/config/stack_run_config.yaml", "--port", "8321"] command: ["python", "-m", "llama_stack.core.server.server", "/etc/config/stack_run_config.yaml", "--port", "8321"]
ports: ports:
- containerPort: 8321 - containerPort: 8321
volumeMounts: volumeMounts:

View file

@ -31,6 +31,13 @@ providers:
provider_type: remote::chromadb provider_type: remote::chromadb
config: config:
url: ${env.CHROMADB_URL:=} url: ${env.CHROMADB_URL:=}
kvstore:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
safety: safety:
- provider_id: llama-guard - provider_id: llama-guard
provider_type: inline::llama-guard provider_type: inline::llama-guard

View file

@ -56,12 +56,12 @@ Breaking down the demo app, this section will show the core pieces that are used
### Setup Remote Inferencing ### Setup Remote Inferencing
Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution: Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
``` ```
conda create -n stack-fireworks python=3.10 uv venv starter --python 3.12
conda activate stack-fireworks source starter/bin/activate # On Windows: starter\Scripts\activate
pip install --no-cache llama-stack==0.2.2 pip install --no-cache llama-stack==0.2.2
llama stack build --template fireworks --image-type conda llama stack build --distro starter --image-type venv
export FIREWORKS_API_KEY=<SOME_KEY> export FIREWORKS_API_KEY=<SOME_KEY>
llama stack run fireworks --port 5050 llama stack run starter --port 5050
``` ```
Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility. Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.

View file

@ -57,7 +57,7 @@ Make sure you have access to a watsonx API Key. You can get one by referring [wa
## Running Llama Stack with watsonx ## Running Llama Stack with watsonx
You can do this via Conda (build code), venv or Docker which has a pre-built image. You can do this via venv or Docker which has a pre-built image.
### Via Docker ### Via Docker
@ -76,13 +76,3 @@ docker run \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \ --env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
--env WATSONX_BASE_URL=$WATSONX_BASE_URL --env WATSONX_BASE_URL=$WATSONX_BASE_URL
``` ```
### Via Conda
```bash
llama stack build --template watsonx --image-type conda
llama stack run ./run.yaml \
--port $LLAMA_STACK_PORT \
--env WATSONX_API_KEY=$WATSONX_API_KEY \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID
```

View file

@ -114,7 +114,7 @@ podman run --rm -it \
## Running Llama Stack ## Running Llama Stack
Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via Conda (build code) or Docker which has a pre-built image. Now you are ready to run Llama Stack with TGI as the inference provider. You can do this via venv or Docker which has a pre-built image.
### Via Docker ### Via Docker
@ -153,7 +153,7 @@ docker run \
--pull always \ --pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $HOME/.llama:/root/.llama \ -v $HOME/.llama:/root/.llama \
-v ./llama_stack/templates/tgi/run-with-safety.yaml:/root/my-run.yaml \ -v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
llamastack/distribution-dell \ llamastack/distribution-dell \
--config /root/my-run.yaml \ --config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \ --port $LLAMA_STACK_PORT \
@ -164,12 +164,12 @@ docker run \
--env CHROMA_URL=$CHROMA_URL --env CHROMA_URL=$CHROMA_URL
``` ```
### Via Conda ### Via venv
Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available. Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.
```bash ```bash
llama stack build --template dell --image-type conda llama stack build --distro dell --image-type venv
llama stack run dell llama stack run dell
--port $LLAMA_STACK_PORT \ --port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \ --env INFERENCE_MODEL=$INFERENCE_MODEL \

View file

@ -70,7 +70,7 @@ $ llama model list --downloaded
## Running the Distribution ## Running the Distribution
You can do this via Conda (build code) or Docker which has a pre-built image. You can do this via venv or Docker which has a pre-built image.
### Via Docker ### Via Docker
@ -104,12 +104,12 @@ docker run \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
``` ```
### Via Conda ### Via venv
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available. Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
```bash ```bash
llama stack build --template meta-reference-gpu --image-type conda llama stack build --distro meta-reference-gpu --image-type venv
llama stack run distributions/meta-reference-gpu/run.yaml \ llama stack run distributions/meta-reference-gpu/run.yaml \
--port 8321 \ --port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct

View file

@ -133,7 +133,7 @@ curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-inst
## Running Llama Stack with NVIDIA ## Running Llama Stack with NVIDIA
You can do this via Conda or venv (build code), or Docker which has a pre-built image. You can do this via venv (build code), or Docker which has a pre-built image.
### Via Docker ### Via Docker
@ -152,24 +152,13 @@ docker run \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY --env NVIDIA_API_KEY=$NVIDIA_API_KEY
``` ```
### Via Conda
```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct
llama stack build --template nvidia --image-type conda
llama stack run ./run.yaml \
--port 8321 \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
--env INFERENCE_MODEL=$INFERENCE_MODEL
```
### Via venv ### Via venv
If you've set up your local development environment, you can also build the image using your local virtual environment. If you've set up your local development environment, you can also build the image using your local virtual environment.
```bash ```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
llama stack build --template nvidia --image-type venv llama stack build --distro nvidia --image-type venv
llama stack run ./run.yaml \ llama stack run ./run.yaml \
--port 8321 \ --port 8321 \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \ --env NVIDIA_API_KEY=$NVIDIA_API_KEY \

View file

@ -100,10 +100,6 @@ The following environment variables can be configured:
### Model Configuration ### Model Configuration
- `INFERENCE_MODEL`: HuggingFace model for serverless inference - `INFERENCE_MODEL`: HuggingFace model for serverless inference
- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name - `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
- `OLLAMA_INFERENCE_MODEL`: Ollama model name
- `OLLAMA_EMBEDDING_MODEL`: Ollama embedding model name
- `OLLAMA_EMBEDDING_DIMENSION`: Ollama embedding dimension (default: `384`)
- `VLLM_INFERENCE_MODEL`: vLLM model name
### Vector Database Configuration ### Vector Database Configuration
- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`) - `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
@ -127,47 +123,29 @@ The following environment variables can be configured:
## Enabling Providers ## Enabling Providers
You can enable specific providers by setting their provider ID to a valid value using environment variables. This is useful when you want to use certain providers or don't have the required API keys. You can enable specific providers by setting appropriate environment variables. For example,
### Examples of Enabling Providers
#### Enable FAISS Vector Provider
```bash ```bash
export ENABLE_FAISS=faiss # self-hosted
export OLLAMA_URL=http://localhost:11434 # enables the Ollama inference provider
export VLLM_URL=http://localhost:8000/v1 # enables the vLLM inference provider
export TGI_URL=http://localhost:8000/v1 # enables the TGI inference provider
# cloud-hosted requiring API key configuration on the server
export CEREBRAS_API_KEY=your_cerebras_api_key # enables the Cerebras inference provider
export NVIDIA_API_KEY=your_nvidia_api_key # enables the NVIDIA inference provider
# vector providers
export MILVUS_URL=http://localhost:19530 # enables the Milvus vector provider
export CHROMADB_URL=http://localhost:8000/v1 # enables the ChromaDB vector provider
export PGVECTOR_DB=llama_stack_db # enables the PGVector vector provider
``` ```
#### Enable Ollama Models This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.
```bash
export ENABLE_OLLAMA=ollama
```
#### Disable vLLM Models
```bash
export VLLM_INFERENCE_MODEL=__disabled__
```
#### Disable Optional Vector Providers
```bash
export ENABLE_SQLITE_VEC=__disabled__
export ENABLE_CHROMADB=__disabled__
export ENABLE_PGVECTOR=__disabled__
```
### Provider ID Patterns
The starter distribution uses several patterns for provider IDs:
1. **Direct provider IDs**: `faiss`, `ollama`, `vllm`
2. **Environment-based provider IDs**: `${env.ENABLE_SQLITE_VEC:+sqlite-vec}`
3. **Model-based provider IDs**: `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`
When using the `+` pattern (like `${env.ENABLE_SQLITE_VEC+sqlite-vec}`), the provider is enabled by default and can be disabled by setting the environment variable to `__disabled__`.
When using the `:` pattern (like `${env.OLLAMA_INFERENCE_MODEL:__disabled__}`), the provider is disabled by default and can be enabled by setting the environment variable to a valid value.
## Running the Distribution ## Running the Distribution
You can run the starter distribution via Docker, Conda, or venv. You can run the starter distribution via Docker or venv.
### Via Docker ### Via Docker
@ -186,12 +164,12 @@ docker run \
--port $LLAMA_STACK_PORT --port $LLAMA_STACK_PORT
``` ```
### Via Conda or venv ### Via venv
Ensure you have configured the starter distribution using the environment variables explained above. Ensure you have configured the starter distribution using the environment variables explained above.
```bash ```bash
uv run --with llama-stack llama stack build --template starter --image-type <conda|venv> --run uv run --with llama-stack llama stack build --distro starter --image-type venv --run
``` ```
## Example Usage ## Example Usage

View file

@ -11,12 +11,6 @@ This is the simplest way to get started. Using Llama Stack as a library means yo
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details. Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details.
## Conda:
If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details.
## Kubernetes: ## Kubernetes:
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details. If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details.

View file

@ -52,11 +52,16 @@ agent = Agent(
prompt = "How do you do great work?" prompt = "How do you do great work?"
print("prompt>", prompt) print("prompt>", prompt)
use_stream = True
response = agent.create_turn( response = agent.create_turn(
messages=[{"role": "user", "content": prompt}], messages=[{"role": "user", "content": prompt}],
session_id=agent.create_session("rag_session"), session_id=agent.create_session("rag_session"),
stream=True, stream=use_stream,
) )
for log in AgentEventLogger().log(response): # Only call `AgentEventLogger().log(response)` for streaming responses.
log.print() if use_stream:
for log in AgentEventLogger().log(response):
log.print()
else:
print(response)

View file

@ -59,10 +59,10 @@ Now let's build and run the Llama Stack config for Ollama.
We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables. We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables.
```bash ```bash
llama stack build --template starter --image-type venv --run llama stack build --distro starter --image-type venv --run
``` ```
::: :::
:::{tab-item} Using `conda` :::{tab-item} Using `venv`
You can use Python to build and run the Llama Stack server, which is useful for testing and development. You can use Python to build and run the Llama Stack server, which is useful for testing and development.
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup, Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
@ -70,7 +70,7 @@ which defines the providers and their settings.
Now let's build and run the Llama Stack config for Ollama. Now let's build and run the Llama Stack config for Ollama.
```bash ```bash
llama stack build --template starter --image-type conda --run llama stack build --distro starter --image-type venv --run
``` ```
::: :::
:::{tab-item} Using a Container :::{tab-item} Using a Container
@ -150,13 +150,7 @@ pip install llama-stack-client
``` ```
::: :::
:::{tab-item} Install with `conda`
```bash
yes | conda create -n stack-client python=3.12
conda activate stack-client
pip install llama-stack-client
```
:::
:::: ::::
Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the

View file

@ -16,10 +16,13 @@ as the inference [provider](../providers/inference/index) for a Llama Model.
```bash ```bash
ollama run llama3.2:3b --keepalive 60m ollama run llama3.2:3b --keepalive 60m
``` ```
#### Step 2: Run the Llama Stack server #### Step 2: Run the Llama Stack server
We will use `uv` to run the Llama Stack server. We will use `uv` to run the Llama Stack server.
```bash ```bash
uv run --with llama-stack llama stack build --template starter --image-type venv --run OLLAMA_URL=http://localhost:11434 \
uv run --with llama-stack llama stack build --distro starter --image-type venv --run
``` ```
#### Step 3: Run the demo #### Step 3: Run the demo
Now open up a new terminal and copy the following script into a file named `demo_script.py`. Now open up a new terminal and copy the following script into a file named `demo_script.py`.

View file

@ -1,5 +1,22 @@
# Agents Providers # Agents
## Overview
Agents API for creating and interacting with agentic systems.
Main functionalities provided by this API:
- Create agents with specific instructions and ability to use tools.
- Interactions with agents are grouped into sessions ("threads"), and each interaction is called a "turn".
- Agents can be provided with various tools (see the ToolGroups and ToolRuntime APIs for more details).
- Agents can be provided with various shields (see the Safety API for more details).
- Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details.
This section contains documentation for all available providers for the **agents** API. This section contains documentation for all available providers for the **agents** API.
- [inline::meta-reference](inline_meta-reference.md) ## Providers
```{toctree}
:maxdepth: 1
inline_meta-reference
```

View file

@ -0,0 +1,21 @@
# Batches
## Overview
Protocol for batch processing API operations.
The Batches API enables efficient processing of multiple requests in a single operation,
particularly useful for processing large datasets, batch evaluation workflows, and
cost-effective inference at scale.
Note: This API is currently under active development and may undergo changes.
This section contains documentation for all available providers for the **batches** API.
## Providers
```{toctree}
:maxdepth: 1
inline_reference
```

View file

@ -0,0 +1,23 @@
# inline::reference
## Description
Reference implementation of batches API with KVStore persistence.
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Configuration for the key-value store backend. |
| `max_concurrent_batches` | `<class 'int'>` | No | 1 | Maximum number of concurrent batches to process simultaneously. |
| `max_concurrent_requests_per_batch` | `<class 'int'>` | No | 10 | Maximum number of concurrent requests to process per batch. |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/batches.db
```

Some files were not shown because too many files have changed in this diff Show more