llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-21 17:33:12 +00:00

Author	SHA1	Message	Date
dependabot[bot]	58e164b8bc	chore(github-deps): bump astral-sh/setup-uv from 6.4.3 to 6.5.0 (#3179 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 19s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Test Llama Stack Build / build-single-provider (push) Failing after 24s Details Unit Tests / unit-tests (3.12) (push) Failing after 21s Details Test External API and Providers / test-external (venv) (push) Failing after 25s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 38s Details Vector IO Integration Tests / test-matrix (push) Failing after 40s Details Python Package Build Test / build (3.12) (push) Failing after 38s Details Pre-commit / pre-commit (push) Failing after 43s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s Details Python Package Build Test / build (3.13) (push) Failing after 41s Details Unit Tests / unit-tests (3.13) (push) Failing after 39s Details Test Llama Stack Build / generate-matrix (push) Failing after 45s Details UI Tests / ui-tests (22) (push) Failing after 42s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 46s Details Update ReadTheDocs / update-readthedocs (push) Failing after 42s Details Test Llama Stack Build / build (push) Has been skipped Details Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.4.3 to 6.5.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v6.5.0 🌈 Better error messages, bug fixes and copilot agent settings</h2> <h2>Changes</h2> <p>This release brings better error messages in case the GitHub API is impacted, fixes a few bugs and allows to disable <a href="https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md">problem matchers</a> for better use in Copilot Agent workspaces.</p> <h2>🐛 Bug fixes</h2> <ul> <li>Improve error messages on GitHub API errors <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/518">#518</a>)</li> <li>Ignore backslashes and whitespace in requirements <a href="https://github.com/axm2"><code>@axm2</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/501">#501</a>)</li> </ul> <h2>🚀 Enhancements</h2> <ul> <li>Add input add-problem-matchers <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/517">#517</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known versions for 0.8.9 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/512">#512</a>)</li> <li>chore: update known versions for 0.8.6-0.8.8 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/510">#510</a>)</li> <li>chore: update known versions for 0.8.5 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/509">#509</a>)</li> <li>chore: update known versions for 0.8.4 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/505">#505</a>)</li> <li>chore: update known versions for 0.8.3 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/502">#502</a>)</li> </ul> <h2>📚 Documentation</h2> <ul> <li>add note on caching to read disable-cache-pruning <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/506">#506</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump actions/checkout from 4 to 5 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/514">#514</a>)</li> <li>bump dependencies <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/516">#516</a>)</li> <li>Bump biome to v2 <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/515">#515</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`d9e0f98d3f`"><code>d9e0f98</code></a> Improve error messages on GitHub API errors (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/518">#518</a>)</li> <li><a href="`e5d42a2b46`"><code>e5d42a2</code></a> Add input add-problem-matchers (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/517">#517</a>)</li> <li><a href="`d664c2a1d1`"><code>d664c2a</code></a> Bump actions/checkout from 4 to 5 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/514">#514</a>)</li> <li><a href="`c35b8eac36`"><code>c35b8ea</code></a> bump dependencies (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/516">#516</a>)</li> <li><a href="`4109b4033f`"><code>4109b40</code></a> Bump biome to v2 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/515">#515</a>)</li> <li><a href="`1463845d3c`"><code>1463845</code></a> chore: update known versions for 0.8.9 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/512">#512</a>)</li> <li><a href="`ad5ded2d63`"><code>ad5ded2</code></a> chore: update known versions for 0.8.6-0.8.8 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/510">#510</a>)</li> <li><a href="`142240426d`"><code>1422404</code></a> chore: update known versions for 0.8.5 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/509">#509</a>)</li> <li><a href="`632449003a`"><code>6324490</code></a> add note on caching to read disable-cache-pruning (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/506">#506</a>)</li> <li><a href="`2a967c9b97`"><code>2a967c9</code></a> chore: update known versions for 0.8.4 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/505">#505</a>)</li> <li>Additional commits viewable in <a href="`e92bafb625...d9e0f98d3f`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.4.3&new-version=6.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:51:53 -07:00
dependabot[bot]	6a719716f2	chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 (#3178 ) [//]: # (dependabot-start) ⚠️ Dependabot is rebasing this PR ⚠️ Rebasing might not happen immediately, so don't worry if this takes some time. Note: if you make any changes to this PR yourself, they will take precedence over the rebase. --- [//]: # (dependabot-end) Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2 to 5.0.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/checkout/releases">actions/checkout's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <h2>What's Changed</h2> <ul> <li>Update actions checkout to use node 24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li> <li>Prepare v5.0.0 release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li> </ul> <h2>⚠️ Minimum Compatible Runner Version</h2> <p><strong>v2.327.1</strong><br /> <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></p> <p>Make sure your runner is updated to this version or newer to use this release.</p> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p> <h2>v4.3.0</h2> <h2>What's Changed</h2> <ul> <li>docs: update README.md by <a href="https://github.com/motss"><code>@motss</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li> <li>Add internal repos for checking out multiple repositories by <a href="https://github.com/mouismail"><code>@mouismail</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li> <li>Documentation update - add recommended permissions to Readme by <a href="https://github.com/benwells"><code>@benwells</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li> <li>Adjust positioning of user email note and permissions heading by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li> <li>Update CODEOWNERS for actions by <a href="https://github.com/TingluoHuang"><code>@TingluoHuang</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li> <li>Update package dependencies by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li> <li>Prepare release v4.3.0 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/motss"><code>@motss</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li> <li><a href="https://github.com/mouismail"><code>@mouismail</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li> <li><a href="https://github.com/benwells"><code>@benwells</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li> <li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <h2>V5.0.0</h2> <ul> <li>Update actions checkout to use node 24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li> </ul> <h2>V4.3.0</h2> <ul> <li>docs: update README.md by <a href="https://github.com/motss"><code>@motss</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li> <li>Add internal repos for checking out multiple repositories by <a href="https://github.com/mouismail"><code>@mouismail</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li> <li>Documentation update - add recommended permissions to Readme by <a href="https://github.com/benwells"><code>@benwells</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li> <li>Adjust positioning of user email note and permissions heading by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li> <li>Update CODEOWNERS for actions by <a href="https://github.com/TingluoHuang"><code>@TingluoHuang</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li> <li>Update package dependencies by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li> </ul> <h2>v4.2.2</h2> <ul> <li><code>url-helper.ts</code> now leverages well-known environment variables by <a href="https://github.com/jww3"><code>@jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li> <li>Expand unit test coverage for <code>isGhes</code> by <a href="https://github.com/jww3"><code>@jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li> </ul> <h2>v4.2.1</h2> <ul> <li>Check out other refs/* by commit if provided, fall back to ref by <a href="https://github.com/orhantoy"><code>@orhantoy</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li> </ul> <h2>v4.2.0</h2> <ul> <li>Add Ref and Commit outputs by <a href="https://github.com/lucacome"><code>@lucacome</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li> <li>Dependency updates by <a href="https://github.com/dependabot"><code>@dependabot</code></a>- <a href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>, <a href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li> </ul> <h2>v4.1.7</h2> <ul> <li>Bump the minor-npm-dependencies group across 1 directory with 4 updates by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li> <li>Bump actions/checkout from 3 to 4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li> <li>Check out other refs/* by commit by <a href="https://github.com/orhantoy"><code>@orhantoy</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li> <li>Pin actions/checkout's own workflows to a known, good, stable version. by <a href="https://github.com/jww3"><code>@jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li> </ul> <h2>v4.1.6</h2> <ul> <li>Check platform to set archive extension appropriately by <a href="https://github.com/cory-miller"><code>@cory-miller</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li> </ul> <h2>v4.1.5</h2> <ul> <li>Update NPM dependencies by <a href="https://github.com/cory-miller"><code>@cory-miller</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li> <li>Bump github/codeql-action from 2 to 3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li> <li>Bump actions/setup-node from 1 to 4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li> <li>Bump actions/upload-artifact from 2 to 4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li> <li>README: Suggest <code>user.email</code> to be <code>41898282+github-actions[bot]@users.noreply.github.com</code> by <a href="https://github.com/cory-miller"><code>@cory-miller</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1707">actions/checkout#1707</a></li> </ul> <h2>v4.1.4</h2> <ul> <li>Disable <code>extensions.worktreeConfig</code> when disabling <code>sparse-checkout</code> by <a href="https://github.com/jww3"><code>@jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1692">actions/checkout#1692</a></li> <li>Add dependabot config by <a href="https://github.com/cory-miller"><code>@cory-miller</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1688">actions/checkout#1688</a></li> <li>Bump the minor-actions-dependencies group with 2 updates by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1693">actions/checkout#1693</a></li> <li>Bump word-wrap from 1.2.3 to 1.2.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1643">actions/checkout#1643</a></li> </ul> <h2>v4.1.3</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`08c6903cd8`"><code>08c6903</code></a> Prepare v5.0.0 release (<a href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li> <li><a href="`9f265659d3`"><code>9f26565</code></a> Update actions checkout to use node 24 (<a href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li> <li><a href="`08eba0b27e`"><code>08eba0b</code></a> Prepare release v4.3.0 (<a href="https://redirect.github.com/actions/checkout/issues/2237">#2237</a>)</li> <li><a href="`631c7dc4f8`"><code>631c7dc</code></a> Update package dependencies (<a href="https://redirect.github.com/actions/checkout/issues/2236">#2236</a>)</li> <li><a href="`8edcb1bdb4`"><code>8edcb1b</code></a> Update CODEOWNERS for actions (<a href="https://redirect.github.com/actions/checkout/issues/2224">#2224</a>)</li> <li><a href="`09d2acae67`"><code>09d2aca</code></a> Update README.md (<a href="https://redirect.github.com/actions/checkout/issues/2194">#2194</a>)</li> <li><a href="`85e6279cec`"><code>85e6279</code></a> Adjust positioning of user email note and permissions heading (<a href="https://redirect.github.com/actions/checkout/issues/2044">#2044</a>)</li> <li><a href="`009b9ae9e4`"><code>009b9ae</code></a> Documentation update - add recommended permissions to Readme (<a href="https://redirect.github.com/actions/checkout/issues/2043">#2043</a>)</li> <li><a href="`cbb722410c`"><code>cbb7224</code></a> Update README.md (<a href="https://redirect.github.com/actions/checkout/issues/1977">#1977</a>)</li> <li><a href="`3b9b8c884f`"><code>3b9b8c8</code></a> docs: update README.md (<a href="https://redirect.github.com/actions/checkout/issues/1971">#1971</a>)</li> <li>See full diff in <a href="`11bd71901b...08c6903cd8`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4.2.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:51:40 -07:00
dependabot[bot]	bd1a794add	chore(python-deps): bump llama-api-client from 0.1.2 to 0.2.0 (#3173 ) Bumps [llama-api-client](https://github.com/meta-llama/llama-api-python) from 0.1.2 to 0.2.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/meta-llama/llama-api-python/releases">llama-api-client's releases</a>.</em></p> <blockquote> <h2>v0.2.0</h2> <h2>0.2.0 (2025-08-07)</h2> <p>Full Changelog: <a href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">v0.1.2...v0.2.0</a></p> <h3>Features</h3> <ul> <li>clean up environment call outs (<a href="`4afbd01ed7`">4afbd01</a>)</li> <li><strong>client:</strong> support file upload requests (<a href="`ec42e80b62`">ec42e80</a>)</li> </ul> <h3>Bug Fixes</h3> <ul> <li><strong>api:</strong> remove chat completion request model (<a href="`94c4e9fd50`">94c4e9f</a>)</li> <li><strong>client:</strong> don't send Content-Type header on GET requests (<a href="`efec88aa51`">efec88a</a>)</li> <li><strong>parsing:</strong> correctly handle nested discriminated unions (<a href="`b6276863be`">b627686</a>)</li> <li><strong>parsing:</strong> ignore empty metadata (<a href="`d6ee85101e`">d6ee851</a>)</li> <li><strong>parsing:</strong> parse extra field types (<a href="`f03ca22860`">f03ca22</a>)</li> </ul> <h3>Chores</h3> <ul> <li>add examples (<a href="`abfa065721`">abfa065</a>)</li> <li><strong>internal:</strong> bump pinned h11 dep (<a href="`d40e1b1d73`">d40e1b1</a>)</li> <li><strong>internal:</strong> fix ruff target version (<a href="`c900ebc528`">c900ebc</a>)</li> <li><strong>package:</strong> mark python 3.13 as supported (<a href="`ef5bc36693`">ef5bc36</a>)</li> <li><strong>project:</strong> add settings file for vscode (<a href="`e3103801d6`">e310380</a>)</li> <li><strong>readme:</strong> fix version rendering on pypi (<a href="`786f9fbdb7`">786f9fb</a>)</li> <li>sync repo (<a href="`7e697f6550`">7e697f6</a>)</li> <li>update SDK settings (<a href="`de22c0ece7`">de22c0e</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>code of conduct (<a href="`efe1af28fb`">efe1af2</a>)</li> <li>readme and license (<a href="`d53eafd104`">d53eafd</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/meta-llama/llama-api-python/blob/main/CHANGELOG.md">llama-api-client's changelog</a>.</em></p> <blockquote> <h2>0.2.0 (2025-08-07)</h2> <p>Full Changelog: <a href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">v0.1.2...v0.2.0</a></p> <h3>Features</h3> <ul> <li>clean up environment call outs (<a href="`4afbd01ed7`">4afbd01</a>)</li> <li><strong>client:</strong> support file upload requests (<a href="`ec42e80b62`">ec42e80</a>)</li> </ul> <h3>Bug Fixes</h3> <ul> <li><strong>api:</strong> remove chat completion request model (<a href="`94c4e9fd50`">94c4e9f</a>)</li> <li><strong>client:</strong> don't send Content-Type header on GET requests (<a href="`efec88aa51`">efec88a</a>)</li> <li><strong>parsing:</strong> correctly handle nested discriminated unions (<a href="`b6276863be`">b627686</a>)</li> <li><strong>parsing:</strong> ignore empty metadata (<a href="`d6ee85101e`">d6ee851</a>)</li> <li><strong>parsing:</strong> parse extra field types (<a href="`f03ca22860`">f03ca22</a>)</li> </ul> <h3>Chores</h3> <ul> <li>add examples (<a href="`abfa065721`">abfa065</a>)</li> <li><strong>internal:</strong> bump pinned h11 dep (<a href="`d40e1b1d73`">d40e1b1</a>)</li> <li><strong>internal:</strong> fix ruff target version (<a href="`c900ebc528`">c900ebc</a>)</li> <li><strong>package:</strong> mark python 3.13 as supported (<a href="`ef5bc36693`">ef5bc36</a>)</li> <li><strong>project:</strong> add settings file for vscode (<a href="`e3103801d6`">e310380</a>)</li> <li><strong>readme:</strong> fix version rendering on pypi (<a href="`786f9fbdb7`">786f9fb</a>)</li> <li>sync repo (<a href="`7e697f6550`">7e697f6</a>)</li> <li>update SDK settings (<a href="`de22c0ece7`">de22c0e</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>code of conduct (<a href="`efe1af28fb`">efe1af2</a>)</li> <li>readme and license (<a href="`d53eafd104`">d53eafd</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`7a8c5838af`"><code>7a8c583</code></a> release: 0.2.0</li> <li><a href="`4f1a04e5c1`"><code>4f1a04e</code></a> chore(internal): fix ruff target version</li> <li><a href="`06485e995a`"><code>06485e9</code></a> feat(client): support file upload requests</li> <li><a href="`131b474ad1`"><code>131b474</code></a> chore(project): add settings file for vscode</li> <li><a href="`ef4cee6d8b`"><code>ef4cee6</code></a> fix(parsing): parse extra field types</li> <li><a href="`fcbc699718`"><code>fcbc699</code></a> fix(parsing): ignore empty metadata</li> <li><a href="`b6656cd0b8`"><code>b6656cd</code></a> fix(api): remove chat completion request model</li> <li><a href="`0deda5590c`"><code>0deda55</code></a> feat: clean up environment call outs</li> <li><a href="`ecf91026ac`"><code>ecf9102</code></a> fix(client): don't send Content-Type header on GET requests</li> <li><a href="`0ac6285cbe`"><code>0ac6285</code></a> chore(readme): fix version rendering on pypi</li> <li>Additional commits viewable in <a href="https://github.com/meta-llama/llama-api-python/compare/v0.1.2...v0.2.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=llama-api-client&package-manager=uv&previous-version=0.1.2&new-version=0.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:50:34 -07:00
dependabot[bot]	886af85e0c	chore(github-deps): bump amannn/action-semantic-pull-request from 5.5.3 to 6.1.0 (#3215 ) Bumps [amannn/action-semantic-pull-request](https://github.com/amannn/action-semantic-pull-request) from 5.5.3 to 6.1.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/amannn/action-semantic-pull-request/releases">amannn/action-semantic-pull-request's releases</a>.</em></p> <blockquote> <h2>v6.1.0</h2> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a> (2025-08-19)</h2> <h3>Features</h3> <ul> <li>Support providing regexps for types (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>) (<a href="`a30288bf13`">a30288b</a>)</li> </ul> <h3>Bug Fixes</h3> <ul> <li>Remove trailing whitespace from "unknown release type" error message (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>) (<a href="`afa4edb1c4`">afa4edb</a>)</li> </ul> <h2>v6.0.1</h2> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a> (2025-08-13)</h2> <h3>Bug Fixes</h3> <ul> <li>Actually execute action (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>) (<a href="`58e4ab40f5`">58e4ab4</a>)</li> </ul> <h2>v6.0.0</h2> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a> (2025-08-13)</h2> <h3>⚠ BREAKING CHANGES</h3> <ul> <li>Upgrade action to use Node.js 24 and ESM (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li> </ul> <h3>Features</h3> <ul> <li>Upgrade action to use Node.js 24 and ESM (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>) (<a href="`bc0c9a79ab`">bc0c9a7</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/amannn/action-semantic-pull-request/blob/main/CHANGELOG.md">amannn/action-semantic-pull-request's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.1...v6.1.0">6.1.0</a> (2025-08-19)</h2> <h3>Features</h3> <ul> <li>Support providing regexps for types (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>) (<a href="`a30288bf13`">a30288b</a>)</li> </ul> <h3>Bug Fixes</h3> <ul> <li>Remove trailing whitespace from "unknown release type" error message (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>) (<a href="`afa4edb1c4`">afa4edb</a>)</li> </ul> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v6.0.0...v6.0.1">6.0.1</a> (2025-08-13)</h2> <h3>Bug Fixes</h3> <ul> <li>Actually execute action (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>) (<a href="`58e4ab40f5`">58e4ab4</a>)</li> </ul> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.3...v6.0.0">6.0.0</a> (2025-08-13)</h2> <h3>⚠ BREAKING CHANGES</h3> <ul> <li>Upgrade action to use Node.js 24 and ESM (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li> </ul> <h3>Features</h3> <ul> <li>Upgrade action to use Node.js 24 and ESM (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>) (<a href="`bc0c9a79ab`">bc0c9a7</a>)</li> </ul> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.2...v5.5.3">5.5.3</a> (2024-06-28)</h2> <h3>Bug Fixes</h3> <ul> <li>Bump <code>braces</code> dependency (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/269">#269</a>. by <a href="https://github.com/EelcoLos"><code>@EelcoLos</code></a>) (<a href="`2d952a1bf9`">2d952a1</a>)</li> </ul> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.1...v5.5.2">5.5.2</a> (2024-04-24)</h2> <h3>Bug Fixes</h3> <ul> <li>Bump tar from 6.1.11 to 6.2.1 (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/262">#262</a> by <a href="https://github.com/EelcoLos"><code>@EelcoLos</code></a>) (<a href="`9a90d5a5ac`">9a90d5a</a>)</li> </ul> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v5.5.0...v5.5.1">5.5.1</a> (2024-04-24)</h2> <h3>Bug Fixes</h3> <ul> <li>Bump ip from 2.0.0 to 2.0.1 (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/263">#263</a> by <a href="https://github.com/EelcoLos"><code>@EelcoLos</code></a>) (<a href="`5e7e9acca3`">5e7e9ac</a>)</li> </ul> <h2><a href="https://github.com/amannn/action-semantic-pull-request/compare/v5.4.0...v5.5.0">5.5.0</a> (2024-04-23)</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`7f33ba7922`"><code>7f33ba7</code></a> chore: Release 6.1.0 [skip ci]</li> <li><a href="`afa4edb1c4`"><code>afa4edb</code></a> fix: Remove trailing whitespace from "unknown release type" error message (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/291">#291</a>)</li> <li><a href="`a30288bf13`"><code>a30288b</code></a> feat: Support providing regexps for types (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/292">#292</a>)</li> <li><a href="`a46a7c8dc4`"><code>a46a7c8</code></a> build: Move Vitest to <code>devDependencies</code> (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/290">#290</a>)</li> <li><a href="`fdd4d3ddf6`"><code>fdd4d3d</code></a> chore: Release 6.0.1 [skip ci]</li> <li><a href="`58e4ab40f5`"><code>58e4ab4</code></a> fix: Actually execute action (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/289">#289</a>)</li> <li><a href="`04a8d177d9`"><code>04a8d17</code></a> chore: Release 6.0.0 [skip ci]</li> <li><a href="`bc0c9a79ab`"><code>bc0c9a7</code></a> feat!: Upgrade action to use Node.js 24 and ESM (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/287">#287</a>)</li> <li><a href="`631ffdc028`"><code>631ffdc</code></a> build(deps): bump the github-action-workflows group with 2 updates (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/286">#286</a>)</li> <li><a href="`c1807ceb58`"><code>c1807ce</code></a> build: configure Dependabot (<a href="https://redirect.github.com/amannn/action-semantic-pull-request/issues/231">#231</a>)</li> <li>Additional commits viewable in <a href="`0723387faa...7f33ba7922`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=amannn/action-semantic-pull-request&package-manager=github_actions&previous-version=5.5.3&new-version=6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:50:00 -07:00
dependabot[bot]	2fa189fe04	chore(github-deps): bump actions/setup-node from 4.1.0 to 4.4.0 (#3214 ) Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.1.0 to 4.4.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-node/releases">actions/setup-node's releases</a>.</em></p> <blockquote> <h2>v4.4.0</h2> <h2>What's Changed</h2> <h3>Bug fixes:</h3> <ul> <li>Make eslint-compact matcher compatible with Stylelint by <a href="https://github.com/FloEdelmann"><code>@FloEdelmann</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li> <li>Add support for indented eslint output by <a href="https://github.com/fregante"><code>@fregante</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li> </ul> <h3>Enhancement:</h3> <ul> <li>Support private mirrors by <a href="https://github.com/marco-ippolito"><code>@marco-ippolito</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li> </ul> <h3>Dependency update:</h3> <ul> <li>Upgrade <code>@action/cache</code> from 4.0.2 to 4.0.3 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1262">actions/setup-node#1262</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/FloEdelmann"><code>@FloEdelmann</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li> <li><a href="https://github.com/fregante"><code>@fregante</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li> <li><a href="https://github.com/marco-ippolito"><code>@marco-ippolito</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-node/compare/v4...v4.4.0">https://github.com/actions/setup-node/compare/v4...v4.4.0</a></p> <h2>v4.3.0</h2> <h2>What's Changed</h2> <h3>Dependency updates</h3> <ul> <li>Upgrade <code>@actions/glob</code> from 0.4.0 to 0.5.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1200">actions/setup-node#1200</a></li> <li>Upgrade <code>@action/cache</code> from 4.0.0 to 4.0.2 by <a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li> <li>Upgrade <code>@vercel/ncc</code> from 0.38.1 to 0.38.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1203">actions/setup-node#1203</a></li> <li>Upgrade <code>@actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1220">actions/setup-node#1220</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-node/compare/v4...v4.3.0">https://github.com/actions/setup-node/compare/v4...v4.3.0</a></p> <h2>v4.2.0</h2> <h2>What's Changed</h2> <ul> <li>Enhance workflows and upgrade publish-actions from 0.2.2 to 0.3.0 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1174">actions/setup-node#1174</a></li> <li>Add recommended permissions section to readme by <a href="https://github.com/benwells"><code>@benwells</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li> <li>Configure Dependabot settings by <a href="https://github.com/HarithaVattikuti"><code>@HarithaVattikuti</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1192">actions/setup-node#1192</a></li> <li>Upgrade <code>@actions/cache</code> to <code>^4.0.0</code> by <a href="https://github.com/priyagupta108"><code>@priyagupta108</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1191">actions/setup-node#1191</a></li> <li>Upgrade pnpm/action-setup from 2 to 4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1194">actions/setup-node#1194</a></li> <li>Upgrade actions/publish-immutable-action from 0.0.3 to 0.0.4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1195">actions/setup-node#1195</a></li> <li>Upgrade semver from 7.6.0 to 7.6.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1196">actions/setup-node#1196</a></li> <li>Upgrade <code>@types/jest</code> from 29.5.12 to 29.5.14 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1201">actions/setup-node#1201</a></li> <li>Upgrade undici from 5.28.4 to 5.28.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1205">actions/setup-node#1205</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/benwells"><code>@benwells</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1193">actions/setup-node#1193</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-node/compare/v4...v4.2.0">https://github.com/actions/setup-node/compare/v4...v4.2.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`49933ea528`"><code>49933ea</code></a> Bump <code>@action/cache</code> from 4.0.2 to 4.0.3 (<a href="https://redirect.github.com/actions/setup-node/issues/1262">#1262</a>)</li> <li><a href="`e3ce749e20`"><code>e3ce749</code></a> feat: support private mirrors (<a href="https://redirect.github.com/actions/setup-node/issues/1240">#1240</a>)</li> <li><a href="`40337cb8f7`"><code>40337cb</code></a> Add support for indented eslint output (<a href="https://redirect.github.com/actions/setup-node/issues/1245">#1245</a>)</li> <li><a href="`1ccdddc9b8`"><code>1ccdddc</code></a> Make eslint-compact matcher compatible with Stylelint (<a href="https://redirect.github.com/actions/setup-node/issues/98">#98</a>)</li> <li><a href="`cdca7365b2`"><code>cdca736</code></a> Bump <code>@actions/tool-cache</code> from 2.0.1 to 2.0.2 (<a href="https://redirect.github.com/actions/setup-node/issues/1220">#1220</a>)</li> <li><a href="`22c0e7494f`"><code>22c0e74</code></a> Bump <code>@vercel/ncc</code> from 0.38.1 to 0.38.3 (<a href="https://redirect.github.com/actions/setup-node/issues/1203">#1203</a>)</li> <li><a href="`a7c2d9473e`"><code>a7c2d94</code></a> actions/cache upgrade (<a href="https://redirect.github.com/actions/setup-node/issues/1251">#1251</a>)</li> <li><a href="`802632921f`"><code>8026329</code></a> Bump <code>@actions/glob</code> from 0.4.0 to 0.5.0 (<a href="https://redirect.github.com/actions/setup-node/issues/1200">#1200</a>)</li> <li><a href="`1d0ff469b7`"><code>1d0ff46</code></a> Bump undici from 5.28.4 to 5.28.5 (<a href="https://redirect.github.com/actions/setup-node/issues/1205">#1205</a>)</li> <li><a href="`574f09a9fa`"><code>574f09a</code></a> Bump <code>@types/jest</code> from 29.5.12 to 29.5.14 (<a href="https://redirect.github.com/actions/setup-node/issues/1201">#1201</a>)</li> <li>Additional commits viewable in <a href="`39370e3970...49933ea528`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4.1.0&new-version=4.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:49:43 -07:00
dependabot[bot]	2cc0051ae5	chore(ui-deps): bump typescript from 5.8.3 to 5.9.2 in /llama_stack/ui (#3216 ) Bumps [typescript](https://github.com/microsoft/TypeScript) from 5.8.3 to 5.9.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/microsoft/TypeScript/releases">typescript's releases</a>.</em></p> <blockquote> <h2>TypeScript 5.9</h2> <p>For release notes, check out the <a href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9/">release announcement</a></p> <ul> <li><a href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed issues query for Typescript 5.9.0 (Beta)</a>.</li> <li><a href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&q=milestone%3A%22TypeScript+5.9.1%22+is%3Aclosed+">fixed issues query for Typescript 5.9.1 (RC)</a>.</li> <li><em>No specific changes for TypeScript 5.9.2 (Stable)</em></li> </ul> <p>Downloads are available on:</p> <ul> <li><a href="https://www.npmjs.com/package/typescript">npm</a></li> </ul> <h2>TypeScript 5.9 RC</h2> <p>For release notes, check out the <a href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9-rc/">release announcement</a></p> <ul> <li><a href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed issues query for Typescript 5.9.0 (Beta)</a>.</li> <li><a href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&q=milestone%3A%22TypeScript+5.9.1%22+is%3Aclosed+">fixed issues query for Typescript 5.9.1 (RC)</a>.</li> </ul> <p>Downloads are available on:</p> <ul> <li><a href="https://www.npmjs.com/package/typescript">npm</a></li> </ul> <h2>TypeScript 5.9 Beta</h2> <p>For release notes, check out the <a href="https://devblogs.microsoft.com/typescript/announcing-typescript-5-9-beta/">release announcement</a>.</p> <ul> <li><a href="https://github.com/Microsoft/TypeScript/issues?utf8=%E2%9C%93&q=milestone%3A%22TypeScript+5.9.0%22+is%3Aclosed+">fixed issues query for Typescript 5.9.0 (Beta)</a>.</li> </ul> <p>Downloads are available on:</p> <ul> <li><a href="https://www.npmjs.com/package/typescript">npm</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`be86783155`"><code>be86783</code></a> Give more specific errors for <code>verbatimModuleSyntax</code> (<a href="https://redirect.github.com/microsoft/TypeScript/issues/62113">#62113</a>)</li> <li><a href="`22ef57786f`"><code>22ef577</code></a> LEGO: Pull request from lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250714...</li> <li><a href="`d5a414cd1d`"><code>d5a414c</code></a> Don't use <code>noErrorTruncation</code> when printing types with <code>maximumLength</code> set (#...</li> <li><a href="`f14b5c8a2f`"><code>f14b5c8</code></a> Remove unused and confusing dom.iterable.d.ts file (<a href="https://redirect.github.com/microsoft/TypeScript/issues/62037">#62037</a>)</li> <li><a href="`2778e84ed8`"><code>2778e84</code></a> Restore AbortSignal.abort (<a href="https://redirect.github.com/microsoft/TypeScript/issues/62086">#62086</a>)</li> <li><a href="`65cb4bd2d5`"><code>65cb4bd</code></a> LEGO: Pull request from lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250710...</li> <li><a href="`9e20e032ef`"><code>9e20e03</code></a> Clear out checker-level stacks on pop (<a href="https://redirect.github.com/microsoft/TypeScript/issues/62016">#62016</a>)</li> <li><a href="`87740bc7fe`"><code>87740bc</code></a> Fix for Issue 61081 (<a href="https://redirect.github.com/microsoft/TypeScript/issues/61221">#61221</a>)</li> <li><a href="`833a8d492c`"><code>833a8d4</code></a> Fix Symbol completion priority and cursor positioning (<a href="https://redirect.github.com/microsoft/TypeScript/issues/61945">#61945</a>)</li> <li><a href="`0018c9ff12`"><code>0018c9f</code></a> LEGO: Pull request from lego/hb_5378966c-b857-470a-8675-daebef4a6da1_20250702...</li> <li>Additional commits viewable in <a href="https://github.com/microsoft/TypeScript/compare/v5.8.3...v5.9.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=typescript&package-manager=npm_and_yarn&previous-version=5.8.3&new-version=5.9.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:49:28 -07:00
dependabot[bot]	bf3b201d61	chore(python-deps): bump chromadb from 1.0.16 to 1.0.20 (#3217 ) Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.16 to 1.0.20. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/chroma-core/chroma/releases">chromadb's releases</a>.</em></p> <blockquote> <h2>1.0.20</h2> <p>Version: <code>1.0.20</code> Git ref: <code>refs/tags/1.0.20</code> Build Date: <code>2025-08-18T17:04</code> PIP Package: <code>chroma-1.0.20.tar.gz</code> Github Container Registry Image: <code>:1.0.20</code> DockerHub Image: <code>:1.0.20</code></p> <h2>What's Changed</h2> <ul> <li>[RELEASE] 1.0.20 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5303">chroma-core/chroma#5303</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/chroma-core/chroma/compare/1.0.19...1.0.20">https://github.com/chroma-core/chroma/compare/1.0.19...1.0.20</a></p> <h2>1.0.18</h2> <p>Version: <code>1.0.18</code> Git ref: <code>refs/tags/1.0.18</code> Build Date: <code>2025-08-18T08:09</code> PIP Package: <code>chroma-1.0.18.tar.gz</code> Github Container Registry Image: <code>:1.0.18</code> DockerHub Image: <code>:1.0.18</code></p> <h2>What's Changed</h2> <ul> <li>[CHORE]: Added short descriptions to CLI commands by <a href="https://github.com/tazarov"><code>@tazarov</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5217">chroma-core/chroma#5217</a></li> <li>[ENH] Use AVX in distance calculations by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5258">chroma-core/chroma#5258</a></li> <li>[ENH] Auto-set tenant, scoped database in python CloudClient by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5026">chroma-core/chroma#5026</a></li> <li>[PERF]: Modify get_range to return an iterator by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5256">chroma-core/chroma#5256</a></li> <li>[BUG] Mark dirty on rollback of cursor to guarantee compaction picks it up. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5265">chroma-core/chroma#5265</a></li> <li>[ENH]: add metric for component queue depth & change dispatcher queue depth metric buckets by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5261">chroma-core/chroma#5261</a></li> <li>[ENH]: add garbage collection CLI for manual garbage collection by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5250">chroma-core/chroma#5250</a></li> <li>[DOC] Clean up DEVELOP.md by <a href="https://github.com/kylediaz"><code>@kylediaz</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5270">chroma-core/chroma#5270</a></li> <li>[ENH]: Further optimize query on getCollections when databases pkey is fully specified by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5268">chroma-core/chroma#5268</a></li> <li>[ENH] Update Rust to allow build with AVX when flag is set by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5269">chroma-core/chroma#5269</a></li> <li>[ENH]: Fix test_add flake by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5272">chroma-core/chroma#5272</a></li> <li>[BUG]: Revert "[ENH]: Further optimize query on getCollections when databases pkey is fully specified (<a href="https://redirect.github.com/chroma-core/chroma/issues/5268">#5268</a>)" by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5273">chroma-core/chroma#5273</a></li> <li>[BLD] Add maturin to dev dependencies by <a href="https://github.com/kylediaz"><code>@kylediaz</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5271">chroma-core/chroma#5271</a></li> <li>[ENH]: Optimize GetCollections and remove usage of raw gorm by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5274">chroma-core/chroma#5274</a></li> <li>[ENH]: add config param to garbage collector to control how many collections are fetched from SysDb by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5275">chroma-core/chroma#5275</a></li> <li>[ENH] Reject version files without paths. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5267">chroma-core/chroma#5267</a></li> <li>[ENH] Enable getting a collection by CRN by <a href="https://github.com/drewkim"><code>@drewkim</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5244">chroma-core/chroma#5244</a></li> <li>[BUG] CompactionError did not proxy should_trace_error by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5282">chroma-core/chroma#5282</a></li> <li>[BUG] Resolve deadlock in system crate? by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5283">chroma-core/chroma#5283</a></li> <li>[ENH] Complete the NAC metrics for the write half. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5278">chroma-core/chroma#5278</a></li> <li>[BUG]: fix missing node in constructed version graph for garbage collection by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5284">chroma-core/chroma#5284</a></li> <li>[BUG] Fix test flake from 5283. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5287">chroma-core/chroma#5287</a></li> <li>[BUG]: Don't GC hnsw if it is empty by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5295">chroma-core/chroma#5295</a></li> <li>[ENH] Sync before flushing by <a href="https://github.com/HammadB"><code>@HammadB</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5296">chroma-core/chroma#5296</a></li> <li>[DOC] update quota limits by <a href="https://github.com/philipithomas"><code>@philipithomas</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5297">chroma-core/chroma#5297</a></li> <li>[BUG] Fix CLI copy offset by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5288">chroma-core/chroma#5288</a></li> <li>[ENH] Add support for default space in create coll config by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5293">chroma-core/chroma#5293</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`b6b059dfd7`"><code>b6b059d</code></a> [RELEASE] 1.0.20 (<a href="https://redirect.github.com/chroma-core/chroma/issues/5303">#5303</a>)</li> <li><a href="`1993cd4a51`"><code>1993cd4</code></a> [RELEASE] CLI 1.1.8, Python 1.0.19, JS 3.0.14 (<a href="https://redirect.github.com/chroma-core/chroma/issues/5302">#5302</a>)</li> <li><a href="`19600af279`"><code>19600af</code></a> [BUG] Fix CLI copy arg number types (<a href="https://redirect.github.com/chroma-core/chroma/issues/5301">#5301</a>)</li> <li><a href="`d3602cd776`"><code>d3602cd</code></a> [CHORE] Update JS binding deps in the client (<a href="https://redirect.github.com/chroma-core/chroma/issues/5300">#5300</a>)</li> <li><a href="`2570b471ed`"><code>2570b47</code></a> [RELEASE] CLI 1.1.7, Python 1.0.18, JS 3.0.13 (<a href="https://redirect.github.com/chroma-core/chroma/issues/5299">#5299</a>)</li> <li><a href="`51a7d1625b`"><code>51a7d16</code></a> [ENH] Add support for default space in create coll config (<a href="https://redirect.github.com/chroma-core/chroma/issues/5293">#5293</a>)</li> <li><a href="`163133aacc`"><code>163133a</code></a> [BUG] Fix CLI copy offset (<a href="https://redirect.github.com/chroma-core/chroma/issues/5288">#5288</a>)</li> <li><a href="`2f06586503`"><code>2f06586</code></a> [DOC] update quota limits (<a href="https://redirect.github.com/chroma-core/chroma/issues/5297">#5297</a>)</li> <li><a href="`983728076d`"><code>9837280</code></a> [ENH] Sync before flushing (<a href="https://redirect.github.com/chroma-core/chroma/issues/5296">#5296</a>)</li> <li><a href="`649e14c530`"><code>649e14c</code></a> [BUG]: Don't GC hnsw if it is empty (<a href="https://redirect.github.com/chroma-core/chroma/issues/5295">#5295</a>)</li> <li>Additional commits viewable in <a href="https://github.com/chroma-core/chroma/compare/1.0.16...1.0.20">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=chromadb&package-manager=uv&previous-version=1.0.16&new-version=1.0.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:49:11 -07:00
dependabot[bot]	620212e920	chore(ui-deps): bump @radix-ui/react-collapsible from 1.1.11 to 1.1.12 in /llama_stack/ui (#3218 ) Bumps [@radix-ui/react-collapsible](https://github.com/radix-ui/primitives) from 1.1.11 to 1.1.12. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/radix-ui/primitives/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-collapsible&package-manager=npm_and_yarn&previous-version=1.1.11&new-version=1.1.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:48:53 -07:00
dependabot[bot]	65d09c442d	chore(ui-deps): bump eslint-config-prettier from 10.1.5 to 10.1.8 in /llama_stack/ui (#3220 ) Bumps [eslint-config-prettier](https://github.com/prettier/eslint-config-prettier) from 10.1.5 to 10.1.8. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/prettier/eslint-config-prettier/releases">eslint-config-prettier's releases</a>.</em></p> <blockquote> <h2>v10.1.8</h2> <p>republish latest version</p> <p><strong>Full Changelog</strong>: <a href="https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8">https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/prettier/eslint-config-prettier/blob/main/CHANGELOG.md">eslint-config-prettier's changelog</a>.</em></p> <blockquote> <h1>eslint-config-prettier</h1> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`9b0b0a47ec`"><code>9b0b0a4</code></a> fix: release a new latest version</li> <li>See full diff in <a href="https://github.com/prettier/eslint-config-prettier/compare/v10.1.5...v10.1.8">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-config-prettier&package-manager=npm_and_yarn&previous-version=10.1.5&new-version=10.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:48:35 -07:00
dependabot[bot]	90b7c2317e	chore(ui-deps): bump @radix-ui/react-separator from 1.1.6 to 1.1.7 in /llama_stack/ui (#3222 ) Bumps [@radix-ui/react-separator](https://github.com/radix-ui/primitives) from 1.1.6 to 1.1.7. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/radix-ui/primitives/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@radix-ui/react-separator&package-manager=npm_and_yarn&previous-version=1.1.6&new-version=1.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:48:20 -07:00
dependabot[bot]	0473a32619	chore(ui-deps): bump tailwind-merge from 3.3.0 to 3.3.1 in /llama_stack/ui (#3223 ) Bumps [tailwind-merge](https://github.com/dcastil/tailwind-merge) from 3.3.0 to 3.3.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dcastil/tailwind-merge/releases">tailwind-merge's releases</a>.</em></p> <blockquote> <h2>v3.3.1</h2> <h3>Bug Fixes</h3> <ul> <li>Fix arbitrary value using <code>color-mix()</code> not being detected as color by <a href="https://github.com/dcastil"><code>@dcastil</code></a> in <a href="https://redirect.github.com/dcastil/tailwind-merge/pull/591">dcastil/tailwind-merge#591</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1">https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1</a></p> <p>Thanks to <a href="https://github.com/brandonmcconnell"><code>@brandonmcconnell</code></a>, <a href="https://github.com/manavm1990"><code>@manavm1990</code></a>, <a href="https://github.com/langy"><code>@langy</code></a>, <a href="https://github.com/roboflow"><code>@roboflow</code></a>, <a href="https://github.com/syntaxfm"><code>@syntaxfm</code></a>, <a href="https://github.com/getsentry"><code>@getsentry</code></a>, <a href="https://github.com/codecov"><code>@codecov</code></a>, <a href="https://github.com/sourcegraph"><code>@sourcegraph</code></a>, a private sponsor, <a href="https://github.com/block"><code>@block</code></a> and <a href="https://github.com/shawt3000"><code>@shawt3000</code></a> for sponsoring tailwind-merge! ❤️</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`40d8feed6a`"><code>40d8fee</code></a> v3.3.1</li> <li><a href="`429ea54ac8`"><code>429ea54</code></a> add changelog for v3.3.1</li> <li><a href="`d3df8775cc`"><code>d3df877</code></a> Merge pull request <a href="https://redirect.github.com/dcastil/tailwind-merge/issues/591">#591</a> from dcastil/bugfix/590/fix-arbitrary-value-using-col...</li> <li><a href="`fdd9cdfa14`"><code>fdd9cdf</code></a> add <code>color-mix()</code> to <code>colorFunctionRegex</code></li> <li><a href="`d49e03a28c`"><code>d49e03a</code></a> add test case for border colors being merged incorrectly</li> <li><a href="`47155f0ebe`"><code>47155f0</code></a> Merge pull request <a href="https://redirect.github.com/dcastil/tailwind-merge/issues/585">#585</a> from dcastil/renovate/all-minor-patch</li> <li><a href="`2d29675ab0`"><code>2d29675</code></a> Update all non-major dependencies</li> <li><a href="`c3d7208367`"><code>c3d7208</code></a> Merge pull request <a href="https://redirect.github.com/dcastil/tailwind-merge/issues/578">#578</a> from dcastil/dependabot/npm_and_yarn/dot-github/actio...</li> <li><a href="`527214bf13`"><code>527214b</code></a> Bump undici from 5.28.5 to 5.29.0 in /.github/actions/metrics-report</li> <li>See full diff in <a href="https://github.com/dcastil/tailwind-merge/compare/v3.3.0...v3.3.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tailwind-merge&package-manager=npm_and_yarn&previous-version=3.3.0&new-version=3.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:48:05 -07:00
dependabot[bot]	09bee51d6b	chore(python-deps): bump locust from 2.38.0 to 2.39.0 (#3221 ) Bumps [locust](https://github.com/locustio/locust) from 2.38.0 to 2.39.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/locustio/locust/releases">locust's releases</a>.</em></p> <blockquote> <h2>2.39.0</h2> <h2>What's Changed</h2> <ul> <li>Add MilvusUser and example by <a href="https://github.com/zhuwenxing"><code>@zhuwenxing</code></a> in <a href="https://redirect.github.com/locustio/locust/pull/3168">locustio/locust#3168</a></li> <li>Add SocketIOUser by <a href="https://github.com/cyberw"><code>@cyberw</code></a> in <a href="https://redirect.github.com/locustio/locust/pull/3189">locustio/locust#3189</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/zhuwenxing"><code>@zhuwenxing</code></a> made their first contribution in <a href="https://redirect.github.com/locustio/locust/pull/3168">locustio/locust#3168</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/locustio/locust/compare/2.38.1...2.39.0">https://github.com/locustio/locust/compare/2.38.1...2.39.0</a></p> <h2>2.38.1</h2> <h2>What's Changed</h2> <ul> <li>Fix test flakyness and update error message by <a href="https://github.com/amadeuppereira"><code>@amadeuppereira</code></a> in <a href="https://redirect.github.com/locustio/locust/pull/3187">locustio/locust#3187</a></li> <li>FastHttpUser: Dont send zstd in Accept-Encoding header by <a href="https://github.com/cyberw"><code>@cyberw</code></a> in <a href="https://redirect.github.com/locustio/locust/pull/3188">locustio/locust#3188</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/locustio/locust/compare/2.38.0...2.38.1">https://github.com/locustio/locust/compare/2.38.0...2.38.1</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/locustio/locust/blob/master/CHANGELOG.md">locust's changelog</a>.</em></p> <blockquote> <h1>Detailed changelog</h1> <p>The most important changes can also be found in <a href="https://docs.locust.io/en/latest/changelog.html">the documentation</a>.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`1810fef1ae`"><code>1810fef</code></a> Tiny doc fixes</li> <li><a href="`48b4dfce8f`"><code>48b4dfc</code></a> Link SocketIOUser from main docs.</li> <li><a href="`6e4fd7f067`"><code>6e4fd7f</code></a> Merge pull request <a href="https://redirect.github.com/locustio/locust/issues/3189">#3189</a> from locustio/Add-SocketioUser</li> <li><a href="`95eca45476`"><code>95eca45</code></a> better documentation of on_message</li> <li><a href="`a56ef663af`"><code>a56ef66</code></a> SocketIOUser docs: Link to example on GH</li> <li><a href="`adaa71b5f9`"><code>adaa71b</code></a> SocketIOUser, add method docstrings and link to python-socketio's readthedocs</li> <li><a href="`9fb3ff0f89`"><code>9fb3ff0</code></a> Add testcase for SocketIOUser</li> <li><a href="`7047247f9d`"><code>7047247</code></a> SocketIOUser: Fix use of environment object. Remove SocketIOClient.</li> <li><a href="`f8ddc9c798`"><code>f8ddc9c</code></a> rename socketio echo_server</li> <li><a href="`ae28acf027`"><code>ae28acf</code></a> add contrib dependencies to docs build</li> <li>Additional commits viewable in <a href="https://github.com/locustio/locust/compare/2.38.0...2.39.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=locust&package-manager=uv&previous-version=2.38.0&new-version=2.39.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:47:46 -07:00
dependabot[bot]	eff97f122b	chore(python-deps): bump weaviate-client from 4.16.5 to 4.16.9 (#3219 ) Bumps [weaviate-client](https://github.com/weaviate/weaviate-python-client) from 4.16.5 to 4.16.9. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's releases</a>.</em></p> <blockquote> <h2>v4.16.9</h2> <h2>What's Changed</h2> <ul> <li>Deprecate broken method by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1795">weaviate/weaviate-python-client#1795</a></li> <li>Improve user create docstring by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1796">weaviate/weaviate-python-client#1796</a></li> <li>Fixup dependencies for package test by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1791">weaviate/weaviate-python-client#1791</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.8...v4.16.9">https://github.com/weaviate/weaviate-python-client/compare/v4.16.8...v4.16.9</a></p> <h2>v4.16.8</h2> <h2>What's Changed</h2> <ul> <li>Add backup list endpoint by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1785">weaviate/weaviate-python-client#1785</a></li> <li>Attempt further fix of protobuf runtime stub incompatibilities by <a href="https://github.com/tsmith023"><code>@tsmith023</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1788">weaviate/weaviate-python-client#1788</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.7...v4.16.8">https://github.com/weaviate/weaviate-python-client/compare/v4.16.7...v4.16.8</a></p> <h2>v4.16.6</h2> <h2>What's Changed</h2> <ul> <li>rq: Add bits to the update method by <a href="https://github.com/rlmanrique"><code>@rlmanrique</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1766">weaviate/weaviate-python-client#1766</a></li> <li>Deprecate contextionar, add model2vec and dimension parameter for transformers by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1773">weaviate/weaviate-python-client#1773</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.6">https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.6</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's changelog</a>.</em></p> <blockquote> <h2>Version 4.16.9</h2> <p>This patch version includes: - Explicitly depend on protobuf package</p> <h2>Version 4.16.8</h2> <p>This patch version includes: - Further attempted fixes for <code>protobuf</code> compatability issues - Introduction of the <code>backups.list()</code> method</p> <h2>Version 4.16.7</h2> <p>This patch version includes: - Fixes compatability issues between the built gRPC stubs and differing protobuf versions depending on the version of <code>grpcio</code> used to build the stubs - Add <code>text2vec-model2vec</code> module to <code>Configure.NamedVectors</code> - Deprecated <code>min_occurrences</code> in <code>Metrics.text</code> in favour of <code>limit</code></p> <h2>Version 4.16.6</h2> <p>This patch version includes: - Add <code>dimensions</code> property to <code>text2vec-transformers</code> vectorizers in <code>Configure.Vectors</code> - Add <code>text2vec-model2vec</code> vectorizer in <code>Configure.Vectors</code> - Deprecate <code>text2vec-contextionary</code> vectorizer</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c69cfa124e`"><code>c69cfa1</code></a> Fixup dependencies for package test (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1791">#1791</a>)</li> <li><a href="`334380b6d4`"><code>334380b</code></a> Merge pull request <a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1796">#1796</a> from weaviate/docstring_user_create</li> <li><a href="`c7b8c75893`"><code>c7b8c75</code></a> Improve user create docstring</li> <li><a href="`93c865a23e`"><code>93c865a</code></a> Merge pull request <a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1795">#1795</a> from weaviate/deprecate_broken_method</li> <li><a href="`ba05f5f1ad`"><code>ba05f5f</code></a> Deprecate broken method</li> <li><a href="`4bef4b8210`"><code>4bef4b8</code></a> Update changelog (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1789">#1789</a>)</li> <li><a href="`c370bf5fa2`"><code>c370bf5</code></a> Attempt further fix of protobuf runtime stub incompatibilities (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1788">#1788</a>)</li> <li><a href="`98db3b1187`"><code>98db3b1</code></a> Merge pull request <a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1785">#1785</a> from weaviate/add_list_response</li> <li><a href="`ebf2b30252`"><code>ebf2b30</code></a> Merge pull request <a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1782">#1782</a> from weaviate/dependabot/pip/ruff-0.12.8</li> <li><a href="`88ad1c113b`"><code>88ad1c1</code></a> Fix version in CI</li> <li>Additional commits viewable in <a href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.5...v4.16.9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.5&new-version=4.16.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-20 16:47:33 -07:00
Ashwin Bharambe	f328ff6e98	fix(ci): dependabot update had a bug	2025-08-20 16:34:50 -07:00
Francisco Arceo	49060c3020	chore: Update dependabot to capture package-lock.json (#3212 ) # What does this PR do? This should fix dependabot based on this thread: https://stackoverflow.com/questions/60201543/dependabot-only-updates-lock-file <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-20 15:05:12 -07:00
grs	14082b22af	fix: handle mcp tool calls in previous response correctly (#3155 ) # What does this PR do? Handles MCP tool calls in a previous response Closes #3105 ## Test Plan Made call to create response with tool call, then made second call with the first linked through previous_response_id. Did not get error. Also added unit test. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-08-20 14:12:15 -07:00
Omer Tuchfeld	00a67da449	fix: Use `pool_pre_ping=True` in SQLAlchemy engine creation (#3208 ) # What does this PR do? We noticed that when llama-stack is running for a long time, we would run into database errors when trying to run messages through the agent (which we configured to persist against postgres), seemingly due to the database connections being stale or disconnected. This commit adds `pool_pre_ping=True` to the SQLAlchemy engine creation to help mitigate this issue by checking the connection before using it, and re-establishing it if necessary. More information in: https://docs.sqlalchemy.org/en/20/core/pooling.html#dealing-with-disconnects We're also open to other suggestions on how to handle this issue, this PR is just a suggestion. ## Test Plan We have not tested it yet (we're in the process of doing that) and we're hoping it's going to resolve our issue.	2025-08-20 13:52:05 -07:00
Francisco Arceo	e195ee3091	fix: Fix broken package-lock.json (#3209 ) # What does this PR do? Fix broken `package-lock.json` not caught by [github bot in this commit](`7f0b2a8764`). <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-20 13:11:44 -07:00
Matthew Farrellee	c2c859a6b0	chore(files tests): update files integration tests and fix inline::localfs (#3195 ) - update files=inline::localfs to raise ResourceNotFoundError instead of ValueError - only skip tests when no files provider is available - directly use openai_client and llama_stack_client where appropriate - check for correct behavior of non-existent file - xfail the isolation test, no implementation supports it test plan - ``` $ uv run ./scripts/integration-tests.sh --stack-config server:ci-tests --provider ollama --test-subdirs files ... tests/integration/files/test_files.py::test_openai_client_basic_operations PASSED [ 25%] tests/integration/files/test_files.py::test_files_authentication_isolation XFAIL [ 50%] tests/integration/files/test_files.py::test_files_authentication_shared_attributes PASSED [ 75%] tests/integration/files/test_files.py::test_files_authentication_anonymous_access PASSED [100%] ==================================== 3 passed, 1 xfailed in 1.03s ===================================== ``` previously - ``` $ uv run llama stack build --image-type venv --providers files=inline::localfs --run & ... $ ./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files ... tests/integration/files/test_files.py::test_openai_client_basic_operations[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 12%] tests/integration/files/test_files.py::test_files_authentication_isolation[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 25%] tests/integration/files/test_files.py::test_files_authentication_shared_attributes[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 37%] tests/integration/files/test_files.py::test_files_authentication_anonymous_access[openai_client-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 50%] tests/integration/files/test_files.py::test_openai_client_basic_operations[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] PASSED [ 62%] tests/integration/files/test_files.py::test_files_authentication_isolation[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 75%] tests/integration/files/test_files.py::test_files_authentication_shared_attributes[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [ 87%] tests/integration/files/test_files.py::test_files_authentication_anonymous_access[client_with_models-ollama/llama3.2:3b-instruct-fp16-None-sentence-transformers/all-MiniLM-L6-v2-None-384] SKIPPED [100%] ========================================================= 2 passed, 6 skipped in 1.31s ========================================================== ```	2025-08-20 14:22:40 -04:00
Jiayi Ni	55e9959f62	fix: fix ```openai_embeddings``` for asymmetric embedding NIMs (#3205 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Python Package Build Test / build (3.12) (push) Failing after 49s Details Test Llama Stack Build / build (push) Failing after 54s Details UI Tests / ui-tests (22) (push) Failing after 1m26s Details Pre-commit / pre-commit (push) Successful in 2m24s Details # What does this PR do? NVIDIA asymmetric embedding models (e.g., `nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter not present in the standard OpenAI embeddings API. This PR adds the `input_type="query"` as default and updates the documentation to suggest using the `embedding` API for passage embeddings. <!-- If resolving an issue, uncomment and update the line below --> Resolves #2892 ## Test Plan ``` pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-08-20 08:06:25 -04:00
Mustafa Elbehery	3f8df167f3	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 ) # What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-08-20 07:15:35 -04:00
Matthew Farrellee	5f151ddf45	fix: disable ui-prettier & ui-eslint (#3207 )	2025-08-20 06:42:43 -04:00
Francisco Arceo	5f6d5072b6	chore: Faster npm pre-commit (#3206 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 13s Details Vector IO Integration Tests / test-matrix (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 23s Details Test Llama Stack Build / build (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 25s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 34s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 58s Details Update ReadTheDocs / update-readthedocs (push) Failing after 55s Details UI Tests / ui-tests (22) (push) Failing after 1m18s Details Test External API and Providers / test-external (venv) (push) Failing after 2m2s Details Pre-commit / pre-commit (push) Failing after 2m43s Details # What does this PR do? Adds npm to pre-commit.yml installation and caches ui Removes node installation during pre-commit. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-19 16:38:38 -07:00
github-actions[bot]	7f0b2a8764	build: Bump version to 0.2.18	2025-08-19 22:38:23 +00:00
Matthew Farrellee	e7a812f5de	chore: Fixup main pre commit (#3204 )	2025-08-19 14:52:38 -04:00
Varsha	8cc4925f7d	chore: Enable keyword search for Milvus inline (#3073 ) # What does this PR do? With https://github.com/milvus-io/milvus-lite/pull/294 - Milvus Lite supports keyword search using BM25. While introducing keyword search we had explicitly disabled it for inline milvus. This PR removes the need for the check, and enables `inline::milvus` for tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Run llama stack with `inline::milvus` enabled: ``` pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v ``` ``` INFO 2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS =========================================================================================== test session starts ============================================================================================ platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 3 items tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED [ 33%] tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED [ 66%] tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED [100%] ============================================================================================ 3 passed in 4.75s ============================================================================================= ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-19 13:01:23 -04:00
Ashwin Bharambe	eb07a0f86a	fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Test Llama Stack Build / build-single-provider (push) Failing after 23s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 28s Details Test Llama Stack Build / generate-matrix (push) Successful in 25s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 37s Details Test External API and Providers / test-external (venv) (push) Failing after 33s Details Unit Tests / unit-tests (3.13) (push) Failing after 33s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 38s Details Python Package Build Test / build (3.12) (push) Failing after 1m0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m4s Details Unit Tests / unit-tests (3.12) (push) Failing after 59s Details Test Llama Stack Build / build (push) Failing after 50s Details Vector IO Integration Tests / test-matrix (push) Failing after 1m48s Details UI Tests / ui-tests (22) (push) Successful in 2m12s Details Pre-commit / pre-commit (push) Successful in 2m41s Details I started this PR trying to unbreak a newly broken test `test_agent_name`. This test was broken all along but did not show up because during testing we were pulling the "non-updated" llama stack client. See this comment: https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205 While fixing this, I encountered a large amount of badness in our CI workflow definitions. - We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR` overrides to `llama stack build` at all in some cases. - Even when we did, we used `uv run` liberally. The first thing `uv run` does is "syncs" the project environment. This means, it is going to undo any mutations we might have done ourselves. But we make many mutations in our CI runners to these environments. The most important of which is why `llama stack build` where we install distro dependencies. As a result, when you tried to run the integration tests, you would see old, strange versions. ## Test Plan Re-record using: ``` sh scripts/integration-tests.sh --stack-config ci-tests \ --provider ollama --test-pattern test_agent_name --inference-mode record ``` Then re-run with `--inference-mode replay`. But: Eventually, this test turned out to be quite flaky for telemetry reasons. I haven't investigated it for now and just disabled it sadly since we have a release to push out.	2025-08-18 17:02:24 -07:00
Francisco Arceo	ac78e9f66a	chore: Adding UI unit tests in CI (#3191 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m2s Details Python Package Build Test / build (3.13) (push) Failing after 1m4s Details UI Tests / ui-tests (22) (push) Successful in 1m33s Details Pre-commit / pre-commit (push) Successful in 2m38s Details	2025-08-18 16:48:21 -06:00
Ashwin Bharambe	89661b984c	revert: "feat(cli): make venv the default image type" (#3196 ) Reverts llamastack/llama-stack#3187	2025-08-18 15:31:01 -07:00
Ashwin Bharambe	2e7ca07423	feat(cli): make venv the default image type (#3187 ) We have removed conda now so we can make `venv` the default. Just doing `llama stack build --distro starter` is now enough for the most part.	2025-08-18 14:58:23 -07:00
slekkala1	7519ab4024	feat: Code scanner Provider impl for moderations api (#3100 ) # What does this PR do? Add CodeScanner implementations ## Test Plan `SAFETY_MODEL=CodeScanner LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` This PR need to land after this https://github.com/meta-llama/llama-stack/pull/3098	2025-08-18 14:15:40 -07:00
Ashwin Bharambe	27d6becfd0	fix(misc): pin openai dependency to < 1.100.0 (#3192 ) This OpenAI client release `0843a11164` ends up breaking litellm `169a17400f/litellm/types/llms/openai.py (L40)` Update the dependency pin. Also make the imports a bit more defensive anyhow if something else during `llama stack build` ends up moving openai to a previous version. ## Test Plan Run pre-release script integration tests.	2025-08-18 12:20:50 -07:00
IAN MILLER	f8398d25ff	fix: kill build_conda_env.sh (#3190 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> I noticed somehow [build_conda_env.sh](https://github.com/llamastack/llama-stack/blob/main/llama_stack/core/build_conda_env.sh) exists in main branch. We need to kill it to be consistent with [#2969](https://github.com/llamastack/llama-stack/pull/2969) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-18 12:17:44 -07:00
Maor Friedman	739b18edf8	feat: add support for postgres ssl mode and root cert (#3182 ) this PR adds support for configuring `sslmode` and `sslrootcert` when initiating the psycopg2 connection. closes #3181	2025-08-18 10:24:24 -07:00
Francisco Arceo	fa431e15e0	chore: Update TRIAGERS.md (#3186 ) # What does this PR do? Update triagers to current state ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-18 10:23:51 -07:00
Charlie Doern	4ae39b94ff	fix: remove category prints (#3189 ) # What does this PR do? commands where the output is important like `llama stack build --print-deps-only` (soon to be `llama stack show`) print some log.py `cprint`'s on _every_ execution of the CLI for example: <img width="912" height="331" alt="Screenshot 2025-08-18 at 1 16 30 PM" src="https://github.com/user-attachments/assets/e5bf18fb-74a1-438c-861a-8a26eea7d014" /> the yellow text is likely unnecessary. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-18 10:23:23 -07:00
Ashwin Bharambe	f4cecaade9	chore(ci): dont run llama stack server always (#3188 ) Sometimes the server has already been started (e.g., via docker). Just a convenience here so we can reuse this script more.	2025-08-18 10:11:55 -07:00
Francisco Arceo	a8091d0c6a	chore: Update benchmarking location in contributing docs (#3180 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (push) Failing after 19s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 24s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details Unit Tests / unit-tests (3.13) (push) Failing after 57s Details Pre-commit / pre-commit (push) Successful in 2m11s Details # What does this PR do? Small docs change as requested in https://github.com/llamastack/llama-stack/pull/3160#pullrequestreview-3125038932 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-18 08:04:21 -04:00
Ashwin Bharambe	5e7c2250be	test(recording): add a script to schedule recording workflow (#3170 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Pre-commit / pre-commit (push) Successful in 1m19s Details See comment here: https://github.com/llamastack/llama-stack/pull/3162#issuecomment-3192859097 -- TL;DR it is quite complex to invoke the recording workflow correctly for an end developer writing tests. This script simplifies the work. No more manual GitHub UI navigation! ## Script Functionality - Auto-detects your current branch and associated PR - Finds the right repository context (works from forks!) - Runs the workflow where it can actually commit back - Validates prerequisites and provides helpful error messages ## How to Use First ensure you are on the branch which introduced a new test and want it recorded. Make sure you have pushed this branch remotely, easiest is to create a PR. ``` # Record tests for current branch ./scripts/github/schedule-record-workflow.sh # Record specific test subdirectories ./scripts/github/schedule-record-workflow.sh --test-subdirs "agents,inference" # Record with vision tests enabled ./scripts/github/schedule-record-workflow.sh --run-vision-tests # Record tests matching a pattern ./scripts/github/schedule-record-workflow.sh --test-pattern "test_streaming" ``` ## Test Plan Ran `./scripts/github/schedule-record-workflow.sh -s inference -k tool_choice` which started `4820409329` which successfully committed recorded outputs.	2025-08-15 16:54:34 -07:00
Matthew Farrellee	914c7be288	feat: add batches API with OpenAI compatibility (with inference replay) (#3162 ) Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-15 15:34:15 -07:00
Ashwin Bharambe	f4ccdee200	fix(ci): skip batches directory for library client testing	2025-08-15 15:30:03 -07:00
Ashwin Bharambe	0e8bb94bf3	feat(ci): make recording workflow simpler, more parameterizable (#3169 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Test External API and Providers / test-external (venv) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (push) Failing after 28s Details Unit Tests / unit-tests (3.12) (push) Failing after 27s Details Unit Tests / unit-tests (3.13) (push) Failing after 51s Details Pre-commit / pre-commit (push) Successful in 2m6s Details # What does this PR do? Recording tests has become a nightmare. This is the first part of making that process simpler by making it _less_ automatic. I tried to be too clever earlier. It simplifies the record-integration-tests workflow to use workflow dispatch inputs instead of PR labels. No more opaque stuff. Just go to the GitHub UI and run the workflow with inputs. I will soon add a helper script for this also. Other things to aid re-running just the small set of things you need to re-record: - Replaces the `test-types` JSON array parameter with a more intuitive `test-subdirs` comma-separated list. The whole JSON array crap was for matrix. - Adds a new `test-pattern` parameter to allow filtering tests using pytest's `-k` option ## Test Plan Note that this PR is in a fork not the source repository. - Replay tests on this PR are green - Manually [ran](`1699856292`) the replay workflow with a test-subdir and test-pattern filter, worked - Manually [ran](`4819508034`) the record workflow with a simple pattern, it has worked and updated _this_ PR. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-15 14:47:20 -07:00
Ashwin Bharambe	a6e2c18909	Revert "refactor(agents): migrate to OpenAI chat completions API" (#3167 ) Reverts llamastack/llama-stack#3097 It has broken agents tests.	2025-08-15 12:01:07 -07:00
ehhuang	2c06b24c77	test: benchmark scripts (#3160 ) # What does this PR do? 1. Add our own benchmark script instead of locust (doesn't support measuring streaming latency well) 2. Simplify k8s deployment 3. Add a simple profile script for locally running server ## Test Plan ❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10 ============================================================ BENCHMARK RESULTS ============================================================ Total time: 180.00s Concurrent users: 10 Total requests: 1636 Successful requests: 1636 Failed requests: 0 Success rate: 100.0% Requests per second: 9.09 Response Time Statistics: Mean: 1.095s Median: 1.721s Min: 0.136s Max: 3.218s Std Dev: 0.762s Percentiles: P50: 1.721s P90: 1.751s P95: 1.756s P99: 1.796s Time to First Token (TTFT) Statistics: Mean: 0.037s Median: 0.037s Min: 0.023s Max: 0.211s Std Dev: 0.011s TTFT Percentiles: P50: 0.037s P90: 0.040s P95: 0.044s P99: 0.055s Streaming Statistics: Mean chunks per response: 64.0 Total chunks received: 104775	2025-08-15 11:24:29 -07:00
dependabot[bot]	2114214fe3	chore(python-deps): bump huggingface-hub from 0.34.3 to 0.34.4 (#3084 ) Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub) from 0.34.3 to 0.34.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's releases</a>.</em></p> <blockquote> <h2>[v0.34.4] Support Image to Video inference + QoL in jobs API, auth and utilities</h2> <p>Biggest update is the support of Image-To-Video task with inference provider Fal AI</p> <ul> <li>[Inference] Support image to video task <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a> by <a href="https://github.com/hanouticelina"><code>@hanouticelina</code></a></li> </ul> <pre lang="py"><code>>>> from huggingface_hub import InferenceClient >>> client = InferenceClient() >>> video = client.image_to_video("cat.jpg", model="Wan-AI/Wan2.2-I2V-A14B", prompt="turn the cat into a tiger") >>> with open("tiger.mp4", "wb") as f: ... f.write(video) </code></pre> <p>And some quality of life improvements:</p> <ul> <li>Add type to job owner <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a> by <a href="https://github.com/drbh"><code>@drbh</code></a></li> <li>Include HF_HUB_DISABLE_XET in the environment dump <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a> by <a href="https://github.com/hanouticelina"><code>@hanouticelina</code></a></li> <li>Whoami: custom message only on unauthorized <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a> by <a href="https://github.com/Wauplin"><code>@Wauplin</code></a></li> <li>Add validation warnings for repository limits in upload_large_folder <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a> by <a href="https://github.com/davanstrien"><code>@davanstrien</code></a></li> <li>Add timeout info to Jobs guide docs <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a> by <a href="https://github.com/davanstrien"><code>@davanstrien</code></a></li> <li>[Jobs] Use current or stored token in a Job secrets <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a> by <a href="https://github.com/lhoestq"><code>@lhoestq</code></a></li> <li>Fix bash history expansion in hf jobs example <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a> by <a href="https://github.com/nyuuzyou"><code>@nyuuzyou</code></a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`84a92a92c2`"><code>84a92a9</code></a> Release: v0.34.4</li> <li><a href="`6196ac2cbc`"><code>6196ac2</code></a> Add type to job owner (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a>)</li> <li><a href="`4f6975f697`"><code>4f6975f</code></a> Include <code>HF_HUB_DISABLE_XET</code> in the environment dump (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a>)</li> <li><a href="`3720a5096f`"><code>3720a50</code></a> [Inference] Support image to video task (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a>)</li> <li><a href="`bb5e4c7a2c`"><code>bb5e4c7</code></a> Whoami: custom message only on unauthorized (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a>)</li> <li><a href="`a725256f31`"><code>a725256</code></a> Add validation warnings for repository limits in upload_large_folder (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a>)</li> <li><a href="`a181b0f088`"><code>a181b0f</code></a> Add timeout info to Jobs guide docs (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a>)</li> <li><a href="`4d38925c8d`"><code>4d38925</code></a> [Jobs] Use current or stored token in a Job secrets (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a>)</li> <li><a href="`1580ce18c7`"><code>1580ce1</code></a> Fix bash history expansion in hf jobs example (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a>)</li> <li>See full diff in <a href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=huggingface-hub&package-manager=uv&previous-version=0.34.3&new-version=0.34.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-15 10:55:43 -07:00
dependabot[bot]	a275282685	chore(python-deps): bump pymilvus from 2.5.14 to 2.6.0 (#3086 ) Bumps [pymilvus](https://github.com/milvus-io/pymilvus) from 2.5.14 to 2.6.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/milvus-io/pymilvus/releases">pymilvus's releases</a>.</em></p> <blockquote> <h2>PyMilvus v2.6.0 Release Notes</h2> <h2>New Features</h2> <ol> <li>Add APIs in MilvusClient</li> </ol> <ul> <li>enhance: add describe and alter database in MilvusClient by <a href="https://github.com/smellthemoon"><code>@smellthemoon</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2433">milvus-io/pymilvus#2433</a></li> <li>enhance: support milvus-client iterator by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2461">milvus-io/pymilvus#2461</a></li> <li>enhance: Enable resource group api in milvus client by <a href="https://github.com/weiliu1031"><code>@weiliu1031</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2513">milvus-io/pymilvus#2513</a></li> <li>enhance: add release_collection, drop_index, create_partition, drop_partition, load_partition and release_partition by <a href="https://github.com/brcarry"><code>@brcarry</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2525">milvus-io/pymilvus#2525</a></li> <li>enhance: enable describe_replica api in milvus client by <a href="https://github.com/weiliu1031"><code>@weiliu1031</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2541">milvus-io/pymilvus#2541</a></li> <li>enhance: support recalls for milvus_client by <a href="https://github.com/chasingegg"><code>@chasingegg</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2552">milvus-io/pymilvus#2552</a></li> <li>enhance: add use_database by <a href="https://github.com/czs007"><code>@czs007</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2491">milvus-io/pymilvus#2491</a></li> </ul> <ol start="2"> <li>Add AsyncMilvusClient</li> </ol> <ul> <li>[FEAT] Asyncio support by <a href="https://github.com/brcarry"><code>@brcarry</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2411">milvus-io/pymilvus#2411</a></li> <li>Add async DDL funcs & DDL examples by <a href="https://github.com/Shawnzheng011019"><code>@Shawnzheng011019</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2852">milvus-io/pymilvus#2852</a></li> </ul> <ol start="3"> <li>Other features</li> </ol> <ul> <li>enhance: support Int8Vector by <a href="https://github.com/cydrain"><code>@cydrain</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2611">milvus-io/pymilvus#2611</a></li> <li>feat: support recalls field in SearchResult by <a href="https://github.com/chasingegg"><code>@chasingegg</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2390">milvus-io/pymilvus#2390</a></li> <li>enhance: Support Python3.13 and upgrade grpcio range by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2684">milvus-io/pymilvus#2684</a></li> <li>enhance: support run analyzer return detail token by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2679">milvus-io/pymilvus#2679</a></li> <li>enhance: Add force_drop parameter to drop_role method for role deletion by <a href="https://github.com/SimFG"><code>@SimFG</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2705">milvus-io/pymilvus#2705</a></li> <li>enhance: add property func for AnalyzeToken by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2704">milvus-io/pymilvus#2704</a></li> <li>enhance: grant/revoke v2 optional db and collection params by <a href="https://github.com/shaoting-huang"><code>@shaoting-huang</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2386">milvus-io/pymilvus#2386</a></li> <li>extend unlimted offset for query iterator(<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2418">#2418</a>) by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2419">milvus-io/pymilvus#2419</a></li> <li>enhance: alterindex & altercollection supports altering properties by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2406">milvus-io/pymilvus#2406</a></li> <li>enhance: alterdatabase support delete property by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2435">milvus-io/pymilvus#2435</a></li> <li>enhance: support hints param by <a href="https://github.com/chasingegg"><code>@chasingegg</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2408">milvus-io/pymilvus#2408</a></li> <li>enhance: create database support properties by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2448">milvus-io/pymilvus#2448</a></li> <li>enhance: Add <code>db_name</code> parameter at <code>bulk_import</code> by <a href="https://github.com/counter2015"><code>@counter2015</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2446">milvus-io/pymilvus#2446</a></li> <li>enhance: add search iterator v2 by <a href="https://github.com/PwzXxm"><code>@PwzXxm</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2395">milvus-io/pymilvus#2395</a></li> <li>enhance: simplify the structure of search_params by <a href="https://github.com/smellthemoon"><code>@smellthemoon</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2507">milvus-io/pymilvus#2507</a></li> <li>enhance: Remove long deprecated Milvus class by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2544">milvus-io/pymilvus#2544</a></li> <li>enhance: Use new model pkg by <a href="https://github.com/junjiejiangjjj"><code>@junjiejiangjjj</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2595">milvus-io/pymilvus#2595</a></li> <li>enhance: Add schema update time verification to insert and upsert to use cache by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2551">milvus-io/pymilvus#2551</a></li> <li>enhance: describecollection output add created_timestamp by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2618">milvus-io/pymilvus#2618</a></li> <li>feat: add external filter func for search iterator v2 by <a href="https://github.com/PwzXxm"><code>@PwzXxm</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2639">milvus-io/pymilvus#2639</a></li> <li>enhance: support run analyzer by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2622">milvus-io/pymilvus#2622</a></li> <li>weighted reranker to allow skip score normalization by <a href="https://github.com/zhengbuqian"><code>@zhengbuqian</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2708">milvus-io/pymilvus#2708</a></li> <li>enhance: Support AddCollectionField API by <a href="https://github.com/congqixia"><code>@congqixia</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2722">milvus-io/pymilvus#2722</a></li> <li>Add 1-Way and 2-Way TLS Support to Bulk Import Functions by <a href="https://github.com/abd-770"><code>@abd-770</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2672">milvus-io/pymilvus#2672</a></li> <li>enhance: Use SearchResult in MilvusClient by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2735">milvus-io/pymilvus#2735</a></li> <li>Support rerank by <a href="https://github.com/junjiejiangjjj"><code>@junjiejiangjjj</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2729">milvus-io/pymilvus#2729</a></li> <li>feat: suppoprt multi analyzer params by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2747">milvus-io/pymilvus#2747</a></li> <li>Add funciton checker by <a href="https://github.com/junjiejiangjjj"><code>@junjiejiangjjj</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2760">milvus-io/pymilvus#2760</a></li> <li>enhance: Support run analyzer by collection and field by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2822">milvus-io/pymilvus#2822</a></li> <li>feat: support load collection/partition with priority(<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2835">#2835</a>) by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2836">milvus-io/pymilvus#2836</a></li> <li>enhance: optimize perf for large topk(<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2848">#2848</a>) by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2849">milvus-io/pymilvus#2849</a></li> <li>enhance: Add usage guide to manage MilvusClient by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2907">milvus-io/pymilvus#2907</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`1e56ce7d31`"><code>1e56ce7</code></a> enhance: Update milvus-proto and readme (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2921">#2921</a>)</li> <li><a href="`75052b1b7c`"><code>75052b1</code></a> enhance: Add usage guide to manage MilvusClient (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2907">#2907</a>)</li> <li><a href="`9f44053086`"><code>9f44053</code></a> add example code for language identifier and multi analyzer (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2919">#2919</a>)</li> <li><a href="`058836de26`"><code>058836d</code></a> fix: Return new pk value for upsert when autoid=true (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2914">#2914</a>)</li> <li><a href="`bbc6777565`"><code>bbc6777</code></a> [cherry-pick] Compatible with the default behavior of free on the cloud (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2913">#2913</a>)</li> <li><a href="`45080c39c5`"><code>45080c3</code></a> fix: Aviod coping functions when init CollectionSchema (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2902">#2902</a>)</li> <li><a href="`52b8461c5b`"><code>52b8461</code></a> [cherry-pick] bulk_import add stageName/dataPaths parameter (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2905">#2905</a>)</li> <li><a href="`a8c3120622`"><code>a8c3120</code></a> [cherry-pick] support stage (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2895">#2895</a>)</li> <li><a href="`3653effa88`"><code>3653eff</code></a> fix: Tidy alias configs when connect fails (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2900">#2900</a>)</li> <li><a href="`728791a7de`"><code>728791a</code></a> enhance: Store alias before wait for ready (<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2894">#2894</a>)</li> <li>Additional commits viewable in <a href="https://github.com/milvus-io/pymilvus/compare/v2.5.14...v2.6.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymilvus&package-manager=uv&previous-version=2.5.14&new-version=2.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-15 10:54:09 -07:00
Aakanksha Duggal	e743d3fdf6	refactor(agents): migrate to OpenAI chat completions API (#3097 ) Replace chat_completion calls with openai_chat_completion to eliminate dependency on legacy inference APIs. # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> Closes #3067 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-15 10:51:41 -07:00
ashwinb	f66ae3b3b1	docs(tests): Add a bunch of documentation for our testing systems (#3139 ) # What does this PR do? Creates a structured testing documentation section with multiple detailed pages: - Testing overview explaining the record-replay architecture - Integration testing guide with practical usage examples - Record-replay system technical documentation - Guide for writing effective tests - Troubleshooting guide for common testing issues Hopefully this makes things a bit easier.	2025-08-15 17:45:30 +00:00
Ashwin Bharambe	81ecaf6221	fix(ci): make the Vector IO CI follow the same pattern as others (#3164 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Python Package Build Test / build (3.13) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 1m19s Details # What does this PR do? Updates the integration-vector-io-tests workflow to run daily tests on Python 3.13 while limiting regular PR tests to Python 3.12 only. The PR also improves the concurrency configuration to prevent workflow conflicts between main branch runs and PR runs. ## Test Plan [![testinprod](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/WjlTemxb6oA4PgZFmj08/2645295d-f421-49ae-8f3f-f4672d8204e2/testinprod.jpeg)](https://app.graphite.dev/settings/meme-library?org=llamastack)	2025-08-14 21:06:08 -07:00
ashwinb	01b2afd4b5	fix(tests): record missing tests for test_responses_store (#3163 ) # What does this PR do? Updates test recordings. ## Test Plan Started ollama serving the 3.2:3b model. Then ran the server: ``` LLAMA_STACK_TEST_INFERENCE_MODE=record \ LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings/ \ SQLITE_STORE_DIR=$(mktemp -d) \ OLLAMA_URL=http://localhost:11434 \ llama stack build --template starter --image-type venv --run ``` Then ran the tests which needed recording: ``` pytest -sv tests/integration/agents/test_openai_responses.py \ --stack-config=server:starter \ --text-model ollama/llama3.2:3b-instruct-fp16 -k test_responses_store ``` Then, restarted the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay`, re-ran the tests and verified they passed.	2025-08-15 03:52:45 +00:00
ashwinb	8ed69978f9	refactor(tests): make the responses tests nicer (#3161 ) # What does this PR do? A _bunch_ on cleanup for the Responses tests. - Got rid of YAML test cases, moved them to just use simple pydantic models - Splitting the large monolithic test file into multiple focused test files: - `test_basic_responses.py` for basic and image response tests - `test_tool_responses.py` for tool-related tests - `test_file_search.py` for file search specific tests - Adding a `StreamingValidator` helper class to standardize streaming response validation ## Test Plan Run the tests: ``` pytest -s -v tests/integration/non_ci/responses/ \ --stack-config=starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ```	2025-08-15 00:05:36 +00:00
ashwinb	ba664474de	feat(responses): add mcp list tool streaming event (#3159 ) # What does this PR do? Adds proper streaming events for MCP tool listing (`mcp_list_tools.in_progress` and `mcp_list_tools.completed`). Also refactors things a bit more. ## Test Plan Verified existing integration tests pass with the refactored code. The test `test_response_streaming_multi_turn_tool_execution` has been updated to check for the new MCP list tools streaming events	2025-08-15 00:05:36 +00:00
ashwinb	9324e902f1	refactor(responses): move stuff into some utils and add unit tests (#3158 ) # What does this PR do? Refactors the OpenAI response conversion utilities by moving helper functions from `openai_responses.py` to `utils.py`. Adds unit tests.	2025-08-15 00:05:36 +00:00
ashwinb	47d5af703c	chore(responses): Refactor Responses Impl to be civilized (#3138 ) # What does this PR do? Refactors the OpenAI responses implementation by extracting streaming and tool execution logic into separate modules. This improves code organization by: 1. Creating a new `StreamingResponseOrchestrator` class in `streaming.py` to handle the streaming response generation logic 2. Moving tool execution functionality to a dedicated `ToolExecutor` class in `tool_executor.py` ## Test Plan Existing tests	2025-08-15 00:05:35 +00:00
Francisco Arceo	e69acbafbf	feat(UI): Adding linter and prettier for UI (#3156 )	2025-08-14 15:58:43 -06:00
Ashwin Bharambe	61582f327c	fix(ci): update triggers for the workflows (#3152 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 8s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 20s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Update ReadTheDocs / update-readthedocs (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s Details Test External API and Providers / test-external (venv) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 1m39s Details	2025-08-14 10:27:25 -07:00
Derek Higgins	c15cc7ed77	fix: use ChatCompletionMessageFunctionToolCall (#3142 ) The OpenAI compatibility layer was incorrectly importing ChatCompletionMessageToolCallParam instead of the ChatCompletionMessageFunctionToolCall class. This caused "Cannot instantiate typing.Union" errors when processing agent requests with tool calls. Closes: #3141 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-08-14 10:27:00 -07:00
Ashwin Bharambe	ee7631b6cf	Revert "feat: add batches API with OpenAI compatibility" (#3149 ) Reverts llamastack/llama-stack#3088 The PR broke integration tests.	2025-08-14 10:08:54 -07:00
Matthew Farrellee	de692162af	feat: add batches API with OpenAI compatibility (#3088 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s Details Unit Tests / unit-tests (3.12) (push) Failing after 20s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 22s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Update ReadTheDocs / update-readthedocs (push) Failing after 38s Details Pre-commit / pre-commit (push) Successful in 1m53s Details Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066	2025-08-14 09:42:02 -04:00
ehhuang	46ff302d87	chore: Remove Trendshift badge from README (#3137 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s Details Python Package Build Test / build (3.12) (push) Failing after 11s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 13s Details Python Package Build Test / build (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 49s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 51s Details Unit Tests / unit-tests (3.12) (push) Failing after 51s Details Pre-commit / pre-commit (push) Successful in 1m36s Details ## Summary - This links to a scammy looking website with ads. ## Test plan	2025-08-13 18:38:34 -07:00
Ashwin Bharambe	e1e161553c	feat(responses): add MCP argument streaming and content part events (#3136 ) # What does this PR do? Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces: 1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals 2. New streaming event types: - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete 3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones. ## Test Plan Updated existing streaming tests to verify content part events are properly emitted	2025-08-13 16:34:26 -07:00
Ashwin Bharambe	8638537d14	feat(responses): stream progress of tool calls (#3135 ) # What does this PR do? Enhances tool execution streaming by adding support for real-time progress events during tool calls. This implementation adds streaming events for MCP and web search tools, including in-progress, searching, completed, and failed states. The refactored `_execute_tool_call` method now returns an async iterator that yields streaming events throughout the tool execution lifecycle. ## Test Plan Updated the integration test `test_response_streaming_multi_turn_tool_execution` to verify the presence and structure of new streaming events, including: - Checking for MCP in-progress and completed events - Verifying that progress events contain required fields (item_id, output_index, sequence_number) - Ensuring completed events have the necessary sequence_number field	2025-08-13 16:31:25 -07:00
Ashwin Bharambe	5b312a80b9	feat(responses): improve streaming for function calls (#3124 ) Some checks failed Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 22s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 1m10s Details Test Llama Stack Build / build (push) Failing after 12s Details Emit streaming events for function calls ## Test Plan Improved the test case	2025-08-13 11:23:27 -07:00
ehhuang	d6ae54723d	chore: setup for performance benchmarking (#3096 ) # What does this PR do? 1. Added a simple mock openai-compat server that serves chat/completion 2. Add a benchmark server in EKS that includes mock inference server 3. Add locust (https://locust.io/) file for load testing ## Test Plan bash apply.sh kubectl port-forward service/locust-web-ui 8089:8089 Go to localhost:8089 to start a load test <img width="1392" height="334" alt="image" src="https://github.com/user-attachments/assets/d6aa3deb-583a-42ed-889b-751262b8e91c" /> <img width="1362" height="881" alt="image" src="https://github.com/user-attachments/assets/6a28b9b4-05e6-44e2-b504-07e60c12d35e" />	2025-08-13 10:58:22 -07:00
ehhuang	2f51273215	fix: huge speed boost (#3132 ) # What does this PR do? make llama stack fast again ## Test Plan	2025-08-13 09:51:35 -07:00
slekkala1	25e0553eed	chore: Change moderations api response to Provider returned categories (#3098 ) # What does this PR do? To be compliant with model policies for LLAMA, just return the categories as is from provider, we will lose the OAI compat in moderations api response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`	2025-08-13 09:47:35 -07:00
Ashwin Bharambe	a9081d87b9	feat(ci): update Recording workflow trigger and concurrency group	2025-08-13 09:36:13 -07:00
IAN MILLER	0950168f26	refactor: replace hardcoded status codes by httpx.codes (#3131 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to eliminate hardcoded status codes in server's responses and replace it by `httpx.codes` functionality for better consistency across the whole project and improvement in code readability. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run `./scripts/unit-tests.sh`	2025-08-13 08:43:41 -07:00
Kelly Brown	0cbd93c5cc	docs: Update blocks formatting in docs/source files (#3120 ) Description: The standard markdown [!NOTE] format is not supported on Sphinx generated documentation, replacing those instances. Also updating other Notes, Tips and Warning blocks throughout the source docs WIP: Working to update the provider code gen	2025-08-13 08:06:31 -07:00
IAN MILLER	c9b78602d3	refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to make the behavior DELETE API endpoints be consistent with standard RESTful conventions and eliminate confusion for API consumers. Old Behavior ``` HTTP Status: 200 OK Response Body: null ``` Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` `INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry: 15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield HTTP/1.1" 200 ` Updated Behavior ``` HTTP Status: 204 No Content Response Body: empty (no body) ``` Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry: 15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield HTTP/1.1" 204 ` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3090 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run `./scripts/unit-tests.sh`	2025-08-13 07:56:26 -07:00
Francisco Arceo	92aca434a7	fix: Fix list_sessions() (#3114 ) # What does this PR do? 1. Updates `AgentPersistence.list_sessions()` to properly filter out `Turn` keys from `Session` keys. 2. Adds a suite of unit tests to confirm the `list_sessions()` behavior and tests the failed sample in https://github.com/meta-llama/llama-stack/issues/3048 ## Fixes https://github.com/meta-llama/llama-stack/issues/3048 ## Test Plan Unit tests added. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-13 07:46:26 -07:00
Krzysztof Malczuk	5bd6cb52fb	fix: github action canceling valid tasks for checking semantic pr title (#3127 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR changes the group name from github.ref to github.even.pull_request_number. The reason for this is that github.ref does not act as a unique identifier in the pull_request_target event and only is unique in pull_request. The github action was getting canceled was because the group name was not unique in the concurrency section. <!-- If resolving an issue, uncomment and update the line below --> Closes #3102 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> To test this I have created a fake github action and ran it trough act to see what the github.ref variable produced and what alternatives can be used. This confirmed that the github.ref was not unique and that github.event.pull_request_number is unique to the PR.	2025-08-13 07:14:03 -07:00
Chacksu	fffdab4f5c	fix: Dell distribution missing kvstore (#3113 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Integration Tests (Replay) / discover-tests (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details Test Llama Stack Build / build (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 37s Details Pre-commit / pre-commit (push) Successful in 1m44s Details # What does this PR do? - Added kvstore config to ChromaDB provider config for Dell distribution similar to [starter config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112) - Fixed [error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425) getting endpoint information by adding `hf-inference` as the provider to the `AsyncInferenceClient` (TGI client). ## Test Plan ``` export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=8000 export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 export HF_TOKEN=[redacted] # TGI Server docker run --rm -it \ --pull always \ --network host \ -v $HOME/.cache/huggingface:/data \ -e HF_TOKEN=$HF_TOKEN \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -p $INFERENCE_PORT:$INFERENCE_PORT \ --gpus all \ ghcr.io/huggingface/text-generation-inference:latest \ --dtype float16 \ --usage-stats off \ --sharded false \ --cuda-memory-fraction 0.8 \ --model-id meta-llama/Llama-3.2-3B-Instruct \ --port $INFERENCE_PORT \ --hostname 0.0.0.0 # Chrome DB docker run --rm -it \ --name chromadb \ --net=host -p 8000:8000 \ -v ~/chroma:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ -e ANONYMIZED_TELEMETRY=FALSE \ chromadb/chroma:latest # Llama Stack llama stack run dell \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL ``` --------- Co-authored-by: Connor Hack <connorhack@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-08-13 06:18:25 -07:00
Kelly Brown	6358d0a478	docs: reorganize contributor guide (#3110 ) Some checks failed Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 11s Details Pre-commit / pre-commit (push) Successful in 1m48s Details Description: Restructures contribution guide and move some sections into categories <img width="1399" height="527" alt="Screenshot 2025-08-12 at 9 28 44 AM" src="https://github.com/user-attachments/assets/404e23b4-0001-4174-b662-593e0173ef7d" />	2025-08-12 16:17:03 -07:00
Ashwin Bharambe	3d90117891	chore(tests): fix responses and vector_io tests (#3119 ) Some fixes to MCP tests. And a bunch of fixes for Vector providers. I also enabled a bunch of Vector IO tests to be used with `LlamaStackLibraryClient` ## Test Plan Run Responses tests with llama stack library client: ``` pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` Do the same with `-k openai_client` The rest should be taken care of by CI.	2025-08-12 16:15:53 -07:00
Ashwin Bharambe	1721aafc1f	feat(responses): type file results properly (#3117 ) Some checks failed Python Package Build Test / build (3.13) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 15s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s Details Test Llama Stack Build / build (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 1m16s Details Another thing our tests implicitly depended on.	2025-08-12 10:39:09 -07:00
Ashwin Bharambe	4fec49dfdb	feat(responses): add include parameter (#3115 ) Well our Responses tests use it so we better include it in the API, no? I discovered it because I want to make sure `llama-stack-client` can be used always instead of `openai-python` as the client (we do want to be _truly_ compatible.)	2025-08-12 10:24:01 -07:00
Nathan Weinberg	6812aa1e1e	chore: bump min python version in docs and tests (#3103 ) # What does this PR do? the minimum python version for the project was bumped to 3.12 a couple months ago, but there remains some artifacts in the repo suggesting we support >=3.10 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-12 08:52:57 -07:00
dependabot[bot]	88c4fdc5d7	chore(python-deps): bump chromadb from 1.0.15 to 1.0.16 (#3083 ) Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.15 to 1.0.16. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/chroma-core/chroma/releases">chromadb's releases</a>.</em></p> <blockquote> <h2>1.0.16</h2> <p>Version: <code>1.0.16</code> Git ref: <code>refs/tags/1.0.16</code> Build Date: <code>2025-08-08T00:26</code> PIP Package: <code>chroma-1.0.16.tar.gz</code> Github Container Registry Image: <code>:1.0.16</code> DockerHub Image: <code>:1.0.16</code></p> <h2>What's Changed</h2> <ul> <li>[ENH]: add cache mount & tolerations to garbage collector template in Helm chart by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5016">chroma-core/chroma#5016</a></li> <li>[DOC] Fix docs typo by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5018">chroma-core/chroma#5018</a></li> <li>[CLN] Change GenericQuotaError from 429 to 422 by <a href="https://github.com/drewkim"><code>@drewkim</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5022">chroma-core/chroma#5022</a></li> <li>[CHORE] Fix type error in batch_utils by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5024">chroma-core/chroma#5024</a></li> <li>[ENH] Add block-level metrics by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/4801">chroma-core/chroma#4801</a></li> <li>[ENH]: return error on /add if embeddings are not provided by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5033">chroma-core/chroma#5033</a></li> <li>[DOC] Docs Polish 07/2025 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5032">chroma-core/chroma#5032</a></li> <li>[DOC] Flatten public txt files by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5040">chroma-core/chroma#5040</a></li> <li>[ENH]: require embeddings & require min embedding dimension on /add by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5037">chroma-core/chroma#5037</a></li> <li>[ENH] - Adds in dark mode support for hero image by <a href="https://github.com/tjkrusinskichroma"><code>@tjkrusinskichroma</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5042">chroma-core/chroma#5042</a></li> <li>[BLD] Use 8core runners for all our windows jobs by <a href="https://github.com/eculver"><code>@eculver</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5027">chroma-core/chroma#5027</a></li> <li>[TST] More benchmark queries for regex by <a href="https://github.com/Sicheng-Pan"><code>@Sicheng-Pan</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/4910">chroma-core/chroma#4910</a></li> <li>[BUG]: refactor otel/tracing initialization in the frontend to be independent of hosted entry point by <a href="https://github.com/c-gamble"><code>@c-gamble</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5028">chroma-core/chroma#5028</a></li> <li>[BUG] js client: handle 422 billing errors as QuotaExceeded instead of ChromaConnectionError by <a href="https://github.com/philipithomas"><code>@philipithomas</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5049">chroma-core/chroma#5049</a></li> <li>[BUG] RLS should use 32MB GRPC payload size limit by <a href="https://github.com/Sicheng-Pan"><code>@Sicheng-Pan</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5044">chroma-core/chroma#5044</a></li> <li>[BUG] Sync protoc arch and version in dockerfile by <a href="https://github.com/Sicheng-Pan"><code>@Sicheng-Pan</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5045">chroma-core/chroma#5045</a></li> <li>[BLD] Fix windows runner label by <a href="https://github.com/eculver"><code>@eculver</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5052">chroma-core/chroma#5052</a></li> <li>[PERF]: Prefetch segments in get and query by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5053">chroma-core/chroma#5053</a></li> <li>[PERF]: Parallelize fetching blocks for brute force regex by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5051">chroma-core/chroma#5051</a></li> <li>[RELEASE] JS 3.0.7 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5059">chroma-core/chroma#5059</a></li> <li>[ENH] Add a delete_many call to the storage API. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5020">chroma-core/chroma#5020</a></li> <li>[ENH] Consume delete_many from the wal3 garbage collector. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5021">chroma-core/chroma#5021</a></li> <li>[ENH]: limit number of concurrent get_all_block_ids() when using buffer_unordered() by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5062">chroma-core/chroma#5062</a></li> <li>[ENH]: use new <code>delete_many()</code> storage method in DeleteUnusedFiles operator by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5061">chroma-core/chroma#5061</a></li> <li>[BUG]: Disable aws stalled stream protection by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5063">chroma-core/chroma#5063</a></li> <li>[DOC] Update manage collections docs with correct delete collection info by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5066">chroma-core/chroma#5066</a></li> <li>[BUG] Improve wal3 robustness with better shutdown handling and error recovery by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5046">chroma-core/chroma#5046</a></li> <li>[ENH] Do not do any mutations of the manifest from within GC. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5050">chroma-core/chroma#5050</a></li> <li>[CHORE]: enable change notifier otel/tracing by <a href="https://github.com/c-gamble"><code>@c-gamble</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5073">chroma-core/chroma#5073</a></li> <li>[CHORE] Add pprof server to query service by <a href="https://github.com/eculver"><code>@eculver</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5072">chroma-core/chroma#5072</a></li> <li>[ENH]: Dedup inserts to the same key in foyer by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5074">chroma-core/chroma#5074</a></li> <li>[ENH] "Failed to fetch: status: NotFound" be gone. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5064">chroma-core/chroma#5064</a></li> <li>[CLN] Remove the the top most spammy log lines from rls/wal3. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5071">chroma-core/chroma#5071</a></li> <li>[DOC] Fix badge in readme by <a href="https://github.com/kylediaz"><code>@kylediaz</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5025">chroma-core/chroma#5025</a></li> <li>[ENH] A tool for patching logs that were deleted before a new manifest was installed. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5083">chroma-core/chroma#5083</a></li> <li>[BUG] Add billing errors to JS client by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5084">chroma-core/chroma#5084</a></li> <li>[CHORE]: Add s3 get metrics and pod name to tracing spans by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5086">chroma-core/chroma#5086</a></li> <li>[RELEASE] JS 3.0.8 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5087">chroma-core/chroma#5087</a></li> <li>[ENH] A tool to purge the cache. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5085">chroma-core/chroma#5085</a></li> <li>[DOC] Update PR template for migration and observability by <a href="https://github.com/HammadB"><code>@HammadB</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5089">chroma-core/chroma#5089</a></li> <li>[CHORE]: Fix s3 get metric name by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5091">chroma-core/chroma#5091</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`dff3a786db`"><code>dff3a78</code></a> [RELEASE] CLI 1.1.5, Python 1.0.16, JS 3.0.11 (<a href="https://redirect.github.com/chroma-core/chroma/issues/5227">#5227</a>)</li> <li><a href="`f60f932b8d`"><code>f60f932</code></a> [ENH]: Increase nprobe for smaller collections (<a href="https://redirect.github.com/chroma-core/chroma/issues/5226">#5226</a>)</li> <li><a href="`f593a43b5d`"><code>f593a43</code></a> [ENH] Add <code>InsertRecordSet</code> to JS client (<a href="https://redirect.github.com/chroma-core/chroma/issues/5225">#5225</a>)</li> <li><a href="`76a14c226a`"><code>76a14c2</code></a> [DOC] Made light/dark mode for Chroma logo (<a href="https://redirect.github.com/chroma-core/chroma/issues/5215">#5215</a>)</li> <li><a href="`d80817ede4`"><code>d80817e</code></a> [ENH]: Add more tracing in the filter path (<a href="https://redirect.github.com/chroma-core/chroma/issues/5219">#5219</a>)</li> <li><a href="`73abfdc51a`"><code>73abfdc</code></a> [ENH] Handle when the garbage doesn't overlap the manifest. (<a href="https://redirect.github.com/chroma-core/chroma/issues/5207">#5207</a>)</li> <li><a href="`fa392226ba`"><code>fa39222</code></a> [BUG] Revert accidentally commited code (<a href="https://redirect.github.com/chroma-core/chroma/issues/5205">#5205</a>)</li> <li><a href="`815c3ac561`"><code>815c3ac</code></a> [ENH]: Fix CI flake with adaptive nsearch (<a href="https://redirect.github.com/chroma-core/chroma/issues/5203">#5203</a>)</li> <li><a href="`ea66d6929c`"><code>ea66d69</code></a> [BUG] Switch to rust-tls (<a href="https://redirect.github.com/chroma-core/chroma/issues/5204">#5204</a>)</li> <li><a href="`04aeb22139`"><code>04aeb22</code></a> [ENH]: Calculate cache weight of block size instead of hardcoding (<a href="https://redirect.github.com/chroma-core/chroma/issues/5201">#5201</a>)</li> <li>Additional commits viewable in <a href="https://github.com/chroma-core/chroma/compare/1.0.15...1.0.16">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=chromadb&package-manager=uv&previous-version=1.0.15&new-version=1.0.16)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-12 08:44:39 -07:00
dependabot[bot]	393f3714b0	chore(python-deps): bump torch from 2.7.1 to 2.8.0 (#3082 ) Bumps [torch](https://github.com/pytorch/pytorch) from 2.7.1 to 2.8.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pytorch/pytorch/releases">torch's releases</a>.</em></p> <blockquote> <h1>PyTorch 2.8.0 Release Notes</h1> <ul> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#highlights">Highlights</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#backwards-incompatible-changes">Backwards Incompatible Changes</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#deprecations">Deprecations</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#new-features">New Features</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#improvements">Improvements</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#bug-fixes">Bug fixes</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#performance">Performance</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#documentation">Documentation</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#developers">Developers</a></li> </ul> <h1>Highlights</h1> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`ba56102387`"><code>ba56102</code></a> Cherrypick: Add the RunLLM widget to the website (<a href="https://redirect.github.com/pytorch/pytorch/issues/159592">#159592</a>)</li> <li><a href="`c525a02c89`"><code>c525a02</code></a> [dynamo, docs] cherry pick torch.compile programming model docs into 2.8 (<a href="https://redirect.github.com/pytorch/pytorch/issues/15">#15</a>...</li> <li><a href="`a1cb3cc05d`"><code>a1cb3cc</code></a> [Release Only] Remove nvshmem from list of preload libraries (<a href="https://redirect.github.com/pytorch/pytorch/issues/158925">#158925</a>)</li> <li><a href="`c76b2356bc`"><code>c76b235</code></a> Move out super large one off foreach_copy test (<a href="https://redirect.github.com/pytorch/pytorch/issues/158880">#158880</a>)</li> <li><a href="`20a0e225a0`"><code>20a0e22</code></a> Revert "[Dynamo] Allow inlining into AO quantization modules (<a href="https://redirect.github.com/pytorch/pytorch/issues/152934">#152934</a>)" (<a href="https://redirect.github.com/pytorch/pytorch/issues/158">#158</a>...</li> <li><a href="`9167ac8c75`"><code>9167ac8</code></a> [MPS] Switch Cholesky decomp to column wise (<a href="https://redirect.github.com/pytorch/pytorch/issues/158237">#158237</a>)</li> <li><a href="`5534685c62`"><code>5534685</code></a> [MPS] Reimplement <code>tri[ul]</code> as Metal shaders (<a href="https://redirect.github.com/pytorch/pytorch/issues/158867">#158867</a>)</li> <li><a href="`d19e08d74b`"><code>d19e08d</code></a> Cherry pick PR 158746 (<a href="https://redirect.github.com/pytorch/pytorch/issues/158801">#158801</a>)</li> <li><a href="`a6c044ab9a`"><code>a6c044a</code></a> [cherry-pick] Unify torch.tensor and torch.ops.aten.scalar_tensor behavior (#...</li> <li><a href="`620ebd0646`"><code>620ebd0</code></a> [Dynamo] Use proper sources for constructing dataclass defaults (<a href="https://redirect.github.com/pytorch/pytorch/issues/158689">#158689</a>)</li> <li>Additional commits viewable in <a href="https://github.com/pytorch/pytorch/compare/v2.7.1...v2.8.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torch&package-manager=uv&previous-version=2.7.1&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-12 08:44:24 -07:00
Matthew Farrellee	b70e2f1f09	fix(dep): update to openai >= 1.99.6 and use new Function location (#3087 ) # What does this PR do? closes #3072 ## Test Plan ci	2025-08-12 08:40:32 -07:00
Mustafa Elbehery	4a13ef45e9	fix: Implement missing `run_moderation` method in `PromptGuardSafetyImpl` (#3101 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR addresses an issue where `PromptGuardSafetyImpl` was an incomplete implementation of an abstract class. The class was missing the required run_moderation method from its parent interface. Currently, running `pre-commit` locally fails with the error below. ``` llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation" [abstract] Found 1 error in 1 file (checked 410 source files) ``` This PR fixes the issue as follows - Added the missing run_moderation method to PromptGuardSafetyImpl - Method raises NotImplementedError with appropriate message indicating this functionality is not implemented for PromptGuard - This allows the class to be properly instantiated while clearly indicating the limitation <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-12 08:32:52 -07:00
Nathan Weinberg	19123ca957	refactor: standardize InferenceRouter model handling (#2965 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Test External API and Providers / test-external (venv) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Unit Tests / unit-tests (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s Details Unit Tests / unit-tests (3.13) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s Details Pre-commit / pre-commit (push) Successful in 1m19s Details	2025-08-12 04:20:39 -06:00
Ashwin Bharambe	803114180b	chore(logging)!: use comma as a delimiter (#3095 ) Some checks failed Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Test Llama Stack Build / build-single-provider (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 11s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 29s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s Details Integration Tests (Replay) / discover-tests (push) Successful in 31s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 30s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 32s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 32s Details Pre-commit / pre-commit (push) Successful in 1m24s Details Test Llama Stack Build / build (push) Failing after 54s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 13s Details Using commas is much more shell-friendly. A semi-colon is a statement delimiter and must be escaped. This change is backwards incompatible but I imagine not many people are using this. I could be wrong. Looking for feedback.	2025-08-11 11:51:43 -07:00
Francisco Arceo	f7adf58b1b	docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093 ) # What does this PR do? - Adds documentation on how to contribute a Vector DB provider. - Updates the testing section to be a little friendlier to navigate. - Also added new shortcut for search so that `/` and `⌘ K` or `ctrl+K` trigger search <img width="1903" height="1346" alt="Screenshot 2025-08-11 at 10 10 12 AM" src="https://github.com/user-attachments/assets/6995b3b8-a2ab-4200-be72-c5b03a784a29" /> <img width="1915" height="1438" alt="Screenshot 2025-08-11 at 10 10 25 AM" src="https://github.com/user-attachments/assets/1f54d30e-5be1-4f27-b1e9-3c3537dcb8e9" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-11 11:11:09 -07:00
Mustafa Elbehery	b5b5f5b9ae	chore: add `mypy` prompt guard (#2678 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-11 08:40:40 -07:00
Francisco Arceo	7448a4a88c	chore: Updating UI Sidebar (#3081 ) # What does this PR do? This updates the sidebar to look a little more like other popular ones. <img width="1913" height="1352" alt="Screenshot 2025-08-08 at 11 25 31 PM" src="https://github.com/user-attachments/assets/00738412-1101-48ec-8864-cde4a8733ec1" /> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-11 07:39:52 -07:00
Matthew Farrellee	8faff92591	chore: remove redundant code in unregister_toolgroup (#3092 ) # What does this PR do? removes redundant code ## Test Plan ci	2025-08-11 07:38:54 -07:00
Eran Cohen	a4bad6c0b4	feat: Add Google Vertex AI inference provider support (#2841 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 39s Details Pre-commit / pre-commit (push) Successful in 1m37s Details # What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-11 08:22:04 -04:00
Francisco Arceo	78a59a4dbe	chore: Adding GitHub Stars, trends, and contributor shout out to README (#3079 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / discover-tests (push) Successful in 6s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 50s Details Unit Tests / unit-tests (3.13) (push) Failing after 48s Details Pre-commit / pre-commit (push) Successful in 1m54s Details # What does this PR do? Updates READMe to add 1. GitHub badge highlighting Llama Stack as #1 Repo of the Day 2. GitHub Star History (cumulative stars chart) 3. Contributor shout out <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-10 21:11:14 -04:00
Varsha	69dc789e15	docs: Add unsupported search mode info about FAISS (#3089 )	2025-08-10 17:34:34 -06:00
Varsha	ce72a28525	docs: Update doc on search modes for Milvus (#3078 ) # What does this PR do? Update Milvus doc on using search modes. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-10 18:48:36 -04:00
Vlastimil Eliáš	1677d6bffd	feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.13) (push) Failing after 59s Details Pre-commit / pre-commit (push) Successful in 1m41s Details PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider Closes #3046 ## Test Plan I was not able to locate any existing test for this provider, so I performed manual testing. But the change is really trivial and straightforward.	2025-08-08 13:48:15 -07:00
ehhuang	0b5a794c27	fix: telemetry logger spams when queue is full (#3070 ) # What does this PR do? ## Test Plan Ran a stress test on chat completion endpoint locally: For 10 concurrent users over 3 minutes: Before: <img width="1440" height="201" alt="image" src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6" /> After: <img width="1434" height="204" alt="image" src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866" /> (Will send scripts in a future PR.)	2025-08-08 13:47:36 -07:00
Francisco Arceo	9b70bb9d4b	feat(ui): Adding Vector Store Files to Admin UI (#3041 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 57s Details Unit Tests / unit-tests (3.12) (push) Failing after 55s Details Pre-commit / pre-commit (push) Successful in 2m10s Details # What does this PR do? This PR updates the UI to create new: 1. `/files/{file_id}` 2. `files/{file_id}/contents` 3. `files/{file_id}/contents/{content_id}` The list of files are clickable which brings the user to the FIles Detail page The File Details page shows all of the content The content details page shows the individual chunk/content parsed These only use our existing OpenAI compatible APIs. I have a separate branch where I expose the embedding and the portal is correctly populated. I included the FE rendering code for that in this PR. 1. `vector-stores/{vector_store_id}/files/{file_id}` <img width="1913" height="1351" alt="Screenshot 2025-08-06 at 10 20 12 PM" src="https://github.com/user-attachments/assets/08010d5e-60c8-4bd9-9f3e-a2731ed1ad55" /> 2. `vector-stores/{vector_store_id}/files/{file_id}/contents` <img width="1920" height="1272" alt="Screenshot 2025-08-06 at 10 21 23 PM" src="https://github.com/user-attachments/assets/3b91e67b-5d64-4fe6-91b6-18f14587e850" /> 3. `vector-stores/{vector_store_id}/files/{file_id}/contents/{content_id}` <img width="1916" height="1273" alt="Screenshot 2025-08-06 at 10 21 45 PM" src="https://github.com/user-attachments/assets/d38ca996-e8d9-460c-9e39-7ff0cb5ec0dd" /> ## Test Plan I tested this locally and reviewed the code. I generated a significant share of the code with Claude and some manual intervention. After this, I'll begin adding tests to the UI. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-08 07:44:06 -07:00
Jiayi Ni	9e78f2da96	docs: fix the docs for NVIDIA Inference Provider (#3055 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Test External API and Providers / test-external (venv) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s Details Python Package Build Test / build (3.12) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 21s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s Details Pre-commit / pre-commit (push) Successful in 1m40s Details Test Llama Stack Build / build (push) Failing after 14s Details # What does this PR do? Fix the NVIDIA inference docs by updating API methods, model IDs, and embedding example. ## Test Plan N/A	2025-08-08 11:27:55 +02:00
Ashwin Bharambe	e90fe25890	fix(tests): move llama stack client init back to fixture (#3071 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 50s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 54s Details Unit Tests / unit-tests (3.13) (push) Failing after 47s Details Pre-commit / pre-commit (push) Successful in 1m44s Details See inline comments	2025-08-07 15:29:53 -07:00
Ashwin Bharambe	5f1ddd35e4	chore(tests): refactor and move responses tests away from verifications (#3068 ) This PR kills the verifications infrastructure which is no longer used. It was relocated to the `llama-stack-evals` (https://github.com/meta-llama/llama-stack-evals) repository previously. Responses tests used this infrastructure but that wasn't quite necessary, just a little useful back when @bbrownin introduced the tests. On Discord, we agreed that tests can be moved to our regular integrations test infra. ## Test Plan Some tests currently do fail (although they run!) I will send a follow-up PR which makes them all pass.	2025-08-07 13:48:16 -07:00
Dean Wampler	342550c1e2	docs: Added comment about a known limitation of AgentEventLogger (#2930 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / discover-tests (push) Successful in 7s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s Details Pre-commit / pre-commit (push) Successful in 1m11s Details # What does this PR do? `AgentEventLogger` only supports streaming responses, so I suggest adding a comment near the bottom of `demo_script.py` letting the user know this, e.g., if they change the `stream` value to `False` in the call to `create_turn`, they need to comment out the logging lines. See https://github.com/llamastack/llama-stack-client-python/issues/15 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Dean Wampler <dean.wampler@ibm.com>	2025-08-07 10:09:57 -07:00
Varsha	e3928e6a29	feat: Implement hybrid search in Milvus (#2644 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s Details Test External API and Providers / test-external (venv) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? This PR implements hybrid search for Milvus DB based on the inbuilt milvus support. To test: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=long --disable-warnings --asyncio-mode=auto ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-07 09:42:03 +02:00
Nathan Weinberg	5a2d323eca	docs: add use of custom exceptions to code style guide (#3049 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Details Integration Tests (Replay) / discover-tests (push) Successful in 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s Details Test External API and Providers / test-external (venv) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s Details Unit Tests / unit-tests (3.12) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 1m3s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m5s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 48s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m0s Details Pre-commit / pre-commit (push) Successful in 1m55s Details # What does this PR do? Adds a blurb to the `CONTRIBUTING.md` encouraging the use of the standardized custom exception classes for resources where applicable Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-06 14:12:08 -07:00
slekkala1	26d3d25c87	feat: Add moderations create api (#3020 ) # What does this PR do? This PR adds Open AI Compatible moderations api. Currently only implementing for llama guard safety provider Image support, expand to other safety providers and Deprecation of run_shield will be next steps. ## Test Plan Added 2 new tests for safe/ unsafe text prompt examples for the new open ai compatible moderations api usage `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` (Had some issue with previous PR https://github.com/meta-llama/llama-stack/pull/2994 while updating and accidentally close it , reopened new one )	2025-08-06 13:51:23 -07:00
Charlie Doern	0caef40e0d	fix: telemetry fixes (inference and core telemetry) (#2733 ) # What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-06 13:37:40 -07:00
Ashwin Bharambe	c252dfa3ef	fix(ci): allow tests to skip llama stack client instantiation (#3052 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s Details Pre-commit / pre-commit (push) Successful in 1m16s Details Test Llama Stack Build / build (push) Failing after 8s Details	2025-08-06 11:15:41 -07:00
IAN MILLER	8ba04205ac	docs: remove pure venv references (#3047 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Remove pure venv (without uv) references in docs <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-06 10:42:34 -07:00
Nathan Weinberg	e9fced773a	refactor: introduce common 'ResourceNotFoundError' exception (#3032 ) # What does this PR do? 1. Introduce new base custom exception class `ResourceNotFoundError` 2. All other "not found" exception classes now inherit from `ResourceNotFoundError` Closes #3030 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-06 10:22:55 -07:00
Ashwin Bharambe	dfce05d0c5	fix(docs): update llama stack build CLI doc (#3050 )	2025-08-06 09:32:09 -07:00
ehhuang	3e695cf320	chore: update postgres_demo with new config (#3045 ) # What does this PR do? closes https://github.com/meta-llama/llama-stack/issues/3044 ## Test Plan matches starter's template	2025-08-06 07:48:40 -07:00
Mohamed Rebai	7eff1bb3ec	ci(pre-commit): enforce presence of 'upload-time' field in uv.lock (#2920 ) # What does this PR do? This PR adds a minimum version `0.7.0` to the project. The diff issue happens because an `upload-time` field in the `uv.lock` file did not exist in older uv versions (pre `0.6.15`). This effectively prevents large diffs in PRs from devs that use older versions of uv. Closes #2887 --------- Co-authored-by: Charlie Doern <charlie@doern.me>	2025-08-06 07:46:59 -07:00
Ashwin Bharambe	7f834339ba	chore(misc): make tests and starter faster (#3042 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 53s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s Details Pre-commit / pre-commit (push) Successful in 1m53s Details A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client`	2025-08-05 14:55:05 -07:00
IAN MILLER	e12524af85	feat: create unregister shield API endpoint in Llama Stack (#2853 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Integration Tests (Replay) / discover-tests (push) Successful in 13s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 2m21s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Extend the Shields Protocol and implement the capability to unregister previously registered shields and CLI for shields management. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2581 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> First of, test API for shields 1. Install and start Ollama: `ollama serve` 2. Pull Llama Guard Model in Ollama: `ollama pull llama-guard3:8b` 3. Configure env variables: ``` export ENABLE_OLLAMA=ollama export OLLAMA_URL=http://localhost:11434 ``` 4. Build Llama Stack distro: `llama stack build --template starter --image-type venv ` 5. Start Llama Stack server: `llama stack run starter --port 8321` 6. Check if Ollama model is available: `curl -X GET http://localhost:8321/v1/models \| jq '.data[] \| select(.provider_id=="ollama")'` 7. Register a new Shield using Ollama provider: ``` curl -X POST http://localhost:8321/v1/shields \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "provider_id": "llama-guard", "provider_shield_id": "ollama/llama-guard3:8b", "params": {} }' ``` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 8. Check if shield was registered: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 9. Run shield: ``` curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "messages": [ { "role": "user", "content": "How can I hack into someone computer?" } ], "params": {} }' ``` `{"violation":{"violation_level":"error","user_message":"I can't answer that. Can I help with something else?","metadata":{"violation_type":"S2"}}}% ` 10. Unregister shield: `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` 11. Verify shield was deleted: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"detail":"Invalid value: Shield 'test-shield' not found"}%` All tests passed ✅ ``` ========================================================================== 430 passed, 194 warnings in 19.54s ========================================================================== /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited loop.close() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Wrote HTML report to htmlcov-3.12/index.html ```	2025-08-05 07:33:46 -07:00
github-actions[bot]	e565b91182	build: Bump version to 0.2.17 Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 7s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m38s Details	2025-08-05 01:43:30 +00:00
Ashwin Bharambe	ea46f74092	fix: rectify typo in MANIFEST.in due to #2975	2025-08-04 18:22:49 -07:00
ehhuang	bb6b6041d6	chore: fix: integration tests failures marked as successful (#3039 )	2025-08-04 17:06:28 -07:00
Francisco Arceo	eac1e0c7d4	chore: Fixing Markdown renderer (#3038 )	2025-08-04 14:16:09 -07:00
Nathan Weinberg	68b0071861	chore: standardize session not found error (#3031 ) # What does this PR do? 1. Creates a new `SessionNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-04 13:12:02 -07:00
Nathan Weinberg	05cfa213b6	chore: standardize tool group not found error (#2986 ) # What does this PR do? 1. Creates a new `ToolGroupNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-08-04 11:41:33 -07:00
dependabot[bot]	55a2694c80	chore(python-deps): bump openai from 1.97.1 to 1.98.0 (#3025 ) Bumps [openai](https://github.com/openai/openai-python) from 1.97.1 to 1.98.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/openai/openai-python/releases">openai's releases</a>.</em></p> <blockquote> <h2>v1.98.0</h2> <h2>1.98.0 (2025-07-30)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v1.97.2...v1.98.0">v1.97.2...v1.98.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> manual updates (<a href="`88a8036c5e`">88a8036</a>)</li> </ul> <h2>v1.97.2</h2> <h2>1.97.2 (2025-07-30)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v1.97.1...v1.97.2">v1.97.1...v1.97.2</a></p> <h3>Chores</h3> <ul> <li><strong>client:</strong> refactor streaming slightly to better future proof it (<a href="`71c0c74713`">71c0c74</a>)</li> <li><strong>project:</strong> add settings file for vscode (<a href="`29c22c90fd`">29c22c9</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's changelog</a>.</em></p> <blockquote> <h2>1.98.0 (2025-07-30)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v1.97.2...v1.98.0">v1.97.2...v1.98.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> manual updates (<a href="`88a8036c5e`">88a8036</a>)</li> </ul> <h2>1.97.2 (2025-07-30)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v1.97.1...v1.97.2">v1.97.1...v1.97.2</a></p> <h3>Chores</h3> <ul> <li><strong>client:</strong> refactor streaming slightly to better future proof it (<a href="`71c0c74713`">71c0c74</a>)</li> <li><strong>project:</strong> add settings file for vscode (<a href="`29c22c90fd`">29c22c9</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`a3315d9fcc`"><code>a3315d9</code></a> release: 1.98.0 (<a href="https://redirect.github.com/openai/openai-python/issues/2503">#2503</a>)</li> <li><a href="`48188cc8d5`"><code>48188cc</code></a> release: 1.97.2 (<a href="https://redirect.github.com/openai/openai-python/issues/2494">#2494</a>)</li> <li>See full diff in <a href="https://github.com/openai/openai-python/compare/v1.97.1...v1.98.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.97.1&new-version=1.98.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-04 11:40:56 -07:00
Ashwin Bharambe	cc87995e2b	chore: rename templates to distributions (#3035 ) As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change.	2025-08-04 11:34:17 -07:00
dependabot[bot]	12f964437a	chore(python-deps): bump opentelemetry-exporter-otlp-proto-http from 1.35.0 to 1.36.0 (#3027 ) Some checks failed Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Test Llama Stack Build / build-single-provider (push) Failing after 19s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 28s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 34s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 31s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Has started running Details Test Llama Stack Build / build (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m46s Details Bumps [opentelemetry-exporter-otlp-proto-http](https://github.com/open-telemetry/opentelemetry-python) from 1.35.0 to 1.36.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/open-telemetry/opentelemetry-python/blob/main/CHANGELOG.md">opentelemetry-exporter-otlp-proto-http's changelog</a>.</em></p> <blockquote> <h2>Version 1.36.0/0.57b0 (2025-07-29)</h2> <ul> <li> <p>Add missing Prometheus exporter documentation (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4485">#4485</a>)</p> </li> <li> <p>Overwrite logging.config.fileConfig and logging.config.dictConfig to ensure the OTLP <code>LogHandler</code> remains attached to the root logger. Fix a bug that can cause a deadlock to occur over <code>logging._lock</code> in some cases (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4636">#4636</a>).</p> </li> <li> <p>otlp-http-exporter: set default value for param <code>timeout_sec</code> in <code>_export</code> method (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4691">#4691</a>)</p> </li> <li> <p>Update OTLP gRPC/HTTP exporters: calling shutdown will now interrupt exporters that are sleeping before a retry attempt, and cause them to return failure immediately. Update BatchSpan/LogRecordProcessors: shutdown will now complete after 30 seconds of trying to finish exporting any buffered telemetry, instead of continuing to export until all telemetry was exported. (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/pull/4638">#4638</a>).</p> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`1aaa2a2587`"><code>1aaa2a2</code></a> Prepare release 1.36.0/0.57b0 (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4704">#4704</a>)</li> <li><a href="`f9ca4755af`"><code>f9ca475</code></a> Use <code>@pytest.mark.flaky</code> decorator instead of <code>@flaky.flaky</code> (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4700">#4700</a>)</li> <li><a href="`eb1a4c574c`"><code>eb1a4c5</code></a> otlp-http-exporter: set default value for param <code>timeout_sec</code> in <code>_export</code> me...</li> <li><a href="`23aad5e4ad`"><code>23aad5e</code></a> Add permissions that were missed on the first pass (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4692">#4692</a>)</li> <li><a href="`344c647774`"><code>344c647</code></a> Add minimum token permissions for all github workflow files (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4663">#4663</a>)</li> <li><a href="`ff9dc82d3a`"><code>ff9dc82</code></a> Migrate from opentelemetrybot to otelbot (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4685">#4685</a>)</li> <li><a href="`d4e606846e`"><code>d4e6068</code></a> Interrupt exporter retry backoff sleeps when shutdown is called. Update Batch...</li> <li><a href="`a28b0cadce`"><code>a28b0ca</code></a> Fix broken link in Prometheus exporter README. Fixes <a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4399">#4399</a> (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4485">#4485</a>)</li> <li><a href="`9746645818`"><code>9746645</code></a> Introducing tox-uv (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4516">#4516</a>)</li> <li><a href="`57cb935e88`"><code>57cb935</code></a> Fix issue where deadlock can occur over logging._lock (<a href="https://redirect.github.com/open-telemetry/opentelemetry-python/issues/4636">#4636</a>)</li> <li>Additional commits viewable in <a href="https://github.com/open-telemetry/opentelemetry-python/compare/v1.35.0...v1.36.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=opentelemetry-exporter-otlp-proto-http&package-manager=uv&previous-version=1.35.0&new-version=1.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-04 09:37:58 -07:00
dependabot[bot]	48b49e318f	chore(python-deps): bump weaviate-client from 4.16.4 to 4.16.5 (#3026 ) [//]: # (dependabot-start) ⚠️ Dependabot is rebasing this PR ⚠️ Rebasing might not happen immediately, so don't worry if this takes some time. Note: if you make any changes to this PR yourself, they will take precedence over the rebase. --- [//]: # (dependabot-end) Bumps [weaviate-client](https://github.com/weaviate/weaviate-python-client) from 4.16.4 to 4.16.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's releases</a>.</em></p> <blockquote> <h2>v3.13.0 - Support for Weaviate v1.18</h2> <h2>What's Changed</h2> <ul> <li>Extend CRUD operations for single data objects and reference with consistency level by <a href="https://github.com/redouan-rhazouani"><code>@redouan-rhazouani</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li> <li>Extend batch operations with consistency level by <a href="https://github.com/redouan-rhazouani"><code>@redouan-rhazouani</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/240">weaviate/weaviate-python-client#240</a></li> <li>Add Cursor api by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/241">weaviate/weaviate-python-client#241</a></li> <li>Add support for backup Azure module by <a href="https://github.com/antas-marcin"><code>@antas-marcin</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/246">weaviate/weaviate-python-client#246</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/redouan-rhazouani"><code>@redouan-rhazouani</code></a> made their first contribution in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li> <li><a href="https://github.com/antas-marcin"><code>@antas-marcin</code></a> made their first contribution in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/246">weaviate/weaviate-python-client#246</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.13.0">https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.13.0</a></p> <h2>v3.12.1b - Support for weaviate v1.18</h2> <h2>What's Changed</h2> <ul> <li>Extend CRUD operations for single data objects and reference with consistency level by <a href="https://github.com/redouan-rhazouani"><code>@redouan-rhazouani</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li> <li>Extend batch operations with consistency level by <a href="https://github.com/redouan-rhazouani"><code>@redouan-rhazouani</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/240">weaviate/weaviate-python-client#240</a></li> <li>Add Cursor api by <a href="https://github.com/dirkkul"><code>@dirkkul</code></a> in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/241">weaviate/weaviate-python-client#241</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/redouan-rhazouani"><code>@redouan-rhazouani</code></a> made their first contribution in <a href="https://redirect.github.com/weaviate/weaviate-python-client/pull/234">weaviate/weaviate-python-client#234</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.12.1b">https://github.com/weaviate/weaviate-python-client/compare/v3.12.0...v3.12.1b</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's changelog</a>.</em></p> <blockquote> <h2>Version 4.16.5</h2> <p>This patch version includes: - Add <code>dimensions</code> property to Google vectorizers in <code>Configure.Vectors</code></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`731cbf0b9a`"><code>731cbf0</code></a> Update changelog (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1768">#1768</a>)</li> <li><a href="`2627bf39c1`"><code>2627bf3</code></a> Bump ruff from 0.12.4 to 0.12.5 (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1761">#1761</a>)</li> <li><a href="`401a1e2ff0`"><code>401a1e2</code></a> Bump coverage from 7.9.2 to 7.10.1 (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1760">#1760</a>)</li> <li><a href="`44aef22189`"><code>44aef22</code></a> Bump authlib from 1.6.0 to 1.6.1 (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1749">#1749</a>)</li> <li><a href="`dca002e39e`"><code>dca002e</code></a> Add <code>dimensions</code> property to Google vectorizers in <code>Configure.Vectors</code> (<a href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1767">#1767</a>)</li> <li>See full diff in <a href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.4...v4.16.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.4&new-version=4.16.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-04 09:37:31 -07:00
Matthew Farrellee	4411e6e362	chore(ci): remove reportlab dep (#3033 ) # What does this PR do? remove reportlab dep. change dynamic pdf generation into a pre-computed pdf. ## Test Plan ci	2025-08-04 09:36:13 -07:00
Eran Cohen	e5b542dd8e	feat: switch to async completion in LiteLLM OpenAI mixin (#3029 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Test External API and Providers / test-external (venv) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m10s Details	2025-08-03 12:08:56 -07:00
Varsha	dbfc15123e	test: Implement vector store search test (#3001 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s Details Test Llama Stack Build / build-single-provider (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 45s Details Update ReadTheDocs / update-readthedocs (push) Failing after 35s Details Pre-commit / pre-commit (push) Successful in 1m30s Details # What does this PR do? Implement vector store search test <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan ``` pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-02 15:57:38 -07:00
Varsha	3c2aee610d	refactor: Remove double filtering based on score threshold (#3019 ) # What does this PR do? Remove score_threshold based check from `OpenAIVectorStoreMixin` Closes: https://github.com/meta-llama/llama-stack/issues/3018 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-02 15:57:03 -07:00
ehhuang	1e3b5aa9b8	chore: CI action names (#3014 ) # What does this PR do? ## Test Plan CI <img width="795" height="162" alt="image" src="https://github.com/user-attachments/assets/78dedfa6-809c-4d82-9eb3-6479234dd657" />	2025-08-02 15:56:42 -07:00
dependabot[bot]	edc19698fb	chore(python-deps): bump huggingface-hub from 0.34.2 to 0.34.3 (#3028 ) Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub) from 0.34.2 to 0.34.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's releases</a>.</em></p> <blockquote> <h2>[v0.34.3] Jobs improvements and <code>whoami</code> user prefix</h2> <ul> <li>[Jobs] Update uv image <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3270">#3270</a> by <a href="https://github.com/lhoestq"><code>@lhoestq</code></a></li> <li>[Update] HF Jobs Documentation <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3268">#3268</a> by <a href="https://github.com/ariG23498"><code>@ariG23498</code></a></li> <li>Add 'user:' prefix to whoami command output <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3267">#3267</a> by <a href="https://github.com/gary149"><code>@gary149</code></a></li> </ul> <p>Full Changelog: <a href="https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3">https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`0bbc5e1b10`"><code>0bbc5e1</code></a> Release: v0.34.3</li> <li><a href="`f464fc15f3`"><code>f464fc1</code></a> update uv image (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3270">#3270</a>)</li> <li><a href="`24c77eb319`"><code>24c77eb</code></a> [Update] HF Jobs Documentation (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3268">#3268</a>)</li> <li><a href="`977c018e3d`"><code>977c018</code></a> Add 'user:' prefix to whoami command output for consistency (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3267">#3267</a>)</li> <li>See full diff in <a href="https://github.com/huggingface/huggingface_hub/compare/v0.34.2...v0.34.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=huggingface-hub&package-manager=uv&previous-version=0.34.2&new-version=0.34.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-02 15:53:46 -07:00
IAN MILLER	a749d5f4a4	refactor: remove Conda support from Llama Stack (#2969 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for removal of Conda support in Llama Stack <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2539 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-02 15:52:59 -07:00
ehhuang	f2eee4e417	chore: create integration-tests script (#3016 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s Details Python Package Build Test / build (3.13) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 23s Details Test External API and Providers / test-external (venv) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 36s Details Unit Tests / unit-tests (3.12) (push) Failing after 27s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 40s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 37s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 44s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 43s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 49s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 44s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 46s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s Details Pre-commit / pre-commit (push) Successful in 2m22s Details	2025-08-01 17:38:49 -07:00
ehhuang	6ac710f3b0	fix(recording): endpoint resolution (#3013 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Python Package Build Test / build (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Unit Tests / unit-tests (3.12) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 56s Details Unit Tests / unit-tests (3.13) (push) Failing after 52s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s Details Pre-commit / pre-commit (push) Successful in 1m49s Details # What does this PR do? ## Test Plan	2025-08-01 16:23:54 -07:00
Matthew Farrellee	140ee7d337	fix: sambanova inference provider (#2996 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m29s Details # What does this PR do? closes #2995 update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin ## Test Plan ``` $ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct ... ======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ======================== ```	2025-08-01 09:09:14 -07:00
Francisco Arceo	0527c0fb15	chore: Update README for supported DBs (#3005 ) # What does this PR do? Update README for supported DBs <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-08-01 08:23:36 -07:00
Varsha	1f0766308d	feat: Add openAI compatible APIs to Qdrant (#2465 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Integration Tests (Replay) / discover-tests (push) Successful in 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Details Test External API and Providers / test-external (venv) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s Details Test Llama Stack Build / build (push) Failing after 32s Details Pre-commit / pre-commit (push) Successful in 2m39s Details # What does this PR do? Adds support to Vector store Open AI APIs in Qdrant. <!-- If resolving an issue, uncomment and update the line below --> Closes #2463 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-01 00:41:34 -04:00
ehhuang	194abe7734	test: use llama stack build when starting server (#2999 ) # What does this PR do? This should be more robust as sometimes its run without running build first. ## Test Plan OLLAMA_URL=http://localhost:11434 LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings LLAMA_STACK_CONFIG=server:starter uv run --with pytest-repeat pytest tests/integration/telemetry --text-model="ollama/llama3.2:3b-instruct-fp16" -vvs	2025-07-31 21:09:14 -07:00
Ashwin Bharambe	0b08d64ddb	feat(ci): introduce workflow for re-recording inference outputs (#3002 )	2025-07-31 17:30:47 -07:00
Francisco Arceo	33cca26154	chore: Enabling Integration tests for Weaviate (#2882 ) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-31 20:29:50 -04:00
Ashwin Bharambe	369286f95b	fix(ci): syntax error in the disabled workflow Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s Details Python Package Build Test / build (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s Details Python Package Build Test / build (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s Details Test External API and Providers / test-external (venv) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 23s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Unit Tests / unit-tests (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 45s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 2m3s Details	2025-07-31 15:35:42 -07:00
Ashwin Bharambe	89ff93182c	feat(ci): only run on 3.12, run on both 3.12 and 3.13 nightly (#3000 ) We don't need to run on all python versions all the time	2025-07-31 15:32:05 -07:00
Ashwin Bharambe	f4489eeb83	fix(ci): simplify integration tests replay mode (#2997 ) We are going to split record and replay workflows completely to simplify the concurrency key design. We can add vision tests by just adding to our matrix.	2025-07-31 15:18:18 -07:00
Matthew Farrellee	218c89fff1	feat: Add clear error message when API key is missing (#2992 ) # What does this PR do? Improve user experience by providing specific guidance when no API key is available, showing both provider data header and config options with the correct field name for each provider. Also adds comprehensive test coverage for API key resolution scenarios. addresses #2990 for providers using litellm openai mixin ## Test Plan `./scripts/unit-tests.sh tests/unit/providers/inference/test_litellm_openai_mixin.py`	2025-07-31 16:33:16 -04:00
Ashwin Bharambe	22f79bdb9e	fix(ci): lets attempt another fix for concurrency	2025-07-31 13:22:24 -07:00
Ashwin Bharambe	18576349ca	fix(ci): simplified concurrency and job eligibility criteria	2025-07-31 13:11:04 -07:00
Ashwin Bharambe	d1b300ead9	fix(ci, nvidia): do not use module level pytest skip for now	2025-07-31 12:32:31 -07:00
Ashwin Bharambe	752fd3b1c1	fix(ci): use single quotes please	2025-07-31 11:56:25 -07:00
Ashwin Bharambe	5ba25efd54	fix(ci): ensure workflow runs when manually run or scheduled	2025-07-31 11:54:51 -07:00
Ashwin Bharambe	27d866795c	feat(ci): add support for running vision inference tests (#2972 ) This PR significantly refactors the Integration Tests workflow. The main goal behind the PR was to enable recording of vision tests which were never run as part of our CI ever before. During debugging, I ended up making several other changes refactoring and hopefully increasing the robustness of the workflow. After doing the experiments, I have updated the trigger event to be `pull_request_target` so this workflow can get write permissions by default but it will run with source code from the base (main) branch in the source repository only. If you do change the workflow, you'd need to experiment using the `workflow_dispatch` triggers. This should not be news to anyone using Github Actions (except me!) It is likely to be a little rocky though while I learn more about GitHub Actions, etc. Please be patient :) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-31 11:50:42 -07:00
Charlie Doern	709c974bd8	fix: integration tests not triggering on PR open (#2985 ) # What does this PR do? I realized that when a new PR is opened, the integration tests aren't triggering (or aren't always?) since the replay logic was introduced amend the concurrency logic a bit to trigger on opened PRs --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-31 11:36:44 -07:00
Nehanth Narendrula	b41d696e4f	fix: Post Training Model change in Tests in order to make it less intensive (#2991 ) # What does this PR do? Changed from` ibm-granite/granite-3.3-2b-instruct` to` HuggingFaceTB/SmolLM2-135M-Instruct` so it as not resource intensive in CI Idea came from - https://github.com/meta-llama/llama-stack/pull/2984#issuecomment-3140400830	2025-07-31 11:22:34 -07:00
Nathan Weinberg	ffb6306fbd	fix: remove redundant code from unregister_vector_db (#2983 ) get_vector_db() will raise an exception if a vector store won't be returned client handling is redundant Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-31 09:22:04 -07:00
Christian Zaccaria	ea8dd58144	chore: Remove coverage badge from README.md (#2976 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> It looks like the coverage badge is still present in the README. This PR removes it. For more context: https://github.com/meta-llama/llama-stack/pull/2950	2025-07-31 09:21:30 -07:00
Kelly Brown	8a6c0fb930	docs: Reformat external provider documentation (#2982 ) Description This PR adjusts the external providers documentation to align with the new providers format. Splits up sections into the existing external providers and how to create them as well. <img width="1049" height="478" alt="Screenshot 2025-07-31 at 9 48 26 AM" src="https://github.com/user-attachments/assets/f13599cb-2fd1-4e57-8ca9-27b067264e33" /> Open to feedback and adjusting titles	2025-07-31 09:21:13 -07:00
Nehanth Narendrula	3a574ef23c	fix: remove unused DPO parameters from schema and tests (#2988 ) # What does this PR do? I removed these DPO parameters from the schema in [this PR](https://github.com/meta-llama/llama-stack/pull/2804), but I may not have done it correctly, since they were reintroduced in [this commit](`cb7354a9ce (diff-4e9a8cb358213d6118c4b6ec2a76d0367af06441bf0717e13a775ade75e2061dR15081)`)—likely due to a pre-commit hook. I've made the changes again, and the pre-commit hook automatically updated the spec sheet.	2025-07-31 09:11:08 -07:00
Charlie Doern	5c33bc1353	fix: post_training ci (#2984 ) Some checks failed Integration Tests / discover-tests (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 4s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 25s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s Details Integration Tests / record-tests (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Python Package Build Test / build (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 28s Details Integration Tests / run-tests (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 31s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 29s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 45s Details Pre-commit / pre-commit (push) Successful in 1m30s Details	2025-07-31 08:26:06 -07:00
Nehanth Narendrula	cf73146132	feat: Enable DPO training with HuggingFace inline provider (#2825 ) Some checks failed Integration Tests / discover-tests (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s Details Integration Tests / record-tests (push) Has been skipped Details Integration Tests / run-tests (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 31s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 32s Details Test External API and Providers / test-external (venv) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 39s Details Update ReadTheDocs / update-readthedocs (push) Failing after 31s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 42s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 35s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 42s Details Unit Tests / unit-tests (3.12) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 45s Details Test Llama Stack Build / build (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m0s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m8s Details Pre-commit / pre-commit (push) Successful in 1m50s Details What does this PR do? This PR adds support for Direct Preference Optimization (DPO) training via the existing HuggingFace inline provider. It introduces a new DPO training recipe, config schema updates, dataset integration, and end-to-end testing to support preference-based fine-tuning with TRL. Test Plan Added integration test: tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize Ran tests on both CPU and CUDA environments --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-30 23:33:36 -07:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Francisco Arceo	f3d5459647	feat(UI): adding MVP playground UI (#2828 ) # What does this PR do? I've been tinkering a little with a simple chat playground in the UI, so I'm opening the PR with what's kind of a WIP. If you look at the first commit, that includes the big part of the changes. The rest of the files changed come from adding installing the `shadcn` components. Note this is missing a lot; e.g., - sessions - document upload - audio (the shadcn components install these by default from https://shadcn-chatbot-kit.vercel.app/docs/components/chat) I still need to wire up a lot more to make it actually fully functional but it does basic chat using the LS Typescript Client. Basic demo: <img width="1329" height="1430" alt="Image" src="https://github.com/user-attachments/assets/917a2096-36d4-4925-b83b-f1f2cda98698" /> <img width="1319" height="1424" alt="Image" src="https://github.com/user-attachments/assets/fab1583b-1c72-4bf3-baf2-405aee13c6bb" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-30 19:44:16 -07:00
Ashwin Bharambe	d6ae2b0f47	fix(ci): more correct concurrency key for workflows (#2973 ) See comment inline. We don't want a random label to pre-empt an existing workflow which had gone ahead.	2025-07-30 18:23:14 -07:00
Nathan Weinberg	406ca72957	fix: remove redundant code from unregister_dataset (#2971 ) Some checks failed Integration Tests / discover-tests (push) Has been skipped Details Integration Tests / record-tests (push) Has been skipped Details Integration Tests / run-tests (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 24s Details Python Package Build Test / build (3.13) (push) Failing after 53s Details Update ReadTheDocs / update-readthedocs (push) Failing after 52s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m0s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s Details Pre-commit / pre-commit (push) Successful in 1m44s Details get_dataset() will raise an exception if a dataset won't be returned client handling is redundant Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-30 16:40:01 -07:00
Sai Prashanth S	cb7354a9ce	docs: Add detailed docstrings to API models and update OpenAPI spec (#2889 ) This PR focuses on improving the developer experience by adding comprehensive docstrings to the API data models across the Llama Stack. These docstrings provide detailed explanations for each model and its fields, making the API easier to understand and use. Key changes: - Added Docstrings: Added reST formatted docstrings to Pydantic models in the `llama_stack/apis/` directory. This includes models for: - Agents (`agents.py`) - Benchmarks (`benchmarks.py`) - Datasets (`datasets.py`) - Inference (`inference.py`) - And many other API modules. - OpenAPI Spec Update: Regenerated the OpenAPI specification (`docs/_static/llama-stack-spec.yaml` and `docs/_static/llama-stack-spec.html`) to include the new docstrings. This will be reflected in the API documentation, providing richer information to users. Impact: - Developers using the Llama Stack API will have a better understanding of the data structures. - The auto-generated API documentation is now more informative. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-30 16:32:59 -07:00
Nathan Weinberg	cd5c6a2fcd	chore: standardize vector store not found error (#2968 ) # What does this PR do? 1. Creates a new `VectorStoreNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-30 15:19:16 -07:00
Nathan Weinberg	272a3e9937	chore: standardize dataset not found error (#2962 ) # What does this PR do? 1. Adds a broad schema for custom exception classes in the Llama Stack project 2. Creates a new `DatasetNotFoundError` class 3. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-30 14:52:46 -07:00
IAN MILLER	25d3dfa30f	fix: fix No module named 'ollama' in test_inference_recordings.py (#2967 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes the following error in unit test that was running on up to date main branch: ``` FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_recording_mode - ModuleNotFoundError: No module named 'ollama' FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_replay_mode - ModuleNotFoundError: No module named 'ollama' FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_replay_missing_recording - ModuleNotFoundError: No module named 'ollama' FAILED tests/unit/distribution/test_inference_recordings.py::TestInferenceRecording::test_embeddings_recording - ModuleNotFoundError: No module named 'ollama' =============================== 4 failed, 499 passed, 198 warnings in 34.50s ================================ ``` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run `./scripts/unit-tests.sh`	2025-07-30 16:33:33 -04:00
Nathan Weinberg	c5622c79de	chore: standardize model not found error (#2964 ) # What does this PR do? 1. Creates a new `ModelNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-30 12:19:53 -07:00
Ashwin Bharambe	266e2afb9c	fix(ci): slightly update workflow trigger (#2966 ) We want to avoid re-triggering the workflow when random other labels are added (e.g., `meta-cla`, etc.) Also no point restarting the workflow when someone _unlabels_.	2025-07-30 12:04:13 -07:00
Kelly Brown	026caa5551	docs: part 1 - fix warnings in documentation generation (#2861 ) Description This PR removes some of the warnings when uv builds the docs - Errors appear when generating docs about .md files not appearing in toctree. ~~Adding content to the `providers-gen.py ` file that adds `--- orphan: true ---` to to each file.~~. Added a toctree generator to the `providers-gen.py` file, this gets rid of the errors in the builds. - Deletes the `_openai_compat` files, extension of PR #2849 - Adds the `files` APIs section to the `providers` toctree on the index page - Manually adds the `--- orphan: true ---` to the advanced apis. Ill try to find a way to modify the providers code gen so it automatically adds it, but this fixes the errors. - Adds the `testing.md` to the `contributing` toctree - Adds `starting_llama_stack_server.md` to `distributions` toctree There are some other warnings im still looking at but this PR gets rid of most of the toctree errors Theres also an issue with the actual distribution-codegen that I can investigate in another PR. Opened a bug for it here #2873	2025-07-30 10:50:10 -07:00
ehhuang	38d5c44354	chore: fix k8s config (#2959 ) # What does this PR do? ## Test Plan deployed to EKS	2025-07-30 10:11:59 -07:00
Ashwin Bharambe	fd2aaf4978	fix: use OLLAMA_URL to activate Ollama provider in starter (#2963 ) We tried to always keep Ollama enabled. However doing so makes the provider implementation half-assed -- should it error when it cannot connect to Ollama or not? What happens during periodic model refresh? Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to conditionally enable the provider. ## Test Plan Run `uv run llama stack build --template starter --image-type venv --run` with and without `OLLAMA_URL` set. Verify using `llama-stack-client provider list` that ollama is correctly enabled.	2025-07-30 10:11:17 -07:00
Matthew Farrellee	b69bafba30	fix(library_client): improve initialization error handling and prevent AttributeError (#2944 ) # What does this PR do? - Initialize route_impls to None in constructor to prevent AttributeError - Consolidate initialization checks to single point in request() method - Improve error message to be more helpful ("Please call initialize() first") - Add comprehensive test suite to prevent regressions The library client now has better error handling when users forget to call initialize(), showing a clear ValueError instead of confusing AttributeError. All initialization validation is now centralized in the request() method, with internal methods (_call_non_streaming, _call_streaming, _convert_body) relying on this single check for cleaner, more maintainable code. closes #2943 ## Test Plan `./scripts/unit-tests.sh`	2025-07-30 11:58:47 -04:00
Ashwin Bharambe	9b69b6ac05	fix: pre-commit issue Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s Details Integration Tests / discover-tests (push) Successful in 29s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s Details Test External API and Providers / test-external (venv) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 27s Details Integration Tests / record-tests (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 29s Details Unit Tests / unit-tests (3.13) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 33s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 34s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 33s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 37s Details Unit Tests / unit-tests (3.12) (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 37s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 35s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s Details Integration Tests / run-tests (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 1m43s Details	2025-07-29 17:52:36 -07:00
Ashwin Bharambe	f6afb3c26b	feat(ci): keep only one re-recording job because independent recordings will conflict (#2956 ) A couple of important updates: - When recording tests, we cannot be generating a matrix because all the independent recordings will conflict. - In fact, we just don't need a matrix on test types any more because things are very fast and the overhead of `llama stack build` and setting up `uv` etc. is much more. - Refactored the running of tests into an independent action	2025-07-29 17:48:04 -07:00
Ashwin Bharambe	b237df8f18	feat(ci): use replay mode, setup ollama if specific label exists on PR (#2955 ) This PR makes setting up Ollama optional for CI. By default, we use `replay` mode for inference requests and use the stored results from the `tests/integration/recordings/` directory. Every so often, users will update tests which will need us to re-record. To do this, we check for the existence of a label `re-record-tests` on the PR. If detected, - ollama is spun up - inference mode is set to record - after the tests are done, if any new changes are detected, they are pushed back to the PR ## Test Plan This is GitHub CI. Gotta test it live.	2025-07-29 16:50:26 -07:00
Ashwin Bharambe	0ac503ec0d	feat(tests): record responses for evals and telemetry tests (#2954 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test External API and Providers / test-external (venv) (push) Failing after 10s Details Test Llama Stack Build / build (push) Failing after 8s Details Integration Tests / test-matrix (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 29s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s Details Python Package Build Test / build (3.13) (push) Failing after 38s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s Details Pre-commit / pre-commit (push) Successful in 2m2s Details Continuing with https://github.com/meta-llama/llama-stack/pull/2952 This also includes a "fix" to inference store related tests so that we pull a large number of inference responses from the DB so as to always find the one we just wrote.	2025-07-29 15:46:21 -07:00
Ashwin Bharambe	81c7d6fa2e	chore(ci): disable post training tests (#2953 ) Post training tests need _much_ better thinking before we can re-enable them to be run on every single PR. Running periodically should be approached only when it is shown that the tests are reliable and as light-weight as can be; otherwise, it is just kicking the can down the road.	2025-07-29 14:20:09 -07:00
Ashwin Bharambe	072d20a124	feat(test): record agents, safety and vector_io integration tests (#2952 ) Continue to build on top of https://github.com/meta-llama/llama-stack/pull/2941 ## Test Plan Run server with `LLAMA_STACK_TEST_INFERENCE_MODE=record` and then run the integration tests with `--stack-config=server:starter`. Then restart the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay` and re-run the tests. Verify that no request hit Ollama at any point.	2025-07-29 14:02:14 -07:00
Matthew Farrellee	2d1ab3ca55	fix: use same image_name logic for build & run config (#2949 ) # What does this PR do? when --image-name is not provided the build script default to the image_name in the config, this makes sure the same is done for the run script ## Test Plan llama stack build w/o --image-name	2025-07-29 12:54:21 -07:00
Francisco Arceo	6ac973ec80	chore: Delete coverage-badge (#2950 ) At the moment, the code coverage action has just been failing. It's misleading when interpreting the status badge on the main branch. https://github.com/meta-llama/llama-stack/actions/workflows/coverage-badge.yml # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-29 12:53:25 -07:00
Ashwin Bharambe	2e5ca3f15c	chore: move recordings one directory upwards	2025-07-29 12:46:19 -07:00
Ashwin Bharambe	08b4a1deb3	feat(tests): introduce inference record/replay to increase test reliability (#2941 ) Implements a comprehensive recording and replay system for inference API calls that eliminates dependency on online inference providers during testing. The system treats inference as deterministic by recording real API responses and replaying them in subsequent test runs. Applies to OpenAI clients (which should cover many inference requests) as well as Ollama AsyncClient. For storing, we use a hybrid system: Sqlite for fast lookups and JSON files for easy greppability / debuggability. As expected, tests become much much faster (more than 3x in just inference testing.) ```bash LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \ uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` ```bash LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \ uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` - `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or `replay` - `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified for record or replay modes)	2025-07-29 12:41:31 -07:00
Ashwin Bharambe	abf1d6a703	fix: random breakage in llama_stack/ui/package.json	2025-07-29 12:31:29 -07:00
Ashwin Bharambe	fee365b71e	fix: delete requirements.txt which crept back in	2025-07-29 11:30:25 -07:00
Nehanth Narendrula	58ffd82853	fix: Update SFTConfig parameter to fix CI and Post Training Workflow (#2948 ) # What does this PR do? - Change max_seq_length to max_length in SFTConfig constructor - TRL deprecated max_seq_length in Feb 2024 and removed it in v0.20.0 - Reference: https://github.com/huggingface/trl/pull/2895 This resolves the SFT training failure in CI tests	2025-07-29 11:14:04 -07:00
Matthew Farrellee	c7dc0f21b4	fix: error on failed job, do not wait for timeout (#2945 ) # What does this PR do? cause post training integration test to error when job fails. ## Test Plan ci	2025-07-29 11:07:51 -07:00
Nathan Weinberg	870a37ff4b	feat: add base64 encoded PDF support for OpenAI Chat Completions (#2881 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 16s Details Test Llama Stack Build / build (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 23s Details Update ReadTheDocs / update-readthedocs (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 29s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 31s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 58s Details Python Package Build Test / build (3.13) (push) Failing after 54s Details Integration Tests / test-matrix (push) Failing after 56s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m4s Details Pre-commit / pre-commit (push) Successful in 2m15s Details # What does this PR do? OpenAI Chat Completions supports passing a base64 encoded PDF file to a model, but Llama Stack currently does not allow for this behavior. This PR extends our implementation of the OpenAI API spec to change that. Closes #2129 ## Test Plan A new functional test has been added to test the validity of such a request Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-29 06:23:41 -04:00
github-actions[bot]	cf8722079c	build: Bump version to 0.2.16 Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / discover-tests (push) Successful in 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s Details Test External API and Providers / test-external (venv) (push) Failing after 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Integration Tests / test-matrix (push) Failing after 8s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 12s Details Test Llama Stack Build / build-single-provider (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s Details Pre-commit / pre-commit (push) Successful in 1m23s Details	2025-07-28 23:13:50 +00:00
Mark Campbell	19c90d9bfc	docs: update using llama stack as library docs (#2931 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Integration Tests / discover-tests (push) Successful in 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Coverage Badge / unit-tests (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 7s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Details Test External API and Providers / test-external (venv) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 42s Details Update ReadTheDocs / update-readthedocs (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Pre-commit / pre-commit (push) Successful in 1m58s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Updates provider template from outdated `ollama` to `starter` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes: #2839 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-28 15:35:26 -07:00
ehhuang	4019027070	chore: revert #2855 (#2939 ) # What does this PR do? revert https://github.com/meta-llama/llama-stack/pull/2855 to unblock release (running out of disk space) Error here: `4689354931` ## Test Plan	2025-07-28 15:30:25 -07:00
dependabot[bot]	e189f65548	chore(python-deps): bump pydantic from 2.10.6 to 2.11.7 (#2925 ) Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.10.6 to 2.11.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pydantic/pydantic/releases">pydantic's releases</a>.</em></p> <blockquote> <h2>v2.11.7 2025-06-14</h2> <!-- raw HTML omitted --> <h2>What's Changed</h2> <h3>Fixes</h3> <ul> <li>Copy <code>FieldInfo</code> instance if necessary during <code>FieldInfo</code> build by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11980">pydantic/pydantic#11980</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.11.6...v2.11.7">https://github.com/pydantic/pydantic/compare/v2.11.6...v2.11.7</a></p> <h2>v2.11.6 2025-06-13</h2> <h2>v2.11.6 (2025-06-13)</h2> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Rebuild dataclass fields before schema generation by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11949">#11949</a></li> <li>Always store the original field assignment on <code>FieldInfo</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11946">#11946</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.11.5...v2.11.6">https://github.com/pydantic/pydantic/compare/v2.11.5...v2.11.6</a></p> <h2>v2.11.5 2025-05-22</h2> <!-- raw HTML omitted --> <h2>What's Changed</h2> <h3>Fixes</h3> <ul> <li>Check if <code>FieldInfo</code> is complete after applying type variable map by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11855">#11855</a></li> <li>Do not delete mock validator/serializer in <code>model_rebuild()</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11890">#11890</a></li> <li>Do not duplicate metadata on model rebuild by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11902">#11902</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/pydantic/pydantic/compare/v2.11.4...v2.11.5">https://github.com/pydantic/pydantic/compare/v2.11.4...v2.11.5</a></p> <h2>v2.11.4 2025-04-29</h2> <h3>What's Changed</h3> <h4>Packaging</h4> <ul> <li>Bump <code>mkdocs-llmstxt</code> to v0.2.0 by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11725">#11725</a></li> </ul> <h4>Changes</h4> <ul> <li>Allow config and bases to be specified together in <code>create_model()</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11714">#11714</a>. This change was backported as it was previously possible (although not meant to be supported) to provide <code>model_config</code> as a field, which would make it possible to provide both configuration and bases.</li> </ul> <h4>Fixes</h4> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pydantic/pydantic/blob/main/HISTORY.md">pydantic's changelog</a>.</em></p> <blockquote> <h2>v2.11.7 (2025-06-14)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.11.7">GitHub release</a></p> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Copy <code>FieldInfo</code> instance if necessary during <code>FieldInfo</code> build by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11898">#11898</a></li> </ul> <h2>v2.11.6 (2025-06-13)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.11.6">GitHub release</a></p> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Rebuild dataclass fields before schema generation by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11949">#11949</a></li> <li>Always store the original field assignment on <code>FieldInfo</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11946">#11946</a></li> </ul> <h2>v2.11.5 (2025-05-22)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.11.5">GitHub release</a></p> <h3>What's Changed</h3> <h4>Fixes</h4> <ul> <li>Check if <code>FieldInfo</code> is complete after applying type variable map by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11855">#11855</a></li> <li>Do not delete mock validator/serializer in <code>model_rebuild()</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11890">#11890</a></li> <li>Do not duplicate metadata on model rebuild by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11902">#11902</a></li> </ul> <h2>v2.11.4 (2025-04-29)</h2> <p><a href="https://github.com/pydantic/pydantic/releases/tag/v2.11.4">GitHub release</a></p> <h3>What's Changed</h3> <h4>Packaging</h4> <ul> <li>Bump <code>mkdocs-llmstxt</code> to v0.2.0 by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11725">#11725</a></li> </ul> <h4>Changes</h4> <ul> <li>Allow config and bases to be specified together in <code>create_model()</code> by <a href="https://github.com/Viicos"><code>@Viicos</code></a> in <a href="https://redirect.github.com/pydantic/pydantic/pull/11714">#11714</a>. This change was backported as it was previously possible (although not meant to be supported) to provide <code>model_config</code> as a field, which would make it possible to provide both configuration and bases.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`5f033e46c5`"><code>5f033e4</code></a> Prepare release v2.11.7</li> <li><a href="`c3368b83c4`"><code>c3368b8</code></a> Copy <code>FieldInfo</code> instance if necessary during <code>FieldInfo</code> build (<a href="https://redirect.github.com/pydantic/pydantic/issues/11980">#11980</a>)</li> <li><a href="`3987b23db4`"><code>3987b23</code></a> Prepare release v2.11.6</li> <li><a href="`dc7a9d20be`"><code>dc7a9d2</code></a> Always store the original field assignment on <code>FieldInfo</code></li> <li><a href="`c284c279a5`"><code>c284c27</code></a> Rebuild dataclass fields before schema generation</li> <li><a href="`5e6d1dc71f`"><code>5e6d1dc</code></a> Prepare release v2.11.5</li> <li><a href="`1b63218c42`"><code>1b63218</code></a> Do not duplicate metadata on model rebuild (<a href="https://redirect.github.com/pydantic/pydantic/issues/11902">#11902</a>)</li> <li><a href="`5aefad873b`"><code>5aefad8</code></a> Do not delete mock validator/serializer in <code>model_rebuild()</code></li> <li><a href="`8fbe6585f4`"><code>8fbe658</code></a> Check if <code>FieldInfo</code> is complete after applying type variable map</li> <li><a href="`12b371a0f7`"><code>12b371a</code></a> Update documentation about <code>@dataclass_transform</code> support</li> <li>Additional commits viewable in <a href="https://github.com/pydantic/pydantic/compare/v2.10.6...v2.11.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pydantic&package-manager=uv&previous-version=2.10.6&new-version=2.11.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-28 15:11:54 -07:00
Ashwin Bharambe	70469c84e9	chore(packaging): remove requirements.txt (#2938 ) We don't need this. We have kept it since existing wisdom is that "it helps with back-compat". Well, the entire ecosystem is moving to `uv` at an unprecedented rate and keeping this creates unnecessary work and confusion. The specific reason I am killing this is that it confuses `dependabot` which ends up not bumping `uv.lock` which is the more important file to change.	2025-07-28 14:52:24 -07:00
Ashwin Bharambe	cd24aaf3aa	fix(pre-commit): push properly version 4	2025-07-28 13:11:56 -07:00
Ashwin Bharambe	8fa77bc93e	fix(pre-commit): push properly version 3	2025-07-28 13:02:04 -07:00
Ashwin Bharambe	3058060e2b	fix(pre-commit): push properly version 2	2025-07-28 12:50:50 -07:00
Ashwin Bharambe	607574c26a	fix(pre-commit): push properly	2025-07-28 12:43:49 -07:00
Ashwin Bharambe	8961706dea	fix(pre-commit): dont error if pre-commit itself errors	2025-07-28 12:35:34 -07:00
Ashwin Bharambe	dd4ea28b49	fix(dependabot): run pre-commit on dependabot PRs (#2935 ) See PR screenshot below -- we need to run pre-commit on the dependabot PRs obviously <img width="837" height="277" alt="image" src="https://github.com/user-attachments/assets/c17802d7-e252-4719-acc7-e335b24120f8" />	2025-07-28 15:25:06 -04:00
Matthew Farrellee	968fc132d3	fix(openai-compat): restrict developer/assistant/system/tool messages to text-only content (#2932 ) What: - Added OpenAIChatCompletionTextOnlyMessageContent type for text-only content validation - Modified OpenAISystemMessageParam, OpenAIAssistantMessageParam, OpenAIDeveloperMessageParam, and OpenAIToolMessageParam to use text-only content type instead of mixed content - OpenAIUserMessageParam unchanged - still accepts both text and images - Updated OpenAPI spec files to reflect text-only content restrictions in schemas closes #2894 Why: - Enforces OpenAI API compatibility by restricting image content to user messages only - Prevents API misuse where images might be sent in message types that don't support them - Aligns with OpenAI's actual API behavior where only user messages can contain multimodal content - Improves type safety and validation at the API boundary Test plan: - Added comprehensive parametrized tests covering all 5 OpenAI message types - Tests verify text string acceptance for all message types - Tests verify text list acceptance for all message types - Tests verify image rejection for system/assistant/developer/tool messages (ValidationError expected) - Tests verify user messages still accept images (backward compatibility maintained)	2025-07-28 10:36:34 -07:00
Matthew Farrellee	60bb5e307e	feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919 ) # What does this PR do? - Add base_url field to OpenAIConfig with default "https://api.openai.com/v1" - Update sample_run_config to support OPENAI_BASE_URL environment variable - Modify get_base_url() to return configured base_url instead of hardcoded value - Add comprehensive test suite covering: - Default base URL behavior - Custom base URL from config - Environment variable override - Config precedence over environment variables - Client initialization with configured URL - Model availability checks using configured URL This enables users to configure custom OpenAI-compatible API endpoints via environment variables or configuration files. Closes #2910 ## Test Plan run unit tests	2025-07-28 10:16:02 -07:00
Charlie Doern	b1c21a25ec	docs: remove provider_id from external docs (#2922 ) # What does this PR do? external provider docs mention setting provider_id in the build yaml. Since we changed that to just be provider_type and module, remove instances of provider_id Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-28 10:14:39 -07:00
Charlie Doern	86fe2b8475	fix: adjust provider type used in external provider test (#2921 ) # What does this PR do? provider_id is no longer valid in a build.yaml, remove it in the external provider test Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-28 10:14:16 -07:00
Matthew Farrellee	47c078fcef	feat: implement dynamic model detection support for inference providers using litellm (#2886 ) # What does this PR do? This enhancement allows inference providers using LiteLLMOpenAIMixin to validate model availability against LiteLLM's official provider model listings, improving reliability and user experience when working with different AI service providers. - Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor - Add check_model_availability method to LiteLLMOpenAIMixin using litellm.models_by_provider - Update Gemini, Groq, and SambaNova inference adapters to pass litellm_provider_name ## Test Plan standard CI.	2025-07-28 10:13:54 -07:00
Christian Zaccaria	c48dcafc77	fix: Fix unit tests CI and failing tests (#2928 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Added `set -e` to the beginning of the unit test script to ensure the script exits on failure and correctly fails the CI when tests do not pass. - Fixed all unit tests that were silently failing in the CI. - Fixed Python 3.13 unit test CI failing silently. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2877 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Previously: Unit tests passing in CI eventhough it failed 11 tests -> [CI-run](`4683681501 (step)`:4:2097) - Made the fix. Now, ensuring CI fails as expected on test failures: Unit tests failing in CI with 1 failed test -> [CI-run](`4684234247 (step)`:4:1506) - This PR shows the CI passing and all unit tests passing.	2025-07-28 10:07:26 -07:00
Charlie Doern	46e2989312	fix: switch refresh to debug log (#2933 ) # What does this PR do? the server logs have a persistent `core: refreshing registry` log that clogs up the output. Switch it to debug this is what it looked like: <img width="1126" height="1028" alt="Screenshot 2025-07-28 at 9 56 44 AM" src="https://github.com/user-attachments/assets/a1880fd3-7fc7-4a97-bfb8-89a62e4c5c19" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-28 10:02:54 -07:00
Matthew Farrellee	3c40c8e583	fix: litellm_provider_name for llama-api (#2934 ) litellm uses "meta_llama" for the provider name, see https://docs.litellm.ai/docs/providers/meta_llama ad https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py#L833	2025-07-28 10:02:16 -07:00
Charlie Doern	09abdb0a37	test: upload logs for external provider tests (#2914 ) Some checks failed Integration Tests / discover-tests (push) Successful in 2s Details Installer CI / lint (push) Failing after 5s Details Installer CI / smoke-test-on-dev (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 7s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 9s Details Test Llama Stack Build / build (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Integration Tests / test-matrix (push) Failing after 7s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 1m5s Details # What does this PR do? currently the external provider tests don't upload log files as artifacts nor do they use LLAMA_STACK_LOG_FILE. align with the other integration tests ## Test Plan logs should be present in the two tests on this PR Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 15:03:15 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Charlie Doern	3344d8a9e5	fix: separate build and run provider types (#2917 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m13s Details # What does this PR do? in #2637, I combined the run and build config provider types to both use `Provider` since this includes a provider_id, a user must now specify this when writing a build yaml. This is not very clear because all a user should care about upon build is the code to be installed (the module and the provider_type) introduce `BuildProvider` and fixup the parts of the code impacted by this Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 12:39:26 -07:00
Nathan Weinberg	025163d8e6	feat: add auto-generated CI documentation pre-commit hook (#2890 ) # What does this PR do? Our CI is entirely undocumented, this commit adds a README.md file with a table of the current CI and what is does --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-25 17:57:01 +02:00
Derek Higgins	52201612de	feat: implement chunk deletion for vector stores (#2701 ) Add support for deleting individual chunks from vector stores - Add abstract remove_chunk() method to EmbeddingIndex base class - Implement chunk deletion for Faiss provider, SQLite Vec, Milvus, PGVector - Placeholder implementations with NotImplementedError for Chroma/Qdrant/Weaviate - Integrate chunk deletion into OpenAI vector store file deletion flow - removed xfail from test_openai_vector_store_delete_file_removes_from_vector_store Closes: #2477 --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-25 10:30:30 -04:00
Francisco Arceo	9e77be1f72	chore: Fix chroma unit tests (#2896 ) # What does this PR do? Enable Chroma inline unit tests and fix integration tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-25 10:12:14 -04:00
Ashwin Bharambe	ed07a58b50	fix(registry): ensure clean shutdown (#2901 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 57s Details Avoid the error message: ``` INFO 2025-07-24 21:51:54,530 __main__:598 server: Received interrupt signal, shutting down gracefully... ERROR 2025-07-24 21:51:54,692 asyncio:1826 uncategorized: Task was destroyed but it is pending! task: <Task pending name='Task-15' coro=<refresh_registry() running at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/stack.py:356> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=> ```	2025-07-25 09:44:31 -04:00
Charlie Doern	de6919ecdd	refactor: install external providers from module (#2637 ) # What does this PR do? Today, external providers are installed via the `external_providers_dir` in the config. This necessitates users to understand the `ProviderSpec` and set up their directories accordingly. This process splits up the config for the stack across multiple files, directories, and formats. Most (if not all) external providers today have a [get_provider_spec](`559cb18fbb/src/ramalama_stack/provider.py (L9)`) method that sits unused. Utilizing this method rather than the providers.d route allows for a much easier installation process for external providers and limits the amount of extra configuration a regular user has to do to get their stack off the ground. To accomplish this and wire it throughout the build process, Introduce the concept of a `module` for users to specify for an external provider upon build time. In order to facilitate this, align the build and run spec to use `Provider` class rather than the stringified provider_type that build currently uses. For example, say this is in your build config: ``` - provider_id: ramalama provider_type: remote::ramalama module: ramalama_stack ``` during build (in the various `build_...` scripts), additionally to installing any pip dependencies we will also install this module and use the `get_provider_spec` method to retrieve the ProviderSpec that is currently specified using `providers.d`. In production so far, providing instructions for installing external providers for users has been difficult: they need to install the module as a pre-req, create the providers.d directory, copy in the provider spec, and also copy in the necessary build/run yaml files. Accessing an external provider should be as easy as possible, and pointing to its installable module aligns more with the rest of our build and dependency management process. For now, `external_providers_dir` still exists as an alternate more declarative method of using external providers. ## Test Plan added an integration test installing an external provider from module and more unit test coverage for `get_provider_registry` ( the warning in yellow is expected, the module is installed inside of the build env, not where we are running the command) <img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30 48 AM" src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d" /> <img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31 14 AM" src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 15:41:26 +02:00
dependabot[bot]	85223ccc4d	chore(github-deps): bump astral-sh/setup-uv from 6.4.1 to 6.4.3 (#2902 ) Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.4.1 to 6.4.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v6.4.3 🌈 fix relative paths starting with dots</h2> <h2>🐛 Bug fixes</h2> <ul> <li>fix relative paths starting with dots <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/500">#500</a>)</li> </ul> <h2>v6.4.2 🌈 Interpret relative inputs as under working-directory</h2> <h2>Changes</h2> <p>This release will interpret relative paths in inputs as relative to the value of <code>working-directory</code> (default is <code>${{ github.workspace }}</code>) . This means the following configuration</p> <pre lang="yaml"><code>- uses: astral-sh/setup-uv@v6 with: working-directory: /my/path cache-dependency-glob: uv.lock </code></pre> <p>will look for the <code>cache-dependency-glob</code> under <code>/my/path/uv.lock</code></p> <h2>🐛 Bug fixes</h2> <ul> <li>interpret relative inputs as under working-directory <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/498">#498</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known versions for 0.8.1/0.8.2 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/497">#497</a>)</li> <li>chore: update known versions for 0.8.0 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/491">#491</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`e92bafb625`"><code>e92bafb</code></a> fix relative paths starting with dots (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/500">#500</a>)</li> <li><a href="`2c7142f755`"><code>2c7142f</code></a> interpret relative inputs as under working-directory (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/498">#498</a>)</li> <li><a href="`23482a31a8`"><code>23482a3</code></a> chore: update known versions for 0.8.1/0.8.2 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/497">#497</a>)</li> <li><a href="`4ac06a054e`"><code>4ac06a0</code></a> chore: update known versions for 0.8.0 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/491">#491</a>)</li> <li>See full diff in <a href="`7edac99f96...e92bafb625`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.4.1&new-version=6.4.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-25 10:08:24 +02:00
Yuan Tang	34093fecd1	ci: Remove `open-pull-requests-limit: 0` from dependabot.yml (#2900 ) This might fix issues in https://github.com/meta-llama/llama-stack/pull/2899 and https://github.com/meta-llama/llama-stack/pull/2897 where uv dependencies are not being upgraded correctly (`uv.lock` is not being updated).	2025-07-25 09:49:18 +02:00
dependabot[bot]	3216765c26	chore(deps): bump form-data from 4.0.2 to 4.0.4 in /llama_stack/ui (#2898 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 47s Details Bumps [form-data](https://github.com/form-data/form-data) from 4.0.2 to 4.0.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/form-data/form-data/releases">form-data's releases</a>.</em></p> <blockquote> <h2>v4.0.4</h2> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a> - 2025-07-16</h2> <h3>Commits</h3> <ul> <li>[meta] add <code>auto-changelog</code> <a href="`811f68282f`"><code>811f682</code></a></li> <li>[Tests] handle predict-v8-randomness failures in node < 17 and node > 23 <a href="`1d11a76434`"><code>1d11a76</code></a></li> <li>[Fix] Switch to using <code>crypto</code> random for boundary values <a href="`3d1723080e`"><code>3d17230</code></a></li> <li>[Tests] fix linting errors <a href="`5e340800b5`"><code>5e34080</code></a></li> <li>[meta] actually ensure the readme backup isn’t published <a href="`316c82ba93`"><code>316c82b</code></a></li> <li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a href="`58c25d7640`"><code>58c25d7</code></a></li> <li>[meta] fix readme capitalization <a href="`2300ca1959`"><code>2300ca1</code></a></li> </ul> <h2>v4.0.3</h2> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a> - 2025-06-05</h2> <h3>Fixed</h3> <ul> <li>[Fix] <code>append</code>: avoid a crash on nullish values <a href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li> </ul> <h3>Commits</h3> <ul> <li>[eslint] use a shared config <a href="`426ba9ac44`"><code>426ba9a</code></a></li> <li>[eslint] fix some spacing issues <a href="`20941917f0`"><code>2094191</code></a></li> <li>[Refactor] use <code>hasown</code> <a href="`81ab41b46f`"><code>81ab41b</code></a></li> <li>[Fix] validate boundary type in <code>setBoundary()</code> method <a href="`8d8e469309`"><code>8d8e469</code></a></li> <li>[Tests] add tests to check the behavior of <code>getBoundary</code> with non-strings <a href="`837b8a1f75`"><code>837b8a1</code></a></li> <li>[Dev Deps] remove unused deps <a href="`870e4e6659`"><code>870e4e6</code></a></li> <li>[meta] remove local commit hooks <a href="`e6e83ccb54`"><code>e6e83cc</code></a></li> <li>[Dev Deps] update <code>eslint</code> <a href="`4066fd6f65`"><code>4066fd6</code></a></li> <li>[meta] fix scripts to use prepublishOnly <a href="`c4bbb13c0e`"><code>c4bbb13</code></a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/form-data/form-data/blob/master/CHANGELOG.md">form-data's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a> - 2025-07-16</h2> <h3>Commits</h3> <ul> <li>[meta] add <code>auto-changelog</code> <a href="`811f68282f`"><code>811f682</code></a></li> <li>[Tests] handle predict-v8-randomness failures in node < 17 and node > 23 <a href="`1d11a76434`"><code>1d11a76</code></a></li> <li>[Fix] Switch to using <code>crypto</code> random for boundary values <a href="`3d1723080e`"><code>3d17230</code></a></li> <li>[Tests] fix linting errors <a href="`5e340800b5`"><code>5e34080</code></a></li> <li>[meta] actually ensure the readme backup isn’t published <a href="`316c82ba93`"><code>316c82b</code></a></li> <li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a href="`58c25d7640`"><code>58c25d7</code></a></li> <li>[meta] fix readme capitalization <a href="`2300ca1959`"><code>2300ca1</code></a></li> </ul> <h2><a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a> - 2025-06-05</h2> <h3>Fixed</h3> <ul> <li>[Fix] <code>append</code>: avoid a crash on nullish values <a href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li> </ul> <h3>Commits</h3> <ul> <li>[eslint] use a shared config <a href="`426ba9ac44`"><code>426ba9a</code></a></li> <li>[eslint] fix some spacing issues <a href="`20941917f0`"><code>2094191</code></a></li> <li>[Refactor] use <code>hasown</code> <a href="`81ab41b46f`"><code>81ab41b</code></a></li> <li>[Fix] validate boundary type in <code>setBoundary()</code> method <a href="`8d8e469309`"><code>8d8e469</code></a></li> <li>[Tests] add tests to check the behavior of <code>getBoundary</code> with non-strings <a href="`837b8a1f75`"><code>837b8a1</code></a></li> <li>[Dev Deps] remove unused deps <a href="`870e4e6659`"><code>870e4e6</code></a></li> <li>[meta] remove local commit hooks <a href="`e6e83ccb54`"><code>e6e83cc</code></a></li> <li>[Dev Deps] update <code>eslint</code> <a href="`4066fd6f65`"><code>4066fd6</code></a></li> <li>[meta] fix scripts to use prepublishOnly <a href="`c4bbb13c0e`"><code>c4bbb13</code></a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`41996f5ac7`"><code>41996f5</code></a> v4.0.4</li> <li><a href="`316c82ba93`"><code>316c82b</code></a> [meta] actually ensure the readme backup isn’t published</li> <li><a href="`2300ca1959`"><code>2300ca1</code></a> [meta] fix readme capitalization</li> <li><a href="`811f68282f`"><code>811f682</code></a> [meta] add <code>auto-changelog</code></li> <li><a href="`5e340800b5`"><code>5e34080</code></a> [Tests] fix linting errors</li> <li><a href="`1d11a76434`"><code>1d11a76</code></a> [Tests] handle predict-v8-randomness failures in node < 17 and node > 23</li> <li><a href="`58c25d7640`"><code>58c25d7</code></a> [Dev Deps] update <code>@ljharb/eslint-config</code></li> <li><a href="`3d1723080e`"><code>3d17230</code></a> [Fix] Switch to using <code>crypto</code> random for boundary values</li> <li><a href="`d8d67dc8ac`"><code>d8d67dc</code></a> v4.0.3</li> <li><a href="`e6e83ccb54`"><code>e6e83cc</code></a> [meta] remove local commit hooks</li> <li>Additional commits viewable in <a href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=form-data&package-manager=npm_and_yarn&previous-version=4.0.2&new-version=4.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meta-llama/llama-stack/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-24 21:24:56 -04:00
ehhuang	21bae296f2	feat(auth): API access control (#2822 ) # What does this PR do? - Added ability to specify `required_scope` when declaring an API. This is part of the `@webmethod` decorator. - If auth is enabled, a user can access an API only if `user.attributes['scope']` includes the `required_scope` - We add `required_scope='telemetry.read'` to the telemetry read APIs. ## Test Plan CI with added tests 1. Enable server.auth with github token 2. Observe `client.telemetry.query_traces()` returns 403	2025-07-24 15:30:48 -07:00
Calum Murray	7cc4819e90	feat: add MCP Streamable HTTP support (#2554 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds support for the new Streamable HTTP transport for MCP, as well as falling back to the SSE protocol if the Streamable HTTP connection fails. <!-- If resolving an issue, uncomment and update the line below --> Closes #2542 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Calum Murray <cmurray@redhat.com>	2025-07-24 15:04:27 -07:00
Sébastien Han	632cf9eb72	feat: Bring Your Own API (BYOA) (#2228 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Installer CI / lint (push) Failing after 3s Details Integration Tests / discover-tests (push) Successful in 3s Details Installer CI / smoke-test-on-dev (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Integration Tests / test-matrix (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? Prototype on a new feature to allow new APIs to be plugged in Llama Stack. Opened for early feedback on the approach and test appetite on the functionality. @ashwinb @raghotham open for early feedback, thanks! --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-07-24 13:41:14 -07:00
Charlie Doern	341504869e	fix: use logger for console telemetry (#2844 ) # What does this PR do? currently `print` is being used with custom formatting to achieve telemetry output in the console_span_processor This causes telemetry not to show up in log files when using `LLAMA_STACK_LOG_FILE`. During testing it looks like telemetry is not being captured when it is switch to using Rich formatting with the logger and then strip the formatting off when a log file is being used so the formatting looks normal ## Test Plan before: console: <img width="967" height="127" alt="Screenshot 2025-07-21 at 4 02 15 PM" src="https://github.com/user-attachments/assets/b09518cc-9d38-4970-9877-70e2c41fcbb5" /> log file (no telemetry): ``` 2025-07-21 16:01:32,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`... 2025-07-21 16:01:34,779 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed 2025-07-21 16:01:35,083 __main__:587 server: Listening on ['::', '0.0.0.0']:8321 2025-07-21 16:01:35,091 uvicorn.error:84 uncategorized: Started server process [68679] 2025-07-21 16:01:35,091 uvicorn.error:48 uncategorized: Waiting for application startup. 2025-07-21 16:01:35,092 __main__:163 server: Starting up 2025-07-21 16:01:35,092 uvicorn.error:62 uncategorized: Application startup complete. 2025-07-21 16:01:35,092 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) 2025-07-21 16:01:37,167 uvicorn.access:473 uncategorized: 127.0.0.1:53145 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200 ``` after: console: <img width="797" height="165" alt="Screenshot 2025-07-22 at 3 28 44 PM" src="https://github.com/user-attachments/assets/44d40e3b-6502-439d-9ea5-38058b289962" /> log file: ``` 2025-07-21 15:59:51,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`... 2025-07-21 15:59:53,801 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed 2025-07-21 15:59:54,059 __main__:587 server: Listening on ['::', '0.0.0.0']:8321 2025-07-21 15:59:54,066 uvicorn.error:84 uncategorized: Started server process [68578] 2025-07-21 15:59:54,067 uvicorn.error:48 uncategorized: Waiting for application startup. 2025-07-21 15:59:54,067 __main__:163 server: Starting up 2025-07-21 15:59:54,067 uvicorn.error:62 uncategorized: Application startup complete. 2025-07-21 15:59:54,068 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) 2025-07-21 15:59:55,381 [TELEMETRY] 19:59:55.381 /v1/openai/v1/chat/completions 2025-07-21 15:59:55,619 uvicorn.access:473 uncategorized: 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200 2025-07-21 15:59:55,621 [TELEMETRY] 19:59:55.621 /v1/openai/v1/chat/completions [StatusCode.OK] (240.07ms) 2025-07-21 15:59:55,622 [TELEMETRY] 19:59:55.620 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200 ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-24 16:26:59 -04:00
Kelly Brown	abade761e0	docs: Update nvidia docs template (#2893 ) Description Fixes generation issue in nvidia code gen file. Closes #2873	2025-07-24 22:11:34 +02:00
Sébastien Han	226b877ca6	chore: install script should use starter (#2891 ) Our demo installation script should pull the starter image. Ollama is not being updated anymore as a distribution. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-24 12:18:02 -07:00
ehhuang	cbe89d2bdd	chore: return webmethod from find_matching_route (#2883 ) This will be used to support API access control, i.e. Webmethod would have a `required_scope` attribute, and we need access to that in the middleware.	2025-07-24 11:37:21 -07:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Stefan Thaler	537dc693ee	chore: add mypy coverage to inspect.py and library_client.py in /distribution (#2707 ) # What does this PR do? Adds type guards in /distribution/inspect.py and ignores a valid-type mypy error in library_client.py. This PR is part of issue #2647 . I'm rather unsure whether ignoring the valid-type error is correct in this case. It appears that args[0] is interpreted as [any] but I didn't find any way to specify the type. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-24 09:51:46 -07:00
Charlie Doern	d4f0b430e2	docs: update list of apis (#2697 ) # What does this PR do? apis.md had a few APIs missing and incorrectly described APIs Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-24 09:50:14 -07:00
Sébastien Han	af9c707eaf	fix: various improvements on install.sh (#2724 ) # What does this PR do? Bulk improvements: * The script has a better error reporting, when a command fails it will print the logs of the failed command * Better error handling using a trap to catch signal and perform proper cleanup * Cosmetic changes * Added CI to test the image code against main * Use the starter image and its latest tag Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-24 09:43:51 -07:00
Derek Higgins	4ea1f2aa9f	test: Add VLLM provider support to integration tests (#2757 ) - Add setup-vllm GitHub action to start VLLM container - Extend integration test matrix to support both ollama and vllm providers - Make test setup conditional based on provider type - Add provider-specific environment variables and configurations - vllm tests setup to run weekly or can be triggered manually (only ollama on PR) TODO: investigate failing tests for vllm provider (safety and post_training) Also need a proper fix for #2713 (tmp fix for this in the first commit in this PR) Closes: #1648 --------- Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-24 09:42:26 -07:00
Mustafa Elbehery	6ab5760a1b	chore(test): migrate unit tests from unittest to pytest nvidia test safety (#2793 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-24 09:41:07 -07:00
Yuan Tang	9069d878ef	docs: Update CHANGELOG.md (#2874 ) This updates the changelog to include recent releases. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-07-24 09:36:28 -07:00
Christian Zaccaria	7f7b990b80	docs: Document use cases for Responses and Agents APIs (#2756 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This pull request adds documentation to clarify the differences between the Agents API and the OpenAI Responses API, including use cases for each. It also updates the index page to reference the new documentation. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2368	2025-07-24 12:20:04 -04:00
Mohit Gaur	5ef2baacdc	fix: update check-workflows-use-hashes to use github error format (#2875 ) # What does this PR do? Updates the script `scripts/check-workflows-use-hashes.sh` to improve error reporting by adopting GitHub Actions error annotation format. * Updated the script to use GitHub Actions error annotation format (`::error file={name},line={line},col={col}::{message}`) making error messages more actionable and easier to locate in workflows. * Modified the script to include line numbers for `uses:` references by using `grep -n` and extracting line numbers, improving the precision of error reporting. Closes #2778 ## Test Plan - Violation check - Created test file with mixed SHA/non-SHA actions ``` echo 'uses: actions/checkout@v4' > test-workflow.yml echo 'uses: actions/upload-artifact@main' >> test-workflow.yml ``` Result: Correctly detected violations with precise line numbers ``` ./scripts/check-workflows-use-hashes.sh Output: ::error file=test-workflow.yml,line=14::uses non-SHA action ref: uses: actions/checkout@v4 ::error file=test-workflow.yml,line=20::uses non-SHA action ref: uses: actions/upload-artifact@main ``` - Verified existing project workflows pass ``` ./scripts/check-workflows-use-hashes.sh # Result: Exit code 0 (all workflows properly SHA-pinned) ```	2025-07-24 17:41:17 +02:00
Matthew Farrellee	e33a50480d	fix: starter template and litellm backward compat conflict for openai (#2885 ) # What does this PR do? openai/models.py has backward compat entries for litellm model names. the starter template includes these in the list of registered models. the inclusion results in duplicate model registrations. the backward compat is no longer necessary. ## Test Plan ci	2025-07-24 17:28:37 +02:00
Sarthak Deshpande	cd8715d327	chore: Added openai compatible vector io endpoints for chromadb (#2489 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? This PR implements the openai compatible endpoints for chromadb Closes #2462 ## Test Plan Ran ollama llama stack server and ran the command `pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2` 8 failed, 27 passed, 8 skipped, 1 xfailed The failed ones are regarding files api --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com> Co-authored-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-23 13:51:58 -07:00
Derek Higgins	fd2aab8582	fix: prevent shell redirection issues with pip dependencies (#2867 ) - Use printf to to escape special characters (e.g. < > ) - Apply escaping to pip_dependencies and special_pip_deps Resolves shell interpretation of >= operators as redirections that were causing build failing to respect versions and unexpected file creation in /app directory. Closes: #2866 ## Test Plan Manually tested, will also be tested by existing CI Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-23 21:43:33 +02:00
Derek Higgins	427136bb63	fix: cleanup after build_container.sh (#2869 ) - rm TEMP_DIR when build_container.sh succeeds - prevents multiple temp directories with Containerfile being left in /tmp Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-23 11:54:54 -07:00
IAN MILLER	51affe5783	fix: fixed test_access_control.py unit test (#2876 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> I fixed test_access_policy() function providing provider_model_id in each register model endpoint to pass assertions. Initially I faced this issue: ``` tests/unit/server/test_quota.py::test_authenticated_quota_allows_up_to_limit tests/unit/server/test_quota.py::test_authenticated_quota_blocks_after_limit tests/unit/server/test_quota.py::test_anonymous_quota_allows_up_to_limit tests/unit/server/test_quota.py::test_anonymous_quota_blocks_after_limit /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/aiosqlite/core.py:105: DeprecationWarning: The default datetime adapter is deprecated as of Python 3.12; see the sqlite3 documentation for suggested replacement recipes result = function() -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================================== short test summary info =============================================================================== FAILED tests/unit/server/test_access_control.py::test_access_policy - AssertionError: assert 'test_provider/model-1' == 'model-1' ==================================================================== 1 failed, 436 passed, 194 warnings in 20.09s ==================================================================== ``` After resolved, all works: ``` -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ========================================================================= 437 passed, 194 warnings in 19.41s ========================================================================= ``` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run ` ./scripts/unit-tests.sh`	2025-07-23 11:50:20 -07:00
Ashwin Bharambe	2fcfb0f0b5	fix: bring back dell template (#2880 ) This template is definitely needed since it (and related docker, which will push soon) is used by folks at Dell.	2025-07-23 11:40:59 -07:00
Mark Campbell	8353ad4981	fix: search mode validation for rag query (#2857 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> I noticed a few issues with my implementation of the search mode validation for RagQuery. This PR replaces the check for search mode in RagQuery with a Literal. There were issues before with ``` TypeError: Object of type RAGSearchMode is not JSON serializable ``` When using ``` query_config = RAGQueryConfig(max_chunks=6, mode="vector").model_dump() ``` It also fixes the fact that despite user input "vector" was always the used search mode. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Verify that a chosen search mode works when using Rag Query or use below agent config: ``` agent = Agent( client, model=model_id, instructions="You are a helpful assistant", tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], "query_config": { "mode": "keyword", "max_chunks": 6 } }, } ], ) ``` Running Unit Tests: ``` uv sync --extra dev uv run pytest tests/unit/rag/test_rag_query.py -v ```	2025-07-23 11:25:12 -07:00
Francisco Arceo	2aba2c1236	chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863 ) # What does this PR do? Moving vector store and vector store files helper methods to `openai_vector_store_mixin.py` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan The tests are already supported in the CI and tests the inline providers and current integration tests. Note that the `vector_index` fixture will be test `milvus_vec_adapter`, `faiss_vec_adapter`, and `sqlite_vec_adapter` in `tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`. Additionally, the integration tests in `integration-vector-io-tests.yml` runs `tests/integration/vector_io` tests for the following providers: ```python vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] ``` Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-23 13:35:48 -04:00
Matthew Farrellee	e1ed152779	chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests	2025-07-23 06:49:40 -04:00
grs	fc67ad408a	chore: add some documentation for access policy rules (#2785 ) # What does this PR do? Adds some documentation on setting explicit access_policy rules in config.	2025-07-23 10:27:27 +02:00
Sébastien Han	c0563c0560	fix: honour deprecation of --config and --template (#2856 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Integration Tests / test-matrix (push) Failing after 12s Details Test Llama Stack Build / build (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2716/ broke commands like: ``` python -m llama_stack.distribution.server.server --config llama_stack/templates/starter/run.yaml ``` And will fail with: ``` Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 626, in <module> main() File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 402, in main config_file = resolve_config_or_template(args.config, Mode.RUN) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/utils/config_resolution.py", line 43, in resolve_config_or_template config_path = Path(config_or_template) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 1162, in __init__ super().__init__(args) File "/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/pathlib.py", line 373, in __init__ raise TypeError( TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType' ``` Complaining that no positional arguments are present. We now honour the deprecation until --config and --template are removed completely. ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.* --> Both ` python -m llama_stack.distribution.server.server --config llama_stack/templates/starter/run.yaml` and ` python -m llama_stack.distribution.server.server llama_stack/templates/starter/run.yaml` should run the server. Same for `--template starter`. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-22 20:48:23 -07:00
Derek Higgins	340448e0aa	fix: optimize container build by enabling uv cache (#2855 ) - Remove --no-cache flags from uv pip install commands to enable caching - Mount host uv cache directory to container for persistent caching - Set UV_LINK_MODE=copy to prevent uv using hardlinks - When building the starter image o Build time reduced from ~4:45 to ~3:05 on subsequent builds (environment specific) o Eliminates re-downloading of 3G+ of data on each build o Cache size: ~6.2G (when building starter image) Fixes excessive data downloads during distro container builds. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-22 16:51:52 -07:00
Ashwin Bharambe	3b83032555	feat(registry): more flexible model lookup (#2859 ) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests	2025-07-22 15:22:48 -07:00
Mustafa Elbehery	9736f096f6	chore(test): fix flaky telemetry tests (#2815 ) Some checks failed Installer CI / lint (push) Failing after 2s Details Installer CI / smoke-test (push) Has been skipped Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests / test-matrix (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 48s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s Details Unit Tests / unit-tests (3.13) (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes flaky telemetry tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> See https://github.com/meta-llama/llama-stack/pull/2814 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-22 12:30:14 -07:00
Omer Tuchfeld	c1a63fcd87	fix(install): explicit docker.io usage (#2850 ) # What does this PR do? When podman is used and the registry is omitted, podman will prompt the user. However, we're piping the output of podman to /dev/null and the user will not see the prompt, the script will end abruptly and this is confusing. This commit explicitly uses the docker.io registry for the ollama image and the llama-stack image so that the prompt is avoided. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> I ran the script on a machine with podman and the issue was resolved ## Image Before the fix, this is what would happen: <img width="748" height="95" alt="image" src="https://github.com/user-attachments/assets/9c609f88-c0a8-45e7-a789-834f64f601e5" /> Signed-off-by: Omer Tuchfeld <omer@tuchfeld.dev>	2025-07-22 20:36:48 +02:00
Francisco Arceo	20c3197952	chore: Making name optional in openai_create_vector_store (#2858 ) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-22 13:31:31 -04:00
ehhuang	8e1a2b4703	chore: remove *_openai_compat providers (#2849 ) # What does this PR do? These are no longer needed as llama-stack-evals can run against OAI endpoints directly. ## Test Plan	2025-07-22 10:25:36 -07:00
Omer Tuchfeld	5e18d4d097	fix(agent): ensure turns are sorted (#2854 ) # What does this PR do? Ensures that session turns retrieved from the agent persistence layer are sorted by their `started_at` timestamp, as the key-value store does not guarantee order. Closes #2852 ## Test Plan - [ ] Add unit tests	2025-07-22 10:24:51 -07:00
Jeremy Bonghwan Choi	b5a6ecc331	docs: minor fix of the pgvector provider spec description (#2847 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Integration Tests / test-matrix (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 24s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> minor update of the pgvector doc, changing 'faiss' to 'pgvector' <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-21 22:10:35 -07:00
Francisco Arceo	2bc96613f9	chore: Adding demo script and importing it into the docs (#2848 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Coverage Badge / unit-tests (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Integration Tests / test-matrix (push) Failing after 13s Details Python Package Build Test / build (3.12) (push) Failing after 1m1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m0s Details Test Llama Stack Build / build (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 2m39s Details # What does this PR do? This PR adds the quickstart as a file to the docs so that it can be more easily maintained and run, as mentioned in https://github.com/meta-llama/llama-stack/pull/2800. ## Test Plan I could add this as a test in the CI but I wasn't sure if we wanted to add additional jobs there. 😅 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-21 22:53:32 -04:00
Francisco Arceo	c8f274347d	chore: Adding Access Control for OpenAI Vector Stores methods (#2772 ) # What does this PR do? Refactors the vector store routing logic by moving OpenAI-compatible vector store operations from the `VectorIORouter` to the `VectorDBsRoutingTable`. Closes https://github.com/meta-llama/llama-stack/issues/2761 ## Test Plan Added unit tests to cover new routing logic and ACL checks. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-21 16:22:44 -04:00
ehhuang	0d7a90b8bc	chore: merge --config and --template in server.py (#2716 ) # What does this PR do? Part of #2696 ## Test Plan Run `llama stack run starter` Error: ``` myenv ❯ llama stack run starters WARNING 2025-07-10 12:12:43,052 llama_stack.cli.stack.run:82 server: Conda detected. Using conda environment myenv for the run. usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE] [--image-type {conda,venv}] [--enable-ui] [config \| template] llama stack run: error: Could not resolve config or template 'starters'. Tried the following locations: 1. As file path: /Users/erichuang/projects/llama-stack-git/starters 2. As template: /Users/erichuang/projects/llama-stack-git/llama_stack/templates/starters/run.yaml 3. As built distribution: (/Users/erichuang/.llama/distributions/llamastack-starters/starters-run.yaml, /Users/erichuang/.llama/distributions/starters/starters-run.yaml) Available templates: dell, test-env, vllm-gpu, test-template, cerebras, openai-api-verification, sambanova, passthrough, direct-config, together, openai, fireworks, meta-reference-gpu, __pycache__, dev, ollama, watsonx, remote-vllm, llama_api, groq, dummy, oracle, nvidia, ci-tests, postgres-demo, test-stack, bedrock, starter, hf-serverless, hf-endpoint, tgi, open-benchmark, verification Did you mean one of these templates? - starter - together - postgres-demo ```	2025-07-21 13:19:27 -07:00
Charlie Doern	9a03526672	fix: uvicorn respect log_config (#2842 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / discover-tests (push) Successful in 9s Details Coverage Badge / unit-tests (push) Failing after 13s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Integration Tests / test-matrix (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m7s Details	2025-07-21 12:50:39 -07:00
Sébastien Han	019ddda138	fix: graceful SIGINT on server (#2831 ) # What does this PR do? After https://github.com/meta-llama/llama-stack/pull/2818, SIGINT will print a stack trace. This is because uvicorn re-raises SIGINT and it gets converted by Python internal signal handler (default handles SIGINT) to KeyboardInterrupt exception. We know simply catch the exception to get a clean exit, this is not changing the behavior on SIGINT. ## Test Plan Run the server, hit Ctrl+C or `kill -2 <server pid>` and expect a clean exit with no stack trace. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-21 11:35:15 -07:00
ehhuang	d0208df286	test: skip flaky telemetry tests (#2814 ) # What does this PR do? example error: `4625086977` ## Test Plan	2025-07-21 10:01:40 -07:00
IAN MILLER	9e6860b9cf	fix: remove @pytest.mark.asyncio from test_get_raw_document_text.py (#2840 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The pre-commit workflow was failing in the main branch and removing `@pytest.mark.asyncio `from `test_get_raw_document_text.py` fixed that. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-21 09:14:34 -07:00
Ondrej Metelka	89c49eb003	feat: Allow application/yaml as mime_type (#2575 ) # What does this PR do? Allow application/yaml as mime_type for documents. ## Test Plan Added unit tests.	2025-07-21 15:43:32 +02:00
Mustafa Elbehery	b2c7543af7	fix(vectordb): VectorDBInput has no provider_id (#2830 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Integration Tests / discover-tests (push) Successful in 21s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s Details Integration Tests / test-matrix (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 53s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 59s Details Pre-commit / pre-commit (push) Successful in 1m35s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR add `provider_id` field to `VectorDBInput` class. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2819 Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-21 14:03:40 +02:00
Sébastien Han	ecd28f0085	chore: add contribution guideline around PRs (#2811 ) More contributing guidelines. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-21 05:47:17 -04:00
Christian Zaccaria	56269245c2	fix: Add permissions for pull request creation in coverage-badge workflow (#2832 ) # What does this PR do? The workflow that automatically creates a PR to update the Coverage Badge fails as the `GITHUB_TOKEN` doesn't have write permissions. As opposed to providing write permissions to the token, we can provide the permissions for just this workflow with this PR.	2025-07-21 11:40:00 +02:00
dependabot[bot]	28956f9447	chore(github-deps): bump astral-sh/setup-uv from 6.3.1 to 6.4.1 (#2827 ) Some checks failed Integration Tests / discover-tests (push) Successful in 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 23s Details Test External Providers / test-external-providers (venv) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 22s Details Python Package Build Test / build (3.12) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s Details Python Package Build Test / build (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s Details Integration Tests / test-matrix (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 3m13s Details Unit Tests / unit-tests (3.13) (push) Failing after 3m15s Details Pre-commit / pre-commit (push) Successful in 4m55s Details Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.3.1 to 6.4.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v6.4.1 🌈 Hotfix: Ignore deps starting with uv when finding uv version</h2> <h2>Changes</h2> <p>Thank you <a href="https://github.com/phpmypython"><code>@phpmypython</code></a> for raising a PR to fix this issue!</p> <h2>🐛 Bug fixes</h2> <ul> <li>Ignore deps starting with uv when finding uv version <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/492">#492</a>)</li> </ul> <h2>v6.4.0 🌈 Add input <code>version-file</code></h2> <h2>Changes</h2> <p>You can now use the <code>version-file</code> input to specify a file that contains the version of uv to install. This can either be a <code>pyproject.toml</code> or <code>uv.toml</code> file which defines a <code>required-version</code> or uv defined as a dependency in <code>pyproject.toml</code> or <code>requirements.txt</code>.</p> <pre lang="yaml"><code>- name: Install uv based on the version defined in requirements.txt uses: astral-sh/setup-uv@v6 with: version-file: "requirements.txt" </code></pre> <h2>🚀 Enhancements</h2> <ul> <li>Add input version-file <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/486">#486</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known versions for 0.7.22 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/488">#488</a>)</li> <li>Bump dependencies <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/487">#487</a>)</li> <li>chore: update known versions for 0.7.21 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/483">#483</a>)</li> <li>chore: update known versions for 0.7.20 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/480">#480</a>)</li> <li>chore: update known versions for 0.7.19 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/475">#475</a>)</li> <li>chore: update known versions for 0.7.18 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/473">#473</a>)</li> <li>chore: update known versions for 0.7.17 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/468">#468</a>)</li> <li>chore: update known versions for 0.7.16 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/466">#466</a>)</li> <li>chore: update known versions for 0.7.15 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/463">#463</a>)</li> </ul> <h2>📚 Documentation</h2> <ul> <li>Add FAQ on changed cache and cache upload behavior <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/477">#477</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump dependencies <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/487">#487</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`7edac99f96`"><code>7edac99</code></a> Ignore deps starting with uv when finding uv version (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/492">#492</a>)</li> <li><a href="`05273c154d`"><code>05273c1</code></a> chore: update known versions for 0.7.22 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/488">#488</a>)</li> <li><a href="`de545d4421`"><code>de545d4</code></a> Bump dependencies (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/487">#487</a>)</li> <li><a href="`b75ff7d7b8`"><code>b75ff7d</code></a> Add input version-file (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/486">#486</a>)</li> <li><a href="`c893ac1cb2`"><code>c893ac1</code></a> chore: update known versions for 0.7.21 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/483">#483</a>)</li> <li><a href="`a905f0040b`"><code>a905f00</code></a> chore: update known versions for 0.7.20 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/480">#480</a>)</li> <li><a href="`d4219d1620`"><code>d4219d1</code></a> Add FAQ on changed cache and cache upload behavior (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/477">#477</a>)</li> <li><a href="`aaefb91b77`"><code>aaefb91</code></a> chore: update known versions for 0.7.19 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/475">#475</a>)</li> <li><a href="`c05b3e180b`"><code>c05b3e1</code></a> chore: update known versions for 0.7.18 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/473">#473</a>)</li> <li><a href="`1bf1493664`"><code>1bf1493</code></a> chore: update known versions for 0.7.17 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/468">#468</a>)</li> <li>Additional commits viewable in <a href="`bd01e18f51...7edac99f96`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.3.1&new-version=6.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-19 21:10:35 -05:00
ehhuang	0a6e588f68	feat: enable auth for LocalFS Files Provider (#2773 ) Some checks failed Integration Tests / discover-tests (push) Successful in 4s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Coverage Badge / unit-tests (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Test Llama Stack Build / build (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Python Package Build Test / build (3.13) (push) Failing after 2m19s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 2m25s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m32s Details Integration Tests / test-matrix (push) Failing after 2m24s Details Pre-commit / pre-commit (push) Successful in 3m57s Details # What does this PR do? Supports authentication for LocalFS Files provider. closes https://github.com/meta-llama/llama-stack/issues/2760 ## Test Plan CI. added tests.	2025-07-18 19:11:01 -07:00
Ashwin Bharambe	dd303327f3	feat(ci): add a ci-tests distro (#2826 )	2025-07-18 17:11:06 -07:00
Ashwin Bharambe	199f859eec	feat(vllm): periodically refresh models (#2823 ) Just like #2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM.	2025-07-18 15:53:09 -07:00
Ashwin Bharambe	ade075152e	chore: kill inline::vllm (#2824 ) Inline _inference_ providers haven't proved to be very useful -- they are rarely used. And for good reason -- it is almost never a good idea to include a complex (distributed) inference engine bundled into a distributed stateful front-end server serving many other things. Responsibility should be split properly. See Discord discussion: `1395849853`	2025-07-18 15:52:18 -07:00
Ashwin Bharambe	68a2dfbad7	feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`	2025-07-18 12:20:36 -07:00
ehhuang	6d55f2f137	feat: enable ls client for files tests (#2769 ) # What does this PR do? titled ## Test Plan CI	2025-07-18 12:10:30 -07:00
Nehanth Narendrula	874b1cb00f	fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Details Integration Tests / test-matrix (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Test Llama Stack Build / build (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 1m50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Pre-commit / pre-commit (push) Successful in 3m20s Details # What does this PR do? This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct Preference Optimization (DPO) parameters. The current schema incorrectly uses PPO-inspired parameters (`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of the DPO algorithm. This PR updates it to use the standard DPO parameters: - `beta`: The KL divergence coefficient that controls deviation from the reference model - `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo, kto_pair) These parameters align with standard DPO implementations like HuggingFace's TRL library. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>	2025-07-18 11:56:00 -07:00
Charlie Doern	d994305f0a	fix: remove disabled providers from model dump (#2784 ) # What does this PR do? currently when running `llama stack run --template starter...` the __disabled__ providers, their models, etc are printed alongside the enabled ones making the output really confusing in server.py add a utility `remove_disabled_providers` which post-processes the model_dump output to remove any dict with `provider_id: __disabled__` we also have `debug` logs printing the disabled providers, so I think its safe to say that is the only indicator we need when using starter. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan before (output truncated because it was huge): ``` ... model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-11B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-11B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-90B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-90B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Scout-17B-16E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Scout-17B-16E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Maverick-17B-128E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Maverick-17B-128E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Meta-Llama-Guard-3-8B model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Meta-Llama-Guard-3-8B - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-Guard-3-8B model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Meta-Llama-Guard-3-8B - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /Users/charliedoern/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '******' base_url: https://api.cerebras.ai provider_id: __disabled__ provider_type: remote::cerebras - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: api_token: '****' max_tokens: ${env.VLLM_MAX_TOKENS:=4096} tls_verify: ${env.VLLM_TLS_VERIFY:=true} url: ${env.VLLM_URL} provider_id: __disabled__ provider_type: remote::vllm - config: url: ${env.TGI_URL} provider_id: __disabled__ provider_type: remote::tgi - config: api_token: '****' huggingface_repo: ${env.INFERENCE_MODEL} provider_id: __disabled__ provider_type: remote::hf::serverless - config: api_token: '****' endpoint_name: ${env.INFERENCE_ENDPOINT_NAME} provider_id: __disabled__ provider_type: remote::hf::endpoint - config: api_key: '****' url: https://api.fireworks.ai/inference/v1 provider_id: __disabled__ provider_type: remote::fireworks - config: api_key: '****' url: https://api.together.xyz/v1 provider_id: __disabled__ provider_type: remote::together - config: {} provider_id: __disabled__ provider_type: remote::bedrock - config: api_token: '****' url: ${env.DATABRICKS_URL} provider_id: __disabled__ provider_type: remote::databricks - config: api_key: '****' append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True} url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com} provider_id: __disabled__ provider_type: remote::nvidia - config: api_token: '****' url: ${env.RUNPOD_URL:=} provider_id: __disabled__ provider_type: remote::runpod - config: api_key: '****' provider_id: __disabled__ provider_type: remote::openai - config: api_key: '****' provider_id: __disabled__ provider_type: remote::anthropic - config: api_key: '****' provider_id: __disabled__ provider_type: remote::gemini - config: api_key: '****' url: https://api.groq.com provider_id: __disabled__ provider_type: remote::groq - config: api_key: '****' openai_compat_api_base: https://api.fireworks.ai/inference/v1 provider_id: __disabled__ provider_type: remote::fireworks-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.llama.com/compat/v1/ provider_id: __disabled__ provider_type: remote::llama-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.together.xyz/v1 provider_id: __disabled__ provider_type: remote::together-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.groq.com/openai/v1 provider_id: __disabled__ provider_type: remote::groq-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.sambanova.ai/v1 provider_id: __disabled__ provider_type: remote::sambanova-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.cerebras.ai/v1 provider_id: __disabled__ provider_type: remote::cerebras-openai-compat - config: api_key: '****' url: https://api.sambanova.ai/v1 provider_id: __disabled__ provider_type: remote::sambanova - config: api_key: '****' url: ${env.PASSTHROUGH_URL} provider_id: __disabled__ provider_type: remote::passthrough - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: huggingface device: cpu distributed_backend: null provider_id: huggingface provider_type: inline::huggingface safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: otel_exporter_otlp_endpoint: null service_name: "\u200B" sinks: console,sqlite sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '****' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss - config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec_registry.db type: sqlite provider_id: __disabled__ provider_type: inline::sqlite-vec - config: db_path: ${env.MILVUS_DB_PATH:=~/.llama/distributions/starter}/milvus.db kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/milvus_registry.db type: sqlite provider_id: __disabled__ provider_type: inline::milvus - config: url: ${env.CHROMADB_URL:=} provider_id: __disabled__ provider_type: remote::chromadb - config: db: ${env.PGVECTOR_DB:=} host: ${env.PGVECTOR_HOST:=localhost} kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/pgvector_registry.db type: sqlite password: '****' port: ${env.PGVECTOR_PORT:=5432} user: ${env.PGVECTOR_USER:=} provider_id: __disabled__ provider_type: remote::pgvector scoring_fns: [] server: auth: null host: null port: 8321 quota: null tls_cafile: null tls_certfile: null tls_keyfile: null shields: - params: null provider_id: null provider_shield_id: ollama/__disabled__ shield_id: __disabled__ tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 ``` after: ``` INFO 2025-07-16 13:00:32,604 __main__:448 server: Run configuration: INFO 2025-07-16 13:00:32,606 __main__:450 server: apis: - agents - datasetio - eval - files - inference - post_training - safety - scoring - telemetry - tool_runtime - vector_io benchmarks: [] datasets: [] image_name: starter inference_store: db_path: /Users/charliedoern/.llama/distributions/starter/inference_store.db type: sqlite metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/registry.db type: sqlite models: - metadata: {} model_id: ollama/llama3.2:3b model_type: llm provider_id: ollama provider_model_id: llama3.2:3b - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: embedding provider_id: sentence-transformers providers: agents: - config: persistence_store: db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /Users/charliedoern/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: huggingface device: cpu provider_id: huggingface provider_type: inline::huggingface safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: "\u200B" sinks: console,sqlite sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-18 10:44:35 -07:00
slekkala1	15916852e8	chore: Add slekkala1 to codeowners (#2817 ) Getting started on LLAMA Stack	2025-07-18 10:33:30 -07:00
Ashwin Bharambe	9e3ae50306	feat(server): construct the stack in a persistent event loop (#2818 ) When we call `construct_stack()`, providers are instantiated and `initialize()` is called. This call can end up doing _anything_ at all -- specifically, providers are free to create long running background tasks as part of this. If we wrapped this within a `asyncio.run()` as in the current code, these tasks get canceled when the stack construction finishes. This is not correct. The PR addresses the issue by creating a persistent event loop which is used for both the stack as well as for running the uvicorn server. In other words, the lifetime of the providers (and downstream async code) is now the same as the lifetime of the uvicorn server. ## Test Plan This should not affect any current code since we don't have background tasks created right now. However, https://github.com/meta-llama/llama-stack/pull/2805 will start using this functionality.	2025-07-18 10:29:19 -07:00
Nathan Weinberg	2bb9039173	docs: fix steps in the Quick Start Guide (#2800 ) # What does this PR do? 'build' command didn't take into account ENABLE flags for starter distro for some reason, I was having issues with HuggingFace access for the embedding model, so added a tip for that as well Closes #2779 ## Test Plan I ran the described steps manually, but it would be nice if someone else could try it and verify this still works We might consider having some CI job ensure the QSG remains functional - it's not a great experience for new users if they try Llama Stack for the first time and it doesn't work as we describe Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-18 09:08:46 -07:00
Christian Zaccaria	e45543f7f3	test: Measure and track code coverage (#2636 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Added coverage badge to README. - [See my fork](https://github.com/ChristianZaccaria/llama-stack) - Added a GitHub Actions workflow that runs the tests and updates the coverage badge. - [See run](`4574811323`) - Documented steps in `testing.md` for running the tests locally, and viewing the `html` report. - Excluded non-essential files from coverage reporting to provide a more accurate measurement. Automatically created PR to update coverage badge: https://github.com/ChristianZaccaria/llama-stack/pull/9 # Note for reviewers 1. Currently the coverage report shows a 45% coverage. Wondering if there are other files or directories that should also be excluded from the report to increase the percentage. The directories with the least test coverage are `llama_stack/cli`, `llama_stack/models`, and `llama_stack/ui`. - Should we exclude these? 2. [Required] The `GITHUB_TOKEN` should have write permissions to open a PR to update the coverage badge. # GitHub Issue <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2355 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> The `testing.md` file describes how to run the unit tests locally.	2025-07-18 18:08:36 +02:00
Nathan Weinberg	1785a6b39c	docs: add virtualenv instructions for running starter distro (#2780 ) # What does this PR do? we had directions for a container and conda but not venv Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-18 09:07:43 -07:00
Charlie Doern	0eb0583cdf	fix: amend integration test workflow (#2812 ) # What does this PR do? trigger integration tests on ALL changes to `tests/` to catch failures before they merge into main Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-18 15:23:36 +02:00
Mustafa Elbehery	fe6af7dc8b	chore(test): migrate unit tests from unittest to pytest nvidia test f… (#2794 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Python Package Build Test / build (3.13) (push) Failing after 11s Details Test Llama Stack Build / build-single-provider (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Integration Tests / test-matrix (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 29s Details Python Package Build Test / build (3.12) (push) Failing after 1m46s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m44s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m51s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m53s Details Pre-commit / pre-commit (push) Successful in 3m17s Details This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 12:32:19 +02:00
Mustafa Elbehery	b78b8e1486	chore: add `mypy` inference parallel utils (#2670 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 12:01:10 +02:00
Mustafa Elbehery	ca7edcd6a4	chore(api): add `mypy` coverage to `chat_format` (#2654 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:56:53 +02:00
Mustafa Elbehery	75480b01b8	chore(test): migrate unit tests from unittest to pytest for system prompt (#2789 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:54:02 +02:00
Mustafa Elbehery	3cdf748a8e	chore(test): migrate unit tests from unittest to pytest for nvidia datastore (#2790 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:52:47 +02:00
Mustafa Elbehery	55713abe7d	chore(test): migrate unit tests from unittest to pytest nvidia test p… (#2792 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:49:45 +02:00
Charlie Doern	d7cc38e934	fix: remove async test markers (fix pre-commit) (#2808 ) # What does this PR do? some async test markers are in the codebase causing pre-commit to fail due to #2744 remove these pytest fixtures ## Test Plan pre-commit passes Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-17 21:35:28 -07:00
Ashwin Bharambe	d64e096c5f	fix(cli): image name should not default to CONDA_DEFAULT_ENV (#2806 ) Some checks failed Integration Tests / discover-tests (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 19s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Unit Tests / unit-tests (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 53s Details Pre-commit / pre-commit (push) Failing after 2m14s Details If I am running `uv run llama stack run --image-type venv` it should not be saying to me "Conda detected" because I am pretty clearly telling it I need venv. The root cause is the offending line.	2025-07-17 16:40:35 -07:00
Matthew Farrellee	910b017680	chore: block asyncio marks in tests (#2744 ) # What does this PR do? use pre-commit to block addition of new asyncio marks, since we configure pytest with async-mode=auto, see https://github.com/meta-llama/llama-stack/pull/2730	2025-07-17 16:33:30 -07:00
Mustafa Elbehery	bd8a3ae3cc	chore(test): migrate unit tests from unittest to pytest for prompt adapter (#2788 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-07-17 16:31:38 -07:00
ehhuang	3ae4aeb344	test: add some tests for Telemetry API (#2787 ) # What does this PR do? ## Test Plan ENABLE_OLLAMA=ollama LLAMA_STACK_CONFIG=starter uv run pytest tests/integration/telemetry --text-model="ollama/llama3.2:3b-instruct-fp16"	2025-07-17 16:20:51 -07:00
Mustafa Elbehery	73868ce9e3	chore(test): migrate unit tests from unittest to pytest for server en… (#2795 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-17 16:20:12 -07:00
Matthew Farrellee	477bcd4d09	feat: allow dynamic model registration for nvidia inference provider (#2726 ) # What does this PR do? let's users register models available at https://integrate.api.nvidia.com/v1/models that isn't already in llama_stack/providers/remote/inference/nvidia/models.py ## Test Plan 1. run the nvidia distro 2. register a model from https://integrate.api.nvidia.com/v1/models that isn't already know, as of this writing nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example 3. perform inference w/ the model	2025-07-17 12:11:30 -07:00
Matthew Farrellee	57745101be	chore: internal change, make Model.provider_model_id non-optional (#2690 ) Some checks failed Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s Details Python Package Build Test / build (3.12) (push) Failing after 25s Details Test Llama Stack Build / build-single-provider (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 30s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s Details Unit Tests / unit-tests (3.12) (push) Failing after 32s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 40s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 32s Details Unit Tests / unit-tests (3.13) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 47s Details Python Package Build Test / build (3.13) (push) Failing after 1m51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m58s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Integration Tests / test-matrix (push) Failing after 36s Details Test Llama Stack Build / build (push) Failing after 37s Details Pre-commit / pre-commit (push) Successful in 3m40s Details - POST /v1/models accepts optional provider_model_id - ModelsRoutingTable.register_model handler ensures it is non-None, providing a default usage of Model.provider_model_id will no longer need to detect None	2025-07-17 08:26:57 -07:00
Derek Higgins	c2b64dce5b	fix: Move sentence-transformers to the top (#2703 ) Move sentence-transformers to be the first embedding in the list of models. This ensures it will always be the default and is more consistent then having the default change based on what env variables are available Closes: #2702 ## Test Plan Manually verified Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-17 10:31:30 -04:00
ehhuang	51b179e1c5	chore: update k8s template (#2786 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 58s Details Unit Tests / unit-tests (3.13) (push) Failing after 54s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m40s Details # What does this PR do? - enables auth - updates to use distribution-starter docker ## Test Plan bash apply.sh	2025-07-16 15:07:26 -07:00
IAN MILLER	b57db11bed	feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s Details Unit Tests / unit-tests (3.13) (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models	2025-07-16 12:49:38 -04:00
Charlie Doern	6c516d391b	fix: de-clutter `llama stack run` logs (#2783 ) # What does this PR do? currently each disabled provider is printed as a warning, switch to debug. This level of verbosity isn't necessary, especially if we intend to grow the list of providers over time that can be in a single run yaml ## Test Plan before: <img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37 18 PM" src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71" /> after: <img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM" src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-16 09:44:26 -07:00
Nathan Weinberg	919ee3199b	docs: add missing bold title to match others (#2782 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 18:05:48 +02:00
Sergey Yedrikov	30be1fd8b7	fix: SQLiteVecIndex.create(..., bank_id="test_bank.123") - bank_id with a dot - leads to sqlite3.OperationalError (#2770 ) (#2771 ) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2770. It replaces characters in SQLite table names that are not alphanumeric or underscores with underscores and quotes the table names with square brackets in SQL statements. Closes #[2770] ## Test Plan I added a ".123" suffix to the bank_id on the following line ``` index = await SQLiteVecIndex.create(dimension=embedding_dimension, db_path=db_path, bank_id="test_bank.123") ``` in tests/unit/providers/vector_io/test_sqlite_vec.py, which, without the fix in place, demonstrates the issue.	2025-07-16 08:25:44 -07:00
Nathan Weinberg	72e606355d	fix: add shutdown function for localfs provider (#2781 ) # What does this PR do? this was causing an unnessessary logger warning ## Test Plan Run `LLAMA_STACK_DIR=. ENABLE_OLLAMA=ollama OLLAMA_INFERENCE_MODEL=llama3.2:3b llama stack build --template starter --image-type venv --run` and then `Crtl-C` to shutdown Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 08:24:57 -07:00
Nathan Weinberg	3165197b75	chore: remove 'gha_workflow_llama_stack_tests.yml' (#2767 ) This was introduced in https://github.com/meta-llama/llama-stack/pull/523 but as far as I can tell has never been used. It's been over six months so it feels fair to remove it at this point. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 07:12:26 -07:00
Matthew Farrellee	a3e249807b	chore: remove vision model URL workarounds and simplify client creation (#2775 ) The vision models are now available at the standard URL, so the workaround code has been removed. This also simplifies the codebase by eliminating the need for per-model client caching. - Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct models - Convert _get_client method to _client property for cleaner API - Remove unnecessary lru_cache decorator and functools import - Simplify client creation logic to use single base URL for all models	2025-07-16 07:10:04 -07:00
IAN MILLER	fa1bb9ae00	docs: fix typo and link self loop for index.html#running-tests (#2777 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes typo "here here" and self loop link at [https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md](https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2762 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-16 07:09:44 -07:00
Sébastien Han	ff9d4d8a9d	ci: do not pull model (#2776 ) the model is now available in the container image Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-16 04:58:05 -07:00
Sébastien Han	f85189022c	fix: re-hydrate requirement and fix package (#2774 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Integration Tests / discover-tests (push) Successful in 6s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 23s Details Update ReadTheDocs / update-readthedocs (push) Failing after 21s Details Python Package Build Test / build (3.13) (push) Failing after 26s Details Test Llama Stack Build / build (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 35s Details Pre-commit / pre-commit (push) Successful in 1m20s Details Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-16 05:46:15 -04:00
Ashwin Bharambe	95fdc8ea94	build: Bump version to 0.2.15	2025-07-15 20:29:08 -07:00
Kelly Brown	b096794959	docs: Reorganize documentation on the webpage (#2651 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Integration Tests / discover-tests (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 14s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 21s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 44s Details Test Llama Stack Build / build (push) Failing after 25s Details Integration Tests / test-matrix (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 2m24s Details # What does this PR do? Reorganizes the Llama stack webpage into more concise index pages, introduce more of a workflow, and reduce repetition of content. New nav structure so far based on #2637 Further discussions in https://github.com/meta-llama/llama-stack/discussions/2585 Preview: ![Screenshot 2025-07-09 at 2 31 53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af) You can also build a full local preview locally Feedback Looking for feedback on page titles and general feedback on the new structure Follow up documentation I plan on reducing some sections and standardizing some terminology in a follow up PR. More discussions on that in https://github.com/meta-llama/llama-stack/discussions/2585	2025-07-15 14:19:35 -07:00
Francisco Arceo	e1755d1ed2	chore: Adding OpenAI Vector Stores Files API compatibility for PGVector (#2755 ) # What does this PR do? Adding OpenAI Vector Stores Files API compatibility for PGVector <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Updated CI to include PGVector --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-15 15:46:49 -04:00
ehhuang	e64e4fc5a2	test: add tests against published client (#2752 ) # What does this PR do? closes #2751 ## Test Plan --------- Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>	2025-07-15 12:25:31 -07:00
Mark Campbell	65fcd03461	docs: update outdated llama stack client documentation (#2758 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Adds new documentation that was missing for the Llama Stack Python Client as well as updates old/outdated docs	2025-07-15 11:49:59 -07:00
Nathan Weinberg	b3d86ca926	fix: stop image_name from being cast to an integer (#2759 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Pre-commit / pre-commit (push) Successful in 2m1s Details # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2490 introduced a new function for type conversion of strings. However, a side effect of this is that it will cast any string that can be cast to an integer if possible, which for something like `image_name` is not desired as we only accept strings for this field in the `StackRunConfig` This PR introduces logic to ensure that `image_name` remains a string Closes #2749 ## Test Plan You can run the original step to reproduce from the bug to verify this manually ```bash OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run ``` I have also added an additional unit test to prevent any future regression here Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-15 09:44:21 -07:00
Francisco Arceo	31b088978a	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 ) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-15 11:24:41 -04:00
ehhuang	5400a2e2b1	chore: remove tests.yaml (#2754 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Integration Tests / discover-tests (push) Successful in 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 26s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details Integration Tests / test-matrix (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s Details Unit Tests / unit-tests (3.13) (push) Failing after 57s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m2s Details Pre-commit / pre-commit (push) Successful in 1m51s Details # What does this PR do? Don't think this is used anymore ## Test Plan	2025-07-14 22:02:37 -07:00
Varsha	4ae5656c2f	feat: Implement keyword search in milvus (#2231 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 8s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Integration Tests / test-matrix (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s Details Update ReadTheDocs / update-readthedocs (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 2m9s Details # What does this PR do? This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus containers locally. In order to verify the implementation, run: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` You can also test the changes using the below script: ``` #!/usr/bin/env python3 import asyncio import os import uuid from typing import List from llama_stack_client import ( Agent, AgentEventLogger, LlamaStackClient, RAGDocument ) class MilvusRAGDemo: def __init__(self, base_url: str = "http://localhost:8321/"): self.client = LlamaStackClient(base_url=base_url) self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}" self.model_id = None self.embedding_model_id = None self.embedding_dimension = None def setup_models(self): """Get available models and select appropriate ones for LLM and embeddings.""" models = self.client.models.list() # Select embedding model embedding_models = [m for m in models if m.model_type == "embedding"] if not embedding_models: raise ValueError("No embedding models found") self.embedding_model_id = embedding_models[0].identifier self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"] def register_vector_db(self): print(f"Registering Milvus vector database: {self.vector_db_id}") response = self.client.vector_dbs.register( vector_db_id=self.vector_db_id, embedding_model=self.embedding_model_id, embedding_dimension=self.embedding_dimension, provider_id="milvus-remote", # Use remote Milvus ) print(f"Vector database registered successfully") return response def insert_documents(self): """Insert sample documents into the vector database.""" print("\nInserting sample documents...") # Sample documents about different topics documents = [ RAGDocument( document_id="ai_ml_basics", content=""" Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world. AI refers to the simulation of human intelligence in machines, while ML is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Deep learning, a subset of ML, uses neural networks with multiple layers to process complex patterns in data. Key concepts in AI/ML include: - Supervised Learning: Training with labeled data - Unsupervised Learning: Finding patterns in unlabeled data - Reinforcement Learning: Learning through trial and error - Neural Networks: Computing systems inspired by biological brains """, mime_type="text/plain", metadata={"topic": "technology", "category": "ai_ml"}, ), ] # Insert documents with chunking self.client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=self.vector_db_id, chunk_size_in_tokens=200, # Smaller chunks for better granularity ) print(f"Inserted {len(documents)} documents with chunking") def test_keyword_search(self): """Test keyword-based search using BM25.""" queries = [ "neural networks", "Python frameworks", "data cleaning", ] for query in queries: response = self.client.vector_io.query( vector_db_id=self.vector_db_id, query=query, params={ "mode": "keyword", # Keyword search "max_chunks": 3, "score_threshold": 0.0, } ) for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)): print(f" {i+1}. Score: {score:.4f}") print(f" Content: {chunk.content[:100]}...") print(f" Metadata: {chunk.metadata}") def run_demo(self): try: self.setup_models() self.register_vector_db() self.insert_documents() self.test_keyword_search() except Exception as e: print(f"Error during demo: {e}") raise def main(): """Main function to run the demo.""" # Check if Llama Stack server is running demo = MilvusRAGDemo() try: demo.run_demo() except Exception as e: print(f"Demo failed: {e}") if __name__ == "__main__": main() ``` [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-07-14 19:39:55 -04:00
Francisco Arceo	33f0d83ad3	chore: Move vector store `kvstore` implementation into `openai_vector_store_mixin.py` (#2748 )	2025-07-14 18:10:35 -04:00
Hardik Shah	6b8a8c1be9	fix: Safety in starter (#2731 ) - fireworks, together do not support Llama-guard 3 8b model anymore - Need to default to ollama - current safety shields logic was not correct since the shield_id was the provider ( which had duplicates ) - Followed similar logic to models Note: Seems a bit over-engineered but this can now be extended to other providers and fits in the overall mechanism of how env_vars are used to manage starter. ### How to test ``` ENABLE_OLLAMA=ollama ENABLE_FIREWORKS=fireworks SAFETY_MODEL=llama-guard3:1b pytest -s -v tests/integration/ --stack-config starter -k 'not(supervised_fine_tune or builtin_tool_code or safety_with_image or code_interpreter_for or rag_and_code or truncation or register_and_unregister)' --text-model fireworks/meta-llama/Llama-3.3-70B-Instruct --vision-model fireworks/meta-llama/Llama-4-Scout-17B-16E-Instruct --safety-shield llama-guard3:1b --embedding-model all-MiniLM-L6-v2 ``` ### Related but not obvious in this PR In the llama-stack-ops repo, we run tests before publishing packages and docker containers. The actions in that repo were using the fireworks / together distros ( which are non-existent ) So need to update that to run with `starter` and use `ollama` specifically for safety.	2025-07-14 15:07:40 -07:00
Nathan Weinberg	6ad22c209f	chore: add issue template for technical debt (#2753 ) # What does this PR do? Adds a template for technical debt. Currently we don't support blank issues so everything filed has to a bug or a feature. This would allow maintainers as well as community members to track things we might want to merge to expose the functionality but should be addressed later. Such things can also be "good first issues" for new contributors. ## Example of what we constitute as technical debt Inelegant code solutions, tests we intend to temporarily disable but would like to restore, CI hacks around infrastructure or installation, etc. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-14 14:41:44 -07:00
ehhuang	aa0840c281	docs: fix building distro link (#2750 ) # What does this PR do? ## Test Plan Co-authored-by: raghotham <rsm@meta.com>	2025-07-14 12:06:56 -07:00
Matthew Farrellee	f731f369a2	feat: add infrastructure to allow inference model discovery (#2710 ) # What does this PR do? inference providers each have a static list of supported / known models. some also have access to a dynamic list of currently available models. this change gives prodivers using the ModelRegistryHelper the ability to combine their static and dynamic lists. for instance, OpenAIInferenceAdapter can implement ``` def query_available_models(self) -> list[str]: return [entry.model for entry in self.openai_client.models.list()] ``` to augment its static list w/ a current list from openai. ## Test Plan scripts/unit-test.sh	2025-07-14 11:38:53 -07:00
Derek Higgins	a7ed86181c	fix(faiss): Delete file contents from kvstore (#2686 ) Remove both the metadata and content from the kvstore when a file is being removed from the vector store. Closes: #2685 Also add faiss provider to openai_vector_stores test suite --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: raghotham <rsm@meta.com>	2025-07-14 13:58:23 -04:00
Sumanth Kamenani	77d2c8e95d	docs: clarify run.yaml files are starting points for customization (#2746 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Integration Tests / discover-tests (push) Successful in 13s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Update ReadTheDocs / update-readthedocs (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 31s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 29s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? This PR improves documentation clarity around run.yaml file usage. It adds comprehensive guidance to help users understand that generated run.yaml files are templates meant to be customized for production use, not used as-is. ## Changes - Add new documentation section on customizing run.yaml files - Clarify that generated run.yaml files are templates, not production configs - Add guidance on customization best practices and common scenarios - Update existing documentation to reference customization guide - Improve clarity around run.yaml file usage for better user experience ## Test Plan - Verified new documentation file exists at correct location - Confirmed documentation is properly integrated into the toctree structure - Checked all internal links use correct paths and reference existing files - Validated references are added to relevant existing documentation files - Documentation build testing will be handled by CI environment	2025-07-14 09:53:13 -07:00
Mark Campbell	618ccea090	feat: add input validation for search mode of rag query config (#2275 ) # What does this PR do? Adds input validation for mode in RagQueryConfig This will prevent users from inputting search modes other than `vector` and `keyword` for the time being with `hybrid` to follow when that functionality is implemented. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] ``` # Check out this PR and enter the LS directory uv sync --extra dev ``` Run the quickstart [example](https://llama-stack.readthedocs.io/en/latest/getting_started/#step-3-run-the-demo) Alter the Agent to include a query_config ``` agent = Agent( client, model=model_id, instructions="You are a helpful assistant", tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], "query_config": { "mode": "i-am-not-vector", # Test for non valid search mode "max_chunks": 6 } }, } ], ) ``` Ensure you get the following error: ``` 400: {'errors': [{'loc': ['mode'], 'msg': "Value error, mode must be either 'vector' or 'keyword' if supported by the vector_io provider", 'type': 'value_error'}]} ``` ## Running unit tests ``` uv sync --extra dev uv run pytest tests/unit/rag/test_rag_query.py -v ``` [//]: # (## Documentation)	2025-07-14 09:11:34 -04:00
Francisco Arceo	958fc92b1b	feat: Add Vector stores UI (#2737 ) Some checks failed Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Python Package Build Test / build (3.13) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 26s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 30s Details Test External Providers / test-external-providers (venv) (push) Failing after 24s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 31s Details Integration Tests / test-matrix (push) Failing after 56s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m1s Details Pre-commit / pre-commit (push) Successful in 1m42s Details Integration Tests / discover-tests (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details # What does this PR do? - Adds two pages to UI - Vector stores - Vector store detail view - Fixed darkmode navbar highlighting - Updated darkmode font color - Updated llama-stack-client package <img width="1916" height="734" alt="Screenshot 2025-07-12 at 11 34 35 PM" src="https://github.com/user-attachments/assets/3f9b6727-ee82-4e6b-9555-2e3ef36d24d2" /> <img width="1912" height="910" alt="Screenshot 2025-07-12 at 11 57 09 PM" src="https://github.com/user-attachments/assets/0c9d3b5e-5592-4dfb-8e04-a57edc9fb406" /> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-13 01:03:55 -07:00
Matthew Farrellee	68e7978c88	chore: block network access from unit tests (#2732 ) Some checks failed Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Integration Tests / test-matrix (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Pre-commit / pre-commit (push) Successful in 1m0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details Integration Tests / discover-tests (push) Successful in 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test External Providers / test-external-providers (venv) (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details # What does this PR do? this blocks network access for all `tests/unit/` tests. `tests/integration/` are untouched. it also introduces an `allow_network` marker to explicitly allow network access. ## Test Plan `./scripts/unit-tests.sh`	2025-07-12 16:53:54 -07:00
dependabot[bot]	8374d4cefd	chore(github-deps): bump medyagh/setup-minikube from 0.0.19 to 0.0.20 (#2738 )	2025-07-12 16:23:42 -04:00
Ben Browning	51d9fd4808	fix: Don't cache clients for passthrough auth providers (#2728 ) Some checks failed Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s Details Unit Tests / unit-tests (3.12) (push) Failing after 45s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s Details Integration Tests / discover-tests (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 2m8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s Details # What does this PR do? Some of our inference providers support passthrough authentication via `x-llamastack-provider-data` header values. This fixes the providers that support passthrough auth to not cache their clients to the backend providers (mostly OpenAI client instances) so that the client connecting to Llama Stack has to provide those auth values on each and every request. ## Test Plan I added some unit tests to ensure we're not caching clients across requests for all the fixed providers in this PR. ``` uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py ``` I also ran some of our OpenAI compatible API integration tests for each of the changed providers, just to ensure they still work. Note that these providers don't actually pass all these tests (for unrelated reasons due to quirks of the Groq and Together SaaS services), but enough of the tests passed to confirm the clients are still working as intended. ### Together ``` ENABLE_TOGETHER="together" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "together/meta-llama/Llama-3.1-8B-Instruct" ``` ### OpenAI ``` ENABLE_OPENAI="openai" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "openai/gpt-4o-mini" ``` ### Groq ``` ENABLE_GROQ="groq" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "groq/meta-llama/Llama-3.1-8B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-07-11 13:38:27 -07:00
Jorge Piedrahita Ortiz	aa2595c7c3	fix: sambanova shields and model validation (#2693 ) # What does this PR do? Update the shield register validation of Sambanova not to raise, but only warn when a model is not available in the base url endpoint used, also added warnings when model is not available in the base url endpoint used <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> run starter distro with Sambanova enabled	2025-07-11 16:29:15 -04:00
Matthew Farrellee	30b2e6a495	chore: default to pytest asyncio-mode=auto (#2730 ) # What does this PR do? previously, developers who ran `./scripts/unit-tests.sh` would get `asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and `@pytest_asyncio.fixture` were redundent. developers who ran `pytest` directly would get pytest's default (strict mode), would run into errors leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture` to their code. with this change - - `asyncio_mode=auto` is included in `pyproject.toml` making behavior consistent for all invocations of pytest - removes all redundant `@pytest_asyncio.fixture` and `@pytest.mark.asyncio` - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0` ## Test Plan - `./scripts/unit-tests.sh` - `uv run pytest tests/unit`	2025-07-11 13:00:24 -07:00
Sébastien Han	2ebc172f33	fix: pin opentelemtry version (#2722 ) Some checks failed Integration Tests / test-matrix (push) Failing after 12s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Python Package Build Test / build (3.13) (push) Failing after 44s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 54s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 56s Details Pre-commit / pre-commit (push) Successful in 2m9s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s Details Integration Tests / discover-tests (push) Successful in 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Test Llama Stack Build / build (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details # What does this PR do? Otherwise we can get old versions like 1.11 and experience this error: ``` ModuleNotFoundError: No module named 'opentelemetry.exporter.otlp.proto.http.metric_exporter' ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-11 16:25:51 +02:00
Sébastien Han	2e4eedce14	fix: container build on podman (#2723 ) # What does this PR do? COPY with chmod does not work, see https://github.com/containers/buildah/issues/4614. Also Docker arguably implements it. Anyway, this command is not even needed since later don't we do: ``` RUN mkdir -p /.llama /.cache && chmod -R g+rw /app /.llama /.cache ``` And providers.d will get the right modes. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Build with CONTAINER_BINARY=podman and success Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-11 16:25:33 +02:00
ehhuang	d880c2df0e	fix: auth sql store: user is owner policy (#2674 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Installer CI / lint (push) Failing after 4s Details Installer CI / smoke-test (push) Has been skipped Details Integration Tests / discover-tests (push) Successful in 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Integration Tests / test-matrix (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Details Pre-commit / pre-commit (push) Successful in 1m8s Details # What does this PR do? The current authorized sql store implementation does not respect user.principal (only checks attributes). This PR addresses that. ## Test Plan Added test cases to integration tests.	2025-07-10 14:40:32 -07:00
ehhuang	4cf1952c32	chore: update vllm k8s command to support tool calling (#2717 ) # What does this PR do? ## Test Plan	2025-07-10 14:40:17 -07:00
Nathan Weinberg	5fe3027cbf	chore: remove "rfc" directory and move original rfc to "docs" (#2718 ) # What does this PR do? the "rfc" directory has only a single document in it, and its the original RFC for creating Llama Stack simply the project directory structure by moving this into the "docs" directory and renaming it to "original_rfc" to preserve the context of the doc ## Why did you do this? A simplified top-level directory structure helps keep the project simpler and prevents misleading new contributors into thinking we use it (we really don't) --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com> Co-authored-by: raghotham <raghotham@gmail.com>	2025-07-10 14:06:10 -07:00
Nathan Weinberg	9f04bc6d1a	chore: move "install.sh" script into "scripts" dir (#2719 ) # What does this PR do? "install.sh" is something that a general user might not use e.g. it is specific to using the "ollama" inference provider cleanup the top-level structure of the repo by moving it into the "scripts" dir and updating the relevant references accordingly Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-10 13:14:10 -07:00
Nathan Weinberg	0bbff91c7e	docs: fix a few broken things in the CONTRIBUTING.md (#2714 ) # What does this PR do? "dev" dependencies were moved in pyproject.toml typo with guidance around automatic doc generation Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-10 11:47:54 -07:00
Francisco Arceo	6a6b66ae4f	chore: Adding unit tests for OpenAI vector stores and migrating SQLite-vec registry to kvstore (#2665 ) # What does this PR do? This PR refactors and the VectorIO backend logic for `sqlite-vec` and adds unit tests and fixtures to make it easy to test both `sqlite-vec` and `milvus`. Key changes: - `sqlite-vec` migrated to `kvstore` registry - added in-memory cache for sqlite-vec to be consistent with `milvus` - default fixtures moved to `conftest.py` - removed redundant tests from sqlite`-vec` - made `test_vector_io_openai_vector_stores.py` more easily extensible ## Test Plan Unit tests added testing inline providers. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-10 14:22:13 -04:00
Nathan Weinberg	b18f4d1ccf	ci: add config for pre-commit.ci (#2712 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 5s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Test Llama Stack Build / build (push) Failing after 5s Details Integration Tests / test-matrix (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s Details Test External Providers / test-external-providers (venv) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m51s Details # What does this PR do? the project already had some config setup for https://pre-commit.ci/ this commit adds additional explicit fields Closes #2711 IMPORTANT: A project maintainer must add `pre-commit.ci` to this repo for this to work - this can be done via https://pre-commit.ci/ Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-10 17:24:10 +02:00
Mustafa Elbehery	83c6b20067	chore(api): add `mypy` coverage to `cli/stack` (#2650 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-10 16:53:38 +02:00
Nathan Weinberg	bbe0199bb7	chore: update pre-commit hook versions (#2708 ) While investigating the `uv.lock` changes made in https://github.com/meta-llama/llama-stack/pull/2695 I noticed several of the pre-commit hook versions were out of date This PR updates them and fixes some new `ruff` errors --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-10 16:47:59 +02:00
Charlie Doern	81ebaf6e9a	fix: properly represent paths in server logs (#2698 ) # What does this PR do? currently when logging the run yaml, if there are path objects in the object they are represented as: ``` external_providers_dir: !!python/object/apply:pathlib.PosixPath - '~' - .llama - providers.d ``` now, with a config.model_dump(mode="json"), it works properly ``` external_providers_dir: ~/.llama/providers.d ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-10 10:19:12 -04:00
Sébastien Han	01c222e12f	ci: run all APIs integration tests (#2646 ) # What does this PR do? We are now automatically building the list of integration test to run. In that process, eval and files and being tested now. This is pending https://github.com/meta-llama/llama-stack/pull/2628 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-10 15:16:08 +02:00
ehhuang	81109a0f72	test: terminate server process when finished (#2700 ) Some checks failed Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 1m31s Details # What does this PR do? Terminate server process for real. ## Test Plan ```ENABLE_OPENAI=openai LLAMA_STACK_CONFIG=server:starter pytest -v tests/integration/agents/test_openai_responses.py --text-model "gpt-4o-mini" -vv -s -k 'test_list_response_input_items[' && lsof -ti:8321``` observe no process printed anymore	2025-07-09 20:59:37 -07:00
ehhuang	780b4c6eea	fix: llama stack run starter in conda (#2679 ) # What does this PR do? `llama stack run starter` in conda environment fails with ' --config is required for venv and conda environments' because it is passed as --template and start_stack.sh doesn't process template. ## Test Plan `llama stack run starter`	2025-07-09 20:33:45 -07:00
Nathan Weinberg	7915551eee	build: replace "python-jose" with "python-jose[cryptography]" (#2695 ) Some checks failed Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s Details Test Llama Stack Build / generate-matrix (push) Successful in 42s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s Details Test Llama Stack Build / build-single-provider (push) Failing after 43s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 54s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Pre-commit / pre-commit (push) Successful in 1m43s Details # What does this PR do? `python-jose` recommends using the `cryptography` backend in their installation docs: https://github.com/mpdavis/python-jose?tab=readme-ov-file#cryptographic-backends This PR modifies the LLS dependencies to use this instead of the current `native-python` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-09 13:21:57 -07:00
Matthew Farrellee	1d8c00635c	chore: Update CODEOWNERS (#2692 ) add @mattf	2025-07-09 08:19:31 -07:00
Sébastien Han	9b7eecebcf	ci: test safety with starter (#2628 ) Some checks failed Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s Details Python Package Build Test / build (3.12) (push) Failing after 31s Details Python Package Build Test / build (3.13) (push) Failing after 29s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? We are now testing the safety capability with the starter image. This includes a few changes: * Enable the safety integration test * Relax the shield model requirements from llama-guard to make it work with llama-guard3:8b coming from Ollama * Expose a shield for each inference provider in the starter distro. The shield will only be registered if the provider is enabled. Closes: https://github.com/meta-llama/llama-stack/issues/2528 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-09 16:53:50 +02:00
Mustafa Elbehery	de01eefdef	chore: add `mypy` post training (#2675 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 15:44:39 +02:00
Jorge	dafd9ed5c0	docs: Update links to Android Demo App (#2687 ) # What does this PR do? Updates some broken or outdated links pointing to the Android Demo App Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>	2025-07-09 15:41:57 +02:00
Mustafa Elbehery	cd0ad21111	chore(api): add `mypy` coverage to `apis` (#2648 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack/apis` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 12:55:16 +02:00
Sébastien Han	297cd8e0db	fix: runpod transition to python 3.12 (#2682 ) # What does this PR do? I'm not sure how this was missed in the pyupgrade PR. This code seems broken... Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-09 12:27:42 +02:00
Mustafa Elbehery	7f3661e7d8	chore: add `mypy` loader (#2672 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 10:26:33 +02:00
Mustafa Elbehery	a5c3362bcd	chore(api): add `mypy` coverage to `meta_reference_config` (#2664 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 10:24:30 +02:00
Mustafa Elbehery	28343fea51	chore(api): add `mypy` coverage to `meta_reference_safety` (#2661 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 10:22:34 +02:00
pgustafs	d39660afed	fix(remote:milvus): add missing files_api parameter and kvstore configuration (#2630 ) - Fix constructor call missing files_api parameter - Add kvstore field to MilvusVectorIOConfig - Resolves #2626 # What does this PR do? [https://github.com/meta-llama/llama-stack/issues/2626] ## Problem The `MilvusVectorIOAdapter` fails to initialize due to two missing configuration issues: 1. Missing `files_api` parameter in the constructor call 2. Missing `kvstore` field in the `MilvusVectorIOConfig` class ## Root Cause 1. The adapter constructor expects 3 parameters `(config, inference_api, files_api)` but the `get_adapter_impl` function only passes 2 parameters 2. The `MilvusVectorIOConfig` class lacks the `kvstore` field that the adapter's `initialize()` method expects for metadata persistence ## Solution - Added `files_api = deps.get(Api.files, None)` to safely retrieve files API from dependencies - Pass the files_api parameter to MilvusVectorIOAdapter constructor - Added `kvstore: KVStoreConfig \| None = None` field to MilvusVectorIOConfig - Maintains backward compatibility since both files_api and kvstore can be None Closes #2626 ## Test Plan - [x] Tested with Milvus configuration - server starts successfully ```yaml vector_io: - provider_id: milvus provider_type: remote::milvus config: uri: http://localhost:19530 token: root:Milvus kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/remote-vllm}/milvus_store.db ``` - [x] Vector operations work as expected ```python from llama_stack_client import LlamaStackClient from llama_stack_client.types.shared_params.document import Document as RAGDocument from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger as AgentEventLogger import os endpoint = os.getenv("LLAMA_STACK_ENDPOINT") model = os.getenv("INFERENCE_MODEL") # Initialize the client client = LlamaStackClient(base_url=endpoint) vector_db_id = "my_documents" response = client.vector_dbs.register( vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, provider_id="milvus", ) urls = ["getting_started/Red_Hat_AI_Inference_Server-3.0-Getting_started-en-US.pdf", "vllm_server_arguments/Red_Hat_AI_Inference_Server-3.0-vLLM_server_arguments-en-US.pdf"] documents = [ RAGDocument( document_id=f"num-{i}", content=f"https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.0/pdf/{url}", mime_type="application/pdf", metadata={}, ) for i, url in enumerate(urls) ] client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) rag_agent = Agent( client, model=model, # Define instructions for the agent (system prompt) instructions="You are a helpful assistant", enable_session_persistence=False, # Define tools available to the agent tools=[ { "name": "builtin::rag/knowledge_search", "args": { "vector_db_ids": [vector_db_id], }, } ], ) session_id = rag_agent.create_session("test-session") user_prompts = [ "How to start the AI Inference Server container image? use the knowledge_search tool to get information.", ] for prompt in user_prompts: print(f"User> {prompt}") response = rag_agent.create_turn( messages=[{"role": "user", "content": prompt}], session_id=session_id, ) for log in AgentEventLogger().log(response): log.print() ``` server logs: ``` INFO 2025-07-04 22:18:30,385 __main__:577 server: Listening on ['::', '0.0.0.0']:5000 INFO: Started server process [769725] INFO: Waiting for application startup. INFO 2025-07-04 22:18:30,390 __main__:158 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit) INFO 2025-07-04 22:18:52,193 llama_stack.distribution.routing_tables.common:200 core: Setting owner for vector_db 'my_documents' to 20:18:52.194 [START] /v1/vector-dbs INFO: 192.168.1.249:64170 - "POST /v1/vector-dbs HTTP/1.1" 200 OK 20:18:52.216 [END] /v1/vector-dbs [StatusCode.OK] (21.89ms) 20:18:52.222 [START] /v1/tool-runtime/rag-tool/insert INFO 2025-07-04 22:18:56,265 llama_stack.providers.utils.inference.embedding_mixin:102 uncategorized: Loading sentence transformer for all-MiniLM-L6-v2... WARNING 2025-07-04 22:18:59,214 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed INFO 2025-07-04 22:18:59,339 sentence_transformers.SentenceTransformer:219 uncategorized: Use pytorch device_name: cuda:0 INFO 2025-07-04 22:18:59,340 sentence_transformers.SentenceTransformer:227 uncategorized: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO: 192.168.1.249:64170 - "POST /v1/tool-runtime/rag-tool/insert HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "POST /v1/agents HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "GET /v1/tools?toolgroup_id=builtin%3A%3Arag%2Fknowledge_search HTTP/1.1" 200 OK INFO: 192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session HTTP/1.1" 200 OK 20:19:01.834 [END] /v1/tool-runtime/rag-tool/insert [StatusCode.OK] (9612.06ms) 20:19:01.839 [START] /v1/agents INFO: 192.168.1.249:64170 - "POST /v1/agents/b1f6f063-1691-4780-8d9e-facd81708b91/session/d2706302-bb54-421d-a890-5e25df9cb47f/turn HTTP/1.1" 200 OK 20:19:01.839 [END] /v1/agents [StatusCode.OK] (0.18ms) 20:19:01.844 [START] /v1/tools INFO 2025-07-04 22:19:01,853 llama_stack.providers.remote.inference.vllm.vllm:330 uncategorized: Initializing vLLM client with base_url=http://192.168.1.183:8080/v1 20:19:01.858 [END] /v1/tools [StatusCode.OK] (14.92ms) 20:19:01.868 [START] /v1/agents/{agent_id}/session 20:19:01.868 [END] /v1/agents/{agent_id}/session [StatusCode.OK] (0.37ms) 20:19:01.873 [START] /v1/agents/{agent_id}/session/{session_id}/turn 20:19:01.885 [START] inference 20:19:05.506 [END] inference [StatusCode.OK] (3621.19ms) INFO 2025-07-04 22:19:05,537 llama_stack.providers.inline.agents.meta_reference.agent_instance:890 agents: executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'} 20:19:05.538 [START] tool_execution 20:19:05.928 [END] tool_execution [StatusCode.OK] (390.08ms) 20:19:05.538 [INFO] executing tool call: knowledge_search with args: {'query': 'How to start the AI Inference Server container image'} 20:19:05.935 [START] inference 20:19:17.539 [END] inference [StatusCode.OK] (11603.76ms) 20:19:17.560 [END] /v1/agents/{agent_id}/session/{session_id}/turn [StatusCode.OK] (15686.62ms) ``` - [x] No regressions in functionality - [x] Configuration properly accepts kvstore settings --------- Co-authored-by: Peter Gustafsson <peter.gustafsson6@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-07-09 10:08:14 +02:00
Mustafa Elbehery	2d3d9664a7	chore(api): add `mypy` coverage to `prompts` (#2657 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-09 10:07:00 +02:00
ehhuang	84fa83b788	fix: update k8s templates (#2645 ) Some checks failed Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 17s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 41s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s Details Python Package Build Test / build (3.13) (push) Failing after 33s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m23s Details # What does this PR do? - fix env variables - use gpu for vllm - add eks/apply.py for aws - add template to set hf secret ## Test Plan bash apply.sh Co-authored-by: Eric Huang <erichuang@fb.com>	2025-07-08 15:57:01 -07:00
ehhuang	daf660c4ea	feat(auth,ui): support github sign-in in the UI (#2545 ) # What does this PR do? Uses NextAuth to add github sign in support. ## Test Plan Start server with auth configured as in https://github.com/meta-llama/llama-stack/pull/2509 https://github.com/user-attachments/assets/61ff7442-f601-4b39-8686-5d0afb3b45ac	2025-07-08 11:02:57 -07:00
ehhuang	c8bac888af	feat(auth): support github tokens (#2509 ) # What does this PR do? This PR adds GitHub OAuth authentication support to Llama Stack, allowing users to authenticate using their GitHub credentials (#2508) . 1. support verifying github acesss tokens 2. support provider-specific auth error messages 3. opportunistic reorganized the auth configs for better ergonomics ## Test Plan Added unit tests. Also tested e2e manually: ``` server: port: 8321 auth: provider_config: type: github_token ``` ``` ~/projects/llama-stack/llama_stack/ui ❯ curl -v http://localhost:8321/v1/models * Host localhost:8321 was resolved. * IPv6: ::1 * IPv4: 127.0.0.1 * Trying [::1]:8321... * Connected to localhost (::1) port 8321 > GET /v1/models HTTP/1.1 > Host: localhost:8321 > User-Agent: curl/8.7.1 > Accept: / > * Request completely sent off < HTTP/1.1 401 Unauthorized < date: Fri, 27 Jun 2025 21:51:25 GMT < server: uvicorn < content-type: application/json < x-trace-id: 5390c6c0654086c55d87c86d7cbf2f6a < Transfer-Encoding: chunked < * Connection #0 to host localhost left intact {"error": {"message": "Authentication required. Please provide a valid GitHub access token (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) in the Authorization header (Bearer <token>)"}} ~/projects/llama-stack/llama_stack/ui ❯ ./scripts/unit-tests.sh ~/projects/llama-stack/llama_stack/ui ❯ curl "http://localhost:8321/v1/models" \ -H "Authorization: Bearer <token_obtained_from_github>" \ {"data":[{"identifier":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_resource_id":"accounts/fireworks/models/llama-guard-3-11b-vision","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-guard-3-8b","provider_resource_id":"accounts/fireworks/models/llama-guard-3-8b","provider_id":"fireworks","type":"model","metadata":{},"model_type":"llm"},{"identifier":"accounts/fireworks/models/llama-v3p1-405b-instruct","provider_resource_id":"accounts/f ``` --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-08 11:02:36 -07:00
Francisco Arceo	83c89265e0	chore: Adding unit tests for Milvus and OpenAI compatibility (#2640 ) Some checks failed Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 4s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 36s Details Test Llama Stack Build / build-single-provider (push) Failing after 36s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 45s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m35s Details # What does this PR do? - Enabling Unit tests for Milvus to start to test OpenAI compatibility and fixing a few bugs. - Also fixed an inconsistency in the Milvus config between remote and inline. - Added pymilvus to extras for testing in CI I'm going to refactor this later to include the other inline providers so that we can catch issues sooner. I have another PR where I've been testing to find other bugs in the implementation (and required changes drafted here: https://github.com/meta-llama/llama-stack/pull/2617). ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-08 00:50:16 -07:00
Charlie Doern	27b3cd570f	fix: use `--template` flag for server (#2643 ) # What does this PR do? currently when a template is used, we still use `--config`. `server.py` has a dedicated `--template` flag and logic, use that instead Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-08 00:48:50 -07:00
ehhuang	e9926564bd	fix: authorized sql store with postgres (#2641 ) Some checks failed Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 14s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Test Llama Stack Build / build-single-provider (push) Failing after 44s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s Details Pre-commit / pre-commit (push) Successful in 1m34s Details # What does this PR do? postgres has different json extract syntax from sqlite ## Test Plan added integration test	2025-07-07 19:36:34 -07:00
Ben Browning	5bb3817c49	fix: Restore the nvidia distro (#2639 ) # What does this PR do? The `nvidia` distro was previously collapsed into the `starter` distro. However, the `nvidia` distro was setup specifically to use NVIDIA NeMo microservices as providers for all APIs and not just inference, which means it was doing quite a bit more than what the `starter` distro covers today. We should work with our friends at NVIDIA to determine the best place to maintain this distro long-term, but for now this restores the `nvidia` distro and its docs back to where they were so that things continue to work for their users. ## Test Plan I ensure the `nvidia` distro could build, and run at least to the point of complaining that I didn't provide the necessary API keys. ``` uv run llama stack build --template nvidia --image-type venv uv run llama stack run llama_stack/templates/nvidia/run.yaml ``` I also made sure the docs website built and looks reasonable, with the `nvidia` distro docs at the same URL it was previously (because it has incoming links from official NVIDIA NeMo docs, among other places). ``` uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-07-07 15:50:05 -07:00
Charlie Doern	d0ec5c3d3a	fix: print proper template path upon build (#2642 ) # What does this PR do? Rather than pointing to a dir in `llama_stack/templates` (the repo directory) we should point to `$BUILD_DIR/IMAGE_NAME-run.yaml` (`~/.llama/distributions/IMAGE_NAME/IMAGE_NAME-run.yaml`) currently we are printing: ``` You can find the newly-built template here: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/templates/starter/run.yaml You can run the new Llama Stack distro via: llama stack run /Users/charliedoern/projects/Documents/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv ``` but should be printing things like: ``` You can find the newly-built template here: /Users/charliedoern/.llama/distributions/starter/starter-run.yaml You can run the new Llama Stack distro via: llama stack run /Users/charliedoern/.llama/distributions/starter/starter-run.yaml --image-type venv ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-07 15:39:39 -07:00
Sébastien Han	5561f1c36d	ci: error when a pipefails (#2635 ) Some checks failed Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 30s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 26s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 24s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 22s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m5s Details Pre-commit / pre-commit (push) Successful in 1m53s Details # What does this PR do? The CI was failing but the error was eaten by the pipe. Now we run the task with pipefail. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-07 16:47:30 +02:00
Wen Zhou	4bca4af3e4	refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627 ) Some checks failed Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 22s Details Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 16s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 24s Details Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 20s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 20s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 34s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 33s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 30s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Python Package Build Test / build (3.13) (push) Failing after 39s Details Update ReadTheDocs / update-readthedocs (push) Failing after 41s Details Unit Tests / unit-tests (3.12) (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 1m30s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - we are using `all-minilm:l6-v2` but the model we download from ollama is `all-minilm:latest` latest: https://ollama.com/library/all-minilm:latest 1b226e2802db l6-v2: https://ollama.com/library/all-minilm:l6-v2 pin 1b226e2802db - even currently they are exactly the same model but if [all-minilm:l12-v2](https://ollama.com/library/all-minilm:l12-v2) is updated, "latest" might not be the same for l6-v2. - the only change in this PR is pin the model id in ollama - also update detailed_tutorial with "starter" to replace deprecated "ollama". <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` >INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" >llama stack build --run --template ollama --image-type venv ... Build Successful! You can find the newly-built template here: /home/wenzhou/zdtsw-forking/lls/llama-stack/llama_stack/templates/ollama/run.yaml .... - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: ollama provider_model_id: all-minilm:l6-v2 ... ``` test ``` >llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon" INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK" OpenAIChatCompletion( id='chatcmpl-04f99071-3da2-44ba-a19f-03b5b7fc70b7', choices=[ OpenAIChatCompletionChoice( finish_reason='stop', index=0, message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam( role='assistant', content="Here is a 2-sentence poem about the moon:\n\nSilver crescent in the midnight sky,\nLuna's gentle face, a beauty to the eye.", name=None, tool_calls=None, refusal=None, annotations=None, audio=None, function_call=None ), logprobs=None ) ], created=1751644429, model='llama3.2:3b-instruct-fp16', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage={'completion_tokens': 33, 'prompt_tokens': 36, 'total_tokens': 69, 'completion_tokens_details': None, 'prompt_tokens_details': None} ) ``` --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>	2025-07-06 09:07:37 +05:30
dependabot[bot]	2faec38724	chore(deps): bump next from 15.3.2 to 15.3.3 in /llama_stack/ui (#2632 ) Some checks failed Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 26s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 23s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 25s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 22s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 41s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Python Package Build Test / build (3.13) (push) Failing after 31s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Pre-commit / pre-commit (push) Successful in 1m23s Details Bumps [next](https://github.com/vercel/next.js) from 15.3.2 to 15.3.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.3.3</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Reinstate <code>vary</code> (<a href="https://redirect.github.com/vercel/next.js/issues/79939">#79939</a>)</li> <li>fix(next-swc): Fix interestingness detection for React Compiler (<a href="https://redirect.github.com/vercel/next.js/issues/79558">#79558</a>)</li> <li>fix(next-swc): Fix react compiler usefulness detector (<a href="https://redirect.github.com/vercel/next.js/issues/79480">#79480</a>)</li> <li>fix(dev-overlay): Better handle edge-case file paths in launchEditor (<a href="https://redirect.github.com/vercel/next.js/issues/79526">#79526</a>)</li> <li>Client router should discard stale prefetch entries for static pages (<a href="https://redirect.github.com/vercel/next.js/issues/79362">#79362</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/gaojude"><code>@gaojude</code></a>, <a href="https://github.com/kdy1"><code>@kdy1</code></a>, <a href="https://github.com/bgw"><code>@bgw</code></a>, and <a href="https://github.com/unstubbable"><code>@unstubbable</code></a> for helping!</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`3ab8db7383`"><code>3ab8db7</code></a> v15.3.3</li> <li><a href="`18c8113ebd`"><code>18c8113</code></a> [backport] Reinstate <code>vary</code> (<a href="https://redirect.github.com/vercel/next.js/issues/79939">#79939</a>)</li> <li><a href="`e18212f546`"><code>e18212f</code></a> re-enable vary header deploy test (<a href="https://redirect.github.com/vercel/next.js/issues/79753">#79753</a>)</li> <li><a href="`ec202eccf0`"><code>ec202ec</code></a> Revert "[next-server] skip setting vary header for basic routes" (<a href="https://redirect.github.com/vercel/next.js/issues/79426">#79426</a>)</li> <li><a href="`e2f264fdce`"><code>e2f264f</code></a> fix(next-swc): Fix interestingness detection for React Compiler (15.3) (<a href="https://redirect.github.com/vercel/next.js/issues/79558">#79558</a>)</li> <li><a href="`562fac78da`"><code>562fac7</code></a> fix(next-swc): Fix react compiler usefulness detector (15.3) (<a href="https://redirect.github.com/vercel/next.js/issues/79480">#79480</a>)</li> <li><a href="`06097fd7bb`"><code>06097fd</code></a> fix(dev-overlay): Better handle edge-case file paths in launchEditor (<a href="https://redirect.github.com/vercel/next.js/issues/79526">#79526</a>)</li> <li><a href="`bda731fa96`"><code>bda731f</code></a> Client router should discard stale prefetch entries for static pages (<a href="https://redirect.github.com/vercel/next.js/issues/79362">#79362</a>)</li> <li>See full diff in <a href="https://github.com/vercel/next.js/compare/v15.3.2...v15.3.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.3.2&new-version=15.3.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meta-llama/llama-stack/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-05 00:13:33 -04:00
Wen Zhou	c025cab3a3	docs: update docs to use "starter" than "ollama" (#2629 )	2025-07-05 08:44:57 +05:30
Francisco Arceo	dc7df60d42	docs: Update starter docs to include milvus inline (#2631 )	2025-07-05 08:43:39 +05:30
Sébastien Han	ea966565f6	feat: improve telemetry (#2590 ) Some checks failed Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 5s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 4s Details Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 19s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 16s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 0s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 58s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m0s Details Python Package Build Test / build (3.12) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m40s Details # What does this PR do? * Use a single env variable to setup OTEL endpoint * Update telemetry provider doc * Update general telemetry doc with the metric with generate * Left a script to setup telemetry for testing Closes: https://github.com/meta-llama/llama-stack/issues/783 Note to reviewer: the `setup_telemetry.sh` script was useful for me, it was nicely generated by AI, if we don't want it in the repo, and I can delete it, and I would understand. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 17:29:09 +02:00
Derek Higgins	4eae0cbfa4	fix(starter): Add missing faiss provider to build.yaml vector_io section (#2625 ) The starter template build.yaml was missing the inline::faiss provider in the vector_io section, while it was properly configured in run.yaml and starter.py's vector_io_providers list. Fixes: #2624 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-04 17:28:57 +02:00
Sébastien Han	df6ce8befa	fix: only load mcp when enabled in tool_group (#2621 ) # What does this PR do? The agent code is currently importing MCP modules even when MCP isn’t enabled. Do we consider this worth fixing, or are we treating MCP as a first-class dependency? I believe we should treat it as such. If everyone agrees, let’s go ahead and close this. Note: The current setup breaks if someone builds a distro without including MCP in tool_group but still serves the agent API. Also, we should bump the MCP version to support streamable responses, as SSE is being deprecated. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 20:27:05 +05:30
Sébastien Han	c4349f532b	feat: consolidate most distros into "starter" (#2516 ) # What does this PR do? * Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: https://github.com/meta-llama/llama-stack/issues/2502. ~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama to work properly in the CI.~ TODO: - [ ] We can only update `install.sh` when we get a new release. - [x] Update providers documentation - [ ] Update notebooks to reference starter instead of ollama Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-04 15:58:03 +02:00
Derek Higgins	f77d4d91f5	fix: handle encoding errors when adding files to vector store (#2574 ) Some checks failed Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s Details Pre-commit / pre-commit (push) Successful in 1m35s Details - Add try-catch block around data.decode() to handle UnicodeDecodeError - Implement UTF-8 fallback when detected encoding fails - Return empty string when both encodings fail - add unit tests Fixes #2572: UnicodeDecodeError when uploading files with problematic encodings Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-04 12:10:18 +02:00
Ashwin Bharambe	f1c62e0af0	build: Bump version to 0.2.14	2025-07-04 12:12:12 +05:30
Matthew Farrellee	ef26259209	feat: add llama guard 4 model (#2579 ) add support for Llama Guard 4 model to the llama_guard safety provider test with - 0. NVIDIA_API_KEY=... llama stack build --image-type conda --image-name env-nvidia --providers inference=remote::nvidia,safety=inline::llama-guard --run 1. llama-stack-client models register meta-llama/Llama-Guard-4-12B --provider-model-id meta/llama-guard-4-12b 2. pytest tests/integration/safety/test_llama_guard.py Co-authored-by: raghotham <rsm@meta.com>	2025-07-03 22:29:04 -07:00
Derek Higgins	0422b4fc63	fix: CI flakiness in vector IO tests by pinning pymilvus>=2.4.10 (#2610 ) Some checks failed Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 1m15s Details Python Package Build Test / build (3.12) (push) Failing after 1m12s Details Python Package Build Test / build (3.13) (push) Failing after 1m10s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m27s Details Unit Tests / unit-tests (3.12) (push) Failing after 35s Details Unit Tests / unit-tests (3.13) (push) Failing after 34s Details Pre-commit / pre-commit (push) Successful in 2m47s Details This occurred when marshmallow 4.0.0 was installed (which removed __version_info__) By pinning pymilvus to >=2.4.10, we ensure marshmallow doesn't get installed. Also set the dependency in InlineProviderSpec as this is the one that takes effect when using the "inline::milvus" provider. Fixes https://github.com/meta-llama/llama-stack/issues/2588 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-04 10:27:23 +05:30
Francisco Arceo	ea80ea63ac	chore: Updating chunk id generation to ensure uniqueness (#2618 ) # What does this PR do? This handles an edge case for `generate_chunk_id` if the concatenation of the `document_id` and `chunk_text` combination are not unique. Adding the window location ensures uniqueness. ## Test Plan Added unit test Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-04 10:26:35 +05:30
Francisco Arceo	4afd619c56	chore: Add support for vector-stores files api for Milvus (#2582 ) Some checks failed Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 7s Details Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s Details Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 24s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 18s Details Test Llama Stack Build / generate-matrix (push) Successful in 20s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 28s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Test Llama Stack Build / build-single-provider (push) Failing after 55s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 54s Details Pre-commit / pre-commit (push) Successful in 1m44s Details # What does this PR do? ### Summary This pull request implements support for the OpenAI Vector Store Files API for the Milvus vector store provider in `llama_stack`. It enables storing, loading, updating, and deleting file metadata and file contents in Milvus collections, allowing OpenAI vector store files to be managed directly within Milvus. ### Main Changes - Milvus Vector Store Files API Implementation - Implements all required methods for storing, loading, updating, and deleting vector store file metadata and contents (`_save_openai_vector_store_file`, `_load_openai_vector_store_file`, `_load_openai_vector_store_file_contents`, `_update_openai_vector_store_file`, `_delete_openai_vector_store_file_from_storage`). - Uses two Milvus collections: `openai_vector_store_files` (for metadata) and `openai_vector_store_files_contents` (for chunked file contents). - Collections are created dynamically if they do not exist, with appropriate schema definitions. - Collection Name Sanitization - Adds a `sanitize_collection_name` utility to ensure Milvus collection names only contain valid characters (letters, numbers, underscores). - Testing - Updates test skip logic to include `"inline::milvus"` for cases where the OpenAI Vector Store Files API is not supported, improving integration test accuracy. - Other Improvements - Passes `kvstore` to `MilvusIndex` for consistency. - Removes obsolete NotImplementedErrors and legacy code for file storage. ## Test Plan CI and tested via a test script ## Notes - `VectorDB` currently uses the `name` as the `identifier` in `openai_create_vector_store`. We need to add `name` as a field to `VectorDB` and generate the `identifier` upon creation. OpenAI is not idempotent with respect to the `name` field that they pass (i.e., you can pass the same name multiple times and OpenAI will generate a new identifier). I'll add a follow up PR for this. - The `Files` api needs to use `files-` as a prefix in the identifier. I have updated the Vector Store to use the OpenAI prefix `vs_*`. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-03 12:15:33 -07:00
Sébastien Han	dae1fcd3c2	ci: let pytest run the distro server (#2586 ) # What does this PR do? * Use #2580 functionality to auto-start the server with the tests * Reduce timeout to 30sec * Print server logs on errors * Pytest logs are collected to a file pytest.log Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-03 10:51:46 -07:00
Akram Ben Aissi	f4950f4ef0	fix: AccessDeniedError leads to HTTP 500 instead of error 403 (#2595 ) Resolves access control error visibility issues where 500 errors were returned instead of proper 403 responses with actionable error messages. • Enhance AccessDeniedError with detailed context and improve exception handling • Enhanced AccessDeniedError class to include user, action, and resource context - Added constructor parameters for action, resource, and user - Generate detailed error messages showing user principal, attributes, and attempted resource - Backward compatible with existing usage (falls back to generic message) • Updated exception handling in server.py - Import AccessDeniedError from access_control module - Return proper 403 status codes with detailed error messages - Separate handling for PermissionError (generic) vs AccessDeniedError (detailed) • Enhanced error context at raise sites - Updated routing_tables/common.py to pass action, resource, and user context - Updated agents persistence to include context in access denied errors - Provides better debugging information for access control issues • Added comprehensive unit tests - Created tests/unit/server/test_server.py with 13 test cases - Covers AccessDeniedError with and without context - Tests all exception types (ValidationError, BadRequestError, AuthenticationRequiredError, etc.) - Validates proper HTTP status codes and error message formats # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan ``` server: port: 8321 access_policy: - permit: principal: admin actions: [create, read, delete] when: user with admin in groups - permit: actions: [read] when: user with system:authenticated in roles ``` then: ``` curl --request POST --url http://localhost:8321/v1/vector-dbs \ --header "Authorization: Bearer your-bearer" \ --data '{ "vector_db_id": "my_demo_vector_db", "embedding_model": "ibm-granite/granite-embedding-125m-english", "embedding_dimension": 768, "provider_id": "milvus" }' ``` depending if user is in group admin or not, you should get the `AccessDeniedError`. Before this PR, this was leading to an error 500 and `Traceback` displayed in the logs. After the PR, logs display a simpler error (unless DEBUG logging is set) and a 403 Forbidden error is returned on the HTTP side. --------- Signed-off-by: Akram Ben Aissi <<akram.benaissi@gmail.com>>	2025-07-03 10:50:49 -07:00
ehhuang	3c43a2f529	fix: store configs (#2593 ) # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo, as the config expected a str but the value was converted to int. This PR: 1. Updates the type of port in sqlstore to be int 2. template generation uses `dict` instead of `StackRunConfig` so as to avoid failing pydantic typechecks. 3. Adds `replace_env_vars` to StackRunConfig instantiation in `configure.py` (not sure why this wasn't needed before). ## Test Plan `llama stack build --template postgres_demo --image-type conda --run`	2025-07-03 10:07:23 -07:00
Sébastien Han	aa273944fd	fix: add mcp dependency to agent provider (#2587 ) # What does this PR do? The agent depends on utils.tools.mcp. Closes: https://github.com/meta-llama/llama-stack/issues/2576 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-03 14:59:01 +02:00
Christian Zaccaria	b246b0660e	docs: Add quick_start.ipynb notebook equivalent of index.md Quickstart guide (#2128 ) Some checks failed Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 52s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 54s Details Unit Tests / unit-tests (3.13) (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 1m51s Details # What does this PR do? - Adding a notebook equivalent of the [getting_started/index.md#Quickstart guide](https://github.com/meta-llama/llama-stack/blob/main/docs/source/getting_started/index.md). ## To discuss Note: works locally, but I am encountering issues when attempting to run through the notebook on Google Colab. Specifically, on the last step to run the demo, the `knowledge_search` tool doesn't seem to be called i.e.,: ``` rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html prompt> How do you do great work? inference> I don't have personal experiences or emotions, but I was trained on a large corpus of text data and use various techniques such as natural language processing (NLP) and machine learning algorithms to generate human-like responses. ``` I would expect to get something like: ``` rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html prompt> How do you do great work? inference> [knowledge_search(query="What is the key to doing great work")] tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'} tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks: .... .... ```	2025-07-03 13:55:43 +02:00
Sumanth Kamenani	577ec382e1	fix(docs): update Agents101 notebook for builtin websearch (#2591 ) - Switch from BRAVE_SEARCH_API_KEY to TAVILY_SEARCH_API_KEY - Add provider_data to LlamaStackClient for API key passing - Use builtin::websearch toolgroup instead of manual tool config - Fix message types to use UserMessage instead of plain dict - Add streaming support with proper type casting - Remove async from EventLogger loop (bug fix) Fixes websearch functionality in agents tutorial by properly configuring Tavily search provider integration. # What does this PR do? Fixes the Agents101 tutorial notebook to work with the current Llama Stack websearch implementation. The tutorial was using outdated Brave Search configuration that no longer works with the current server setup. Key Changes: - Switch API provider: Change from `BRAVE_SEARCH_API_KEY` to `TAVILY_SEARCH_API_KEY` to match server configuration - Fix client setup: Add `provider_data` to `LlamaStackClient` to properly pass API keys to server - Modernize tool usage: Replace manual tool configuration with `tools=["builtin::websearch"]` - Fix type safety: Use `UserMessage` type instead of plain dictionaries for messages - Fix streaming: Add proper streaming support with `stream=True` and type casting - Fix EventLogger: Remove incorrect `async for` usage (should be `for`) Why needed: Users following the tutorial were getting 401 Unauthorized errors because the notebook wasn't properly configured for the Tavily search provider that the server actually uses. ## Test Plan Prerequisites: 1. Start Llama Stack server with Ollama template and `TAVILY_SEARCH_API_KEY` environment variable 2. Set `TAVILY_SEARCH_API_KEY` in your `.env` file Testing Steps: 1. Clone and setup: ```bash git checkout fix-2558-update-agents101 cd docs/zero_to_hero_guide/ ``` 2. Start server with API key: ```bash export TAVILY_SEARCH_API_KEY="your_tavily_api_key" podman run -it --network=host -v ~/.llama:/root/.llama:Z \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env OLLAMA_URL=http://localhost:11434 \ --env TAVILY_SEARCH_API_KEY=$TAVILY_SEARCH_API_KEY \ llamastack/distribution-ollama --port $LLAMA_STACK_PORT ``` 3. Run the notebook: - Open `07_Agents101.ipynb` in Jupyter - Execute all cells in order - Cell 5 should run without errors and show successful web search results Expected Results: - ✅ No 401 Unauthorized errors - ✅ Agent successfully calls `brave_search.call()` with web results - ✅ Switzerland travel recommendations appear in output - ✅ Follow-up questions work correctly Before this fix: Users got `401 Unauthorized` errors and tutorial failed After this fix: Tutorial works end-to-end with proper web search functionality Tested with: - Tavily API key (free tier) - Ollama distribution template - Llama-3.2-3B-Instruct model	2025-07-03 11:14:51 +02:00
Wen Zhou	040424acf5	docs: update full list of providers with matched APIs and dockerhub images (#2452 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - add model_type in example - change "Memory" to "VectorIO" as column name - update index.md and README.md <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> run pre-commit to catch changes. --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-07-03 10:12:56 +02:00
Nate Harada	5b07755556	docs: Minor spelling fix (#2592 ) Some checks failed Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 21s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 34s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 33s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 33s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 33s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 31s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 30s Details Python Package Build Test / build (3.12) (push) Failing after 47s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 54s Details Python Package Build Test / build (3.13) (push) Failing after 42s Details Test External Providers / test-external-providers (venv) (push) Failing after 27s Details Unit Tests / unit-tests (3.13) (push) Failing after 36s Details Unit Tests / unit-tests (3.12) (push) Failing after 38s Details Pre-commit / pre-commit (push) Successful in 2m3s Details # What does this PR do? Minor spelling fix in the comments ## Test Plan No code changes	2025-07-02 20:26:51 -04:00
Jorge	4d0d2d685f	fix: Set parameter usedforsecurity=False when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters (#2577 ) Some checks failed Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 26s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 31s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 34s Details Python Package Build Test / build (3.13) (push) Failing after 33s Details Pre-commit / pre-commit (push) Successful in 1m52s Details # What does this PR do? Set parameter `usedforsecurity=False` when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters <!-- If resolving an issue, uncomment and update the line below --> Closes #2571 --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>	2025-07-02 12:07:05 +02:00
ehhuang	fc735a414e	test: Add one-step integration testing with server auto-start (#2580 ) Some checks failed Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 21s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 1m3s Details Python Package Build Test / build (3.13) (push) Failing after 1m3s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m7s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m15s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 2m42s Details ## Summary Add support for `server:<config>` format in `--stack-config` option to enable seamless one-step integration testing. This eliminates the need to manually start servers in separate terminals before running tests. ## Key Features - Auto-start server: Automatically launches `llama stack run <config>` if target port is available - Smart reuse: Reuses existing server if port is already occupied - Health check polling: Waits up to 2 minutes for server readiness via `/v1/health` endpoint - Custom port support: Use `server:<config>:<port>` for non-default ports - Clean output: Server runs quietly in background without cluttering test output - Backward compatibility: All existing `--stack-config` formats continue to work ## Usage Examples ```bash # Auto-start server with default port 8321 pytest tests/integration/inference/ --stack-config=server:fireworks # Use custom port pytest tests/integration/safety/ --stack-config=server:together:8322 # Run multiple test suites seamlessly pytest tests/integration/inference/ tests/integration/agents/ --stack-config=server:starter ``` ## Implementation Details - Enhanced `llama_stack_client` fixture with server management - Updated documentation with cleaner organization and comprehensive examples - Added utility functions for port checking, server startup, and health verification ## Test Plan - Verified server auto-start when port 8321 is available - Verified server reuse when port 8321 is occupied - Tested health check polling via `/v1/health` endpoint - Confirmed custom port configuration works correctly - Verified backward compatibility with existing config formats ## Before/After Comparison Before (2 steps): ```bash # Terminal 1: Start server manually llama stack run fireworks --port 8321 # Terminal 2: Wait for startup, then run tests pytest tests/integration/inference/ --stack-config=http://localhost:8321 ``` After (1 step): ```bash # Single command handles everything pytest tests/integration/inference/ --stack-config=server:fireworks ```	2025-07-01 14:48:46 -07:00
Wen Zhou	958600a5c1	fix: update zero_to_hero package and README (#2578 ) Some checks failed Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 36s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test External Providers / test-external-providers (venv) (push) Failing after 32s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - update REAMDE.md format and python version - update package name: CustomTool was renamed to ClientTool in https://github.com/meta-llama/llama-stack-client-python/pull/73 <!-- If resolving an issue, uncomment and update the line below --> Closes #2556 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Wen Zhou <wenzhou@redhat.com>	2025-07-01 11:08:55 -07:00
Nathan Weinberg	d165000bbc	docs: specify the ability to train non-Llama models (#2573 ) # What does this PR do? Clarifies that non-Llama models can be trained via the Post Training API ## Test Plan Build docs locally Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-01 19:29:06 +05:30
Sébastien Han	25268854bc	fix: allow default empty vars for conditionals (#2570 ) # What does this PR do? We were not using conditionals correctly, conditionals can only be used when the env variable is set, so `${env.ENVIRONMENT:+}` would return None is ENVIRONMENT is not set. If you want to create a conditional value, you need to do `${env.ENVIRONMENT:=}`, this will pick the value of ENVIRONMENT if set, otherwise will return None. Closes: https://github.com/meta-llama/llama-stack/issues/2564 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-01 14:42:05 +02:00
Nathan Weinberg	faaeccc6fd	docs: update external provider guide and navigation (#2567 ) Some checks failed Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 25s Details Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 33s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 31s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 28s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 14s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 1m23s Details # What does this PR do? The external providers guide can now be accessed directly from the sidebar ## Test Plan Build locally to test the changes Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-01 09:42:32 +02:00
Francisco Arceo	0066135944	chore: Enabling VectorIO Integration tests for Milvus (#2546 ) Some checks failed Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test Llama Stack Build / build-single-provider (push) Failing after 41s Details Python Package Build Test / build (3.12) (push) Failing after 35s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 41s Details Unit Tests / unit-tests (3.13) (push) Failing after 37s Details Pre-commit / pre-commit (push) Successful in 2m3s Details	2025-06-30 19:49:59 -07:00
Francisco Arceo	5785ccda35	fix: Fixing Milvus sample config and updating documentation (#2568 )	2025-06-30 19:25:23 -07:00
Matthew Farrellee	f6d91f45ba	fix: update zero-to-hero guide for modern llama stack (#2555 ) # What does this PR do? closes #2553 ## Test Plan run through notebooks w/ llama stack running on localhost:{8321,8322}	2025-06-30 18:09:33 -07:00
Matthew Farrellee	13aa367c8a	fix: default api_key from env must be a SecretStr (#2565 ) # What does this PR do? fixes the api_key type when read from env ## Test Plan run nvidia template w/o api_key in run.yaml and perform inference before change the inference will fail w/ - ``` File ".../llama-stack/llama_stack/providers/remote/inference/nvidia/nvidia.py", line 118, in _get_client_for_base_url api_key=(self._config.api_key.get_secret_value() if self._config.api_key else "NO KEY"), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'str' object has no attribute 'get_secret_value' ```	2025-06-30 18:08:44 -07:00
Nathan Weinberg	ba9acce93b	docs: fixed incorrect API list item (#2566 ) Current text did not match section in example Ollama distro: https://llama-stack.readthedocs.io/en/latest/distributions/configuration.html Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-06-30 18:08:19 -07:00
Ashwin Bharambe	b333a3c03a	fix(ollama): Download remote image URLs for Ollama (#2551 ) Some checks failed Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s Details Python Package Build Test / build (3.12) (push) Failing after 43s Details Test External Providers / test-external-providers (venv) (push) Failing after 40s Details Python Package Build Test / build (3.13) (push) Failing after 42s Details Unit Tests / unit-tests (3.13) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 25s Details Update ReadTheDocs / update-readthedocs (push) Failing after 20s Details Pre-commit / pre-commit (push) Successful in 2m13s Details ## What does this PR do? Ollama does not support remote images. Only local file paths OR base64 inputs are supported. This PR ensures that the Stack downloads remote images and passes the base64 down to the inference engine. ## Test Plan Added a test cases for Responses and ran it for both `fireworks` and `ollama` providers.	2025-06-30 20:36:11 +05:30
Sébastien Han	c9a49a80e8	docs: auto generated documentation for providers (#2543 ) # What does this PR do? Simple approach to get some provider pages in the docs. Add or update description fields in the provider configuration class using Pydantic’s Field, ensuring these descriptions are clear and complete, as they will be used to auto-generate provider documentation via ./scripts/distro_codegen.py instead of editing the docs manually. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-30 15:13:20 +02:00
Sébastien Han	8d8e90d78e	fix: add missing argument and methods (#2550 ) # What does this PR do? Resolves: ``` mypy.....................................................................Failed - hook id: mypy - exit code: 1 llama_stack/providers/utils/responses/responses_store.py:119: error: Missing positional argument "policy" in call to "fetch_one" of "AuthorizedSqlStore" [call-arg] llama_stack/providers/utils/responses/responses_store.py:122: error: "AuthorizedSqlStore" has no attribute "delete" [attr-defined] Found 2 errors in 1 file (checked 403 source files) ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-30 14:55:37 +02:00
Krzysztof Malczuk	be9bf68246	feat: Add webmethod for deleting openai responses (#2160 ) Some checks failed Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s Details Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 37s Details Python Package Build Test / build (3.13) (push) Failing after 33s Details Python Package Build Test / build (3.12) (push) Failing after 36s Details Pre-commit / pre-commit (push) Failing after 1m19s Details # What does this PR do? This PR creates a webmethod for deleting open AI responses, adds and implementation for it and makes an integration test for the OpenAI delete response method. [//]: # (If resolving an issue, uncomment and update the line below) # (Closes #2077) ## Test Plan Ran the standard tests and the pre-commit hooks and the unit tests. # (## Documentation) For this pr I made the routes and implementation based on the current get and create methods. The unit tests were not able to handle this test due to the mock interface in use, which did not allow for effective CRUD to be tested. I instead created an integration test to match the existing ones in the test_openai_responses.	2025-06-30 11:28:02 +02:00
Wen Zhou	6fa5271807	docs: update document since container is not an option for "llama stack run" + update docs with current "usage" (#2531 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - change from https://github.com/meta-llama/llama-stack/issues/2110 need update documentation. "container" is not valid value for --image-type - chore: updates from standard output <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Wen Zhou <wenzhou@redhat.com>	2025-06-30 11:02:07 +05:30
dependabot[bot]	dc1b4a84c3	chore(github-deps): bump astral-sh/setup-uv from 6.3.0 to 6.3.1 (#2548 ) Some checks failed Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 28s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 18s Details Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 32s Details Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 31s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 42s Details Python Package Build Test / build (3.12) (push) Failing after 40s Details Python Package Build Test / build (3.13) (push) Failing after 38s Details Test External Providers / test-external-providers (venv) (push) Failing after 39s Details Unit Tests / unit-tests (3.12) (push) Failing after 21s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 2m18s Details Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.3.0 to 6.3.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v6.3.1 🌈 Do not warn when version not in manifest-file</h2> <h2>Changes</h2> <p>This is a hotfix to change the warning messages that a version could not be found in the local manifest-file to info level.</p> <p>A <code>setup-uv</code> release contains a version-manifest.json file with infos in all available <code>uv</code> releases. When a new <code>uv</code> version is released this is not contained in this file until the file gets updated and a new <code>setup-uv</code> release is made. We will overhaul this process in the future but for now the spamming of warnings is removed.</p> <h2>🐛 Bug fixes</h2> <ul> <li>Do not warn when version not in manifest-file <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/462">#462</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known versions for 0.7.14 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/459">#459</a>)</li> <li>Revert "Set expected cache dir drive to C: on windows (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)" <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/460">#460</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`bd01e18f51`"><code>bd01e18</code></a> Do not warn when version not in manifest-file (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/462">#462</a>)</li> <li><a href="`c6a5ebaafe`"><code>c6a5eba</code></a> chore: update known versions for 0.7.14 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/459">#459</a>)</li> <li><a href="`790df8f465`"><code>790df8f</code></a> Revert "Set expected cache dir drive to C: on windows (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)" (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/460">#460</a>)</li> <li>See full diff in <a href="`445689ea25...bd01e18f51`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.3.0&new-version=6.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-06-29 13:55:32 -04:00
Ashwin Bharambe	21669b14e7	fix(docs): add setuptools explicitly (#2547 ) Some checks failed Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 31s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 35s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Test Llama Stack Build / build (push) Failing after 10s Details Python Package Build Test / build (3.13) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m23s Details Given the shift to python3.12, we need to explicitly depend on `setuptools` for the pkg_resources import ## Test Plan Run ``` cd local/llama-stack UV_PROJECT_ENVIRONMENT=/tmp/docs uv sync --frozen --group docs cd /tmp/docs uv run python -m sphinx -T -b html -d _build/doctrees -D language=en \ ~/local/llama-stack/docs/source/ \ /tmp/docs/html ```	2025-06-28 08:14:25 +05:30
github-actions[bot]	709eb7da33	build: Bump version to 0.2.13	2025-06-27 23:56:14 +00:00
Francisco Arceo	cc19b56c87	chore: OpenAI compatibility for Milvus (#2470 ) # What does this PR do? Closes https://github.com/meta-llama/llama-stack/issues/2461 ## Test Plan Tested with the `ollama` distriubtion template and updated the vector_io provider to: ```yaml vector_io: - provider_id: milvus provider_type: inline::milvus config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/milvus_store.db kvstore: type: sqlite db_name: milvus_registry.db ``` Ran the stack ```bash llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434" ``` Ran the tests: ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` Output passed. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-06-27 16:00:36 -07:00
Charlie Doern	65b4fae51d	fix: proper checkpointing logic for HF trainer (#2429 ) # What does this PR do? currently only the last saved model is reported as a checkpoint and associated with the job UUID. since the HF trainer handles checkpoint collection during training, we need to add all of the `checkpoint-*` folders as Checkpoint objects. Adjust the save strategy to be per-epoch to make this easier and to use less storage Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-06-27 17:36:25 -04:00
Ramakrishna Reddy Yekulla	03e61e3fcc	fix: ValueError in faiss vector database serialization (resolves #2519 ) (#2526 ) Some checks failed Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 22s Details Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 21s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Pre-commit / pre-commit (push) Successful in 1m12s Details The error message was misleading as it appeared to be an Ollama connectivity issue, but actually occurred during faiss vector database initialization. ## 🔍 Root Cause Analysis The issue was in the faiss vector database serialization logic in `llama_stack/providers/inline/vector_io/faiss/faiss.py`: 1. Saving: `faiss.serialize_index()` returns binary data (uint8 numpy array) 2. Bug: Code incorrectly used `np.savetxt()` which converts binary to text with scientific notation (e.g., `7.300000000000000000e+01`) 3. Loading: `np.loadtxt(buffer, dtype=np.uint8)` failed to parse scientific notation back to uint8 4. Result: Server crashed during initialization before reaching Ollama connectivity check ## ✅ Solution Replaced text-based serialization with proper binary serialization: ``` After (fixed): ```python # Saving - proper binary format np.save(buffer, np_index, allow_pickle=False) # Loading - proper binary format self.index = faiss.deserialize_index(np.load(buffer, allow_pickle=False)) ``` ## 🧪 Testing - ✅ Binary serialization/deserialization works correctly - ✅ Backward compatible with existing functionality - ✅ No security concerns (allow_pickle=False maintained) - ✅ Resolves the specific ValueError mentioned in the issue ## 📊 Impact This fix resolves: - ValueError during server startup with Ollama templates ## 🔗 Related Issues - Closes #2519 - Affects all users of Ollama template and faiss vector_io configurations ## 📝 Files Changed - `llama_stack/providers/inline/vector_io/faiss/faiss.py` - Fixed serialization methods in `initialize()` and `_save_index()` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com>	2025-06-27 14:34:52 -04:00
Rohan Awhad	7cb5d3c60f	chore: standardize unsupported model error #2517 (#2518 ) # What does this PR do? - llama_stack/exceptions.py: Add UnsupportedModelError class - remote inference ollama.py and utils/inference/model_registry.py: Changed ValueError in favor of UnsupportedModelError - utils/inference/litellm_openai_mixin.py: remove `register_model` function implementation from `LiteLLMOpenAIMixin` class. Now uses the parent class `ModelRegistryHelper`'s function implementation Closes #2517 ## Test Plan 1. Create a new `test_run_openai.yaml` and paste the following config in it: ```yaml version: '2' image_name: test-image apis: - inference providers: inference: - provider_id: openai provider_type: remote::openai config: max_tokens: 8192 models: - metadata: {} model_id: "non-existent-model" provider_id: openai model_type: llm server: port: 8321 ``` And run the server with: ```bash uv run llama stack run test_run_openai.yaml ``` You should now get a `llama_stack.exceptions.UnsupportedModelError` with the supported list of models in the error message. --- Tested for the following remote inference providers, and they all raise the `UnsupportedModelError`: - Anthropic - Cerebras - Fireworks - Gemini - Groq - Ollama - OpenAI - SambaNova - Together - Watsonx --------- Co-authored-by: Rohan Awhad <rawhad@redhat.com>	2025-06-27 14:26:58 -04:00
Yuan Tang	9baa16e498	fix(security): Upgrade protobuf and aiohttp. Fixes CVE-2025-4565 (#2541 ) # What does this PR do? Fixes CVE-2025-4565 and the following warning: ``` warning: `aiohttp==3.11.13` is yanked (reason: "Regression: https://github.com/aio-libs/aiohttp/issues/10617") ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-06-27 06:58:38 -07:00
Juanma	e7eb9f9adc	fix: dataset metadata without provider_id (#2527 ) # What does this PR do? Fixes an error when inferring dataset provider_id with metadata Closes #[2506](https://github.com/meta-llama/llama-stack/issues/2506) Signed-off-by: Juanma Barea <juanmabareamartinez@gmail.com>	2025-06-27 08:51:29 -04:00
Yuan Tang	40fdce79b3	fix(security): Upgrade urllib3 to v2.5.0. Fixes CVE-2025-50181 and CVE-2025-50182 (#2534 ) Some checks failed Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 36s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Details Test External Providers / test-external-providers (venv) (push) Failing after 32s Details Pre-commit / pre-commit (push) Successful in 1m21s Details This fixes CVE-2025-50181 and CVE-2025-50182. Changes via: ``` uv sync --upgrade-package urllib3 uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-06-27 10:46:47 +02:00
Wen Zhou	8c3f2762fb	build: update temp. created Containerfile (#2492 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - conditionally created folder /.llama/providers.d if external_providers_dir is set - do not create /.cache folder, not in use anywhere - combine chmod and copy to one command <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> updated test: ``` export CONTAINER_BINARY=podman LLAMA_STACK_DIR=. uv run llama stack build --template remote-vllm --image-type container --image-name <name> ``` log: ``` Containerfile created successfully in /tmp/tmp.rPMunE39Aw/Containerfile FROM python:3.11-slim WORKDIR /app RUN apt-get update && apt-get install -y iputils-ping net-tools iproute2 dnsutils telnet curl wget telnet git procps psmisc lsof traceroute bubblewrap gcc && rm -rf /var/lib/apt/lists/* ENV UV_SYSTEM_PYTHON=1 RUN pip install uv RUN uv pip install --no-cache sentencepiece pillow pypdf transformers pythainlp faiss-cpu opentelemetry-sdk requests datasets chardet scipy nltk numpy matplotlib psycopg2-binary aiosqlite langdetect autoevals tree_sitter tqdm pandas chromadb-client opentelemetry-exporter-otlp-proto-http redis scikit-learn openai pymongo emoji sqlalchemy[asyncio] mcp aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http RUN uv pip install --no-cache sentence-transformers --no-deps RUN uv pip install --no-cache torch torchvision --index-url https://download.pytorch.org/whl/cpu # Allows running as non-root user RUN mkdir -p /.llama/providers.d /.cache RUN uv pip install --no-cache llama-stack RUN pip uninstall -y uv ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "remote-vllm"] RUN chmod -R g+rw /app /.llama /.cache PWD: /tmp/llama-stack Containerfile: /tmp/tmp.rPMunE39Aw/Containerfile + podman build --progress=plain --security-opt label=disable --platform linux/amd64 -t distribution-remote-vllm:0.2.12 -f /tmp/tmp.rPMunE39Aw/Containerfile /tmp/llama-stack .... Success! Build Successful! You can find the newly-built template here: /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml You can run the new Llama Stack distro via: llama stack run /tmp/llama-stack/llama_stack/templates/remote-vllm/run.yaml --image-type container ``` ``` podman tag localhost/distribution-remote-vllm:dev quay.io/wenzhou/distribution-remote-vllm:2492_2 podman push quay.io/wenzhou/distribution-remote-vllm:2492_2 docker run --rm -p 8321:8321 -e INFERENCE_MODEL="meta-llama/Llama-2-7b-chat-hf" -e VLLM_URL="http://localhost:8000/v1" quay.io/wenzhou/distribution-remote-vllm:2492_2 --port 8321 INFO 2025-06-26 13:47:31,813 __main__:436 server: Using template remote-vllm config file: /app/llama-stack-source/llama_stack/templates/remote-vllm/run.yaml INFO 2025-06-26 13:47:31,818 __main__:438 server: Run configuration: INFO 2025-06-26 13:47:31,826 __main__:440 server: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io benchmarks: [] container_image: null .... ``` ----- previous test: local run` >llama stack build --template remote-vllm --image-type container` image stored in `quay.io/wenzhou/distribution-remote-vllm:2492` --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>	2025-06-27 10:23:12 +02:00
Yuan Tang	0ddb293d77	docs: Add recent releases to CHANGELOG.md (#2533 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Update changelog. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-06-26 23:04:13 -04:00
Ben Browning	0883944bc3	fix: Some missed env variable changes from PR 2490 (#2538 ) Some checks failed Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 25s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 28s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 34s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 32s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 1m1s Details # What does this PR do? Some templates were still using the old environment variable substition syntax instead of the new one and were not getting substituted properly. Also, some places didn't handle the new None vs old empty string ("") values that come from the conditional environment variable substitution. This gets the starter and remote-vllm distributions starting again, and I tested various permutations of the starter as chroma and pgvector needed some adjustments to their config classes to handle the new possible `None` values. And, I had to tweak our `Provider` class to also handle `None` values, for cases where we disable providers in the starter config via environment variables. This may not have caught everything that was missed, but I did grep around quite a bit to try and find anything lingering. ## Test Plan The following permutations now all run (or attempt to run to the point of complaining that they can't connect to chroma, vllm, etc) when before they failed immediately on startup because of bad environment variable substitions: ``` uv run llama stack run llama_stack/templates/starter/run.yaml ENABLE_SQLITE_VEC=true uv run llama stack run llama_stack/templates/starter/run.yaml ENABLE_PGVECTOR=true uv run llama stack run llama_stack/templates/starter/run.yaml ENABLE_CHROMADB=true uv run llama stack run llama_stack/templates/starter/run.yaml uv run llama stack run llama_stack/templates/remote-vllm/run.yaml ``` <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: raghotham <rsm@meta.com>	2025-06-26 17:59:15 -07:00
Hardik Shah	eb01a3f1c5	ci: vector_io provider integration tests (#2537 ) Runs integration tests for `vector_io` across the provider matrix. This new workflow adds CI testing across - `inline::faiss`, `remote::chroma`.	2025-06-26 17:04:32 -07:00
grs	68d8f2186f	fix: fix test of root span to match what is being set (#2494 ) Some checks failed Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 32s Details Unit Tests / unit-tests (3.12) (push) Failing after 48s Details Pre-commit / pre-commit (push) Successful in 1m32s Details # What does this PR do? I get errors when trying to query spans. It appears to be a result of traces being inserted where there is no root_span_id which causes a pydantic validation error on trying to load the data for a query response (and in any case having no span referenced undermines the purpose of the trace). The root cause as far as I can see is an invalid test in the code that inserts the trace, where it is testing for the string "true" against an object set to the python value True. <!-- If resolving an issue, uncomment and update the line below --> Closes #2493 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> With this change I can query spans. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-06-26 11:41:35 -04:00
Sébastien Han	dbdc811d16	chore: isolate bare minimum project dependencies (#2282 ) Some checks failed Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 19s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 48s Details # What does this PR do? The goal is to promote the minimal set of dependencies the project needs to run, this includes: * dependencies needed to work with the CLI * dependencies needed for the server to run with no providers This also: * Relocate redundant dependencies out of the core project and into the individual providers that actually require them. * Include all necessary server dependencies so the project can run standalone, even without any providers. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Build and run distro a server. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 10:14:27 +02:00
Sébastien Han	43c1f39bd6	refactor(env)!: enhanced environment variable substitution (#2490 ) # What does this PR do? This commit significantly improves the environment variable substitution functionality in Llama Stack configuration files: * The version field in configuration files has been changed from string to integer type for better type consistency across build and run configurations. * The environment variable substitution system for ${env.FOO:} was fixed and properly returns an error * The environment variable substitution system for ${env.FOO+} returns None instead of an empty strings, it better matches type annotations in config fields * The system includes automatic type conversion for boolean, integer, and float values. * The error messages have been enhanced to provide clearer guidance when environment variables are missing, including suggestions for using default values or conditional syntax. * Comprehensive documentation has been added to the configuration guide explaining all supported syntax patterns, best practices, and runtime override capabilities. * Multiple provider configurations have been updated to use the new conditional syntax for optional API keys, making the system more flexible for different deployment scenarios. The telemetry configuration has been improved to properly handle optional endpoints with appropriate validation, ensuring that required endpoints are specified when their corresponding sinks are enabled. * There were many instances of ${env.NVIDIA_API_KEY:} that should have caused the code to fail. However, due to a bug, the distro server was still being started, and early validation wasn’t triggered. As a result, failures were likely being handled downstream by the providers. I’ve maintained similar behavior by using ${env.NVIDIA_API_KEY:+}, though I believe this is incorrect for many configurations. I’ll leave it to each provider to correct it as needed. * Environment variable substitution now uses the same syntax as Bash parameter expansion. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:20:08 +05:30
Sébastien Han	36d70637b9	fix: finish conversion to StrEnum (#2514 ) # What does this PR do? We still had a few enum declared to behave like string as well as enum. Let's use StrEnum for those. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:01:26 +05:30
Sébastien Han	ac5fd57387	chore: remove nested imports (#2515 ) # What does this PR do? * Given that our API packages use "import " in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import ` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:01:05 +05:30
Ben Browning	2d9fd041eb	fix: annotations list and web_search_preview in Responses (#2520 ) # What does this PR do? These are a couple of fixes to get an example LangChain app working with our OpenAI Responses API implementation. The Responses API spec requires an annotations array in `output[].content[].annotations` and we were not providing one. So, this adds that as an empty list, even though we don't do anything to populate it yet. This prevents an error from client libraries like Langchain that expect this field to always exist, even if an empty list. The other fix is `web_search_preview` is a valid name for the web search tool in the Responses API, but we only responded to `web_search` or `web_search_preview_2025_03_11`. ## Test Plan The existing Responses unit tests were expanded to test these cases, via: ``` pytest -sv tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` The existing test_openai_responses.py integration tests still pass with this change, tested as below with Fireworks: ``` uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv tests/integration/agents/test_openai_responses.py \ --text-model accounts/fireworks/models/llama4-scout-instruct-basic ``` Lastly, this example LangChain app now works with Llama stack (tested with Ollama in the starter template in this case). This LangChain code is using the example snippets for using Responses API at https://python.langchain.com/docs/integrations/chat/openai/#responses-api ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="fake", model="ollama/meta-llama/Llama-3.2-3B-Instruct", ) tool = {"type": "web_search_preview"} llm_with_tools = llm.bind_tools([tool]) response = llm_with_tools.invoke("What was a positive news story from today?") print(response.content) ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-26 07:59:33 +05:30
ehhuang	1d3f27fe5b	fix: resume responses with tool call output (#2524 ) Some checks failed Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 17s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 49s Details Unit Tests / unit-tests (3.13) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 2m5s Details # What does this PR do? closes #2522 ## Test Plan added integration test LLAMA_STACK_CONFIG=http://localhost:8321 pytest -v tests/integration/agents/test_openai_responses.py --text-model "accounts/fireworks/models/llama-v3p3-70b-instruct" -vv -k 'function_call'	2025-06-25 14:43:37 -07:00
Francisco Arceo	82f13fe83e	feat: Add ChunkMetadata to Chunk (#2497 ) # What does this PR do? Adding `ChunkMetadata` so we can properly delete embeddings later. More specifically, this PR refactors and extends the chunk metadata handling in the vector database and introduces a distinction between metadata used for model context and backend-only metadata required for chunk management, storage, and retrieval. It also improves chunk ID generation and propagation throughout the stack, enhances test coverage, and adds new utility modules. ```python class ChunkMetadata(BaseModel): """ `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that will NOT be inserted into the context during inference, but is required for backend functionality. Use `metadata` in `Chunk` for metadata that will be used during inference. """ document_id: str \| None = None chunk_id: str \| None = None source: str \| None = None created_timestamp: int \| None = None updated_timestamp: int \| None = None chunk_window: str \| None = None chunk_tokenizer: str \| None = None chunk_embedding_model: str \| None = None chunk_embedding_dimension: int \| None = None content_token_count: int \| None = None metadata_token_count: int \| None = None ``` Eventually we can migrate the document_id out of the `metadata` field. I've introduced the changes so that `ChunkMetadata` is backwards compatible with `metadata`. <!-- If resolving an issue, uncomment and update the line below --> Closes https://github.com/meta-llama/llama-stack/issues/2501 ## Test Plan Added unit tests --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-06-25 15:55:23 -04:00
Ben Browning	fa0b0c13d4	fix: Ollama should be optional in starter distro (#2482 ) Some checks failed Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 14s Details Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 1m10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m8s Details Python Package Build Test / build (3.13) (push) Failing after 1m6s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m4s Details Pre-commit / pre-commit (push) Successful in 2m33s Details # What does this PR do? Our starter distro required Ollama to be running (and a large list of models available in that Ollama) to successfully start. This adjusts things so that Ollama does not have to be running to use the starter template / distro. To accomplish this, a few changes were needed: * The Ollama provider is now configurable whether it raises an Exception or just logs a warning when it cannot reach the Ollama server on startup. The default is to raise an exception (same as previous behavior), but in the starter template we adjust this to just log a warning so that we can bring the stack up without needing a running Ollama server. * The starter template no longer specifies a default list of models for Ollama, as any models specified there need to actually be pulled and available in Ollama. Instead, it adds a new `OLLAMA_INFERENCE_MODEL` environment variable where users can provide an optional model to register with the Ollama provider on startup. Additional models can also be registered via the typical `models.register(...)` at runtime. * The vLLM template was adjusted to also allow an optional `VLLM_INFERENCE_MODEL` specified on startup, so that the behavior between vLLM and Ollama was consistent here to make it easy to get up and running quickly. * The default vector store was changed from sqlite-vec to faiss. sqlite-vec can enabled via setting the `ENABLE_SQLITE_VEC` environment variable, like we do for chromadb and pgvector. This is due to sqlite-vec not shipping proper arm64 binaries, like we previously fixed in #1530 for the ollama distribution. ## Test Plan With this change, the following scenarios now work with the starter template that did not before: * no Ollama running * Ollama running but not all of the Llama models pulled locally * Ollama running with a custom model registered on startup * vLLM running with a custom model registered on startup * running the starter template on linux/arm64, like when running containers on Mac without rosetta emulation --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-25 15:54:00 +02:00
Varsha	cfee63bd0d	feat: Add search_mode support to OpenAI vector store API (#2500 ) Some checks failed Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s Details Test Llama Stack Build / build (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 44s Details Test External Providers / test-external-providers (venv) (push) Failing after 47s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 2m12s Details # What does this PR do? Add search_mode parameter (vector/keyword/hybrid) to openai_search_vector_store method. Fixes OpenAPI code generation by using str instead of Literal type. Closes: #2459 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-24 20:38:47 -04:00
ehhuang	114946ae88	chore: fix build script bug (#2507 ) # What does this PR do? Fixes ``` Installing pip dependencies error: Failed to parse: `scikit-learn pymongo pythainlp datasets torch sentencepiece requests aiohttp psycopg2-binary trl pillow pandas chardet nltk scipy ollama faiss-cpu pypdf tree_sitter langdetect openai matplotlib asyncpg peft redis autoevals mcp opentelemetry-exporter-otlp-proto-http sqlalchemy[asyncio] tqdm opentelemetry-sdk aiosqlite numpy chromadb-client emoji transformers aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http` Caused by: Expected one of `@`, `(`, `<`, `=`, `>`, `~`, `!`, `;`, found `p` scikit-learn pymongo pythainlp datasets torch sentencepiece requests aiohttp psycopg2-binary trl pillow pandas chardet nltk scipy ollama faiss-cpu pypdf tree_sitter langdetect openai matplotlib asyncpg peft redis autoevals mcp opentelemetry-exporter-otlp-proto-http sqlalchemy[asyncio] tqdm opentelemetry-sdk aiosqlite numpy chromadb-client emoji transformers aiosqlite fastapi fire httpx uvicorn opentelemetry-sdk opentelemetry-exporter-otlp-proto-http ^ ERROR 2025-06-24 11:33:33,362 llama_stack.distribution.build:145 uncategorized: Failed to build target myenv with return code 2 Error building stack: Failed to build image myenv ``` ## Test Plan	2025-06-24 12:05:22 -07:00
Sébastien Han	450ed920d6	chore: do not build on auth ci test (#2505 ) Some checks failed Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 18s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 22s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.13, inspect) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 21s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 23s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 19s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 2m4s Details # What does this PR do? Since we are using a very minimal run.yaml, there is not need to build. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-24 21:08:33 +05:30
Ashwin Bharambe	73c18feac4	fix: update the signature of openai_list_files_in_vector_store in all VectorIO impls (#2503 )	2025-06-24 18:55:56 +05:30
ehhuang	7fa8f23555	fix(ui): ensure initial data fetch only happens once (#2486 ) # What does this PR do? Bug: 1. go to responses chat logs in UI 2. go to chat completions logs page 3. observe that same data appears in the table twice This is because `fetchData` is called multiple times when multiple renders occur. ## Test Plan manual testing of above bug repro steps	2025-06-24 12:22:55 +02:00
Sébastien Han	9c8be89fb6	chore: bump python supported version to 3.12 (#2475 ) Some checks failed Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Details Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 41s Details Python Package Build Test / build (3.12) (push) Failing after 33s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 31s Details Pre-commit / pre-commit (push) Successful in 1m54s Details # What does this PR do? The project now supports Python >= 3.12 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-24 09:22:04 +05:30
Rohan Awhad	d797f9aec1	fix: #2495 FileNotFound Err in container image (#2498 ) # What does this PR do? Closes #2495 Changes: - Delay the `COPY run.yaml` into docker image step until after external provider handling - Split the check for `external_providers_dir` into “non-empty” and “directory exists" ## Test Plan 0. Create and Activate venv 1. Create a `simple_build.yaml` ```yaml version: '2' distribution_spec: providers: inference: - remote::openai image_type: container image_name: openai-stack ``` 2. Run llama stack build: ```bash llama stack build --config simple_build.yaml ``` 3. Run the docker container: ```bash docker run \ -p 8321:8321 \ -e OPENAI_API_KEY=$OPENAI_API_KEY \ openai_stack:0.2.12 ``` This should show server is running. ``` INFO 2025-06-23 19:07:57,832 llama_stack.distribution.distribution:151 core: Loading external providers from /.llama/providers.d INFO 2025-06-23 19:07:59,324 __main__:572 server: Listening on ['::', '0.0.0.0']:8321 INFO: Started server process [1] INFO: Waiting for application startup. INFO 2025-06-23 19:07:59,336 __main__:156 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Notice the first line: ``` Loading external providers from /.llama/providers.d ``` This is expected behaviour. Co-authored-by: Rohan Awhad <rawhad@redhat.com>	2025-06-24 09:08:08 +05:30
dependabot[bot]	929ac618ce	chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 6.3.0 (#2488 ) Some checks failed Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 15s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 21s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 21s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 22s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details Python Package Build Test / build (3.13) (push) Failing after 20s Details Python Package Build Test / build (3.11) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 34s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 29s Details Test External Providers / test-external-providers (venv) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 28s Details Unit Tests / unit-tests (3.11) (push) Failing after 23s Details Unit Tests / unit-tests (3.13) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details Pre-commit / pre-commit (push) Successful in 48s Details Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.0.1 to 6.3.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v6.3.0 🌈 Use latest version from manifest-file</h2> <h2>Changes</h2> <p>If a manifest-file is supplied the default value of the version input (latest) will get the latest version available in the manifest. That might not be the actual latest version available in the official uv repo.</p> <h2>🚀 Enhancements</h2> <ul> <li>Use latest version from manifest-file <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/458">#458</a>)</li> </ul> <h2>v6.2.0 🌈 New input manifest-file</h2> <h2>Changes</h2> <p>This release adds a new input <code>manifest-file</code>.</p> <p>The <code>manifest-file</code> input allows you to specify a JSON manifest that lists available uv versions, architectures, and their download URLs. By default, this action uses the manifest file contained in this repository, which is automatically updated with each release of uv.</p> <p>The manifest file contains an array of objects, each describing a version, architecture, platform, and the corresponding download URL.</p> <p>You can supply a custom manifest file URL to define additional versions, architectures, or different download URLs. This is useful if you maintain your own uv builds or want to override the default sources.</p> <p>For example:</p> <pre lang="json"><code>[ { "version": "0.7.12-alpha.1", "artifactName": "uv-x86_64-unknown-linux-gnu.tar.gz", "arch": "x86_64", "platform": "unknown-linux-gnu", "downloadUrl": "https://release.pyx.dev/0.7.12-alpha.1/uv-x86_64-unknown-linux-gnu.tar.gz" }, ... ] </code></pre> <pre lang="yaml"><code>- name: Use a custom manifest file uses: astral-sh/setup-uv@v6 with: manifest-file: "https://example.com/my-custom-manifest.json" </code></pre> <blockquote> <p>[!WARNING]</p> </blockquote> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`445689ea25`"><code>445689e</code></a> Use latest version from manifest-file (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/458">#458</a>)</li> <li><a href="`a02a550bdd`"><code>a02a550</code></a> Look for version-manifest.json relative to action path (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/456">#456</a>)</li> <li><a href="`60cc2b4585`"><code>60cc2b4</code></a> Add input manifest-file (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/454">#454</a>)</li> <li><a href="`7bbb36f434`"><code>7bbb36f</code></a> chore: update known versions for 0.7.13 and 0.7.12 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/444">#444</a>)</li> <li><a href="`60ecb381b4`"><code>60ecb38</code></a> Set expected cache dir drive to C: on windows (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/451">#451</a>)</li> <li><a href="`252c995424`"><code>252c995</code></a> chore: update known versions for 0.7.11 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/442">#442</a>)</li> <li><a href="`477a814f2d`"><code>477a814</code></a> chore: update known versions for 0.7.10 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/440">#440</a>)</li> <li><a href="`9b19f8f4b1`"><code>9b19f8f</code></a> Add warning about shadowed uv binaries to <code>activate-environment</code> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/439">#439</a>)</li> <li><a href="`d44461ea9f`"><code>d44461e</code></a> chore: update known versions for 0.7.9 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/437">#437</a>)</li> <li><a href="`c19c1b1ffd`"><code>c19c1b1</code></a> Check that all jobs are in all-tests-passed.needs (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/432">#432</a>)</li> <li>Additional commits viewable in <a href="`6b9c6063ab...445689ea25`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=6.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-06-23 11:21:06 +02:00
ehhuang	6fde601765	chore: upgrade hf hub dependency (#2487 ) Some checks failed Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s Details Python Package Build Test / build (3.11) (push) Failing after 2s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 33s Details Test Llama Stack Build / build-single-provider (push) Failing after 31s Details Pre-commit / pre-commit (push) Successful in 1m12s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details # What does this PR do? CI tests have been failing with .venv/lib/python3.12/site-packages/peft/auto.py:21: in <module> from transformers import ( .venv/lib/python3.12/site-packages/transformers/__init__.py:27: in <module> from . import dependency_versions_check .venv/lib/python3.12/site-packages/transformers/dependency_versions_check.py:57: in <module> require_version_core(deps[pkg]) .venv/lib/python3.12/site-packages/transformers/utils/versions.py:117: in require_version_core return require_version(requirement, hint) .venv/lib/python3.12/site-packages/transformers/utils/versions.py:111: in require_version _compare_versions(op, got_ver, want_ver, requirement, pkg, hint) .venv/lib/python3.12/site-packages/transformers/utils/versions.py:44: in _compare_versions raise ImportError( E ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal functioning of this module, but found huggingface-hub==0.29.0. E Try: `pip install transformers -U` or `pip install -e '.[dev]'` if you're working with git main ------------------------------ Captured log setup ------------------------------ INFO llama_stack.providers.remote.inference.ollama.ollama:ollama.py:106 checking connectivity to Ollama at `http://0.0.0.0:11434`.../ =========================== short test summary info ============================ ERROR tests/integration/providers/test_providers.py::TestProviders::test_providers - ImportError: huggingface-hub>=0.30.0,<1.0 is required for a normal functioning of this module, but found huggingface-hub==0.29.0. Try: `pip install transformers -U` or `pip install -e '.[dev]'` if you're working with git main =================== 1 skipped, 4 warnings, 1 error in 9.52s ==================== ## Test Plan CI	2025-06-20 15:50:54 -07:00
ehhuang	23b7dc7b37	fix: stack build (#2485 ) # What does this PR do? probably related to 3.11 upgrade ^^^^ File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.11/site-packages/termcolor/termcolor.py", line 147, in colored text = fmt_str % (COLORS[color], text) ~~~~~~^^^^^^^ KeyError: 'light_blue' ## Test Plan	2025-06-20 15:15:43 -07:00
github-actions[bot]	d70573bd47	build: Bump version to 0.2.12	2025-06-20 21:06:17 +00:00
ehhuang	d3b60507d7	feat: support auth attributes in inference/responses stores (#2389 ) # What does this PR do? Inference/Response stores now store user attributes when inserting, and respects them when fetching. ## Test Plan pytest tests/unit/utils/test_sqlstore.py	2025-06-20 10:24:45 -07:00
Costa Shulyupin	7930c524f9	docs: Fix spacing (#2481 ) Some checks failed Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Details Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 10s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 5s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 13s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.11, vector_io) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Pre-commit / pre-commit (push) Successful in 1m14s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 13s Details ![image](https://github.com/user-attachments/assets/4b8e0e9c-1622-41dd-a0f4-178b6b452029) Replace misaligned tab with spaces Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>	2025-06-20 13:21:58 +02:00
ehhuang	6832e8a658	feat: remove score_threshold constraint (#2479 ) Some checks failed Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 28s Details Python Package Build Test / build (3.11) (push) Failing after 3s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Details Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s Details Python Package Build Test / build (3.13) (push) Failing after 4s Details Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 28s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 30s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 48s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m5s Details Pre-commit / pre-commit (push) Successful in 2m17s Details # What does this PR do? See inline comment. fixes test _ test_openai_vector_store_search_with_high_score_filter[llama_stack_client-meta-llama/Llama-3.3-70B-Instruct-meta-llama/Llama-4-Scout-17B-16E-Instruct-all-MiniLM-L6-v2-None-None] _ llama-stack/llama_stack/distribution/library_client.py:98: in convert_to_pydantic return TypeAdapter(annotation).validate_python(value) .venv/lib/python3.10/site-packages/pydantic/type_adapter.py:421: in validate_python return self.validator.validate_python( E pydantic_core._pydantic_core.ValidationError: 1 validation error for nullable[SearchRankingOptions] E score_threshold E Input should be less than or equal to 1 [type=less_than_equal, input_value=1.3458905661753127, input_type=float] E For further information visit https://errors.pydantic.dev/2.11/v/less_than_equal The above exception was the direct cause of the following exception: llama-stack/tests/integration/vector_io/test_openai_vector_stores.py:376: in test_openai_vector_store_search_with_high_score_filter search_response = compat_client.vector_stores.search( .venv/lib/python3.10/site-packages/llama_stack_client/resources/vector_stores/vector_stores.py:356: in search return self._post( .venv/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1232: in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) llama-stack/llama_stack/distribution/library_client.py:177: in request result = loop.run_until_complete(self.async_client.request(args, *kwargs)) /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/asyncio/base_events.py:649: in run_until_complete return future.result() llama-stack/llama_stack/distribution/library_client.py:292: in request response = await self._call_non_streaming( llama-stack/llama_stack/distribution/library_client.py:313: in _call_non_streaming body = self._convert_body(path, options.method, body) llama-stack/llama_stack/distribution/library_client.py:425: in _convert_body converted_body[param_name] = convert_to_pydantic(param.annotation, value) llama-stack/llama_stack/distribution/library_client.py:112: in convert_to_pydantic raise ValueError(f"Failed to convert parameter {value} into {annotation}: {e}") from e E ValueError: Failed to convert parameter {'score_threshold': 1.3458905661753127} into llama_stack.apis.vector_io.vector_io.SearchRankingOptions \| None: 1 validation error for nullable[SearchRankingOptions] E score_threshold E Input should be less than or equal to 1 [type=less_than_equal, input_value=1.3458905661753127, input_type=float] E For further information visit https://errors.pydantic.dev/2.11/v/less_than_equal ## Test Plan	2025-06-20 09:17:42 +05:30
Eran Cohen	747e594680	feat: expand set of known gemini models (#2471 ) Some checks failed Test Llama Stack Build / build-single-provider (push) Failing after 39s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 37s Details Python Package Build Test / build (3.12) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 45s Details Pre-commit / pre-commit (push) Successful in 1m57s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Python Package Build Test / build (3.11) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 3s Details feat: Add Gemini 2.0 and 2.5 models This commit expands the set of known Gemini models by introducing: - `gemini/gemini-2.0-flash` - `gemini/gemini-2.5-flash` - `gemini/gemini-2.5-pro` These new models are added to `LLM_MODEL_IDS` for broader compatibility and updated in `run.yaml` to allow for their immediate use in starter configurations. Signed-off-by: Eran Cohen <eranco@redhat.com>	2025-06-19 12:19:37 -04:00
Ben Browning	f394c7f2d9	feat: Add missing Vector Store Files API surface (#2468 ) Some checks failed Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Details Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s Details Python Package Build Test / build (3.11) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s Details Test External Providers / test-external-providers (venv) (push) Failing after 43s Details Unit Tests / unit-tests (3.13) (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 2m4s Details # What does this PR do? This adds the ability to list, retrieve, update, and delete Vector Store Files. It implements these new APIs for the faiss and sqlite-vec providers, since those are the two that also have the rest of the vector store files implementation. Closes #2445 ## Test Plan ### test_openai_vector_stores Integration Tests There are a number of new integration tests added, which I ran for each provider as outlined below. faiss (from ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` sqlite-vec (from starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \ --embedding-model=all-MiniLM-L6-v2 ``` ### file_search verification tests I also ensured the file_search verification tests continue to work, both for faiss and sqlite-vec. faiss (ollama distro): ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` sqlite-vec (starter distro): ``` llama stack run llama_stack/templates/starter/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-19 11:08:24 -04:00
Ihar Hrachyshka	a2f054607d	fix: cancel scheduler tasks on shutdown (#2130 ) # What does this PR do? Scheduler: cancel tasks on shutdown. Otherwise the currently running tasks will never exit (before they actually complete), which means the process can't be properly shut down (only with SIGKILL). Ideally, we let tasks know that they are about to shutdown and give them some time to do so; but in the lack of the mechanism, it's better to cancel than linger forever. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Start a long running task (e.g. torchtune or external kfp-provider training). Ctr-C the process in TTY. Confirm it exits in reasonable time. ``` ^CINFO: Shutting down INFO: Waiting for application shutdown. 13:32:26.187 - INFO - Shutting down 13:32:26.187 - INFO - Shutting down DatasetsRoutingTable 13:32:26.187 - INFO - Shutting down DatasetIORouter 13:32:26.187 - INFO - Shutting down TorchtuneKFPPostTrainingImpl Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 109, in <module> executor_main() File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor_main.py", line 101, in executor_main output_file = executor.execute() ^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/executor.py", line 361, in execute result = self.func(**func_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/45/1q1rx6cn7jbcn2ty852w0g_r0000gn/T/tmp.RKpPrvTWDD/ephemeral_component.py", line 118, in component asyncio.run(recipe.setup()) File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 123, in run raise KeyboardInterrupt() KeyboardInterrupt 13:32:31.219 - ERROR - Task 'component' finished with status FAILURE ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ INFO 2025-05-09 13:32:31,221 llama_stack.providers.utils.scheduler:221 scheduler: Job test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m. ERROR 2025-05-09 13:32:31,223 llama_stack_provider_kfp_trainer.scheduler:54 scheduler: Job test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa failed. ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/src/llama_stack_provider_kfp_trainer/scheduler.py:45 │ │ in do │ │ │ │ 42 │ │ │ │ │ 43 │ │ │ job.status = JobStatus.running │ │ 44 │ │ │ try: │ │ ❱ 45 │ │ │ │ artifacts = self._to_artifacts(job.handler().output) │ │ 46 │ │ │ │ for artifact in artifacts: │ │ 47 │ │ │ │ │ on_artifact_collected_cb(artifact) │ │ 48 │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/base_compon │ │ ent.py:101 in __call__ │ │ │ │ 98 │ │ │ │ f'{self.name}() missing {len(missing_arguments)} required ' │ │ 99 │ │ │ │ f'{argument_or_arguments}: {arguments}.') │ │ 100 │ │ │ │ ❱ 101 │ │ return pipeline_task.PipelineTask( │ │ 102 │ │ │ component_spec=self.component_spec, │ │ 103 │ │ │ args=task_inputs, │ │ 104 │ │ │ execute_locally=pipeline_context.Pipeline.get_default_pipeline() is │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │ │ sk.py:187 in __init__ │ │ │ │ 184 │ │ ]) │ │ 185 │ │ │ │ 186 │ │ if execute_locally: │ │ ❱ 187 │ │ │ self._execute_locally(args=args) │ │ 188 │ │ │ 189 │ def _execute_locally(self, args: Dict[str, Any]) -> None: │ │ 190 │ │ """Execute the pipeline task locally. │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/dsl/pipeline_ta │ │ sk.py:197 in _execute_locally │ │ │ │ 194 │ │ from kfp.local import task_dispatcher │ │ 195 │ │ │ │ 196 │ │ if self.pipeline_spec is not None: │ │ ❱ 197 │ │ │ self._outputs = pipeline_orchestrator.run_local_pipeline( │ │ 198 │ │ │ │ pipeline_spec=self.pipeline_spec, │ │ 199 │ │ │ │ arguments=args, │ │ 200 │ │ │ ) │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │ │ orchestrator.py:43 in run_local_pipeline │ │ │ │ 40 │ │ │ 41 │ # validate and access all global state in this function, not downstream │ │ 42 │ config.LocalExecutionConfig.validate() │ │ ❱ 43 │ return _run_local_pipeline_implementation( │ │ 44 │ │ pipeline_spec=pipeline_spec, │ │ 45 │ │ arguments=arguments, │ │ 46 │ │ raise_on_error=config.LocalExecutionConfig.instance.raise_on_error, │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │ │ orchestrator.py:108 in _run_local_pipeline_implementation │ │ │ │ 105 │ │ │ ) │ │ 106 │ │ return outputs │ │ 107 │ elif dag_status == status.Status.FAILURE: │ │ ❱ 108 │ │ log_and_maybe_raise_for_failure( │ │ 109 │ │ │ pipeline_name=pipeline_name, │ │ 110 │ │ │ fail_stack=fail_stack, │ │ 111 │ │ │ raise_on_error=raise_on_error, │ │ │ │ /Users/ihrachys/src/llama-stack-provider-kfp-trainer/.venv/lib/python3.12/site-packages/kfp/local/pipeline_ │ │ orchestrator.py:137 in log_and_maybe_raise_for_failure │ │ │ │ 134 │ │ logging_utils.format_task_name(task_name) for task_name in fail_stack) │ │ 135 │ msg = f'Pipeline {pipeline_name_with_color} finished with status │ │ {status_with_color}. Inner task failed: {task_chain_with_color}.' │ │ 136 │ if raise_on_error: │ │ ❱ 137 │ │ raise RuntimeError(msg) │ │ 138 │ with logging_utils.local_logger_context(): │ │ 139 │ │ logging.error(msg) │ │ 140 │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Pipeline [1m[95m'test-jobc3c2e1e4-859c-4852-a41d-ef29e55e3efa'[1m[0m finished with status [1m[91mFAILURE[1m[0m. Inner task failed: [1m[96m'component'[1m[0m. INFO 2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down DistributionInspectImpl INFO 2025-05-09 13:32:31,266 llama_stack.distribution.server.server:136 server: Shutting down ProviderImpl INFO: Application shutdown complete. INFO: Finished server process [26648] ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-06-19 17:01:33 +02:00
Sébastien Han	c20388c424	ci: add python package build test (#2457 ) # What does this PR do? We now test a package build on every PRs. Closes: https://github.com/meta-llama/llama-stack/issues/2406 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-19 18:57:32 +05:30
Sébastien Han	fa1d986f72	fix: remove asyncio.TimeoutError since Python update (#2476 ) # What does this PR do? Since we now support Pythong starting from 3.11, this is not needed anymore. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-19 18:52:41 +05:30
Sébastien Han	6039d922c0	fix: allow running vector tests with embedding dimension (#2467 ) Some checks failed Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 5s Details Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 28s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 24s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 30s Details Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 28s Details Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 23s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 22s Details Test Llama Stack Build / build (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 37s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 1m7s Details Test Llama Stack Build / build-single-provider (push) Failing after 1m15s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1m17s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m32s Details Pre-commit / pre-commit (push) Failing after 2m14s Details # What does this PR do? Do not force 384 for the embedding dimension, use the one provided by the test run. ## Test Plan ``` pytest -s -vvv tests/integration/vector_io/test_vector_io.py --stack-config=http://localhost:8321 \ -k "not(builtin_tool or safety_with_image or code_interpreter or test_rag)" \ --text-model="meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=granite-embedding-125m --embedding-dimension=768 Uninstalled 1 package in 16ms Installed 1 package in 11ms INFO 2025-06-18 10:52:03,314 tests.integration.conftest:59 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ================================================= test session starts ================================================= platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.5-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'cov': '6.0.0', 'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: cov-6.0.0, html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 8 items tests/integration/vector_io/test_vector_io.py::test_vector_db_retrieve[emb=granite-embedding-125m:dim=768] PASSED tests/integration/vector_io/test_vector_io.py::test_vector_db_register[emb=granite-embedding-125m:dim=768] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case0] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case1] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case2] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case3] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=granite-embedding-125m:dim=768-test_case4] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks_with_precomputed_embeddings[emb=granite-embedding-125m:dim=768] PASSED ================================================== 8 passed in 5.50s ================================================== ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-19 13:29:04 +05:30
Charlie Doern	d12f195f56	feat: drop python 3.10 support (#2469 ) # What does this PR do? dropped python3.10, updated pyproject and dependencies, and also removed some blocks of code with special handling for enum.StrEnum Closes #2458 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-06-19 12:07:14 +05:30
ehhuang	db2cd9e8f3	feat: support filters in file search (#2472 ) # What does this PR do? Move to use vector_stores.search for file search tool in Responses, which supports filters. closes #2435 ## Test Plan Added e2e test with fitlers. myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search and filters' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct	2025-06-18 21:50:55 -07:00
Ihar Hrachyshka	fd37a50e6a	chore: Remove @booxter from triagers (#2473 ) Sadly, I won't have capacity to continue working for the project. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-06-18 19:30:09 -07:00
ehhuang	e6bfc717cb	feat(ui): add infinite scroll pagination to chat completions/responses logs table (#2466 ) Some checks failed Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 5s Details Test External Providers / test-external-providers (venv) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 20s Details Unit Tests / unit-tests (3.11) (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Unit Tests / unit-tests (3.10) (push) Failing after 48s Details Unit Tests / unit-tests (3.12) (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 1m23s Details ## Summary: This commit adds infinite scroll pagination to the chat completions and responses tables. ## Test Plan: 1. Run unit tests: npm run test 2. Manual testing: Navigate to chat completions/responses pages 3. Verify infinite scroll triggers when approaching bottom 4. Added playwright tests: npm run test:e2e	2025-06-18 15:28:39 -07:00
Sumit Jaiswal	90d03552d4	feat: To add health check for faiss inline vector_io provider (#2319 ) Some checks failed Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 1m1s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m11s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m13s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m9s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Pre-commit / pre-commit (push) Successful in 1m52s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> To add health check for faiss inline vector_io provider. I tried adding `async def health(self) -> HealthResponse:` like in inference provider, but it didn't worked for `inline->vector_io->faiss` provider. And via debug logs, I understood the critical issue, that the health responses are being stored with the API name as the key, not as a nested dictionary with provider IDs. This means that all providers of the same API type (e.g., "vector_io") will share the same health response, and only the last one processed will be visible in the API response. I've created a patch file that fixes this issue by: - Storing the original get_providers_health method - Creating a patched version that correctly maps health responses to providers - Applying the patch to the `ProviderImpl` class Not an expert, so please let me know, if there can be any other workaround using which I can get the health status updated directly from `faiss.py`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Added unit tests to test the provider patch implementation in the PR. Adding a screenshot with the FAISS inline vector_io health status as "OK" ![faiss_health_check](https://github.com/user-attachments/assets/d769e762-890c-41ea-a596-5e90951f79a4)	2025-06-18 17:56:25 +02:00
github-actions[bot]	7d812e3bf0	build: Bump version to 0.2.11 Some checks failed Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 4s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 55s Details	2025-06-17 19:08:17 +00:00
Hardik Shah	822307e6d5	fix: Do not throw when listing vector stores (#2460 ) When trying to `list` vector_stores , if we cannot retrieve one, log an error and return all the ones that are valid. ### Test Plan ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` Also tested for `--stack-config fireworks`	2025-06-17 11:19:43 -07:00
Dalton Flanagan	53ac8532e4	fix: clarify bash requirement in install flow (#2450 ) Some checks failed Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Test Llama Stack Build / build-single-provider (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.10) (push) Failing after 4s Details Unit Tests / unit-tests (3.11) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 50s Details # What does this PR do? Our "run this line to get started" pipes into `sh`, but the default shell on Ubuntu (a common setup) is `dash`, which doesn't support `pipefail`: ``` dalton@ollama-test:~$ ls -l /usr/bin/sh lrwxrwxrwx 1 root root 4 Mar 31 2024 /usr/bin/sh -> dash ``` ``` $ curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh \| sh sh: 8: set: Illegal option -o pipefail ``` Let's be explicit with `bash`? It covers Linux, WSL, macOS and I doubt anyone's trying to run Llama Stack on embedded systems :) ## Test Plan ``` dalton@ollama-test:~/llama-stack$ cat install.sh \| sh This script must be run with bash dalton@ollama-test:~/llama-stack$ cat install.sh \| bash ❌ Docker or Podman is required. Install Docker: https://docs.docker.com/get-docker/ or Podman: https://podman.io/getting-started/installation ```	2025-06-17 13:03:28 +05:30
Ben Browning	94fcfb5674	fix: broken links on nvidia distro docs when rendered (#2446 ) # What does this PR do? The Nvidia distribution docs had some broken links when viewing the rendered docs site, where the deep links they were attempting into our code on GitHub weren't actually getting users to the intended destination. This updates those links to use the `{repopath}` helper we use elsewhere to generate valid deep links into the Llama Stack repository. ## Test Plan I generated the site locally after this change and ensured the links now resolve to their intended destination. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-17 13:02:13 +05:30
ehhuang	15f630e5da	feat: support pagination in inference/responses stores (#2397 ) Some checks failed Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 23s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 27s Details Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 44s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 41s Details Unit Tests / unit-tests (3.10) (push) Failing after 52s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.11) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Pre-commit / pre-commit (push) Successful in 2m0s Details # What does this PR do? ## Test Plan added unit tests	2025-06-16 22:43:35 -07:00
Varsha	6f1a935365	chore: Add OpenAI compatiblity for vLLM embeddings (#2448 ) Some checks failed Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 26s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 16s Details Test Llama Stack Build / generate-matrix (push) Successful in 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.11) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 47s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 49s Details Test Llama Stack Build / build-single-provider (push) Failing after 38s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s Details Pre-commit / pre-commit (push) Successful in 1m38s Details # What does this PR do? - Implement OpenAI-compatible embeddings endpoint in vLLM provider - Support both float and base64 encoding formats - Add proper error handling and response formatting <!-- If resolving an issue, uncomment and update the line below --> Closes #2447 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-16 19:06:05 -04:00
Jash Gulabrai	40e2c97915	feat: Add Nvidia e2e beginner notebook and tool calling notebook (#1964 ) # What does this PR do? This PR contains two sets of notebooks that serve as reference material for developers getting started with Llama Stack using the NVIDIA Provider. Developers should be able to execute these notebooks end-to-end, pointing to their NeMo Microservices deployment. 1. `beginner_e2e/`: Notebook that walks through a beginner end-to-end workflow that covers creating datasets, running inference, customizing and evaluating models, and running safety checks. 2. `tool_calling/`: Notebook that is ported over from the [Data Flywheel & Tool Calling notebook](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/nemo/data-flywheel) that is referenced in the NeMo Microservices docs. I updated the notebook to use the Llama Stack client wherever possible, and added relevant instructions. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - Both notebook folders contain READMEs with pre-requisites. To manually test these notebooks, you'll need to have a deployment of the NeMo Microservices Platform and update the `config.py` file with your deployment's information. - I've run through these notebooks manually end-to-end to verify each step works. [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-06-16 11:29:01 -04:00
Rohan Awhad	436c7aa751	feat: Add url field to PaginatedResponse and populate it using route … (#2419 ) Some checks failed Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s Details Unit Tests / unit-tests (3.11) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 50s Details Unit Tests / unit-tests (3.12) (push) Failing after 58s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m0s Details Pre-commit / pre-commit (push) Successful in 2m10s Details …path # What does this PR do? Closes #1847 Changes: - llama_stack/apis/common/responses.py: adds optional `url` field to PaginatedResponse - llama_stack/distribution/server/server.py: automatically populate the URL field with route path ## Test Plan - Built and ran llama stack server using the following cmds: ```bash export INFERENCE_MODEL=llama3.1:8b llama stack build --run --template ollama --image-type container llama stack run llama_stack/templates/ollama/run.yaml ``` - Ran `curl` to test if we are seeing the `url` param in response: ```bash curl -X 'GET' \ 'http://localhost:8321/v1/agents' \ -H 'accept: application/json' ``` - Expected and Received Output: `{"data":[],"has_more":false,"url":"/v1/agents"}` --------- Co-authored-by: Rohan Awhad <rawhad@redhat.com>	2025-06-16 11:19:48 +02:00
Hardik Shah	985d0b156c	feat: Add `suffix` to openai_completions (#2449 ) Some checks failed Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s Details Unit Tests / unit-tests (3.10) (push) Failing after 19s Details Unit Tests / unit-tests (3.11) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 58s Details For code completion apps need "fill in the middle" capabilities. Added option of `suffix` to `openai_completion` to enable this. Updated ollama provider to showcase the same. ### Test Plan ``` pytest -sv --stack-config="inference=ollama" tests/integration/inference/test_openai_completion.py --text-model qwen2.5-coder:1.5b -k test_openai_completion_non_streaming_suffix ``` ### OpenAI Sample script ``` from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/openai/v1") response = client.completions.create( model="qwen2.5-coder:1.5b", prompt="The capital of ", suffix="is Paris.", max_tokens=10, ) print(response.choices[0].text) ``` ### Output ``` France is ____. To answer this question, we ```	2025-06-13 16:06:06 -07:00
Varsha	2e8054bede	feat: Implement hybrid search in SQLite-vec (#2312 ) Some checks failed Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s Details Test Llama Stack Build / generate-matrix (push) Successful in 37s Details Test Llama Stack Build / build-single-provider (push) Failing after 37s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.11) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.10) (push) Failing after 17s Details Pre-commit / pre-commit (push) Successful in 2m0s Details # What does this PR do? Add support for hybrid search mode in SQLite-vec provider, which combines keyword and vector search for better results. The implementation: - Adds hybrid search mode as a new option alongside vector and keyword search - Implements query_hybrid method in SQLiteVecIndex that: - First performs keyword search to get candidate matches - Then applies vector similarity search on those candidates - Updates documentation to reflect the new search mode This change improves search quality by leveraging both semantic similarity and keyword matching, while maintaining backward compatibility with existing vector and keyword search modes. ## Test Plan ``` pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short /Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) =============================================================================================== test session starts =============================================================================================== platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 10 items tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED ``` --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-06-13 15:54:06 -04:00
Ben Browning	941f505eb0	feat: File search tool for Responses API (#2426 ) # What does this PR do? This is an initial working prototype of wiring up the `file_search` builtin tool for the Responses API to our existing rag knowledge search tool. This is me seeing what I could pull together on top of the bits we already have merged. This may not be the ideal way to implement this, and things like how I shuffle the vector store ids from the original response API tool request to the actual tool execution feel a bit hacky (grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see what I mean). ## Test Plan I stubbed in some new tests to exercise this using text and pdf documents. Note that this is currently under tests/verification only because it sometimes flakes with tool calling of the small Llama-3.2-3B model we run in CI (and that I use as an example below). We'd want to make the test a bit more robust in some way if we moved this over to tests/integration and ran it in CI. ### OpenAI SaaS (to verify test correctness) ``` pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=https://api.openai.com/v1 \ --model=gpt-4o ``` ### Fireworks with faiss vector store ``` llama stack run llama_stack/templates/fireworks/run.yaml pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.3-70B-Instruct ``` ### Ollama with faiss vector store This sometimes flakes on Ollama because the quantized small model doesn't always choose to call the tool to answer the user's question. But, it often works. ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" pytest -sv tests/verifications/openai_api/test_responses.py \ -k'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=meta-llama/Llama-3.2-3B-Instruct ``` ### OpenAI provider with sqlite-vec vector store ``` llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv pytest -sv tests/verifications/openai_api/test_responses.py \ -k 'file_search' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model=openai/gpt-4o-mini ``` ### Ensure existing vector store integration tests still pass ``` ollama run llama3.2:3b INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run ./llama_stack/templates/ollama/run.yaml \ --image-type venv \ --env OLLAMA_URL="http://0.0.0.0:11434" LLAMA_STACK_CONFIG=http://localhost:8321 \ pytest -sv tests/integration/vector_io \ --text-model "meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2 ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-13 14:32:48 -04:00
Francisco Arceo	554ada57b0	chore: Add OpenAI compatibility for Ollama embeddings (#2440 ) # What does this PR do? This PR adds OpenAI compatibility for Ollama embeddings. Closes https://github.com/meta-llama/llama-stack/issues/2428 Summary of changes: - `llama_stack/providers/remote/inference/ollama/ollama.py` - Implements the OpenAI embeddings endpoint for Ollama, replacing the NotImplementedError with a full function that validates the model, prepares parameters, calls the client, encodes embedding data (optionally in base64), and returns a correctly structured response. - Updates import statements to include the new embedding response utilities. - `llama_stack/providers/utils/inference/litellm_openai_mixin.py` - Refactors the embedding data encoding logic to use a new shared utility (`b64_encode_openai_embeddings_response`) instead of inline base64 encoding and packing logic. - Cleans up imports accordingly. - `llama_stack/providers/utils/inference/openai_compat.py` - Adds `b64_encode_openai_embeddings_response` to handle encoding OpenAI embedding outputs (including base64 support) in a reusable way. - Adds `prepare_openai_embeddings_params` utility for standardizing embedding parameter preparation. - Updates imports to include the new embedding data class. - `tests/integration/inference/test_openai_embeddings.py` - Removes `"remote::ollama"` from the list of providers that skip OpenAI embeddings tests, since support is now implemented. ## Note There was one minor issue, which required me to override the `OpenAIEmbeddingsResponse.model` name with `self._get_model(model).identifier` name, which is very unsatisfying. ## Test Plan Unit Tests and integration tests --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-06-13 14:28:51 -04:00
grs	e2e15ebb6c	feat(auth): allow token to be provided for use against jwks endpoint (#2394 ) Some checks failed Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m11s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m17s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m19s Details Pre-commit / pre-commit (push) Successful in 2m26s Details Though the jwks endpoint does not usually require authentication, it does in a kubernetes cluster. While the cluster can be configured to allow anonymous access to that endpoint, this avoids the need to do so.	2025-06-13 10:13:41 +02:00
Hardik Shah	ddaee42650	test: Update integration-tests.yml (#2443 ) Added `vector_io` to the CI integration tests.	2025-06-13 10:04:08 +02:00
Hardik Shah	fef670b024	feat: update openai tests to work with both clients (#2442 ) Some checks failed Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 18s Details Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s Details Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 20s Details Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 1m45s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m46s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m1s Details Unit Tests / unit-tests (3.10) (push) Failing after 2m3s Details Pre-commit / pre-commit (push) Successful in 3m11s Details https://github.com/meta-llama/llama-stack-client-python/pull/238 updated llama-stack-client to also support Open AI endpoints for embeddings, files, vector-stores. This updates the test to test all configs -- openai sdk, llama stack sdk and library-as-client.	2025-06-12 16:30:23 -07:00
Hardik Shah	0bc1747ed8	feat: update search for vector_stores (#2441 ) Updated the `search` functionality return response to match openai. ## Test Plan ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ```	2025-06-12 15:34:22 -07:00
Ibrahim Haroon	35c2817d0a	fix(weaviate): handle case where distance is 0 by setting score to infinity (#2415 ) Some checks failed Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 41s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 39s Details Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 41s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 42s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 38s Details Integration Tests / test-matrix (http, 3.10, providers) (push) Failing after 46s Details Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 44s Details Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 42s Details Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 43s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 40s Details Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 39s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m3s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m12s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m10s Details Pre-commit / pre-commit (push) Successful in 2m23s Details # What does this PR do? Fixes provider weaviate `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then sets `score` to infinity, which represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="weaviate", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ```	2025-06-12 11:23:59 -04:00
Sébastien Han	eb04731750	ci: fix external provider test (#2438 ) # What does this PR do? The test wasn't using the correct virtual environment. Also augment the console width for logs. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-12 16:14:32 +02:00
Hardik Shah	de37a04c3e	fix: set appropriate defaults for params (#2434 ) Some checks failed Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 20s Details Update ReadTheDocs / update-readthedocs (push) Failing after 17s Details Unit Tests / unit-tests (3.12) (push) Failing after 20s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m39s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m37s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m41s Details Pre-commit / pre-commit (push) Failing after 3h4m8s Details Setting defaults to be `\| None` else they get marked as required params in open-api spec.	2025-06-11 17:30:34 -07:00
Hardik Shah	d55100d9b7	feat: OpenAIVectorIOMixin for vector_stores common logic (#2427 ) Extracts common OpenAI vector-store code into its own mixin so that all providers can share the same core logic. This also makes it easy for Llama Stack to support both vector-stores and Llama Stack APIs in the interim so that both share the same underlying vector-dbs. Each provider contains storage specific logic to `create / edit / delete / list` vector dbs while the plumbing logic is standardized in the common code. Ensured that this works well with both faiss and sqllite-vec. ### Test Plan ``` llama stack run starter pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ```	2025-06-11 15:40:57 -07:00
Rohan Awhad	4e37b49cdc	fix: #1867 InferenceRouter has no attribute formatter (#2422 ) Some checks failed Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 49s Details Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 53s Details Integration Tests / test-matrix (http, 3.10, datasets) (push) Failing after 57s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 17s Details Integration Tests / test-matrix (http, 3.10, scoring) (push) Failing after 55s Details Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 50s Details Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 51s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 17s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 6s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 14s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 2m9s Details Unit Tests / unit-tests (3.11) (push) Failing after 2m7s Details Pre-commit / pre-commit (push) Failing after 3h13m50s Details # What does this PR do? Closes #1867 [Steps to reproduce the bug](https://github.com/meta-llama/llama-stack/issues/1867#issuecomment-2956819381) The change was designed to minimize code changes. Open to option of skipping `metrics` field entirely when `telemetry` is disabled. ## Test Plan 1. Build llama-stack remote-vllm container ```bash llama stack build --template remote-vllm --image-type container ``` 2. Create a small run.yaml ```yaml version: '2' image_name: remote-vllm apis: - inference providers: inference: - provider_id: vllm-inference provider_type: remote::vllm config: url: ${env.VLLM_URL:http://localhost:8000/v1} max_tokens: ${env.VLLM_MAX_TOKENS:4096} api_token: ${env.VLLM_API_TOKEN:fake} tls_verify: ${env.VLLM_TLS_VERIFY:true} metadata_store: type: sqlite db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db inference_store: type: sqlite db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/inference_store.db models: - metadata: {} model_id: ${env.INFERENCE_MODEL} provider_id: vllm-inference model_type: llm shields: [] vector_dbs: [] datasets: [] scoring_fns: [] benchmarks: [] server: port: 8321 ``` 3. Run the llama-stack server ```bash export VLLM_URL="http://localhost:8000/v1" export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack run run.yaml ``` 4. Then perform a curl ```bash curl -X 'POST' \ 'http://localhost:8321/v1/inference/completion' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model_id": "meta-llama/Llama-3.2-3B-Instruct", "content": "string", "sampling_params": { "strategy": { "type": "greedy" }, "max_tokens": 10, "repetition_penalty": 1, "stop": [ "string" ] }, "stream": false, "logprobs": { "top_k": 0 } }' ``` 5. You should receive a 200 response with metric values set to 0, similar to one below: ``` { "metrics": [ { "metric": "prompt_tokens", "value": 0, "unit": null }, { "metric": "completion_tokens", "value": 0, "unit": null }, { "metric": "total_tokens", "value": 0, "unit": null } ], [...] } ``` Co-authored-by: Rohan Awhad <rawhad@redhat.com>	2025-06-11 18:14:41 +02:00
Hardik Shah	5ac43268e8	feat: Add OpenAI compat /v1/vector_store APIs (#2423 ) Some checks failed Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 41s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 46s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 14s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 5s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 16s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m31s Details Unit Tests / unit-tests (3.11) (push) Failing after 1m33s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m35s Details Pre-commit / pre-commit (push) Failing after 3h13m41s Details Adding OpenAI compat `/v1/vector-store` apis. This PR implements the `faiss` provider with followup PRs coming up for other providers. Added routes to create, update, delete, list vector stores. Also added route to search a vector store Inserting into vector stores is missing and will be a follow up diff. ### Test Plan - Added new integration test for testing the faiss provider ``` pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ```	2025-06-10 13:07:39 -07:00
Ben Browning	ee57e58f29	fix: loosen tool call checks in inference store (#2420 ) # What does this PR do? This loosens up the tool call function name and arguments checks in `tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls` because the small models we use in CI cannot reliably get the tool call function name or arguments exactly right. Closes #2345 ## Test Plan I ran this flaking test in a loop, let it run many dozens of times, and didn't observe any flakes after the changes. Previously it flaked quite regularly. ``` while uv run pytest -s -v \ 'tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[llama_stack_client-txt=3B-False]' \ --stack-config=http://localhost:8321 \ --text-model="meta-llama/Llama-3.2-3B-Instruct" \ --embedding-model=all-MiniLM-L6-v2; do; sleep 0.1; done ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-10 14:45:55 +02:00
Yuan Tang	5639ad7466	docs: Add recent releases (#2424 ) Some checks failed Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m20s Details # What does this PR do? This adds recent release notes. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-06-10 08:43:02 +05:30
Yuan Tang	f6718b2408	fix(security): Upgrade requests to 2.32.4. Fixes CVE-2024-47081 (#2425 ) # What does this PR do? This address https://github.com/advisories/GHSA-9hjg-9r4m-mvj7. Diff was generated via: ``` uv sync --upgrade-package requests uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-06-10 08:33:28 +05:30
Ibrahim Haroon	28ca00d0d9	fix(pgvector): handle case where distance is 0 by setting score to infinity (#2416 ) Some checks failed Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? Fixes provider pgvector `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then sets `score` to infinity, which represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="pgvector", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ```	2025-06-07 16:31:30 -04:00
Ibrahim Haroon	a34cef925b	fix(faiss): handle case where distance is 0 by setting d to minimum positive… (#2387 ) # What does this PR do? Adds try-catch to faiss `query_vector` function for when the distance between the query embedding and an embedding within the vector db is 0 (identical vectors). Catches `ZeroDivisionError` and then appends `(1.0 / sys.float_info.min)` to `scores` to represent maximum similarity. <!-- If resolving an issue, uncomment and update the line below --> Closes [#2381] ## Test Plan Checkout this PR Execute this code and there will no longer be a `ZeroDivisionError` exception ``` from llama_stack_client import LlamaStackClient base_url = "http://localhost:8321" client = LlamaStackClient(base_url=base_url) models = client.models.list() embedding_model = ( em := next(m for m in models if m.model_type == "embedding") ).identifier embedding_dimension = 384 _ = client.vector_dbs.register( vector_db_id="foo_db", embedding_model=embedding_model, embedding_dimension=embedding_dimension, provider_id="faiss", ) chunk = { "content": "foo", "mime_type": "text/plain", "metadata": { "document_id": "foo-id" } } client.vector_io.insert(vector_db_id="foo_db", chunks=[chunk]) client.vector_io.query(vector_db_id="foo_db", query="foo") ``` ### Running unit tests `uv run pytest tests/unit/rag/test_rag_query.py -v` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com>	2025-06-07 16:09:46 -04:00
Sumit Jaiswal	33ecefd284	feat: To add health status check for remote VLLM (#2303 ) Some checks failed Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 56s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> To add health status check for remote VLLM <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> PR includes the unit test to test the added health check implementation feature.	2025-06-06 15:33:12 -04:00
Alexey Rybak	32c651e3a7	chore: update CODEOWNERS (#2414 ) Some checks failed Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 10s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Pre-commit / pre-commit (push) Successful in 1m12s Details	2025-06-06 20:35:15 +02:00
Hardik Shah	1f48577a02	fix: ChromaDB provider (#2413 ) fixes the remote::chromaDB provider for vector_io by updating the method definition appropriately. Fixed impl to use score_threshold properly. ### Test Plan ``` # Start Chroma Docker docker run --rm \ --name chromadb \ -p 8800:8000 \ -v ~/chroma:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ -e ANONYMIZED_TELEMETRY=FALSE \ chromadb/chroma:latest # run pytest CHROMADB_URL="http://localhost:8800" pytest -sv tests/integration/vector_io/test_vector_io.py --stack-config vector_io=remote::chromadb,inference=fireworks --embedding-model nomic-ai/nomic-embed-text-v1.5 ```	2025-06-06 11:25:58 -07:00
Sébastien Han	0d0b8d2be1	ci: use ollama container image with loaded models (#2410 ) Some checks failed Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s Details Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 16s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Pre-commit / pre-commit (push) Successful in 1m3s Details # What does this PR do? Instead of downloading the models each time we now have a single Ollama container that is baked with the models pulled and ready to use. This will remove the CI flakiness on model pulling. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-06 12:08:20 +02:00
github-actions[bot]	692709cd45	build: Bump version to 0.2.10 Some checks failed Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 27s Details Test Llama Stack Build / build (push) Failing after 7s Details Pre-commit / pre-commit (push) Failing after 1m16s Details	2025-06-05 22:56:39 +00:00
Hardik Shah	102516f33c	fix: Pin fastapi to avoid picking up spurious versions in test pypi (#2409 ) as titled	2025-06-05 15:33:30 -07:00
ehhuang	446893f791	feat: add deps dynamically based on metastore config (#2405 ) # What does this PR do? ## Test Plan changed metastore in one of the templates, rerun distro gen, observe change in build.yaml	2025-06-05 14:07:25 -07:00
ehhuang	92b59a3377	test: skip files integrations tests for library client (#2407 ) # What does this PR do? ## Test Plan LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/files/test_files.py::test_openai_client_basic_operations	2025-06-05 13:42:10 -07:00
ehhuang	ee6feaa2d5	chore: remove dead code (#2403 ) # What does this PR do? ## Test Plan	2025-06-05 21:17:54 +02:00
Hardik Shah	04592b9590	fix: update pyproject to include recursive LS deps (#2404 ) trying to run `llama` cli after installing wheel fails with this error ``` Traceback (most recent call last): File "/tmp/tmp.wdZath9U6j/.venv/bin/llama", line 4, in <module> from llama_stack.cli.llama import main File "/tmp/tmp.wdZath9U6j/.venv/lib/python3.10/site-packages/llama_stack/__init__.py", line 7, in <module> from llama_stack.distribution.library_client import ( # noqa: F401 ModuleNotFoundError: No module named 'llama_stack.distribution.library_client' ``` This PR fixes it by ensurring that all sub-directories of `llama_stack` are also included. Also, fixes the missing `fastapi` dependency issue.	2025-06-05 11:46:48 -07:00
Sébastien Han	4fb228a1d8	ci: run integration test on more python version (#2400 ) # What does this PR do? Expand the test matrix to include Python 3.10, 3.11, and 3.12 to ensure the project runs correctly on these versions. This will give us confidence to begin considering an increase to the project's minimum supported Python version. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-05 20:40:21 +02:00
Ashwin Bharambe	3251b44d8a	refactor: unify stream and non-stream impls for responses (#2388 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m18s Details The non-streaming version is just a small layer on top of the streaming version - just pluck off the final `response.completed` event and return that as the response! This PR also includes a couple other changes which I ended up making while working on it on a flight: - changes to `ollama` so it does not pull embedding models unconditionally - a small fix to library client to make the stream and non-stream cases a bit more symmetric	2025-06-05 17:48:09 +02:00
Jose Angel Morena Simon	ef885d2147	fix(server): Add missing OpenTelemetry dependencies to resolve telemetry import errors (#2391 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 7s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 35s Details Pre-commit / pre-commit (push) Successful in 1m20s Details This PR fixes a runtime import error caused by missing OpenTelemetry dependencies during `llama stack run`. Specifically, the following imports fail if `opentelemetry-sdk` and `opentelemetry-exporter-otlp-proto-http` are not present in the environment: ```python from opentelemetry import metrics, trace from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter ``` See [llama\_stack/providers/inline/telemetry/meta\_reference/telemetry.py#L10-L19](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py#L10-L19) This PR resolves the issue by adding both packages to the `SERVER_DEPENDENCIES` list: ```python "opentelemetry-sdk", "opentelemetry-exporter-otlp-proto-http", ``` ### Reproduction Steps ```bash llama stack build --config llama.yaml --image-type venv --image-name fun-with-lamas llama stack run ~/.llama/distributions/fun-with-lamas/fun-with-lamas-run.yaml ``` Results in: ``` ModuleNotFoundError: No module named 'opentelemetry' ``` or ``` ModuleNotFoundError: No module named 'opentelemetry.exporter' ``` Signed-off-by: Jose Angel Morena <jmorenas@redhat.com> Co-authored-by: raghotham <rsm@meta.com>	2025-06-05 09:34:46 +02:00
Nathan Weinberg	179d72615b	docs: update contributing guidance around uv python versions (#2398 ) As discussed with @leseb here: https://github.com/containers/ramalama-stack/pull/81#discussion_r2125961014 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-06-04 23:12:03 -07:00
ehhuang	a58c0639d5	chore: update postgres_demo distro config (#2396 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? ## Test Plan	2025-06-04 17:41:27 -07:00
Sébastien Han	c8c742ba45	fix: vllm starter name (#2392 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Test Llama Stack Build / build (push) Failing after 6s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Test External Providers / test-external-providers (venv) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 2m3s Details Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-04 16:21:36 +02:00
grs	0de9536717	fix: remove debug print accidentally merged (#2393 ) I accidentally left a debug print in a PR that was merged. This removes that.	2025-06-04 15:14:14 +02:00
Ben Browning	e9d9f01b8b	docs: Add OpenAI API compatibility page (#2316 ) # What does this PR do? This adds some initial content documenting our OpenAI compatible APIs - Responses, Chat Completions, Completions, and Models - along with instructions on how to use them via OpenAI or Llama Stack clients and some simple examples for each. It's not a lot of content, but it's a start so that users have some idea how to get going as we continue to work on these APIs. ## Test Plan I generated the docs site locally and verified things render properly. I also ran each code example to ensure it works as expected. And, I asked my AI code assistant to do a quick spell-check and review of the docs and it didn't flag any obvious errors. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-06-04 06:51:52 -04:00
Ashwin Bharambe	ed69c1b3cc	feat(responses): add more streaming response types (#2375 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 7s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 34s Details Pre-commit / pre-commit (push) Successful in 1m21s Details	2025-06-03 15:48:41 -07:00
ehhuang	d96f6ec763	chore(ui): use proxy server for backend API calls; simplified k8s deployment (#2350 ) # What does this PR do? - no more CORS middleware needed ## Test Plan ### Local test llama stack run starter --image-type conda npm run dev verify UI works in browser ### Deploy to k8s temporarily change ui-k8s.yaml.template to load from PR commit <img width="604" alt="image" src="https://github.com/user-attachments/assets/87fa2e52-1e93-4e32-9e0f-5b283b7a37b3" /> sh ./apply.sh $ kubectl get services go to external_ip:8322 and play around with UI <img width="1690" alt="image" src="https://github.com/user-attachments/assets/5b7ec827-4302-4435-a9eb-df423676d873" />	2025-06-03 14:57:10 -07:00
grs	7c1998db25	feat: fine grained access control policy (#2264 ) This allows a set of rules to be defined for determining access to resources. The rules are (loosely) based on the cedar policy format. A rule defines a list of action either to permit or to forbid. It may specify a principal or a resource that must match for the rule to take effect. It may also specify a condition, either a 'when' or an 'unless', with additional constraints as to where the rule applies. A list of rules is held for each type to be protected and tried in order to find a match. If a match is found, the request is permitted or forbidden depening on the type of rule. If no match is found, the request is denied. If no rules are specified for a given type, a rule that allows any action as long as the resource attributes match the user attributes is added (i.e. the previous behaviour is the default. Some examples in yaml: ``` model: - permit: principal: user-1 actions: [create, read, delete] comment: user-1 has full access to all models - permit: principal: user-2 actions: [read] resource: model-1 comment: user-2 has read access to model-1 only - permit: actions: [read] when: user_in: resource.namespaces comment: any user has read access to models with matching attributes vector_db: - forbid: actions: [create, read, delete] unless: user_in: role::admin comment: only user with admin role can use vector_db resources ``` --------- Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-06-03 14:51:12 -07:00
Ben Browning	8bee2954be	feat: Structured output for Responses API (#2324 ) # What does this PR do? This adds the missing `text` parameter to the Responses API that is how users control structured outputs. All we do with that parameter is map it to the corresponding chat completion response_format. ## Test Plan The new unit tests exercise the various permutations allowed for this property, while a couple of new verification tests actually use it for real to verify the model outputs are following the format as expected. Unit tests: `python -m pytest -s -v tests/unit/providers/agents/meta_reference/test_openai_responses.py` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -vv 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Note that the verification tests can only be run with a real Llama Stack server (as opposed to using the library client via `--provider=stack:together`) because the Llama Stack python client is not yet updated to accept this text field. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-03 14:43:00 -07:00
Ignas Baranauskas	c70ca8344f	fix: resolve template name to config path in `llama stack run` (#2361 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes a bug where running a known template by name using: `llama stack run ollama` would fail with the following error: `ValueError: Config file ollama does not exist` <!-- If resolving an issue, uncomment and update the line below --> Closes #2291 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> `llama stack run ollama` should work	2025-06-03 14:39:12 -07:00
Ashwin Bharambe	cba55808ab	feat(distro): add more providers to starter distro, prefix conflicting models (#2362 ) The name changes to the verifications file are unfortunate, but maybe we don't need that @ehhuang ? Edit: deleted the verifications template now	2025-06-03 12:10:46 -07:00
Ashwin Bharambe	b380cb463f	feat: add postgres deps to starter distro (#2360 ) Once we have this, we can use the starter distro for the Kubernetes cluster demos.	2025-06-03 11:04:23 -07:00
Jorge	e743257d1d	docs: Add missing dependencies in quickstart demo command (#2347 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / test-matrix (http, agents) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 31s Details Pre-commit / pre-commit (push) Successful in 1m17s Details Adds missing required dependencies to run the demo command in the Quickstart doc Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>	2025-06-03 18:01:36 +02:00
ehhuang	3c9a10d2fe	feat: reference implementation for files API (#2330 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 11s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 53s Details # What does this PR do? TSIA Added Files provider to the fireworks template. Might want to add to all templates as a follow-up. ## Test Plan llama-stack pytest tests/unit/files/test_files.py llama-stack llama stack build --template fireworks --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/files/	2025-06-02 21:54:24 -07:00
Ashwin Bharambe	ba25c5e7e1	docs(k8s): add UI template (#2343 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 5s Details Integration Tests / test-matrix (http, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Pre-commit / pre-commit (push) Successful in 55s Details WIP: add a UI template	2025-06-02 17:55:18 -07:00
Ben Browning	e92f571f47	fix: ollama chat completion needs unique ids (#2344 ) # What does this PR do? The chat completion ids generated by Ollama are not unique enough to use with stored chat completions as they rely on only 3 numbers of randomness to give unique values - ie `chatcmpl-373`. This causes frequent collisions in id values of chat completions in Ollama, which creates issues in our SQL storage of chat completions by id where it expects ids to actually be unique. So, this adjusts Ollama responses to use uuids as unique ids. This does mean we're replacing the ids generated natively by Ollama. If we don't wish to do this, we'll either need to relax the unique constraint on our chat completions id field in the inference storage or convince Ollama upstream to use something closer to uuid values here. Closes #2315 ## Test Plan I tested by running the openai completion / chat completion integration tests in a loop. Without this change, I regularly get unique id collisions. With this change, I do not. We sometimes see flakes from these unique id collisions in our CI tests, and this will resolve those. ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack run llama_stack/templates/ollama/run.yaml while true; do; \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ pytest -s -v \ tests/integration/inference/test_openai_completion.py \ --stack-config=http://localhost:8321 \ --text-model="meta-llama/Llama-3.2-3B-Instruct"; \ done ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-02 17:43:20 -07:00
ehhuang	4540c9b3e5	chore: revert llama-stack-client dep (#2342 ) # What does this PR do? ## Test Plan	2025-06-02 16:05:21 -07:00
Ashwin Bharambe	dbe4e84aca	feat(responses): implement full multi-turn support (#2295 ) I think the implementation needs more simplification. Spent way too much time trying to get the tests pass with models not co-operating :( Finally had to switch claude-sonnet to get things to pass reliably. ### Test Plan ``` export TAVILY_SEARCH_API_KEY=... export OPENAI_API_KEY=... uv run pytest -p no:warnings \ -s -v tests/verifications/openai_api/test_responses.py \ --provider=stack:starter \ --model openai/gpt-4o ```	2025-06-02 15:35:49 -07:00
ehhuang	cac7d404a2	fix: remove openai dep (#2337 ) # What does this PR do? 1. remove openai dep 2. temporarily update llama-stack-client to stainless sync'd branch as the responses/inputitems API wasn't included in the last push. This will automatically be updated to the next version in the release. ## Test Plan npm run dev go to any responses details page	2025-06-02 15:15:12 -07:00
Ashwin Bharambe	76dcf47320	docs(mcp): add a few lines for how to specify Auth headers in MCP tools (#2336 )	2025-06-02 14:28:38 -07:00
Sébastien Han	6bb174bb05	revert: "chore: Remove zero-width space characters from OTEL service" (#2331 ) # What does this PR do? Revert #2060 and fix PLE2515. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-02 14:21:35 -07:00
Hardik Shah	3511af7c33	fix: fireworks provider for openai compat inference endpoint (#2335 ) fixes provider to use stream var correctly Before ``` curl --request POST \ --url http://localhost:8321/v1/openai/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", "messages": [ { "role": "user", "content": "Who are you?" } ] }' {"detail":"Internal server error: An unexpected error occurred."} ``` After ``` llama-stack % curl --request POST \ --url http://localhost:8321/v1/openai/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "accounts/fireworks/models/llama4-scout-instruct-basic", "messages": [ { "role": "user", "content": "Who are you?" } ] }' {"id":"chatcmpl-97978538-271d-4c73-8d4d-c509bfb6c87e","choices":[{"message":{"role":"assistant","content":"I'm an AI assistant designed by Meta. I'm here to answer your questions, share interesting ideas and maybe even surprise you with a fresh perspective. What's on your mind?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1748896403,"model":"accounts/fireworks/models/llama4-scout-instruct-basic"}% ```	2025-06-02 14:11:15 -07:00
Ashwin Bharambe	7fb4bdabea	docs(kubernetes): add more fleshed-out example of a Demo Kubernetes cluster (#2329 ) This Kubernetes cluster has: - vLLM for serving an inference model - vLLM for serving a safety model - Postgres DB (for metadata and other state for the Llama Stack distro) - Chroma DB for Vector IO (memory) Perhaps most importantly, this was me trying to learn Kubernetes for the first time. ## Test Plan Run `sh apply.sh` against an EKS cluster, then after `kubectl port-forward service/llama-stack-service 8321:8321` and after many attempts, we have finally: <img width="1589" alt="image" src="https://github.com/user-attachments/assets/c69f242d-6aaa-4def-9f7c-172113b8bfc1" /> <img width="1978" alt="image" src="https://github.com/user-attachments/assets/cf678404-f551-4fa5-9077-bebe3e8e8ae8" />	2025-06-02 13:07:08 -07:00
ehhuang	31a3ae60f4	feat: openai files api (#2321 ) # What does this PR do? * Adds the OpenAI compatible Files API * Modified doc gen script to support multipart parameter ## Test Plan --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/2321). * #2330 * __->__ #2321	2025-06-02 11:45:53 -07:00
Ben Browning	17f4414be9	fix: remote-vllm event loop blocking unit test on Mac (#2332 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, agents) (push) Failing after 14s Details Integration Tests / test-matrix (http, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 29s Details Pre-commit / pre-commit (push) Successful in 1m11s Details # What does this PR do? The remote-vllm `test_chat_completion_doesnt_block_event_loop` unit test was often failing for me on a Mac with a `httpx.ReadError`. I traced this back to the swap to the `AsyncOpenAI` client in the remote-vllm provider as where this started, and it looks like the async client needs a bit more accurate HTTP request handling from our mock server. So, this fixes that unit test to send proper Content-Type and Content-Length headers which makes the `AsyncOpenAI` client happier on Macs. ## Test Plan All the test_remote_vllm.py unit tests consistently pass for me on a Mac now, without any flaking in the event loop one. `pytest -s -v tests/unit/providers/inference/test_remote_vllm.py` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-06-02 11:24:12 -04:00
Sébastien Han	1c0c6e1e17	chore: remove usage of load_tiktoken_bpe (#2276 )	2025-06-02 07:33:37 -07:00
Sébastien Han	af65207ebd	chore: help setuptools finding the project path (#2333 )	2025-06-02 07:20:46 -07:00
Mark Campbell	c7be73fb16	refactor: remove container from list of run image types (#2178 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, providers) (push) Failing after 8s Details Integration Tests / test-matrix (http, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (http, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 12s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s Details Pre-commit / pre-commit (push) Successful in 2m1s Details # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Removes the ability to run llama stack container images through the llama stack CLI Closes #2110 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Run: ``` llama stack run /path/to/run.yaml --image-type container ``` Expected outcome: ``` llama stack run: error: argument --image-type: invalid choice: 'container' (choose from 'conda', 'venv') ``` [//]: # (## Documentation)	2025-06-02 09:57:55 +02:00
Hardik Shah	b21050935e	feat: New OpenAI compat embeddings API (#2314 ) Some checks failed Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 15s Details Integration Tests / test-matrix (library, providers) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 43s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (http, inference) (push) Failing after 46s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 44s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 47s Details Integration Tests / test-matrix (http, providers) (push) Failing after 45s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 45s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 46s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 47s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 49s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? Adds a new endpoint that is compatible with OpenAI for embeddings api. `/openai/v1/embeddings` Added providers for OpenAI, LiteLLM and SentenceTransformer. ## Test Plan ``` LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/inference/test_openai_embeddings.py --embedding-model all-MiniLM-L6-v2,text-embedding-3-small,gemini/text-embedding-004 ```	2025-05-31 22:11:47 -07:00
Ben Browning	277f8690ef	fix: Responses streaming tools don't concatenate None and str (#2326 ) # What does this PR do? This adds a check to ensure we don't attempt to concatenate `None + str` or `str + None` when building up our arguments for streaming tool calls in the Responses API. ## Test Plan All existing tests pass with this change. Unit tests: ``` python -m pytest -s -v \ tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` Integration tests: ``` llama stack run llama_stack/templates/together/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ python -m pytest -s -v \ tests/integration/agents/test_openai_responses.py \ --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Verification tests: ``` llama stack run llama_stack/templates/together/run.yaml pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url=http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Additionally, the manual example using Codex CLI from #2325 now succeeds instead of throwing a 500 error. Closes #2325 Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-31 18:24:04 -07:00
Francisco Arceo	f328436831	feat: Enable ingestion of precomputed embeddings (#2317 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 8s Details Integration Tests / test-matrix (http, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 7s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 1m15s Details	2025-05-31 04:03:37 -06:00
Francisco Arceo	31ce208bda	fix: Fix requirements from broken github-actions[bot] (#2323 ) Some checks failed Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 11s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 40s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 47s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 45s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 45s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 47s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 46s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 49s Details Integration Tests / test-matrix (library, agents) (push) Failing after 48s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 1m33s Details	2025-05-30 19:05:47 -07:00
github-actions[bot]	ad15276da1	build: Bump version to 0.2.9 Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (http, providers) (push) Failing after 9s Details Integration Tests / test-matrix (http, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 8s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 7s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Pre-commit / pre-commit (push) Failing after 1m34s Details	2025-05-30 19:43:09 +00:00
ehhuang	2603f10f95	feat: support postgresql inference store (#2310 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, inference) (push) Failing after 13s Details Integration Tests / test-matrix (http, providers) (push) Failing after 15s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 16s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 18s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 16s Details Integration Tests / test-matrix (http, agents) (push) Failing after 19s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 16s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 18s Details Integration Tests / test-matrix (library, agents) (push) Failing after 18s Details Integration Tests / test-matrix (http, inference) (push) Failing after 20s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? * Added support postgresql inference store * Added 'oracle' template that demos how to config postgresql stores (except for telemetry, which is not supported currently) ## Test Plan llama stack build --template oracle --image-type conda --run LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v tests/integration/ --text-model accounts/fireworks/models/llama-v3p3-70b-instruct -k 'inference_store'	2025-05-29 14:33:09 -07:00
Jorge Piedrahita Ortiz	168c7113df	fix(providers): update sambanova json schema mode (#2306 ) # What does this PR do? Updates sambanova inference to use strict as false in json_schema structured output ## Test Plan pytest -s -v tests/integration/inference/test_text_inference.py --stack-config=sambanova --text-model=sambanova/Meta-Llama-3.3-70B-Instruct	2025-05-29 09:54:23 -07:00
Mark Campbell	f0d8ceb242	chore: fix flaky distro_codegen script (#2305 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Adds an import for all of the template modules before the executor to prevent deadlock <!-- If resolving an issue, uncomment and update the line below --> Closes #2278 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` # Run the pre-commit multiple times and verify the deadlock doesn't occur for i in {1..10}; do pre-commit run --all-files; done ```	2025-05-29 09:53:45 -07:00
Ashwin Bharambe	bfdd15d1fa	fix(responses): use input, not original_input when storing the Response (#2300 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 7s Details Integration Tests / test-matrix (http, providers) (push) Failing after 7s Details Integration Tests / test-matrix (http, agents) (push) Failing after 9s Details Integration Tests / test-matrix (http, inference) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 9s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 9s Details Integration Tests / test-matrix (library, inference) (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 8s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 11s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 5s Details Pre-commit / pre-commit (push) Failing after 53s Details We must store the full (re-hydrated) input not just the original input in the Response object. Of course, this is not very space efficient and we should likely find a better storage scheme so that we can only store unique entries in the database and then re-hydrate them efficiently later. But that can be done safely later. Closes https://github.com/meta-llama/llama-stack/issues/2299 ## Test Plan Unit test	2025-05-28 13:17:48 -07:00
Michael Dawson	a654467552	feat: add cpu/cuda config for prompt guard (#2194 ) # What does this PR do? Previously prompt guard was hard coded to require cuda which prevented it from being used on an instance without a cuda support. This PR allows prompt guard to be configured to use either cpu or cuda. [//]: # (If resolving an issue, uncomment and update the line below) Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133) ## Test Plan (Edited after incorporating suggestion) 1) started stack configured with prompt guard as follows on a system without a GPU and validated prompt guard could be used through the APIs 2) validated on a system with a gpu (but without llama stack) that the python selecting between cpu and cuda support returned the right value when a cuda device was available. 3) ran the unit tests as per - https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md [//]: # (## Documentation) --------- Signed-off-by: Michael Dawson <mdawson@devrus.com>	2025-05-28 12:23:15 -07:00
Sébastien Han	63a9f08c9e	chore: use starlette built-in Route class (#2267 ) # What does this PR do? Use a more common pattern and known terminology from the ecosystem, where Route is more approved than Endpoint. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:53:33 -07:00
ehhuang	56e5ddb39f	feat(ui): add views for Responses (#2293 ) # What does this PR do? * Add responses list and detail views * Refactored components to be shared as much as possible between chat completions and responses ## Test Plan <img width="2014" alt="image" src="https://github.com/user-attachments/assets/6dee12ea-8876-4351-a6eb-2338058466ef" /> <img width="2021" alt="image" src="https://github.com/user-attachments/assets/6c7c71b8-25b7-4199-9c57-6960be5580c8" /> added tests	2025-05-28 09:51:22 -07:00
Sébastien Han	6352078e4b	chore: use groups when running commands (#2298 ) # What does this PR do? Followup of https://github.com/meta-llama/llama-stack/pull/2287. We must use `--group` when running commands with uv. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-28 09:13:16 -07:00
Charlie Doern	a7ecc92be1	docs: add post training to providers list (#2280 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / test-matrix (http, inference) (push) Failing after 11s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 10s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 12s Details Integration Tests / test-matrix (http, agents) (push) Failing after 13s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, agents) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 11s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 11s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (library, inference) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 6s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m18s Details Pre-commit / pre-commit (push) Successful in 3m0s Details # What does this PR do? the providers list is missing post_training. Add that column and `HuggingFace`, `TorchTune`, and `NVIDIA NEMO` as supported providers. also point to these providers in docs/source/providers/index.md, and describe basic functionality There are other missing provider types here as well, but starting with this Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-28 09:32:00 -04:00
raghotham	9b7f9db05c	fix: build docs without requirements.txt (#2294 ) Some checks failed Integration Tests / test-matrix (http, providers) (push) Failing after 12s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (http, agents) (push) Failing after 44s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 44s Details Integration Tests / test-matrix (library, inference) (push) Failing after 12s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 46s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 44s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 46s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 48s Details Integration Tests / test-matrix (library, agents) (push) Failing after 46s Details Test External Providers / test-external-providers (venv) (push) Failing after 5s Details Integration Tests / test-matrix (http, inference) (push) Failing after 52s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 50s Details Unit Tests / unit-tests (3.10) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 1m34s Details Pre-commit / pre-commit (push) Successful in 3m21s Details Following the instructions here https://docs.readthedocs.com/platform/stable/build-customization.html#install-dependencies-with-uv as per https://github.com/meta-llama/llama-stack/pull/2223#issuecomment-2914315408	2025-05-27 16:27:57 -07:00
ehhuang	0b695538af	fix: chat completion with more than one choice (#2288 ) Some checks failed Integration Tests / test-matrix (http, inference) (push) Failing after 13s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 12s Details Integration Tests / test-matrix (library, providers) (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m33s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Integration Tests / test-matrix (library, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, providers) (push) Failing after 13s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 10s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (library, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 7s Details Integration Tests / test-matrix (http, agents) (push) Failing after 11s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 10s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 8s Details Pre-commit / pre-commit (push) Successful in 3m18s Details # What does this PR do? Fix a bug in openai_compat where choices are not indexed correctly. ## Test Plan Added a new test. Rerun the failed inference_store tests: llama stack run fireworks --image-type conda pytest -s -v tests/integration/ --stack-config http://localhost:8321 -k 'test_inference_store' --text-model meta-llama/Llama-3.3-70B-Instruct --count 10	2025-05-27 15:39:15 -07:00
ehhuang	1d46f3102e	fix: enable test_responses_store (#2290 ) # What does this PR do? Changed the test to not require tool_call in output, but still keeping the tools params there as a smoke test. ## Test Plan Used llama3.3 from fireworks (same as CI) <img width="1433" alt="image" src="https://github.com/user-attachments/assets/1e5fca98-9b4f-402e-a0bc-d9f910f2c207" /> Run with ollama distro and 3b model.	2025-05-27 15:37:28 -07:00
Sébastien Han	4f3f28f718	chore: use dependency-groups for dev (#2287 ) # What does this PR do? The previous `[project.optional-dependencies]` was misrepresenting what the packages were. They were NOT optional dependencies to the project but development dependencies. Unlike optional dependencies, development dependencies are local-only and will not be included in the project requirements when published to PyPI or other indexes. As such, development dependencies are not included in the [project] table. Additionally, the dev group is synced by default. Source: https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 23:00:17 +02:00
Sébastien Han	484abe3116	chore: bump uv version (#2289 ) # What does this PR do? To match the one used by the release bot. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 13:44:27 -07:00
github-actions[bot]	7105a25b0f	build: Bump version to 0.2.8	2025-05-27 20:28:29 +00:00
Ashwin Bharambe	5cdb29758a	feat(responses): add output_text delta events to responses (#2265 ) This adds initial streaming support to the Responses API. This PR makes sure that the _first_ inference call made to chat completions streams out. There's more to be done: - tool call output tokens need to stream out when possible - we need to loop through multiple rounds of inference and they all need to stream out. ## Test Plan Added a test. Executed as: ``` FIREWORKS_API_KEY=... \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Then, started a llama stack fireworks distro and tested against it like this: ``` OPENAI_API_KEY=blah \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-27 13:07:14 -07:00
Sébastien Han	6ee319ae08	fix: convert boolean string to boolean (#2284 ) # What does this PR do? Handles the case where the vllm config `tls_verify` is set to `false` or `true`. Closes: https://github.com/meta-llama/llama-stack/issues/2283 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 13:05:38 -07:00
Sébastien Han	a8f75d3897	chore: remove dependencies.json (#2281 ) # What does this PR do? It's not used anywhere in the build process. Ancient artifact from an old attempt of using sub packages to build distros. ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> N/A Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 10:26:57 -07:00
Mark Campbell	e7e9ec0379	chore: fix visible comments in pr template (#2279 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (http, agents) (push) Failing after 14s Details Integration Tests / test-matrix (http, providers) (push) Failing after 11s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 12s Details Integration Tests / test-matrix (http, inference) (push) Failing after 13s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 12s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 11s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 14s Details Integration Tests / test-matrix (library, inference) (push) Failing after 11s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 7s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Unit Tests / unit-tests (3.10) (push) Failing after 10s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Update ReadTheDocs / update-readthedocs (push) Failing after 10s Details Test Llama Stack Build / build (push) Failing after 12s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 1m18s Details Pre-commit / pre-commit (push) Successful in 3m15s Details # What does this PR do? This PR adds updated comments for the PR template as comments were showing up in PRs when they were not meant to	2025-05-27 15:42:33 +02:00
Mark Campbell	b2adaa3f60	docs: fix evals notebook preview (#2277 ) # What does this PR do? Fixes the preview of the Evals Benchmark Notebook ## Explanation I took the original notebook, opened it in Google Colab and downloaded it again from Colab. I then replaced the original with the new fixed version cc: @leseb Closes #2142 ## Test Plan You can view the nb preview from my fork https://github.com/Bobbins228/llama-stack/blob/fix-evals-nb/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb	2025-05-27 15:18:20 +02:00
Sébastien Han	448f00903d	chore: mark blobpath as optional (#2271 ) # What does this PR do? This is not a core dependency of the distro server. It's only necessary when using `inline::rag-runtime` or `inline::meta-reference` providers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-27 10:55:24 +02:00
Ignas Baranauskas	28930cdab6	fix: handle None external_providers_dir in build with run arg (#2269 ) # What does this PR do? Fixes an issue where running `llama stack build --template ollama --image-type venv --run` fails with a TypeError when validating external providers directory paths. The error occurs because `os.path.exists()` is called with `Path(None)` instead of converting it to a string first. This change ensures consistent handling of `None` values for `external_providers_dir` across both build and [run](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/cli/stack/run.py#L134) commands by using `str()` conversion before path validation. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ```bash INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --run ``` Command completes successfully without TypeError [//]: # (## Documentation)	2025-05-27 09:41:12 +02:00
Ashwin Bharambe	7504c2f430	test: disable test_inference_store test urrrggg (#2273 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 45s Details Integration Tests / test-matrix (http, agents) (push) Failing after 51s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 49s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 50s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 52s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (library, agents) (push) Failing after 54s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 57s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 52s Details Integration Tests / test-matrix (http, inference) (push) Failing after 58s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 55s Details Integration Tests / test-matrix (http, providers) (push) Failing after 56s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 10s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 9s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 13s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Unit Tests / unit-tests (3.11) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m41s Details Pre-commit / pre-commit (push) Successful in 3m32s Details	2025-05-26 22:48:41 -07:00
Ashwin Bharambe	51e6f529f3	fix: index non-MCP toolgroups at registration time (#2272 ) Two somewhat annoying fixes: - we are going to index tools for non-MCP toolgroups always (like we used to do). because there are just random assumptions in our tests, etc. and I don't want to fix them right now - we need to handle the funny case of toolgroups like `builtin::rag/knowledge_search` where we added the tool name to use in the toolgroup itself.	2025-05-26 20:33:36 -07:00
Sébastien Han	39b33a3b01	chore: allow to pass CA cert to remote vllm (#2266 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s Details Integration Tests / test-matrix (http, inference) (push) Failing after 22s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 28s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 29s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 30s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 18s Details Integration Tests / test-matrix (library, agents) (push) Failing after 28s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 35s Details Integration Tests / test-matrix (http, agents) (push) Failing after 37s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 34s Details Integration Tests / test-matrix (http, providers) (push) Failing after 35s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m18s Details Pre-commit / pre-commit (push) Successful in 3m12s Details # What does this PR do? The `tls_verify` can now receive a path to a certificate file if the endpoint requires it. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-26 20:59:03 +02:00
Sébastien Han	7710b2f43b	chore: removed unused class (#2268 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-26 08:41:37 -07:00
Ashwin Bharambe	9623d5d230	fix: match mcp headers in provider data to Responses API shape (#2263 )	2025-05-25 14:33:10 -07:00
Ashwin Bharambe	ce33d02443	fix(tools): do not index tools, only index toolgroups (#2261 ) When registering a MCP endpoint, we cannot list tools (like we used to) since the MCP endpoint may be behind an auth wall. Registration can happen much sooner (via run.yaml). Instead, we do listing only when the _user_ actually calls listing. Furthermore, we cache the list in-memory in the server. Currently, the cache is not invalidated -- we may want to periodically re-list for MCP servers. Note that they must call `list_tools` before calling `invoke_tool` -- we use this critically. This will enable us to list MCP servers in run.yaml ## Test Plan Existing tests, updated tests accordingly.	2025-05-25 13:27:52 -07:00
raghotham	5a422e236c	chore: make cprint write to stderr (#2250 ) Also do sys.exit(1) in case of errors	2025-05-24 23:39:57 -07:00
raghotham	c25bd0ad58	fix: use pypi browser agent (#2260 ) Getting this error from pypi of late ``` 'python-requests/2.32.3 User-Agents are currently blocked from accessing JSON release resources. A cluster is apparently crawling all project/release resources resulting in excess cache misses. Please contact admin@pypi.org if you have information regarding what this software may be.' ```	2025-05-24 23:26:30 -07:00
Ashwin Bharambe	298721c238	chore: split routing_tables into individual files (#2259 )	2025-05-24 23:15:05 -07:00
Ashwin Bharambe	eedf21f19c	chore: split routers into individual files (inference, tool, vector_io, eval_scoring) (#2258 )	2025-05-24 22:59:07 -07:00
Ashwin Bharambe	ae7272d8ff	chore: split routers into individual files (datasets) (#2249 )	2025-05-24 22:11:43 -07:00
Ashwin Bharambe	a2160dc0af	chore: split routers into individual files (safety) Reviewers: bbrowning, leseb, ehhuang, terrytangyuan, raghotham, yanxi0830, hardikjshah Reviewed By: raghotham Pull Request: https://github.com/meta-llama/llama-stack/pull/2248	2025-05-24 22:00:32 -07:00
Ashwin Bharambe	c290999c63	fix(telemetry): get rid of annoying sqlite span export error (#2245 )	2025-05-24 20:24:34 -07:00
Ashwin Bharambe	3faf1e4a79	feat: enable MCP execution in Responses impl (#2240 ) ## Test Plan ``` pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:together --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-24 14:20:42 -07:00
Ashwin Bharambe	66f09f24ed	fix: disable test_responses_store (#2244 ) The test depends on llama's tool calling ability. In the CI, we run with a small ollama model. The fix might be to check for either message or function_call because the model is flaky and we aren't really testing that behavior?	2025-05-24 08:18:06 -07:00
raghotham	84751f3e55	fix: skip failing tests (#2243 ) as title. trying release 0.2.8	2025-05-24 07:31:08 -07:00
Yuan Tang	a411029d7e	docs: Update CHANGELOG.md (#2241 ) # What does this PR do? This PR adds release notes for recent releases. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-05-24 07:06:36 -07:00
ehhuang	15b0a67555	feat: add responses input items api (#2239 ) # What does this PR do? TSIA ## Test Plan added integration and unit tests	2025-05-24 07:05:53 -07:00
Yuan Tang	055f48b6a2	fix(security): Upgrade setuptools to v80.8.0. Fixes CVE-2025-47273 (#2242 ) # What does this PR do? This fixes a high vulnerable CVE in `setuptools`: https://github.com/advisories/GHSA-5rjg-fvgr-3xxf Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-24 06:57:24 -07:00
ehhuang	ca65617a71	feat: start ui server in `llama stack run` (#2170 ) # What does this PR do? TSIA `--enable-ui` to enable ## Test Plan `llama stack run dev --image-type conda --enable-ui` `localhost:8322` shows UI llama stack run dev --image-type conda `localhost:8322` does not work	2025-05-23 20:00:09 -07:00
ehhuang	5844c2da68	feat: add list responses API (#2233 ) # What does this PR do? This is not part of the official OpenAI API, but we'll use this for the logs UI. In order to support more filtering options, I'm adopting the newly introduced sql store in in place of the kv store. ## Test Plan Added integration/unit tests.	2025-05-23 13:16:48 -07:00
Ashwin Bharambe	6463ee7633	feat: allow using llama-stack-library-client from verifications (#2238 ) Having to run (and re-run) a server while running verifications can be annoying while you are iterating on code. This makes it so you can use the library client -- and because it is OpenAI client compatible, it all works. ## Test Plan ``` pytest -s -v tests/verifications/openai_api/test_responses.py \ --provider=stack:together \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-23 11:43:41 -07:00
Ashwin Bharambe	558d109ab7	fix: signature change to match OpenAI SDK (#2237 )	2025-05-23 10:59:30 -07:00
ehhuang	b054023800	chore: add sqlalchemy to test dependencies (#2236 ) # What does this PR do? ## Test Plan	2025-05-23 10:33:38 -07:00
Ashwin Bharambe	51945f1e57	feat: accept MCP authorization headers for MCP toolgroups (#2230 ) The most interesting MCP servers are those with an authorization wall in front of them. This PR uses the existing `provider_data` mechanism of passing provider API keys for passing MCP access tokens (in fact, arbitrary headers in the style of the OpenAI Responses API) from the client through to the MCP server. ``` class MCPProviderDataValidator(BaseModel): # mcp_endpoint => list of headers to send mcp_headers: dict[str, list[str]] \| None = None ``` Note how we must stuff the headers for all MCP endpoints into a single "MCPProviderDataValidator". Unlike existing providers (e.g., Together and Fireworks for inference) where we could name the provider api keys clearly (`together_api_key`, `fireworks_api_key`), we cannot name these keys for MCP. We have a single generic MCP provider which can serve multiple "toolgroups". So we use a dict to combine all the headers for all MCP endpoints you may want to use in an agentic call. ## Test Plan See the added integration test for usage.	2025-05-23 08:52:18 -07:00
ehhuang	2708312168	feat(ui): implement chat completion views (#2201 ) # What does this PR do? Implements table and detail views for chat completions <img width="1548" alt="image" src="https://github.com/user-attachments/assets/01061b7f-0d47-4b3b-b5ac-2df8f9035ef6" /> <img width="1549" alt="image" src="https://github.com/user-attachments/assets/738d8612-8258-4c2c-858b-bee39030649f" /> ## Test Plan npm run test	2025-05-22 22:05:54 -07:00
Ashwin Bharambe	d8c6ab9bfc	feat: add MCP tool signature to Responses API (#2232 )	2025-05-22 16:43:08 -07:00
ehhuang	8feb1827c8	fix: openai provider model id (#2229 ) # What does this PR do? Since https://github.com/meta-llama/llama-stack/pull/2193 switched to openai sdk, we need to strip 'openai/' from the model_id ## Test Plan start server with openai provider and send a chat completion call	2025-05-22 14:51:01 -07:00
ehhuang	549812f51e	feat: implement get chat completions APIs (#2200 ) # What does this PR do? * Provide sqlite implementation of the APIs introduced in https://github.com/meta-llama/llama-stack/pull/2145. * Introduced a SqlStore API: llama_stack/providers/utils/sqlstore/api.py and the first Sqlite implementation * Pagination support will be added in a future PR. ## Test Plan Unit test on sql store: <img width="1005" alt="image" src="https://github.com/user-attachments/assets/9b8b7ec8-632b-4667-8127-5583426b2e29" /> Integration test: ``` INFERENCE_MODEL="llama3.2:3b-instruct-fp16" llama stack build --template ollama --image-type conda --run ``` ``` LLAMA_STACK_CONFIG=http://localhost:5001 INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-fp16" -k 'inference_store and openai' ```	2025-05-21 22:21:52 -07:00
Jorge Piedrahita Ortiz	633bb9c5b3	feat(providers): sambanova safety provider (#2221 ) # What does this PR do? Includes SambaNova safety adaptor to use the sambanova cloud served Meta-Llama-Guard-3-8B minor updates in sambanova docs ## Test Plan pytest -s -v tests/integration/safety/test_safety.py --stack-config=sambanova --safety-shield=sambanova/Meta-Llama-Guard-3-8B	2025-05-21 15:33:02 -07:00
Sébastien Han	02e5e8a633	fix: only print routes that match the runtime config (#2226 ) # What does this PR do? We now only print the 'active' routes, not all the possible routes. This is based on the distribution server config by looking at enabled APIs and their respective providers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 15:30:29 -07:00
Sébastien Han	37f1e8a7f7	fix: use proper service account for kube auth (#2227 ) # What does this PR do? Not sure why it passed CI earlier... Strange only 24 workflows run here https://github.com/meta-llama/llama-stack/pull/2216 so the test never ran... Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 15:28:21 -07:00
Varsha	e92301f2d7	feat(sqlite-vec): enable keyword search for sqlite-vec (#1439 ) # What does this PR do? This PR introduces support for keyword based FTS5 search with BM25 relevance scoring. It makes changes to the existing EmbeddingIndex base class in order to support a search_mode and query_str parameter, that can be used for keyword based search implementations. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan run ``` pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` Output: ``` pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto /Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ====================================================== test session starts ======================================================= platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0 asyncio: mode=auto, asyncio_default_fixture_loop_scope=None collected 7 items llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED ``` For reference, with the implementation, the fts table looks like below: ``` Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0 Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0 Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0 Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0 Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0 Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0 Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0 Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0 Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0 Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0 Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1 Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1 Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1 Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1 Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1 ``` Query results: Result 1: Sentence 5 from document 0 Result 2: Sentence 5 from document 1 Result 3: Sentence 5 from document 2 [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-05-21 15:24:24 -04:00
Sébastien Han	85b5f3172b	docs: misc cleanup (#2223 ) # What does this PR do? * remove requirements.txt to use pyproject.toml as the source of truth * update relevant docs Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:35:27 +02:00
Sébastien Han	6a62e783b9	chore: refactor workflow writting (#2225 ) # What does this PR do? Use a composite action to avoid similar steps repetitions and centralization of the defaults. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:31:14 +02:00
Sébastien Han	1862de4be5	chore: clarify cache_ttl to be key_recheck_period (#2220 ) # What does this PR do? The cache_ttl config value is not in fact tied to the lifetime of any of the keys, it represents the time interval between for our key cache refresher. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:30:23 +02:00
Sébastien Han	c25acedbcd	chore: remove k8s auth in favor of k8s jwks endpoint (#2216 ) # What does this PR do? Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our recent oauth2 recent implementation. The CI test has been kept intact for validation. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 16:23:54 +02:00
liangwen12year	2890243107	feat(quota): add server‑side per‑client request quotas (requires auth) (#2096 ) # What does this PR do? feat(quota): add server‑side per‑client request quotas (requires auth) Unrestricted usage can lead to runaway costs and fragmented client-side workarounds. This commit introduces a native quota mechanism to the server, giving operators a unified, centrally managed throttle for per-client requests—without needing extra proxies or custom client logic. This helps contain cloud-compute expenses, enables fine-grained usage control, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Notice that Quotas are fully opt-in and require authentication to be enabled. The 'sqlite' is the only supported quota `type` at this time, any other `type` will be rejected. And the only supported `period` is 'day'. Highlights: - Adds `QuotaMiddleware` to enforce per-client request quotas: - Uses `Authorization: Bearer <client_id>` (from AuthenticationMiddleware) - Tracks usage via a SQLite-based KV store - Returns 429 when the quota is exceeded - Extends `ServerConfig` with a `quota` section (type + config) - Enforces strict coupling: quotas require authentication or the server will fail to start Behavior changes: - Quotas are disabled by default unless explicitly configured - SQLite defaults to `./quotas.db` if no DB path is set - The server requires authentication when quotas are enabled To enable per-client request quotas in `run.yaml`, add: ``` server: port: 8321 auth: provider_type: "custom" config: endpoint: "https://auth.example.com/validate" quota: type: sqlite config: db_path: ./quotas.db limit: max_requests: 1000 period: day [//]: # (If resolving an issue, uncomment and update the line below) Closes #2093 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Wen Liang <wenliang@redhat.com> Co-authored-by: Wen Liang <wenliang@redhat.com>	2025-05-21 10:58:45 +02:00
Abhishek koserwal	5a3d777b20	feat: add llama stack rm command (#2127 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` llama stack rm llamastack-test ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) #225 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-05-21 10:25:51 +02:00
grs	091d8c48f2	feat: add additional auth provider that uses oauth token introspection (#2187 ) # What does this PR do? This adds an alternative option to the oauth_token auth provider that can be used with existing authorization services which support token introspection as defined in RFC 7662. This could be useful where token revocation needs to be handled or where opaque tokens (or other non jwt formatted tokens) are used ## Test Plan Tested against keycloak Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-20 19:45:11 -07:00
grs	87a4b9cb28	fix: synchronize concurrent coroutines checking & updating key set (#2215 ) # What does this PR do? This PR adds a lock to coordinate concurrent coroutines passing through the jwt verification. As _refresh_jwks() was setting _jwks to an empty dict then repopulating it, having multiple coroutines doing this concurrently risks losing keys. The PR also builds the updated dict as a separate object and assigns it to _jwks once completed. This avoids impacting any coroutines using the key set as it is being updated. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-20 10:00:44 -07:00
Derek Higgins	3339844fda	feat: Add "instructions" support to responses API (#2205 ) # What does this PR do? Add support for "instructions" to the responses API. Instructions provide a way to swap out system (or developer) messages in new responses. ## Test Plan unit tests added Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-20 09:52:10 -07:00
Jash Gulabrai	1a770cf8ac	fix: Pass model parameter as config name to NeMo Customizer (#2218 ) # What does this PR do? When launching a fine-tuning job, an upcoming version of NeMo Customizer will expect the `config` name to be formatted as `namespace/name@version`. Here, `config` is a reference to a model + additional metadata. There could be multiple `config`s that reference the same base model. This PR updates NVIDIA's `supervised_fine_tune` to simply pass the `model` param as-is to NeMo Customizer. Currently, it expects a specific, allowlisted llama model (i.e. `meta/Llama3.1-8B-Instruct`) and converts it to the provider format (`meta/llama-3.1-8b-instruct`). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan From a notebook, I built an image with my changes: ``` !llama stack build --template nvidia --image-type venv from llama_stack.distribution.library_client import LlamaStackAsLibraryClient client = LlamaStackAsLibraryClient("nvidia") client.initialize() ``` And could successfully launch a job: ``` response = client.post_training.supervised_fine_tune( job_uuid="", model="meta/llama-3.2-1b-instruct@v1.0.0+A100", # Model passed as-is to Customimzer ... ) job_id = response.job_uuid print(f"Created job with ID: {job_id}") Output: Created job with ID: cust-Jm4oGmbwcvoufaLU4XkrRU ``` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-05-20 09:51:39 -07:00
Sébastien Han	2eae8568e1	chore: collapse all local hook under the same repo (#2217 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-20 09:51:09 -07:00
Sébastien Han	3f6368d56c	ci: enable ruff output format for github (#2214 ) # What does this PR do? Update output format to enable automatic inline annotations. ![Screenshot 2025-05-20 at 10 55 38](https://github.com/user-attachments/assets/f943aa00-9b60-4cdb-b434-67b2de8b79f2) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-20 09:04:03 -07:00
Francisco Arceo	90d7612f5f	chore: Updated readme (#2219 ) # What does this PR do? chore: Updated readme [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-05-20 17:06:20 +02:00
Francisco Arceo	ed7b4731aa	fix: Setting default value for `metadata_token_count` in case the key is not found (#2199 ) # What does this PR do? If a user has previously serialized data into their vector store without the `metadata_token_count` in the chunk, the `query` method will fail in a server error. This fixes that edge case by returning 0 when the key is not detected. This solution is suboptimal but I think it's better to understate the token size rather than recalculate it and add unnecessary complexity to the retrieval code. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-05-20 08:03:22 -04:00
Ben Browning	6d20b720b8	feat: Propagate W3C trace context headers from clients (#2153 ) # What does this PR do? This extracts the W3C trace context headers (traceparent and tracestate) from incoming requests, stuffs them as attributes on the spans we create, and uses them within the tracing provider implementation to actually wrap our spans in the proper context. What this means in practice is that when a client (such as an OpenAI client) is instrumented to create these traces, we'll continue that distributed trace within Llama Stack as opposed to creating our own root span that breaks the distributed trace between client and server. It's slightly awkward to do this in Llama Stack because our Tracing API knows nothing about opentelemetry, W3C trace headers, etc - that's only knowledge the specific provider implementation has. So, that's why the trace headers get extracted by in the server code but not actually used until the provider implementation to form the proper context. This also centralizes how we were adding the `__root__` and `__root_span__` attributes, as those two were being added in different parts of the code instead of from a single place. Closes #2097 ## Test Plan This was tested manually using the helpful scripts from #2097. I verified that Llama Stack properly joined the client's span when the client was instrumented for distributed tracing, and that Llama Stack properly started its own root span when the incoming request was not part of an existing trace. Here's an example of the joined spans: ![Screenshot 2025-05-13 at 8 46 09 AM](https://github.com/user-attachments/assets/dbefda28-9faa-4339-a08d-1441efefc149) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-19 18:56:54 -07:00
Sébastien Han	82778ecbb0	fix: remove wrong deprecated warning (#2202 ) # What does this PR do? `--yaml-config` is gone now with https://github.com/meta-llama/llama-stack/pull/2196. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-19 13:02:23 -07:00
Michael Anstis	0cc0731189	fix: Pass external_config_dir to BuildConfig (#2190 ) # What does this PR do? The `external_config_dir` configuration parameter is not being passed to the `BuildConfig` for `LlamaStackAsLibraryClient`. This prevents _plugin_ providers from being loaded when `llama-stack` is uses as a library. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I ran `LlamaStackAsLibraryClient` with a configuration file that contained `external_config_dir` and related configuration. It does not work without this change: _external_ providers are not resolved. It does work with this change 👍 [//]: # (## Documentation)	2025-05-19 14:01:28 +02:00
ehhuang	047303e339	feat: introduce APIs for retrieving chat completion requests (#2145 ) # What does this PR do? This PR introduces APIs to retrieve past chat completion requests, which will be used in the LS UI. Our current `Telemetry` is ill-suited for this purpose as it's untyped so we'd need to filter by obscure attribute names, making it brittle. Since these APIs are 'provided by stack' and don't need to be implemented by inference providers, we introduce a new InferenceProvider class, containing the existing inference protocol, which is implemented by inference providers. The APIs are OpenAI-compliant, with an additional `input_messages` field. ## Test Plan This PR just adds the API and marks them provided_by_stack. S tart stack server -> doesn't crash	2025-05-18 21:43:19 -07:00
Ashwin Bharambe	c7015d3d60	feat: introduce OAuth2TokenAuthProvider and notion of "principal" (#2185 ) This PR adds a notion of `principal` (aka some kind of persistent identity) to the authentication infrastructure of the Stack. Until now we only used access attributes ("claims" in the more standard OAuth / OIDC setup) but we need the notion of a User fundamentally as well. (Thanks @rhuss for bringing this up.) This value is not yet _used_ anywhere downstream but will be used to segregate access to resources. In addition, the PR introduces a built-in JWT token validator so the Stack does not need to contact an authentication provider to validating the authorization and merely check the signed token for the represented claims. Public keys are refreshed via the configured JWKS server. This Auth Provider should overwhelmingly be considered the default given the seamless integration it offers with OAuth setups.	2025-05-18 17:54:19 -07:00
dependabot[bot]	1341916caf	chore(github-deps): bump astral-sh/setup-uv from 5.4.1 to 6.0.1 (#2197 )	2025-05-18 02:09:56 -04:00
Matthew Farrellee	f40693e720	feat: --image-type argument overrides value in --config build.yaml (#2179 ) closes #2162 # test plan run `llama stack build --image-name ollama --image-type <venv/conda/container> --config llama_stack/templates/ollama/build.yaml` and verify venv \| conda \| container are built.	2025-05-16 14:45:41 -07:00
Charlie Doern	f02f7b28c1	feat: add huggingface post_training impl (#2132 ) # What does this PR do? adds an inline HF SFTTrainer provider. Alongside touchtune -- this is a super popular option for running training jobs. The config allows a user to specify some key fields such as a model, chat_template, device, etc the provider comes with one recipe `finetune_single_device` which works both with and without LoRA. any model that is a valid HF identifier can be given and the model will be pulled. this has been tested so far with CPU and MPS device types, but should be compatible with CUDA out of the box The provider processes the given dataset into the proper format, establishes the various steps per epoch, steps per save, steps per eval, sets a sane SFTConfig, and runs n_epochs of training if checkpoint_dir is none, no model is saved. If there is a checkpoint dir, a model is saved every `save_steps` and at the end of training. ## Test Plan re-enabled post_training integration test suite with a singular test that loads the simpleqa dataset: https://huggingface.co/datasets/llamastack/simpleqa and a tiny granite model: https://huggingface.co/ibm-granite/granite-3.3-2b-instruct. The test now uses the llama stack client and the proper post_training API runs one step with a batch_size of 1. This test runs on CPU on the Ubuntu runner so it needs to be a small batch and a single step. [//]: # (## Documentation) --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-05-16 14:41:28 -07:00
Matthew Farrellee	8f9964f46b	fix: update llama stack build --run to use new start_stack.sh signature (#2191 ) # What does this PR do? fixes #2188 ## Test Plan `INFERENCE_MODEL=meta-llama/Llama-3.3-70B-Instruct llama stack build --image-name ollama --image-type conda --template ollama --run` without error	2025-05-16 14:32:02 -07:00
Charlie Doern	1ae61e8d5f	fix: replace all instances of --yaml-config with --config (#2196 ) # What does this PR do? start_stack.sh was using --yaml-config which is deprecated. a bunch of distro docs also mentioned --yaml-config. Replaces all instances and logic for --yaml-config with --config resolves #2189 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-05-16 14:31:12 -07:00
github-actions[bot]	65cf076f13	build: Bump version to 0.2.7	2025-05-16 20:32:06 +00:00
grs	b8f7e1504d	feat: allow the interface on which the server will listen to be configured (#2015 ) # What does this PR do? It may not always be desirable to listen on all interfaces, which is the default. As an example, by listening instead only on a loopback interface, the server cannot be reached except from within the host it is run on. This PR makes this configurable, through a CLI option, an env var or an entry on the config file. ## Test Plan I ran a server with and without the added CLI argument to verify that the argument is used if provided, but the default is as it was before if not. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-16 12:59:31 -07:00
Matthew Farrellee	64f8d4c3ad	feat: use openai-python for openai inference provider (#2193 ) # What does this PR do? fixes #2121 this implementation splits reponsibility between litellm and openai libraries - \| Inference Method \| Implementation Source \| \|----------------------------\|--------------------------\| \| completion \| LiteLLMOpenAIMixin \| \| chat_completion \| LiteLLMOpenAIMixin \| \| embedding \| LiteLLMOpenAIMixin \| \| batch_completion \| LiteLLMOpenAIMixin \| \| batch_chat_completion \| LiteLLMOpenAIMixin \| \| openai_completion \| AsyncOpenAI \| \| openai_chat_completion \| AsyncOpenAI \| ## Test Plan smoke test with - ``` $ OPENAI_API_KEY=$LLAMA_API_KEY OPENAI_BASE_URL=https://api.llama.com/compat/v1 llama stack build --image-type conda --image-name openai --providers inference=remote::openai --run $ llama-stack-client models register Llama-4-Scout-17B-16E-Instruct-FP8 $ curl "http://localhost:8321/v1/openai/v1/chat/completions" -H "Content-Type: application/json" \ -d '{ "model": "Llama-4-Scout-17B-16E-Instruct-FP8", "messages": [ {"role": "user", "content": "Hello Llama! Can you give me a quick intro?"} ] }' {"id":"AmPwrrkc5JgVjejPdIPrpT2","choices":[{"finish_reason":"stop","index":0,"logprobs":{"content":null,"refusal":null},"message":{"content":"Hello! I'm Llama, a Meta-designed model that adapts to your conversational style. Whether you need quick answers, deep dives into ideas, or just want to vent, joke, or brainstorm—I'm here for it. What’s on your mind?","refusal":"","role":"assistant","annotations":null,"audio":null,"function_call":null,"tool_calls":null,"id":"AmPwrrkc5JgVjejPdIPrpT2"}}],"created":1747410061,"model":"Llama-4-Scout-17B-16E-Instruct-FP8","object":"chat.completions","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":54,"prompt_tokens":22,"total_tokens":76,"completion_tokens_details":null,"prompt_tokens_details":null}} ``` and run full test suite.	2025-05-16 12:57:56 -07:00
ehhuang	953ccffca2	test: catch BadRequestError for non-library client (#2195 ) # What does this PR do? ## Test Plan LLAMA_STACK_CONFIG=http://localhost:8321 pytest tests/integration/tool_runtime/test_rag_tool.py --embedding-model text-embedding-3-small	2025-05-16 12:26:59 -07:00
Francisco Arceo	7f1f21fd6c	feat: Adding dark mode, cleaning the UI a small bit, adding a link to the API documentation, and linting the code. (#2182 ) # What does this PR do? This PR adds a few enhancements: - Dark mode - A dark mode icon - Adds a link to the API documentation - Adds prettier and a linter to the code - Aligning the default text - Linted the code ## Before: ![Screenshot 2025-05-15 at 3 57 15 PM](https://github.com/user-attachments/assets/996db083-4a4f-4683-a2b4-e7c09de96135) ## After (dark mode): ![Screenshot 2025-05-15 at 3 57 50 PM](https://github.com/user-attachments/assets/9d45d26b-2449-4a5f-813e-29e07e94b793) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Related to https://github.com/meta-llama/llama-stack/issues/2085 --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-05-16 10:48:26 -07:00
Matthew Farrellee	7aae8fadbf	fix: dev -> starter rename in ci (#2183 ) continuation of https://github.com/meta-llama/llama-stack/pull/2181	2025-05-16 09:41:53 +02:00
Sébastien Han	3cc15f7d15	fix: misc UI changes (#2175 ) # What does this PR do? - Add pre-req to run the server (install deps) - Fix the static build Closes: https://github.com/meta-llama/llama-stack/issues/2174 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-15 13:03:05 -07:00
Ashwin Bharambe	1a6d4af5e9	refactor: rename dev distro as starter (#2181 ) We want this to be a "flagship" distribution we can advertize to a segment of users to get started quickly. This distro should package a bunch of remote providers and some cheap inline providers so they get a solid "AI Platform in a box" setup instantly.	2025-05-15 12:52:34 -07:00
Ashwin Bharambe	87e284f1a0	chore: update CODEOWNERS	2025-05-15 12:31:12 -07:00
Ben Browning	10b1056dea	fix: multiple tool calls in remote-vllm chat_completion (#2161 ) # What does this PR do? This fixes an issue in how we used the tool_call_buf from streaming tool calls in the remote-vllm provider where it would end up concatenating parameters from multiple different tool call results instead of aggregating the results from each tool call separately. It also fixes an issue found while digging into that where we were accidentally mixing the json string form of tool call parameters with the string representation of the python form, which mean we'd end up with single quotes in what should be double-quoted json strings. Closes #1120 ## Test Plan The following tests are now passing 100% for the remote-vllm provider, where some of the test_text_inference were failing before this change: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_vision_inference.py --vision-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ``` All but one of the agent tests are passing (including the multi-tool one). See the PR at https://github.com/vllm-project/vllm/pull/17917 and a gist at https://gist.github.com/bbrowning/4734240ce96b4264340caa9584e47c9e for changes needed there, which will have to get made upstream in vLLM. Agent tests: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ```` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-15 11:23:29 -07:00
Sébastien Han	bb5fca9521	chore: more API validators (#2165 ) # What does this PR do? We added: * make sure docstrings are present with 'params' and 'returns' * fail if someone sets 'returns: None' * fix the failing APIs Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-15 11:22:51 -07:00
Charlie Doern	e46de23be6	feat: refactor external providers dir (#2049 ) # What does this PR do? currently the "default" dir for external providers is `/etc/llama-stack/providers.d` This dir is not used anywhere nor created. Switch to a more friendly `~/.llama/providers.d/` This allows external providers to actually create this dir and/or populate it upon installation, `pip` cannot create directories in `etc`. If a user does not specify a dir, default to this one see https://github.com/containers/ramalama-stack/issues/36 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-05-15 20:17:03 +02:00
Yuan Tang	7e25c8df28	fix: ReadTheDocs should display all versions (#2172 ) # What does this PR do? Currently the website only displays the "latest" version. This is because our config and workflow do not include version information. This PR adds missing version info. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-05-15 11:41:15 -04:00
Ihar Hrachyshka	c3f27de3ea	chore: Update triagers list with new additions (#2180 ) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-15 11:39:25 -04:00
Yuan Tang	354faa15ce	feat: Allow to print usage information for install script (#2171 ) # What does this PR do? This allows users to print the usage information for this script: ``` 📚 Llama-Stack Deployment Script Description: This script sets up and deploys Llama-Stack with Ollama integration in containers. It handles both Docker and Podman runtimes and includes automatic platform detection. Usage: install.sh [OPTIONS] Options: -p, --port PORT Server port for Llama-Stack (default: 8321) -o, --ollama-port PORT Ollama service port (default: 11434) -m, --model MODEL Model alias to use (default: llama3.2:3b) -i, --image IMAGE Server image (default: llamastack/distribution-ollama:0.2.2) -t, --timeout SECONDS Service wait timeout in seconds (default: 300) -h, --help Show this help message For more information: Documentation: https://llama-stack.readthedocs.io/ GitHub: https://github.com/meta-llama/llama-stack Report issues: https://github.com/meta-llama/llama-stack/issues ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-05-15 16:50:56 +02:00
Francisco Arceo	8e7ab146f8	feat: Adding support for customizing chunk context in RAG insertion and querying (#2134 ) # What does this PR do? his PR allows users to customize the template used for chunks when inserted into the context. Additionally, this enables metadata injection into the context of an LLM for RAG. This makes a naive and crude assumption that each chunk should include the metadata, this is obviously redundant when multiple chunks are returned from the same document. In order to remove any sort of duplication of chunks, we'd have to make much more significant changes so this is a reasonable first step that unblocks users requesting this enhancement in https://github.com/meta-llama/llama-stack/issues/1767. In the future, this can be extended to support citations. List of Changes: - `llama_stack/apis/tools/rag_tool.py` - Added `chunk_template` field in `RAGQueryConfig`. - Added `field_validator` to validate the `chunk_template` field in `RAGQueryConfig`. - Ensured the `chunk_template` field includes placeholders `{index}` and `{chunk.content}`. - Updated the `query` method to use the `chunk_template` for formatting chunk text content. - `llama_stack/providers/inline/tool_runtime/rag/memory.py` - Modified the `insert` method to pass `doc.metadata` for chunk creation. - Enhanced the `query` method to format results using `chunk_template` and exclude unnecessary metadata fields like `token_count`. - `llama_stack/providers/utils/memory/vector_store.py` - Updated `make_overlapped_chunks` to include metadata serialization and token count for both content and metadata. - Added error handling for metadata serialization issues. - `pyproject.toml` - Added `pydantic.field_validator` as a recognized `classmethod` decorator in the linting configuration. - `tests/integration/tool_runtime/test_rag_tool.py` - Refactored test assertions to separate `assert_valid_chunk_response` and `assert_valid_text_response`. - Added integration tests to validate `chunk_template` functionality with and without metadata inclusion. - Included a test case to ensure `chunk_template` validation errors are raised appropriately. - `tests/unit/rag/test_vector_store.py` - Added unit tests for `make_overlapped_chunks`, verifying chunk creation with overlapping tokens and metadata integrity. - Added tests to handle metadata serialization errors, ensuring proper exception handling. - `docs/_static/llama-stack-spec.html` - Added a new `chunk_template` field of type `string` with a default template for formatting retrieved chunks in RAGQueryConfig. - Updated the `required` fields to include `chunk_template`. - `docs/_static/llama-stack-spec.yaml` - Introduced `chunk_template` field with a default value for RAGQueryConfig. - Updated the required configuration list to include `chunk_template`. - `docs/source/building_applications/rag.md` - Documented the `chunk_template` configuration, explaining how to customize metadata formatting in RAG queries. - Added examples demonstrating the usage of the `chunk_template` field in RAG tool queries. - Highlighted default values for `RAG` agent configurations. # Resolves https://github.com/meta-llama/llama-stack/issues/1767 ## Test Plan Updated both `test_vector_store.py` and `test_rag_tool.py` and tested end-to-end with a script. I also tested the quickstart to enable this and specified this metadata: ```python document = RAGDocument( document_id="document_1", content=source, mime_type="text/html", metadata={"author": "Paul Graham", "title": "How to do great work"}, ) ``` Which produced the output below: ![Screenshot 2025-05-13 at 10 53 43 PM](https://github.com/user-attachments/assets/bb199d04-501e-4217-9c44-4699d43d5519) This highlights the usefulness of the additional metadata. Notice how the metadata is redundant for different chunks of the same document. I think we can update that in a subsequent PR. # Documentation I've added a brief comment about this in the documentation to outline this to users and updated the API documentation. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-05-14 21:56:20 -04:00
ehhuang	ff247e35be	feat: scaffolding for Llama Stack UI (#2149 ) # What does this PR do? Introduces scaffolding for Llama Stack's UI. Created with next.js and https://ui.shadcn.com/. 1. Initialized directory with `npx shadcn@latest init` 2. Added sidebar component `npx shadcn@latest add sidebar` and added menu items for chat completions and responses. 3. Placeholder pages for each. ## Test Plan `npm run dev` <img width="1058" alt="image" src="https://github.com/user-attachments/assets/5695a53f-e22e-418e-80d1-5bf0ae9b6fe8" />	2025-05-14 17:22:46 -07:00
Ben Browning	b42eb1ccbc	fix: Responses API: handle type=None in streaming tool calls (#2166 ) # What does this PR do? In the Responses API, we convert incoming response requests to chat completion requests. When streaming the resulting chunks of those chat completion requests, inference providers that use OpenAI clients will often return a `type=None` value in the tool call parts of the response. This causes issues when we try to dump and load that response into our pydantic model, because type cannot be None in the Responses API model we're loading these into. So, strip the "type" field, if present, off those chat completion tool call results before dumping and loading them as our typed pydantic models, which will apply our default value for that type field. ## Test Plan This was found via manual testing of the Responses API with codex, where I was getting errors in some tool call situations. I added a unit test to simulate this scenario and verify the fix, as well as manual codex testing to verify the fix. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-14 14:16:33 -07:00
Matthew Farrellee	aa5bef8e05	feat: expand set of known openai models, allow using openai canonical model names (#2164 ) note: the openai provider exposes the litellm specific model names to the user. this change is compatible with that. the litellm names should be deprecated.	2025-05-14 13:18:15 -07:00
Ilya Kolchinsky	5052c3cbf3	fix: Fixed an "out of token budget" error when attempting a tool call via remote vLLM provider (#2114 ) # What does this PR do? Closes #2113. Closes #1783. Fixes a bug in handling the end of tool execution request stream where no `finish_reason` is provided by the model. ## Test Plan 1. Ran existing unit tests 2. Added a dedicated test verifying correct behavior in this edge case 3. Ran the code snapshot from #2113 [//]: # (## Documentation)	2025-05-14 13:11:02 -07:00
Ihar Hrachyshka	268725868e	chore: enforce no git tags or branches in external github actions (#2159 ) # What does this PR do? Don't allow git tags and branches for external actions. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-14 20:40:06 +02:00
Nathan Weinberg	a1fbfb51e2	ci(chore): use hashes for all version pinning (#2157 ) # What does this PR do? most third-party actions use hashes for pinning but not all do proper hash pinning on all remaining actions using tags Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-05-14 14:59:58 +02:00
Ilya Kolchinsky	43d4447ff0	fix: remote vLLM tool execution now works when the last chunk contains the call arguments (#2112 ) # What does this PR do? Closes #2111. Fixes an error causing Llama Stack to just return `<tool_call>` and complete the turn without actually executing the tool. See the issue description for more detail. ## Test Plan 1) Ran existing unit tests 2) Added a dedicated test verifying correct behavior in this edge case 3) Ran the code snapshot from #2111	2025-05-14 11:38:00 +02:00
Ihar Hrachyshka	1de0dfaab5	docs: Clarify kfp provider is both inline and remote (#2144 ) The provider selling point is using the same provider for both. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-14 09:37:07 +02:00
Derek Higgins	dd07c7a5b5	fix: Make search tool talk about models (#2151 ) Prevent it from returning results about 'LT Wright Maverick Scout' knives. Ultimatly we want the word "model" in the returned results putting llm in the search term make this more likely. Closes: #2150 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-13 22:41:51 -07:00
Sébastien Han	26dffff92a	chore: remove pytest reports (#2156 ) # What does this PR do? Cleanup old test code too. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-13 22:40:15 -07:00
Ben Browning	8e316c9b1e	feat: function tools in OpenAI Responses (#2094 ) # What does this PR do? This is a combination of what was previously 3 separate PRs - #2069, #2075, and #2083. It turns out all 3 of those are needed to land a working function calling Responses implementation. The web search builtin tool was already working, but this wires in support for custom function calling. I ended up combining all three into one PR because they all had lots of merge conflicts, both with each other but also with #1806 that just landed. And, because landing any of them individually would have only left a partially working implementation merged. The new things added here are: * Storing of input items from previous responses and restoring of those input items when adding previous responses to the conversation state * Handling of multiple input item messages roles, not just "user" messages. * Support for custom tools passed into the Responses API to enable function calling outside of just the builtin websearch tool. Closes #2074 Closes #2080 ## Test Plan ### Unit Tests Several new unit tests were added, and they all pass. Ran via: ``` python -m pytest -s -v tests/unit/providers/agents/meta_reference/test_openai_responses.py ``` ### Responses API Verification Tests I ran our verification run.yaml against multiple providers to ensure we were getting a decent pass rate. Specifically, I ensured the new custom tool verification test passed across multiple providers and that the multi-turn examples passed across at least some of the providers (some providers struggle with the multi-turn workflows still). Running the stack setup for verification testing: ``` llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml ``` Together, passing 100% as an example: ``` pytest -s -v 'tests/verifications/openai_api/test_responses.py' --provider=together-llama-stack ``` ## Documentation We will need to start documenting the OpenAI APIs, but for now the Responses stuff is still rapidly evolving so delaying that. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-05-13 11:29:15 -07:00
Nathan Weinberg	e0d10dd0b1	docs: revamp testing documentation (#2155 ) # What does this PR do? reduces duplication and centralizes information to be easier to find for contributors Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-05-13 11:28:29 -07:00
Sébastien Han	62476a5373	fix: pytest reports (#2152 ) # What does this PR do? While adding other tests, I came across this and wasn’t sure how useful it is. It doesn’t seem to be exercised anywhere in CI, but I figured I’d fix it anyway. Happy to remove it if preferred. :) ## Test Plan Run: ``` uv run pytest tests/integration/inference --stack-config=ollama --report=test_report.md -v --text-model="llama3.2:3b" --embedding-model=all-MiniLM-L6-v2 ``` Look at the produced `test_report.md`. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-13 11:27:29 -07:00
grs	e3ad17ec5e	feat: enable mutual tls (#2140 ) # What does this PR do? This adds a config option for a CA to be specified with which client certs are verified. If specified client certs are required. This offers a simple way of securing access to the server. (Note: at present it is not possible to access the details of the client certificate using uvicorn (unless it was monkey patched). Though there is a defined TLS extension for ASGI, this is not implemented in uvicorn pending a review and likely change to the specification. See https://github.com/encode/uvicorn/pull/1119 and https://github.com/django/asgiref/issues/466. Without access to the DN it isn't possible to set user access attributes for a mutually authentication tls connection, so more fine grained access control is not yet possible). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Used proposed config option to specify a CA and verified that the server can only be accessed with a valid client certificate. [//]: # (## Documentation) Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-12 14:08:36 -07:00
Sébastien Han	a5d14749a5	chore: rehydrate requirements.txt (#2146 ) # What does this PR do? Hiccup with 0.2.6 bot release? Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-12 12:45:35 -07:00
github-actions[bot]	23d9f3b1fb	build: Bump version to 0.2.6	2025-05-12 18:02:05 +00:00
Divya	c985ea6326	fix: Adding Embedding model to watsonx inference (#2118 ) # What does this PR do? Issue Link : https://github.com/meta-llama/llama-stack/issues/2117 ## Test Plan Once added, User will be able to use Sentence Transformer model `all-MiniLM-L6-v2`	2025-05-12 10:58:22 -07:00
Ben Browning	136e6b3cf7	fix: ollama openai completion and chat completion params (#2125 ) # What does this PR do? The ollama provider was using an older variant of the code to convert incoming parameters from the OpenAI API completions and chat completion endpoints into requests that get sent to the backend provider over its own OpenAI client. This updates it to use the common `prepare_openai_completion_params` method used elsewhere, which takes care of removing stray `None` values even for nested structures. Without this, some other parameters, even if they have values of `None`, make their way to ollama and actually influence its inference output as opposed to when those parameters are not sent at all. ## Test Plan This passes tests/integration/inference/test_openai_completion.py and fixes the issue found in #2098, which was tested via manual curl requests crafted a particular way. Closes #2098 Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-12 10:57:53 -07:00
Sébastien Han	80c349965f	chore(refact): move paginate_records fn outside of datasetio (#2137 ) # What does this PR do? Move under utils. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-12 10:56:14 -07:00
Sébastien Han	53b7f50828	chore: force ellipsis in API webmethods (#2141 ) # What does this PR do? This new check will fail if some webmethods are missing the ellipsis: ``` API Method Return Type Validation Errors: Method Api.eval.job_result does not contain ellipsis (...) in its implementation Method Api.agents.create_agent_turn does not contain ellipsis (...) in its implementation Method Api.agents.create_openai_response does not contain ellipsis (...) in its implementation Method Api.eval.evaluate_rows does not contain ellipsis (...) in its implementation Method Api.eval.run_eval does not contain ellipsis (...) in its implementation ``` Unless not implemented. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-12 10:55:39 -07:00
Sébastien Han	43e623eea6	chore: remove last instances of code-interpreter provider (#2143 ) Was removed in https://github.com/meta-llama/llama-stack/pull/2087 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-12 10:54:43 -07:00
Krzysztof Malczuk	675f34e79d	fix: Syntax error with missing stubs at the end of some function calls (#2116 ) # What does this PR do? This PR adds stubs to the end of functions create_agent_turn, create_openai_response and job_result. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Ran provided unit tests [//]: # (## Documentation)	2025-05-12 17:05:40 +02:00
Matthew Farrellee	9a6e91cd93	fix: chromadb type hint (#2136 ) ``` $ INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ CHROMADB_URL=http://localhost:8000 \ llama stack build --image-type conda --image-name llama \ --providers vector_io=remote::chromadb,inference=remote::ollama \ --run ... File ".../llama_stack/providers/remote/vector_io/chroma/chroma.py", line 31, in <module> ChromaClientType = chromadb.AsyncHttpClient \| chromadb.PersistentClient TypeError: unsupported operand type(s) for \|: 'function' and 'function' ``` issue: AsyncHttpClient and PersistentClient are functions that return AsyncClientAPI and ClientAPI types, respectively. \| cannot be used to construct a type from functions. previously the code was Union[AsyncHttpClient, PersistentClient], which did not trigger an error # What does this PR do? Closes #2135	2025-05-12 06:27:01 -07:00
Ihar Hrachyshka	db21eab713	fix: catch TimeoutError in place of asyncio.TimeoutError (#2131 ) # What does this PR do? As per docs [1], since python 3.11 wait_for() raises TimeoutError. Since we currently support python 3.10+, we have to catch both. [1]: https://docs.python.org/3.12/library/asyncio-task.html#asyncio.wait_for [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan No explicit testing; just code hardening to reflect docs. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-12 11:49:59 +02:00
Ilya Kolchinsky	dd7be274b9	fix: raise an error when no vector DB IDs are provided to the RAG tool (#1911 ) # What does this PR do? This PR fixes the behavior of the `/tool-runtime/rag-tool/query` endpoint when invoked with an empty `vector_db_ids` parameter. As of now, it simply returns an empty result, which leads to a misleading error message from the server and makes it difficult and time-consuming to detect the problem with the input parameter. The proposed fix is to return an indicative error message in this case. ## Test Plan Running the following script: ``` agent = Agent( client, model=MODEL_ID, instructions=SYSTEM_PROMPT, tools=[ dict( name="builtin::rag/knowledge_search", args={ "vector_db_ids": [], }, ) ], ) response = agent.create_turn( messages=[ { "role": "user", "content": "How to install OpenShift?", } ], session_id=agent.create_session(f"rag-session") ) ``` results in the following error message in the non-patched version: ``` {"type": "function", "name": "knowledge_search", "parameters": {"query": "installing OpenShift"}}400: Invalid value: Tool call result (id: 494b8020-90bb-449b-aa76-10960d6b2cc2, name: knowledge_search) does not have any content ``` and in the following one in the patched version: ``` {"type": "function", "name": "knowledge_search", "parameters": {"query": "installing OpenShift"}}400: Invalid value: No vector DBs were provided to the RAG tool. Please provide at least one DB. ```	2025-05-12 11:25:13 +02:00
Yuan Tang	f2b83800cc	docs: Add link to Discord to README (#2126 )	2025-05-10 18:32:44 -07:00
Ashwin Bharambe	473a07f624	fix: revert "feat(provider): adding llama4 support in together inference provider (#2123 )" (#2124 ) This reverts commit `0f878ad87a`. The llama4 models already existed for Together. cc @yogishbaliga @bbrowning	2025-05-08 15:18:16 -07:00
Yogish Baliga	0f878ad87a	feat(provider): adding llama4 support in together inference provider (#2123 ) # What does this PR do? Adding Llama4 model support in TogetherAI provider	2025-05-08 14:27:56 -07:00
Dinesh Yeduguru	fe5f5e530c	feat: add metrics query API (#1394 ) # What does this PR do? Adds the API to query metrics from telemetry. ## Test Plan llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-05-07 10:11:26 -07:00
Sébastien Han	6371bb1b33	chore(refact)!: simplify config management (#1105 ) # What does this PR do? We are dropping configuration via CLI flag almost entirely. If any server configuration has to be tweak it must be done through the server section in the run.yaml. This is unfortunately a breaking change for whover was using: * `--tls-` `--disable_ipv6` `--port` stays around and get a special treatment since we believe, it's common for user dev to change port for quick experimentations. Closes: https://github.com/meta-llama/llama-stack/issues/1076 ## Test Plan Simply do `llama stack run <config>` nothing should break :) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-07 09:18:12 -07:00
Sébastien Han	c91e3552a3	feat: implementation for agent/session list and describe (#1606 ) Create a new agent: ``` curl --request POST \ --url http://localhost:8321/v1/agents \ --header 'Accept: application/json' \ --header 'Content-Type: application/json' \ --data '{ "agent_config": { "sampling_params": { "strategy": { "type": "greedy" }, "max_tokens": 0, "repetition_penalty": 1 }, "input_shields": [ "string" ], "output_shields": [ "string" ], "toolgroups": [ "string" ], "client_tools": [ { "name": "string", "description": "string", "parameters": [ { "name": "string", "parameter_type": "string", "description": "string", "required": true, "default": null } ], "metadata": { "property1": null, "property2": null } } ], "tool_choice": "auto", "tool_prompt_format": "json", "tool_config": { "tool_choice": "auto", "tool_prompt_format": "json", "system_message_behavior": "append" }, "max_infer_iters": 10, "model": "string", "instructions": "string", "enable_session_persistence": false, "response_format": { "type": "json_schema", "json_schema": { "property1": null, "property2": null } } } }' ``` Get agent: ``` curl http://127.0.0.1:8321/v1/agents/9abad4ab-2c77-45f9-9d16-46b79d2bea1f {"agent_id":"9abad4ab-2c77-45f9-9d16-46b79d2bea1f","agent_config":{"sampling_params":{"strategy":{"type":"greedy"},"max_tokens":0,"repetition_penalty":1.0},"input_shields":["string"],"output_shields":["string"],"toolgroups":["string"],"client_tools":[{"name":"string","description":"string","parameters":[{"name":"string","parameter_type":"string","description":"string","required":true,"default":null}],"metadata":{"property1":null,"property2":null}}],"tool_choice":"auto","tool_prompt_format":"json","tool_config":{"tool_choice":"auto","tool_prompt_format":"json","system_message_behavior":"append"},"max_infer_iters":10,"model":"string","instructions":"string","enable_session_persistence":false,"response_format":{"type":"json_schema","json_schema":{"property1":null,"property2":null}}},"created_at":"2025-03-12T16:18:28.369144Z"}% ``` List agents: ``` curl http://127.0.0.1:8321/v1/agents\|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1680 100 1680 0 0 498k 0 --:--:-- --:--:-- --:--:-- 546k { "data": [ { "agent_id": "9abad4ab-2c77-45f9-9d16-46b79d2bea1f", "agent_config": { "sampling_params": { "strategy": { "type": "greedy" }, "max_tokens": 0, "repetition_penalty": 1.0 }, "input_shields": [ "string" ], "output_shields": [ "string" ], "toolgroups": [ "string" ], "client_tools": [ { "name": "string", "description": "string", "parameters": [ { "name": "string", "parameter_type": "string", "description": "string", "required": true, "default": null } ], "metadata": { "property1": null, "property2": null } } ], "tool_choice": "auto", "tool_prompt_format": "json", "tool_config": { "tool_choice": "auto", "tool_prompt_format": "json", "system_message_behavior": "append" }, "max_infer_iters": 10, "model": "string", "instructions": "string", "enable_session_persistence": false, "response_format": { "type": "json_schema", "json_schema": { "property1": null, "property2": null } } }, "created_at": "2025-03-12T16:18:28.369144Z" }, { "agent_id": "a6643aaa-96dd-46db-a405-333dc504b168", "agent_config": { "sampling_params": { "strategy": { "type": "greedy" }, "max_tokens": 0, "repetition_penalty": 1.0 }, "input_shields": [ "string" ], "output_shields": [ "string" ], "toolgroups": [ "string" ], "client_tools": [ { "name": "string", "description": "string", "parameters": [ { "name": "string", "parameter_type": "string", "description": "string", "required": true, "default": null } ], "metadata": { "property1": null, "property2": null } } ], "tool_choice": "auto", "tool_prompt_format": "json", "tool_config": { "tool_choice": "auto", "tool_prompt_format": "json", "system_message_behavior": "append" }, "max_infer_iters": 10, "model": "string", "instructions": "string", "enable_session_persistence": false, "response_format": { "type": "json_schema", "json_schema": { "property1": null, "property2": null } } }, "created_at": "2025-03-12T16:17:12.811273Z" } ] } ``` Create sessions: ``` curl --request POST \ --url http://localhost:8321/v1/agents/{agent_id}/session \ --header 'Accept: application/json' \ --header 'Content-Type: application/json' \ --data '{ "session_name": "string" }' ``` List sessions: ``` curl http://127.0.0.1:8321/v1/agents/9abad4ab-2c77-45f9-9d16-46b79d2bea1f/sessions\|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 263 100 263 0 0 90099 0 --:--:-- --:--:-- --:--:-- 128k [ { "session_id": "2b15c4fc-e348-46c1-ae32-f6d424441ac1", "session_name": "string", "turns": [], "started_at": "2025-03-12T17:19:17.784328" }, { "session_id": "9432472d-d483-4b73-b682-7b1d35d64111", "session_name": "string", "turns": [], "started_at": "2025-03-12T17:19:19.885834" } ] ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-07 14:49:23 +02:00
Ben Browning	40e71758d9	fix: inference providers still using tools with `tool_choice="none"` (#2048 ) # What does this PR do? In our OpenAI API verification tests, some providers were still calling tools even when `tool_choice="none"` was passed in the chat completion requests. Because they aren't all respecting `tool_choice` properly, this adjusts our routing implementation to remove the `tools` and `tool_choice` from the request if `tool_choice="none"` is passed in so that it does not attempt to call any of those tools. Adjusting this in the router fixes this across all providers. This also cleans up the non-streaming together.ai responses for tools, ensuring it returns `None` instead of an empty list when there are no tool calls, to exactly match the OpenAI API responses in that case. ## Test Plan I observed existing failures in our OpenAI API verification suite - see https://github.com/bbrowning/llama-stack-tests/blob/main/openai-api-verification/2025-04-27.md#together-llama-stack for the failing `test_chat_*_tool_choice_none` tests. All streaming and non-streaming variants were failing across all 3 tested models. After this change, all of those 6 failing tests are now passing with no regression in the other tests. I verified this via: ``` llama stack run --image-type venv \ tests/verifications/openai-api-verification-run.yaml ``` ``` python -m pytest -s -v \ 'tests/verifications/openai_api/test_chat_completion.py' \ --provider=together-llama-stack ``` The entire verification suite is not 100% on together.ai yet, but it's getting closer. This also increased the pass rate for fireworks.ai, and did not regress the groq or openai tests at all. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-07 14:34:47 +02:00
Derek Higgins	6f1badc934	test: Document how users can run a subset of tests (#2066 ) ## Test Plan N/A Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-07 14:05:36 +02:00
ehhuang	664161c462	fix: llama4 tool use prompt fix (#2103 ) Tests: LLAMA_STACK_CONFIG=http://localhost:5002 pytest -s -v tests/integration/inference --safety-shield meta-llama/Llama-Guard-3-8B --vision-model meta-llama/Llama-4-Scout-17B-16E-Instruct --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct LLAMA_STACK_CONFIG=http://localhost:5002 pytest -s -v tests/integration/inference --safety-shield meta-llama/Llama-Guard-3-8B --vision-model Llama-4-Maverick-17B-128E-Instruct --text-model Llama-4-Maverick-17B-128E-Instruct Co-authored-by: Eric Huang <erichuang@fb.com>	2025-05-06 22:18:31 -07:00
Jorge Piedrahita Ortiz	b2b00a216b	feat(providers): sambanova updated to use LiteLLM openai-compat (#1596 ) # What does this PR do? switch sambanova inference adaptor to LiteLLM usage to simplify integration and solve issues with current adaptor when streaming and tool calling, models and templates updated ## Test Plan pytest -s -v tests/integration/inference/test_text_inference.py --stack-config=sambanova --text-model=sambanova/Meta-Llama-3.3-70B-Instruct pytest -s -v tests/integration/inference/test_vision_inference.py --stack-config=sambanova --vision-model=sambanova/Llama-3.2-11B-Vision-Instruct	2025-05-06 16:50:22 -07:00
Yuan Tang	dd49ef31f1	docs: Update changelog to include recent releases (#2108 ) # What does this PR do? We don't have GA workflow enabled to proceed with automation so I am doing this manually again. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-05-06 14:42:06 -07:00
Kevin Postlethwait	a57985eeac	fix: add check for interleavedContent (#1973 ) # What does this PR do? Checks for RAGDocument of type InterleavedContent I noticed when stepping through the code that the supported types for `RAGDocument` included `InterleavedContent` as a content type. This type is not checked against before putting the `doc.content` is regex matched against. This would cause a runtime error. This change adds an explicit check for type. The only other part that I'm unclear on is how to handle the `ImageContent` type since this would always just return `<image>` which seems like an undesired behavior. Should the `InterleavedContent` type be removed from `RAGDocument` and replaced with `URI \| str`? ## Test Plan [//]: # (## Documentation) --------- Signed-off-by: Kevin <kpostlet@redhat.com>	2025-05-06 09:55:07 -07:00
Sébastien Han	1a529705da	chore: more mypy fixes (#2029 ) # What does this PR do? Mainly tried to cover the entire llama_stack/apis directory, we only have one left. Some excludes were just noop. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-06 09:52:31 -07:00
Christian Zaccaria	feb9eb8b0d	docs: Remove datasets.rst and fix llama-stack build commands (#2061 ) # Issue Closes #2073 # What does this PR do? - Removes the `datasets.rst` from the list of document urls as it no longer exists in torchtune. Referenced PR: https://github.com/pytorch/torchtune/pull/1781 - Added a step to run `uv sync`. Previously, I would get the following error: ``` ➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10 source .venv/bin/activate Using CPython 3.10.13 interpreter at: /usr/bin/python3.10 Creating virtual environment at: .venv Activate with: source .venv/bin/activate (llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run zsh: llama: command not found... ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan To test: Run through `rag_agent` example in the `detailed_tutorial.md` file. [//]: # (## Documentation)	2025-05-06 09:51:20 -07:00
Ihar Hrachyshka	c219a74fa0	fix: Don't require efficiency_config for torchtune (#2104 ) # What does this PR do? Revert a change that by mistake forced efficiency_config on torchtune provider users. ``` fix: Don't require efficiency_config for torchtune It was enforced by mistake when `0751a960a5` merged. Other asserts made sense in that the code was written, potentially, to always expect a non-None value. But not efficiency_config. ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-06 09:50:44 -07:00
Sébastien Han	7377a5c83e	docs: contrib add a note about unicode in code (#2106 ) # What does this PR do? Don't use unicode characters in the codebase. ASCII-only is preferred for compatibility or readability reasons Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-06 09:50:30 -07:00
Sébastien Han	b9b13a3670	chore: factor kube auth test distro (#2105 ) # What does this PR do? We just need to validate the auth so we don't need any API / Providers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-06 09:49:49 -07:00
Ignas Baranauskas	2413447467	ci: add new action to install ollama, cache the model (#2054 ) # What does this PR do? This PR introduces a reusable GitHub Actions workflow for pulling and running an Ollama model, with caching to avoid repeated downloads. [//]: # (If resolving an issue, uncomment and update the line below) Closes: #1949 ## Test Plan 1. Trigger a workflow that uses the Ollama setup. Confirm that: - The model is pulled successfully. - It is placed in the correct directory, official at the moment (not ~ollama/.ollama/models as per comment so need to confirm this). 2. Re-run the same workflow to validate that: - The model is restored from the cache. - Execution succeeds with the cached model. [//]: # (## Documentation)	2025-05-06 14:56:20 +02:00
Divya	3022f7b642	feat: Adding TLS support for Remote::Milvus vector_io (#2011 ) # What does this PR do? For the Issue :- #[2010](https://github.com/meta-llama/llama-stack/issues/2010) Currently, if we try to connect the Llama stack server to a remote Milvus instance that has TLS enabled, the connection fails because TLS support is not implemented in the Llama stack codebase. As a result, users are unable to use secured Milvus deployments out of the box. After adding this , the user will be able to connect to remote::Milvus which is TLS enabled . if TLS enabled :- ``` vector_io: - provider_id: milvus provider_type: remote::milvus config: uri: "http://<host>:<port>" token: "<user>:<password>" secure: True server_pem_path: "path/to/server.pem" ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I have already tested it by connecting to a Milvus instance which is TLS enabled and i was able to start llama stack server .	2025-05-06 14:15:34 +02:00
Christina Xu	65cc971877	docs: Add TrustyAI LM-Eval to list of known external providers (#2020 ) # What does this PR do? Adds documentation for the remote [TrustyAI LM-Eval Eval Provider](https://github.com/trustyai-explainability/llama-stack-provider-lmeval). LM-Eval is a service for large language model evaluation based on the open source project [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and is integrated into the [TrustyAI Kubernetes Operator](https://trustyai-explainability.github.io/trustyai-site/main/trustyai-operator.html).	2025-05-06 14:11:55 +02:00
Christian Zaccaria	18d2312690	fix: test_datasets HF scenario in CI (#2090 ) # What does this PR do? Fixes #1959 HuggingFace provides several loading paths that the datasets library can use. My theory on why the test would previously fail intermittently is because when calling `load_dataset(...)`, it may be trying several options such as local cache, Hugging Face Hub, or a dataset script, or other. There's one of these options that seem to work inconsistently in the CI. The HuggingFace datasets library relies on the `transformers` package to load certain datasets such as `llamastack/simpleqa`, and by adding the package, we can see the dataset is loaded consistently via the Hugging Face Hub. Please see PR in my fork demonstrating over 7 consecutive passes: https://github.com/ChristianZaccaria/llama-stack/pull/1 Some References: - https://github.com/huggingface/transformers/issues/8690 - https://huggingface.co/docs/datasets/en/loading [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-05-06 14:09:15 +02:00
Derek Higgins	2e807b38cc	chore: Add fixtures to conftest.py (#2067 ) Add fixtures for SqliteKVStore, DiskDistributionRegistry and CachedDiskDistributionRegistry. And use them in tests that had all been duplicating similar setups. ## Test Plan unit tests continue to run Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-06 13:57:48 +02:00
ehhuang	4597145011	chore: remove recordable mock (#2088 ) # What does this PR do? We've disabled it for a while given that this hasn't worked as well as expected given the frequent changes of llama_stack_client and how this requires both repos to be in sync. ## Test Plan Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-05-05 10:08:55 -07:00
Sébastien Han	a5d151e912	docs: fix typo mivus.md -> milvus.md (#2102 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-05 09:48:38 -07:00
Sébastien Han	a4247ce0a8	docs: expand contribution guidelines for linting exceptions (#2101 ) # What does this PR do? - Clarified best practices for using `# noqa` and `# type: ignore`, requiring justification comments - Improved formatting for readability Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-05 02:36:30 -07:00
dependabot[bot]	1fbda6bfaa	chore(github-deps): bump actions/setup-python from 5.5.0 to 5.6.0 (#2099 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.5.0 to 5.6.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-python/releases">actions/setup-python's releases</a>.</em></p> <blockquote> <h2>v5.6.0</h2> <h2>What's Changed</h2> <ul> <li>Workflow updates related to Ubuntu 20.04 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1065">actions/setup-python#1065</a></li> <li>Fix for Candidate Not Iterable Error by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1082">actions/setup-python#1082</a></li> <li>Upgrade semver and <code>@types/semver</code> by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1091">actions/setup-python#1091</a></li> <li>Upgrade prettier from 2.8.8 to 3.5.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1046">actions/setup-python#1046</a></li> <li>Upgrade ts-jest from 29.1.2 to 29.3.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1081">actions/setup-python#1081</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-python/compare/v5...v5.6.0">https://github.com/actions/setup-python/compare/v5...v5.6.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`a26af69be9`"><code>a26af69</code></a> Bump ts-jest from 29.1.2 to 29.3.2 (<a href="https://redirect.github.com/actions/setup-python/issues/1081">#1081</a>)</li> <li><a href="`30eafe9548`"><code>30eafe9</code></a> Bump prettier from 2.8.8 to 3.5.3 (<a href="https://redirect.github.com/actions/setup-python/issues/1046">#1046</a>)</li> <li><a href="`5d95bc16d4`"><code>5d95bc1</code></a> Bump semver and <code>@types/semver</code> (<a href="https://redirect.github.com/actions/setup-python/issues/1091">#1091</a>)</li> <li><a href="`6ed2c67c8a`"><code>6ed2c67</code></a> Fix for Candidate Not Iterable Error (<a href="https://redirect.github.com/actions/setup-python/issues/1082">#1082</a>)</li> <li><a href="`e348410e00`"><code>e348410</code></a> Remove Ubuntu 20.04 from workflows due to deprecation from 2025-04-15 (<a href="https://redirect.github.com/actions/setup-python/issues/1065">#1065</a>)</li> <li>See full diff in <a href="https://github.com/actions/setup-python/compare/v5.5.0...a26af69be951a213d495a4c3e4e4022e16d87065">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-python&package-manager=github_actions&previous-version=5.5.0&new-version=5.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-05-05 10:25:45 +02:00
Ihar Hrachyshka	16e163da0e	docs: List external kubeflow pipelines provider prototype (#2100 ) # What does this PR do? Lists another external provider example (kfp). Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-05 10:24:52 +02:00
Alexey Rybak	15a1648be6	fix(installer): harden install.sh for Podman macOS (#2068 ) # What does this PR do? Several fixes to ensure the script runs properly on macOS & Podman: - Automates Podman VM startup on macOS - Fixes host-gateway handling - Adds explicit ARM64 platform overrides (this also fixes the platform warning on Docker) - Switches health checks to in-container exec calls to avoid Podman timeouts - Minor formatting nits # (Closes #2064 ) ## Test Plan - Manual testing on macOS and Podman	2025-05-05 00:31:58 -07:00
Ashwin Bharambe	d27a0f276c	fix: pytest.mark.skip, not pytest.skip	2025-05-04 13:22:06 -07:00
github-actions[bot]	6b4c218788	build: Bump version to 0.2.5	2025-05-03 21:31:01 +00:00
Ashwin Bharambe	c69f14bfaa	fix: disable rag_and_code_agent test because no code interpreter anymore	2025-05-03 14:29:06 -07:00
Christian Zaccaria	9f27578929	fix: improve Mermaid diagram visibility in dark mode (#2092 ) # What does this PR do? Closes #2078 Previously, the Agent Execution Loop diagram was barely visible in dark mode: ![image](https://github.com/user-attachments/assets/78567334-c57f-4cd0-ba93-290b20ed3aba) I experimented with styling individual classes, but ultimately found that adding an off-white background provides the best visibility in both dark and light modes: ![image](https://github.com/user-attachments/assets/419d153a-d870-410b-b635-02b95da67a3d) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan The documentation can be built locally by following the docs: https://llama-stack.readthedocs.io/en/latest/contributing/index.html#building-the-documentation [//]: # (## Documentation)	2025-05-02 13:09:45 -07:00
Ben Browning	f1b103e6c8	fix: openai_compat messages system/assistant non-str content (#2095 ) # What does this PR do? When converting OpenAI message content for the "system" and "assistant" roles to Llama Stack inference APIs (used for some providers when dealing with Llama models via OpenAI API requests to get proper prompt / tool handling), we were not properly converting any non-string content. I discovered this while running the new Responses AI verification suite against the Fireworks provider, but instead of fixing it as part of some ongoing work there split this out into a separate PR. This fixes that, by using the `openai_content_to_content` helper we used elsewhere to ensure content parts were mapped properly. ## Test Plan I added a couple of new tests to `test_openai_compat` to reproduce this issue and validate its fix. I ran those as below: ``` python -m pytest -s -v tests/unit/providers/utils/inference/test_openai_compat.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-02 13:09:27 -07:00
Ashwin Bharambe	272d3359ee	fix: remove code interpeter implementation (#2087 ) # What does this PR do? The builtin implementation of code interpreter is not robust and has a really weak sandboxing shell (the `bubblewrap` container). Given the availability of better MCP code interpreter servers coming up, we should use them instead of baking an implementation into the Stack and expanding the vulnerability surface to the rest of the Stack. This PR only does the removal. We will add examples with how to integrate with MCPs in subsequent ones. ## Test Plan Existing tests.	2025-05-01 14:35:08 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
ehhuang	ffe3d0b2cd	fix: nullable param type for function call (#2086 ) Nullable param type is not supported, e.g. ['string', 'null'], since it fails type validation. Tests: Run inference with messages: - content: You are a helpful assistant that can use tools to get information. role: system - content: What's the temperature in San Francisco in celsius? role: user tools: - function: description: Get current temperature for a given location. name: get_weather parameters: additionalProperties: false properties: location: description: "City and country e.g. Bogot\xE1, Colombia" type: string unit: description: "Unit of temperature, default to celsius" type: [string, "null"] # <= nullable type required: - location type: object type: function Co-authored-by: Eric Huang <erichuang@fb.com>	2025-05-01 13:17:36 -07:00
Matthew Farrellee	88a796ca5a	fix: allow use of models registered at runtime (#1980 ) # What does this PR do? fix a bug where models registered at runtime could not be used. ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct $ curl http://localhost:8321/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "test-model", "messages": [{"role": "user", "content": "What is the weather like in Boston today?"}] }' =(client)=> {"detail":"Internal server error: An unexpected error occurred."} =(server)=> TypeError: Missing required arguments; Expected either ('messages' and 'model') or ('messages', 'model' and 'stream') arguments to be given ``` root cause: test-model is not added to ModelRegistryHelper's alias_to_provider_id_map. as part of the fix, this adds tests for ModelRegistryHelper and defines its expected behavior. user visible behavior changes - \| action \| existing behavior \| new behavior \| \| -- \| -- \| -- \| \| double register \| success (but no change) \| error \| \| register unknown \| success (fail when used) \| error \| existing behavior for register unknown model and double register - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown Successfully registered model test-model $ llama-stack-client models list \| grep test-model │ llm │ test-model │ meta/llama-3.1-70b-instruct-unknown │ │ nv… │ $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct Successfully registered model test-model $ llama-stack-client models list \| grep test-model │ llm │ test-model │ meta/llama-3.1-70b-instruct-unknown │ │ nv… │ ``` new behavior for register unknown - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to register model │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model id │ │ 'meta/llama-3.1-70b-instruct-unknown' is not supported. Supported ids are: │ │ meta/llama-3.1-70b-instruct, snowflake/arctic-embed-l, meta/llama-3.2-1b-instruct, │ │ nvidia/nv-embedqa-mistral-7b-v2, meta/llama-3.2-90b-vision-instruct, meta/llama-3.2-3b-instruct, │ │ meta/llama-3.2-11b-vision-instruct, meta/llama-3.1-405b-instruct, meta/llama3-8b-instruct, │ │ meta/llama3-70b-instruct, nvidia/llama-3.2-nv-embedqa-1b-v2, meta/llama-3.1-8b-instruct, │ │ nvidia/nv-embedqa-e5-v5"} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` new behavior for double register - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct Successfully registered model test-model $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.2-1b-instruct ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to register model │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model id 'test-model' is already │ │ registered. Please use a different id or unregister it first."} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` ## Test Plan ``` uv run pytest -v tests/unit/providers/utils/test_model_registry.py ```	2025-05-01 12:00:58 -07:00
Derek Higgins	64829947d0	feat: Add temperature support to responses API (#2065 ) # What does this PR do? Add support for the temperature to the responses API ## Test Plan Manually tested simple case unit tests added for simple case and tool calls Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-01 11:47:58 -07:00
Ihar Hrachyshka	f36f68c590	ci: Disable no-commit-to-branch (#2084 ) All merges produced by github are pushes to main, which makes the check fail. The check is local by design, not meant for CI. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 11:43:43 -07:00
Ben Browning	6378c2a2f3	fix: resolve BuiltinTools to strings for vllm tool_call messages (#2071 ) # What does this PR do? When the result of a ToolCall gets passed back into vLLM for the model to handle the tool call result (as is often the case in agentic tool-calling workflows), we forgot to handle the case where BuiltinTool calls are not string values but instead instances of the BuiltinTool enum. This fixes that, properly converting those enums to string values before trying to serialize them into an OpenAI chat completion request to vLLM. PR #1931 fixed a bug where we weren't passing these tool calling results back into vLLM, but as a side-effect it created this serialization bug when using BuiltinTools. Closes #2070 ## Test Plan I added a new unit test to the openai_compat unit tests to cover this scenario, ensured the new test failed before this fix, and all the existing tests there plus the new one passed with this fix. ``` python -m pytest -s -v tests/unit/providers/utils/inference/test_openai_compat.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-01 08:47:29 -04:00
Ashwin Bharambe	293d95b955	fix: pre-commit cleanup	2025-04-30 15:08:14 -07:00
Sébastien Han	dc94433072	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 ) # What does this PR do? Add several new pre-commit hooks to improve code quality and security: - no-commit-to-branch: prevent direct commits to protected branches like `main` - check-yaml: validate YAML files - detect-private-key: prevent accidental commit of private keys - requirements-txt-fixer: maintain consistent requirements.txt format and sorting - mixed-line-ending: enforce LF line endings to avoid mixed line endings - check-executables-have-shebangs: ensure executable scripts have shebangs - check-json: validate JSON files - check-shebang-scripts-are-executable: verify shebang scripts are executable - check-symlinks: validate symlinks and report broken ones - check-toml: validate TOML files mainly for pyproject.toml The respective fixes have been included. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 11:35:49 -07:00
Nathan Weinberg	d897313e0b	feat: add additional logging to llama stack build (#1689 ) # What does this PR do? Partial revert of `fa68ded07c` this commit ensures users know where their new templates are generated and how to run the newly built distro locally discussion on Discord: `1351652390` ## Test Plan Did a local run - let me know if we want any unit testing covering this ![Screenshot from 2025-03-18 22-38-18](https://github.com/user-attachments/assets/6d5dac52-edad-4a84-992f-a3c23cda10c8) ## Documentation Updated "Zero to Hero" guide with new output --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-30 11:06:24 -07:00
Sébastien Han	2c7aba4158	fix: enforce stricter ASCII rules lint rules in Ruff (#2062 ) # What does this PR do? - Added new Ruff lint rules to detect ambiguous or non-ASCII characters: - Added per-file ignores where Unicode usage is still required. - Fixed whatever had to be fixed Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 18:05:27 +02:00
Jash Gulabrai	eab550f7d2	fix: Fix messages format in NVIDIA safety check request body (#2063 ) # What does this PR do? When running a Llama Stack server and invoking the `/v1/safety/run-shield` endpoint, the NVIDIA Guardrails endpoint in some cases errors with a `422: Unprocessable Entity` due to malformed input. For example, given an request body like: ``` { "model": "test", "messages": [ { "role": "user", "content": "You are stupid." } ] } ``` `convert_pydantic_to_json_value` converts the message to: ``` { "role": "user", "content": "You are stupid.", "context": null } ``` Which causes NVIDIA Guardrails to return an error `HTTPError: 422 Client Error: Unprocessable Entity for url: http://nemo.test/v1/guardrail/checks`, because `context` shouldn't be included in the body. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I ran the Llama Stack server locally and manually verified that the endpoint now succeeds. ``` message = {"role": "user", "content": "You are stupid."} response = client.safety.run_shield(messages=[message], shield_id=shield_id, params={}) ``` Server logs: ``` 14:29:09.656 [START] /v1/safety/run-shield INFO: 127.0.0.1:54616 - "POST /v1/safety/run-shield HTTP/1.1" 200 OK 14:29:09.918 [END] /v1/safety/run-shield [StatusCode.OK] (262.26ms ``` [//]: # (## Documentation) Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-30 18:01:28 +02:00
Sébastien Han	4412694018	chore: Remove zero-width space characters from OTEL service name env var defaults (#2060 ) # What does this PR do? Replaced `${env.OTEL_SERVICE_NAME:\u200B}` and similar variants with properly formatted `${env.OTEL_SERVICE_NAME:}` across all YAML templates and TelemetryConfig. This prevents silent parsing issues and ensures consistent environment variable resolution. Slipped in https://github.com/meta-llama/llama-stack/pull/2058 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 17:56:46 +02:00
Sébastien Han	653e8526ec	chore(ci): misc Ollama improvements (#2052 ) # What does this PR do? * pull the embedding model so that it's not pulled during the distro server startup sequence * cache the models * collect logs at the end of the workflow Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 07:05:28 -07:00
Derek Higgins	78ef6a6099	chore: Increase unit test coverage of routing_tables.py (#2057 ) # What does this PR do? Adds some unit tests for the routing logic ## Test Plan Overall unit test coverage goes from TOTAL 12434 8030 35% to TOTAL 12434 7871 37% Better coverage on router.py, before: ``` llama_stack/distribution/routers/routers.py \| 342 \| 219 \| 0 \| 36% llama_stack/distribution/routers/routing_tables.py \| 346 \| 236 \| 0 \| 32% ``` After: ``` llama_stack/distribution/routers/routers.py \| 342 \| 219 \| 0 \| 36% llama_stack/distribution/routers/routing_tables.py \| 349 \| 89 \| 0 \| 74% ``` Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-30 16:00:43 +02:00
Derek Higgins	17b5302543	fix: Fix precommit-hook (#2059 ) Distribution Template Codegen was broken # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-30 12:03:19 +02:00
Alexey Rybak	afd7e750d9	ci: add UBI 9 container-build gate (#2039 ) # What does this PR do? * new workflow job build-ubi9-container-distribution * runs on the default `ubuntu-latest` runner * uses the existing `dev` template * invokes `uv run llama stack build` with `.container_base = "registry.access.redhat.com/ubi9/ubi-minimal:latest"` * inspects the resulting image to verify its entrypoint # (Closes #1994) ## Test Plan - CI now includes the `build-ubi9-container-distribution` job and will turn green when that job passes on changes to build files	2025-04-30 09:52:57 +02:00
Roland Huß	5a2bfd6ad5	refactor: Replace SQLITE_DB_PATH by SQLITE_STORE_DIR env in templates (#2055 ) # What does this PR do? The telemetry provider configs is the only one who leverages the env var `SQLITE_DB_PATH` for pointing to persistent data in the respective templates, whereas usually `SQLITE_STORE_DIR` is used. This PR modifies the `sqlite_db_path` in various telemetry configuration files to use the environment variable `SQLITE_STORE_DIR` instead of `SQLITE_DB_PATH`. This change ensures that _only_ the SQLITE_STORE_DIR needs to be set to point to a different persistence location for providers. All references to `SQLITE_DB_PATH` have been removed. Another improvement could be to move `sqlite_db_path` to `db_path` in the telemetry provider config, to align with the other provider configurations. That could be done by another PR (if wanted).	2025-04-29 15:28:10 -07:00
Yuan Tang	7532f4cdb2	chore(github-deps): bump astral-sh/setup-uv from 5 to 6 (#2051 ) # What does this PR do? This builds on top of https://github.com/meta-llama/llama-stack/pull/2037 to include some additional changes to fix integration tests builds. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-29 20:41:41 +02:00
Ashwin Bharambe	799286fe52	fix: Bump version to 0.2.4	2025-04-29 10:34:17 -07:00
Ashwin Bharambe	4d0bfbf984	feat: add api.llama provider, llama-guard-4 model (#2058 ) This PR adds a llama-stack inference provider for `api.llama.com`, as well as adds entries for Llama-Guard-4 and updated Prompt-Guard models.	2025-04-29 10:07:41 -07:00
Ben Browning	934446ddb4	fix: ollama still using tools with `tool_choice="none"` (#2047 ) # What does this PR do? In our OpenAI API verification tests, ollama was still calling tools even when `tool_choice="none"` was passed in its chat completion requests. Because ollama isn't respecting `tool_choice` properly, this adjusts our provider implementation to remove the `tools` from the request if `tool_choice="none"` is passed in so that it does not attempt to call any of those tools. ## Test Plan I tested this with a couple of Llama models, using both our OpenAI completions integration tests and our verification test suites. ### OpenAI Completions / Chat Completions integration tests These all passed before, and still do. ``` INFERENCE_MODEL="llama3.2:3b-instruct-fp16" \ llama stack build --template ollama --image-type venv --run ``` ``` LLAMA_STACK_CONFIG=http://localhost:8321 \ python -m pytest -v \ tests/integration/inference/test_openai_completion.py \ --text-model "llama3.2:3b-instruct-fp16" ``` ### OpenAI API Verification test suite test_chat_*_tool_choice_none OpenAI API verification tests pass now, when they failed before. See https://github.com/bbrowning/llama-stack-tests/blob/main/openai-api-verification/2025-04-27.md#ollama-llama-stack for an example of these failures from a recent nightly CI run. ``` INFERENCE_MODEL="llama3.3:70b-instruct-q3_K_M" \ llama stack build --template ollama --image-type venv --run ``` ``` cat <<-EOF > tests/verifications/conf/ollama-llama-stack.yaml base_url: http://localhost:8321/v1/openai/v1 api_key_var: OPENAI_API_KEY models: - llama3.3:70b-instruct-q3_K_M model_display_names: llama3.3:70b-instruct-q3_K_M: Llama-3.3-70B-Instruct test_exclusions: llama3.3:70b-instruct-q3_K_M: - test_chat_non_streaming_image - test_chat_streaming_image - test_chat_multi_turn_multiple_images EOF ``` ``` python -m pytest -s -v \ 'tests/verifications/openai_api/test_chat_completion.py' \ --provider=ollama-llama-stack ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-29 10:45:28 +02:00
Kevin Postlethwait	2aca7265b3	fix: add todo for schema validation (#1991 ) # What does this PR do? Change validation to TODO same as was done [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/eval/meta_reference/eval.py#L87) until validation can be implemented Closes #1849 ## Test Plan Signed-off-by: Kevin <kpostlet@redhat.com>	2025-04-29 09:59:35 +02:00
Michael Clifford	fe9b5ef08b	fix: tools page on playground resets agent after every interaction (#2044 ) # What does this PR do? This PR updates how the `AgentType` gets set using the radio button on the tools page of the playground. This change is needed due to the fact with its current implementation, the chat interface will resets after every input, preventing users from having a multi-turn conversation with the agent. ## Test Plan Run the Playground without these changes: ```bash streamlit run llama_stack/distribution/ui/app.py ``` Navigate to the tools page and attempt to have a multi-turn conversation. You should see the conversation reset after asking a second question. Repeat the steps above with these changes and you will see that it works as expected when asking the agent multiple questions. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-28 23:13:27 +02:00
Sébastien Han	7807a86358	ci: simplify external provider integration test (#2050 ) Do not run Ollama, but only validate that the provider was loaded by the server. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-28 23:10:27 +02:00
Ben Browning	8dfce2f596	feat: OpenAI Responses API (#1989 ) # What does this PR do? This provides an initial [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses) implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like `previous_response_id`. ## Test Plan I've added a new `tests/integration/openai_responses/test_openai_responses.py` as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the `openai_chat_completion` endpoint. ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack build --template remote-vllm --image-type venv --run ``` ``` LLAMA_STACK_CONFIG="http://localhost:8321" \ python -m pytest -v \ tests/integration/openai_responses/test_openai_responses.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-04-28 14:06:00 -07:00
Sébastien Han	79851d93aa	feat: Add Kubernetes authentication (#1778 ) # What does this PR do? This commit adds a new authentication system to the Llama Stack server with support for Kubernetes and custom authentication providers. Key changes include: - Implemented KubernetesAuthProvider for validating Kubernetes service account tokens - Implemented CustomAuthProvider for validating tokens against external endpoints - this is the same code that was already present. - Added test for Kubernetes - Updated server configuration to support authentication settings - Added documentation for authentication configuration and usage The authentication system supports: - Bearer token validation - Kubernetes service account token validation - Custom authentication endpoints ## Test Plan Setup a Kube cluster using Kind or Minikube. Run a server with: ``` server: port: 8321 auth: provider_type: kubernetes config: api_server_url: http://url ca_cert_path: path/to/cert (optional) ``` Run: ``` curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers ``` Or replace "my-user" with your service account. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-28 22:24:58 +02:00
Rashmi Pawar	e6bbf8d20b	feat: Add NVIDIA NeMo datastore (#1852 ) # What does this PR do? Implemetation of NeMO Datastore register, unregister API. Open Issues: - provider_id gets set to `localfs` in client.datasets.register() as it is specified in routing_tables.py: DatasetsRoutingTable see: #1860 Currently I have passed `"provider_id":"nvidia"` in metadata and have parsed that in `DatasetsRoutingTable` (Not the best approach, but just a quick workaround to make it work for now.) ## Test Plan - Unit test cases: `pytest tests/unit/providers/nvidia/test_datastore.py` ```bash ========================================================== test session starts =========================================================== platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 2 items tests/unit/providers/nvidia/test_datastore.py .. [100%] ============================================================ warnings summary ============================================================ ====================================================== 2 passed, 1 warning in 0.84s ====================================================== ``` cc: @dglogo, @mattf, @yanxi0830	2025-04-28 09:41:59 -07:00
dependabot[bot]	c149cf2e0f	chore(github-deps): bump actions/setup-python from 5.5.0 to 5.6.0 (#2038 ) [//]: # (dependabot-start) ⚠️ Dependabot is rebasing this PR ⚠️ Rebasing might not happen immediately, so don't worry if this takes some time. Note: if you make any changes to this PR yourself, they will take precedence over the rebase. --- [//]: # (dependabot-end) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.5.0 to 5.6.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-python/releases">actions/setup-python's releases</a>.</em></p> <blockquote> <h2>v5.6.0</h2> <h2>What's Changed</h2> <ul> <li>Workflow updates related to Ubuntu 20.04 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1065">actions/setup-python#1065</a></li> <li>Fix for Candidate Not Iterable Error by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1082">actions/setup-python#1082</a></li> <li>Upgrade semver and <code>@types/semver</code> by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1091">actions/setup-python#1091</a></li> <li>Upgrade prettier from 2.8.8 to 3.5.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1046">actions/setup-python#1046</a></li> <li>Upgrade ts-jest from 29.1.2 to 29.3.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1081">actions/setup-python#1081</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-python/compare/v5...v5.6.0">https://github.com/actions/setup-python/compare/v5...v5.6.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`a26af69be9`"><code>a26af69</code></a> Bump ts-jest from 29.1.2 to 29.3.2 (<a href="https://redirect.github.com/actions/setup-python/issues/1081">#1081</a>)</li> <li><a href="`30eafe9548`"><code>30eafe9</code></a> Bump prettier from 2.8.8 to 3.5.3 (<a href="https://redirect.github.com/actions/setup-python/issues/1046">#1046</a>)</li> <li><a href="`5d95bc16d4`"><code>5d95bc1</code></a> Bump semver and <code>@types/semver</code> (<a href="https://redirect.github.com/actions/setup-python/issues/1091">#1091</a>)</li> <li><a href="`6ed2c67c8a`"><code>6ed2c67</code></a> Fix for Candidate Not Iterable Error (<a href="https://redirect.github.com/actions/setup-python/issues/1082">#1082</a>)</li> <li><a href="`e348410e00`"><code>e348410</code></a> Remove Ubuntu 20.04 from workflows due to deprecation from 2025-04-15 (<a href="https://redirect.github.com/actions/setup-python/issues/1065">#1065</a>)</li> <li>See full diff in <a href="`8d9ed9ac5c...a26af69be9`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-python&package-manager=github_actions&previous-version=5.5.0&new-version=5.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-28 11:46:29 +02:00
Alexey Rybak	1050837622	feat: Llama Stack Meta Reference installation script (#1383 ) # What does this PR do? Add installation script for Llama Stack Meta Reference distro (Docker only). # Closes #1374 ## Test Plan ./instal.sh --------- Co-authored-by: Sébastien Han <seb@redhat.com>	2025-04-28 11:25:59 +02:00
Yuan Tang	921ce36480	docs: Add changelog for v0.2.2 and v0.2.3 (#2040 ) # What does this PR do? It's still not automated yet. See description in https://github.com/meta-llama/llama-stack/pull/1899 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-27 11:46:13 -07:00
Yuan Tang	28687b0e85	fix: Bump h11 to 0.16.0 to fix cve-2025-43859 (#2041 ) This resolves a new critical severity on h11. See https://access.redhat.com/security/cve/cve-2025-43859. We should consider releasing a new patch with this fix. This was updated via: ``` uv add "h11>=0.16.0" uv export --frozen --no-hashes --no-emit-project --output-file=requirements.txt ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-27 11:45:35 -07:00
Sajikumar JS	6cf6791de1	fix: updated watsonx inference chat apis with new repo changes (#2033 ) # What does this PR do? There are new changes in repo which needs to add some additional functions to the inference which is fixed. Also need one additional params to pass some extra arguments to watsonx.ai [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Co-authored-by: Sajikumar JS <sajikumar.js@ibm.com>	2025-04-26 10:17:52 -07:00
ehhuang	0266b20535	docs: update prompt_format.md for llama4 (#2035 ) torchrun --nproc_per_node=8 scripts/generate_prompt_format.py meta-llama/Llama-4-Scout-17B-16E-Instruct ~/local/checkpoints/<path>/ llama_stack.models.llama.llama4.prompts llama_stack/models/llama/llama4/prompt_format.md Co-authored-by: Eric Huang <erichuang@fb.com>	2025-04-25 15:52:15 -07:00
Ashwin Bharambe	bb1a85c9a0	fix: make sure test works equally well against llama stack as a server	2025-04-25 15:24:11 -07:00
Jash Gulabrai	8713d67ce3	fix: Correctly parse algorithm_config when launching NVIDIA customization job; fix internal request handler (#2025 ) # What does this PR do? This addresses 2 bugs I ran into when launching a fine-tuning job with the NVIDIA Adapter: 1. Session handling in `_make_request` helper function returns an error. ``` INFO: 127.0.0.1:55831 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 500 Internal Server Error 16:11:45.643 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (270.44ms) 16:11:45.643 [ERROR] Error executing endpoint route='/v1/post-training/supervised-fine-tune' method='post' Traceback (most recent call last): File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 201, in endpoint return await maybe_await(value) File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 161, in maybe_await return await value File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/providers/remote/post_training/nvidia/post_training.py", line 408, in supervised_fine_tune response = await self._make_request( File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/providers/remote/post_training/nvidia/post_training.py", line 98, in _make_request async with self.session.request(method, url, params=params, json=json, **kwargs) as response: File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/aiohttp/client.py", line 1425, in __aenter__ self._resp: _RetType = await self._coro File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/aiohttp/client.py", line 579, in _request handle = tm.start() File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/aiohttp/helpers.py", line 587, in start return self._loop.call_at(when, self.__call__) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 724, in call_at self._check_closed() File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 510, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed ``` Note: This only occurred when initializing the client like so: ``` client = LlamaStackClient( base_url="http://0.0.0.0:8321" ) response = client.post_training.supervised_fine_tune(...) # Returns error ``` I didn't run into this issue when using the library client: ``` client = LlamaStackAsLibraryClient("nvidia") client.initialize() response = client.post_training.supervised_fine_tune(...) # Works fine ``` 2. The `algorithm_config` param in `supervised_fine_tune` is parsed as a `dict` when run from unit tests, but a Pydantic model when invoked using the Llama Stack client. So, the call fails outside of unit tests: ``` INFO: 127.0.0.1:54024 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 500 Internal Server Error 21:14:02.315 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (71.18ms) 21:14:02.314 [ERROR] Error executing endpoint route='/v1/post-training/supervised-fine-tune' method='post' Traceback (most recent call last): File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 205, in endpoint return await maybe_await(value) File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 164, in maybe_await return await value File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/providers/remote/post_training/nvidia/post_training.py", line 407, in supervised_fine_tune "adapter_dim": algorithm_config.get("adapter_dim"), File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/pydantic/main.py", line 891, in __getattr__ raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') AttributeError: 'LoraFinetuningConfig' object has no attribute 'get' ``` The code assumes `algorithm_config` should be `dict`, so I just handle both cases. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan 1. I ran a local Llama Stack server with the necessary env vars: ``` lama stack run llama_stack/templates/nvidia/run.yaml --port 8321 --env ... ``` And invoked `supervised_fine_tune` to confirm neither of the errors above occur. ``` client = LlamaStackClient( base_url="http://0.0.0.0:8321" ) response = client.post_training.supervised_fine_tune(...) ``` 2. I confirmed the unit tests still pass: `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_supervised_fine_tuning.py` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-25 13:21:50 -07:00
Ashwin Bharambe	b5d8e44e81	fix: only sleep for tests when they pass or fail	2025-04-25 13:16:22 -07:00
ehhuang	1b2e116a2a	fix: tool call encoded twice (#2034 ) # What does this PR do? ## Test Plan LLAMA_STACK_CONFIG=http://localhost:5002 pytest -s -v tests/integration/inference --safety-shield meta-llama/Llama-Guard-3-8B --vision-model meta-llama/Llama-4-Scout-17B-16E-Instruct --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct	2025-04-25 13:16:16 -07:00
Ashwin Bharambe	4fb583b407	fix: check that llama stack client plain can be used as a subst for OpenAI client (#2032 ) With https://github.com/meta-llama/llama-stack-client-python/pull/226, now we have llama-stack-client be able to used as a substitute for OpenAI client (duck-typed) so you don't need to change downstream library code. <img width="1399" alt="image" src="https://github.com/user-attachments/assets/abab6bfd-e6ff-4a7d-a965-fd93e3c105d7" />	2025-04-25 12:23:33 -07:00
Derek Higgins	0e4307de0f	docs: Fix missing --gpu all flag in Docker run commands (#2026 ) adding the --gpu all flag to Docker run commands for meta-reference-gpu distributions ensures models are loaded into GPU instead of CPU. Remove docs for meta-reference-quantized-gpu The distribution was removed in #1887 but these files were left behind. Fixes: #1798 # What does this PR do? Fixes doc to add --gpu all command to docker run [//]: # (If resolving an issue, uncomment and update the line below) Closes #1798 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] verified in docker documentation but untested --------- Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-25 12:17:31 -07:00
Sébastien Han	1deab94ea0	chore: exclude test, provider, and template directories from coverage (#2028 ) # What does this PR do? Introduce a `.coveragerc` file to omit: - test files (/tests/) - provider code (/llama_stack/providers/) - template files (/llama_stack/templates/) - virtual environment (.venv/*) This ensures coverage reports focus on core application logic (API and CLI). Note: I'm opening this for discussing as well - we might decide to ignore more and or re-add some directories! Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-25 12:16:57 -07:00
Sajikumar JS	1bb1d9b2ba	feat: Add watsonx inference adapter (#1895 ) # What does this PR do? IBM watsonx ai added as the inference [#1741 ](https://github.com/meta-llama/llama-stack/issues/1741) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) --------- Co-authored-by: Sajikumar JS <sajikumar.js@ibm.com>	2025-04-25 11:29:21 -07:00
ehhuang	29072f40ab	feat: new system prompt for llama4 (#2031 ) Tests: LLAMA_STACK_CONFIG=http://localhost:5002 pytest -s -v tests/integration/inference --safety-shield meta-llama/Llama-Guard-3-8B --vision-model meta-llama/Llama-4-Scout-17B-16E-Instruct --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct Co-authored-by: Eric Huang <erichuang@fb.com>	2025-04-25 11:29:08 -07:00
Ashwin Bharambe	4bbd0c0693	fix: add endpoint route debugs	2025-04-25 10:40:12 -07:00
Andy Xie	f5dae0517c	feat: Support ReAct Agent on Tools Playground (#2012 ) # What does this PR do? ReAct prompting attempts to use the Thinking, Action, Observation loop to improve the model's reasoning ability via prompt engineering. With this PR, it now supports the various features in Streamlit's playground: 1. Adding the selection box for choosing between Agent Type: normal, ReAct. 2. Adding the Thinking, Action, Observation loop streamlit logic for ReAct agent, as seen in many LLM clients. 3. Improving tool calling accuracies via ReAct prompting, e.g. using web_search. Folded ![react_output_folded png](https://github.com/user-attachments/assets/bf1bdce7-e6ef-455d-b6b0-c22a64e9d5c1) Collapsed ![react_output_collapsed](https://github.com/user-attachments/assets/cda2fc17-df0b-400d-971c-988de821f2a4) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Run the playground and uses reasoning prompts to see for yourself. Steps to test the ReAct agent mode: 1. Setup a llama-stack server as [getting_started](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) describes. 2. Setup your Web Search API keys under `llama_stack/distribution/ui/modules/api.py`. 3. Run the streamlit playground and try ReAct agent, possibly with `websearch`, with the command: `streamlit run llama_stack/distribution/ui/app.py`. ## Test Process Current results are demonstrated with `llama-3.2-3b-instruct`. Results will vary with different models. You should be seeing clear distinction with normal agent and ReAct agent. Example prompts listed below: 1. Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with? 2. What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into? ## Example Test Results Web search on AppleTV <img width="1440" alt="normal_output_appletv" src="https://github.com/user-attachments/assets/bf6b3273-1c94-4976-8b4a-b2d82fe41330" /> <img width="1440" alt="react_output_appletv" src="https://github.com/user-attachments/assets/687f1feb-88f4-4d32-93d5-5013d0d5fe25" /> Web search on Colorado <img width="1440" alt="normal_output_colorado" src="https://github.com/user-attachments/assets/10bd3ad4-f2ad-466d-9ce0-c66fccee40c1" /> <img width="1440" alt="react_output_colorado" src="https://github.com/user-attachments/assets/39cfd82d-2be9-4e2f-9f90-a2c4840185f7" /> Web search tool + MCP Slack server <img width="1250" alt="normal_output_search_slack png" src="https://github.com/user-attachments/assets/72e88125-cdbf-4a90-bcb9-ab412c51d62d" /> <img width="1217" alt="react_output_search_slack" src="https://github.com/user-attachments/assets/8ae04efb-a4fd-49f6-9465-37dbecb6b73e" /> ![slack_screenshot](https://github.com/user-attachments/assets/bb70e669-6067-462a-bdf6-7aaac6ccbcef)	2025-04-25 17:01:51 +02:00
Roland Huß	121c73c2f5	feat(cli): add interactive tab completion for image type selection (#2027 ) # What does this PR do? Enhances the user experience in the `llama stack build` command by adding interactive TAB completion for image type selection. This ensures the UX consistency with other parts of the CLI that already support tab completion, such as provider selection, providing a more intuitive and discoverable interface for users. <img width="1531" alt="image" src="https://github.com/user-attachments/assets/12161d45-451d-4820-b34d-7ea4decf810f" />	2025-04-25 16:57:42 +02:00
Surya Prakash Pathak	59b7593609	feat: Enhance tool display in Tools sidebar by simplifying tool identifiers (#2024 ) # What does this PR do? This PR improves the Tools page in the LlamaStack Playground UI by enhancing the readability of the active tool list shown in the sidebar. - Previously, active tools were displayed in a flat JSON array with verbose identifiers (e.g., builtin::code_interpreter:code_interpreter). - This PR updates the logic to group tools by their toolgroup (e.g., builtin::websearch) and renders each tool name in a simplified, human-readable format (e.g., web_search). - This change improves usability when working with multiple toolgroups, especially in configurations involving MCP tools or complex tool identifiers. Before and After Comparison: Before ![Screenshot 2025-04-24 at 1 05 47 PM](https://github.com/user-attachments/assets/44843a79-49dc-4b4d-ab28-c6187f9bb5ba) After ![Screenshot 2025-04-24 at 1 24 08 PM](https://github.com/user-attachments/assets/ebb01006-e0a9-4664-a95a-e6f72eea6f94) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - Followed the [LlamaStack UI Developer Setup instructions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distribution/ui) - Ran the Streamlit UI via: `uv run --with "[.ui]" streamlit run llama_stack/distribution/ui/app.py` - Selected multiple built-in toolgroups (e.g., code_interpreter, websearch, wolfram_alpha) from the sidebar. [//]: # (## Documentation)	2025-04-25 10:22:22 +02:00
Kevin Postlethwait	d9e00fca66	fix: specify nbformat version in nb (#2023 ) # What does this PR do? Adding nbformat version fixes this issue. Not sure exactly why this needs to be done, but this version was rewritten to the bottom of a nb file when I changed its name trying to get to the bottom of this. When I opened it on GH the issue was no longer present Closes #1837 ## Test Plan N/A	2025-04-25 10:10:37 +02:00
Rashmi Pawar	ace82836c1	feat: NVIDIA allow non-llama model registration (#1859 ) # What does this PR do? Adds custom model registration functionality to NVIDIAInferenceAdapter which let's the inference happen on: - post-training model - non-llama models in API Catalogue(behind https://integrate.api.nvidia.com and endpoints compatible with AyncOpenAI) ## Example Usage: ```python from llama_stack.apis.models import Model, ModelType from llama_stack.distribution.library_client import LlamaStackAsLibraryClient client = LlamaStackAsLibraryClient("nvidia") _ = client.initialize() client.models.register( model_id=model_name, model_type=ModelType.llm, provider_id="nvidia" ) response = client.inference.chat_completion( model_id=model_name, messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Write a limerick about the wonders of GPU computing."}], ) ``` ## Test Plan ```bash pytest tests/unit/providers/nvidia/test_supervised_fine_tuning.py ========================================================== test session starts =========================================================== platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0 collected 6 items tests/unit/providers/nvidia/test_supervised_fine_tuning.py ...... [100%] ============================================================ warnings summary ============================================================ ../miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076 /home/ubuntu/miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ warn( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================== 6 passed, 1 warning in 1.51s ====================================================== ``` [//]: # (## Documentation) Updated Readme.md cc: @dglogo, @sumitb, @mattf	2025-04-24 17:13:33 -07:00
Jash Gulabrai	cc77f79f55	feat: Add NVIDIA Eval integration (#1890 ) # What does this PR do? This PR adds support for NVIDIA's NeMo Evaluator API to the Llama Stack eval module. The integration enables users to evaluate models via the Llama Stack interface. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] 1. Added unit tests and successfully ran from root of project: `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_eval.py` ``` tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_cancel PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_result PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_status PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_register_benchmark PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_run_eval PASSED ``` 2. Verified I could build the Llama Stack image: `LLAMA_STACK_DIR=$(pwd) llama stack build --template nvidia --image-type venv` Documentation added to `llama_stack/providers/remote/eval/nvidia/README.md` --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-24 17:12:42 -07:00
Ben Browning	0b6cd45950	fix: Additional streaming error handling (#2007 ) # What does this PR do? This expands the `test_sse` test suite and fixes some edge cases with bugs in our SSE error handling to ensure streaming clients always get a proper error response. First, we handle the case where a client disconnects before we actually start streaming the response back. Previously we only handled the case where a client disconnected as we were streaming the response, but there was an edge case where a client disconnecting before we streamed any response back did not trigger our logic to cleanly handle that disconnect. Second, we handle the case where an error is thrown from the server before the actual async generator gets created from the provider. This happens in scenarios like the newly merged OpenAI API input validation, where we eagerly raise validation errors before returning the async generator object that streams the responses back. ## Test Plan Tested via: ``` python -m pytest -s -v tests/unit/server/test_sse.py ``` Both test cases failed before, and passed afterwards. The test cases were written based on me experimenting with actual clients that would do bad things like randomly disconnect or send invalid input in streaming mode and I hit these two cases, where things were misbehaving in our error handling. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-24 17:01:45 -07:00
Derek Higgins	c8797f1125	fix: Including tool call in chat (#1931 ) Include the tool call details with the chat when doing Rag with Remote vllm Fixes: #1929 With this PR the tool call is included in the chat returned to vllm, the model (meta-llama/Llama-3.1-8B-Instruct) the returns the answer as expected. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-24 16:59:10 -07:00
ehhuang	7ed137e963	fix: meta ref inference (#2022 ) MAX_BATCH_SIZE=10 LLAMA_MODELS_DEBUG=1 LLAMA_STACK_PORT=5002 LLAMA_STACK_LOGGING='all=info' llama stack run meta-reference-gpu --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct --env INFERENCE_CHECKPOINT_DIR=... LLAMA_STACK_CONFIG=http://localhost:5002/ pytest -s -v tests/integration/inference --safety-shield meta-llama/Llama-Guard-3-8B --vision-model meta-llama/Llama-4-Scout-17B-16E-Instruct --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct Co-authored-by: Eric Huang <erichuang@fb.com>	2025-04-24 13:03:35 -07:00
Ashwin Bharambe	a5d6ab16b2	fix: meta-reference parallel utils bug, use isinstance not equality	2025-04-24 11:27:49 -07:00
Francisco Arceo	70488abe9c	chore: Remove `distributions/` from integration, external provider, and unit tests (#2018 ) # What does this PR do? Remove `distributions/` from integration, external provider, and unit tests [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan N/A [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-24 11:39:31 -04:00
Francisco Arceo	dc0d4763a0	chore: Update External Providers CI to not run on changes to docs, rfcs, and scripts (#2009 ) # What does this PR do? Update External Providers CI to not run on changes to docs, rfcs, and scripts [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-24 11:24:07 -04:00
Ilya Kolchinsky	e664ba91d8	fix: prevent the knowledge search tool from confusing the model with long content (#1908 ) # What does this PR do? This PR addresses the content dominance problem that frequently arises with multiple models when executing queries with the RAG tool. When the retrieved content is too large, it disproportionately influences the generation process, causing the model to ignore the original question and to provide meaningless comments on the retrieved information instead. This situation is especially common with agentic RAG, which is the standard way of doing RAG in Llama Stack, since directly manipulating the prompt combining the query with the retrieved content is not possible. This PR appends a grounding message to the results returned by the knowledge search tool, reminding the model about the original query and the purpose of the inference call. This makes the problem significantly less likely to occur. ## Test Plan Running the following script before the fix demonstrates the content dominance problem where the model insists to comment on the retrieved content and refuses to address the question. Running the script after the fix results in getting the correct answer. ``` import os import uuid from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient # the server endpoint LLAMA_STACK_SERVER_URL = "http://localhost:8321" # inference settings MODEL_ID = ""meta-llama/Llama-3.1-8B-Instruct" SYSTEM_PROMPT = "You are a helpful assistant. " # RAG settings VECTOR_DB_EMBEDDING_MODEL = "all-MiniLM-L6-v2" VECTOR_DB_EMBEDDING_DIMENSION = 384 VECTOR_DB_CHUNK_SIZE = 512 # initialize the server connection client = LlamaStackClient(base_url=os.environ.get("LLAMA_STACK_ENDPOINT", LLAMA_STACK_SERVER_URL)) # init the RAG retrieval parameters vector_db_id = f"test_vector_db_{uuid.uuid4()}" vector_providers = [ provider for provider in client.providers.list() if provider.api == "vector_io" ] vector_provider_to_use = vector_providers[0] # define and register the document collection to be used client.vector_dbs.register( vector_db_id=vector_db_id, embedding_model=VECTOR_DB_EMBEDDING_MODEL, embedding_dimension=VECTOR_DB_EMBEDDING_DIMENSION, provider_id=vector_provider_to_use.provider_id, ) # ingest the documents into the newly created document collection urls = [ ("https://www.openshift.guide/openshift-guide-screen.pdf", "application/pdf"), ] documents = [ RAGDocument( document_id=f"num-{i}", content=url, mime_type=url_type, metadata={}, ) for i, (url, url_type) in enumerate(urls) ] client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=VECTOR_DB_CHUNK_SIZE, ) queries = [ "How to install OpenShift?", ] # initializing the agent agent = Agent( client, model=MODEL_ID, instructions=SYSTEM_PROMPT, # we make our agent aware of the RAG tool by including builtin::rag/knowledge_search in the list of tools tools=[ dict( name="builtin::rag/knowledge_search", args={ "vector_db_ids": [vector_db_id], # list of IDs of document collections to consider during retrieval }, ) ], ) for prompt in queries: print(f"User> {prompt}") # create a new turn with a new session ID for each prompt response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=agent.create_session(f"rag-session_{uuid.uuid4()}") ) # print the response, including tool calls output for log in AgentEventLogger().log(response): print(log.content, end='') ```	2025-04-24 16:38:38 +02:00
Sébastien Han	14e60e3c02	feat: include run.yaml in the container image (#2005 ) As part of the build process, we now include the generated run.yaml (based of the provided build configuration file) into the container. We updated the entrypoint to use this run configuration as well. Given this simple distribution configuration: ``` # build.yaml version: '2' distribution_spec: description: Use (an external) Ollama server for running LLM inference providers: inference: - remote::ollama vector_io: - inline::faiss safety: - inline::llama-guard agents: - inline::meta-reference telemetry: - inline::meta-reference eval: - inline::meta-reference datasetio: - remote::huggingface - inline::localfs scoring: - inline::basic - inline::llm-as-judge - inline::braintrust tool_runtime: - remote::brave-search - remote::tavily-search - inline::code-interpreter - inline::rag-runtime - remote::model-context-protocol - remote::wolfram-alpha container_image: "registry.access.redhat.com/ubi9" image_type: container image_name: test ``` Build it: ``` llama stack build --config build.yaml ``` Run it: ``` podman run --rm \ -p 8321:8321 \ -e OLLAMA_URL=http://host.containers.internal:11434 \ --name llama-stack-server \ localhost/leseb-test:0.2.2 ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-24 11:29:53 +02:00
Charlie Doern	a673697858	chore: rename ramalama provider (#2008 ) # What does this PR do? the ramalama team has decided to rename their external provider `ramalama-stack` (more catchy!). Update docs accordingly Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-04-24 09:34:15 +02:00
Ben Browning	fa5dfee07b	fix: Return HTTP 400 for OpenAI API validation errors (#2002 ) # What does this PR do? When clients called the Open AI API with invalid input that wasn't caught by our own Pydantic API validation but instead only caught by the backend inference provider, that backend inference provider was returning a HTTP 400 error. However, we were wrapping that into a HTTP 500 error, obfuscating the actual issue from calling clients and triggering OpenAI client retry logic. This change adjusts our existing `translate_exception` method in `server.py` to wrap `openai.BadRequestError` as HTTP 400 errors, passing through the string representation of the error message to the calling user so they can see the actual input validation error and correct it. I tried changing this in a few other places, but ultimately `translate_exception` was the only real place to handle this for both streaming and non-streaming requests across all inference providers that use the OpenAI server APIs. This also tightens up our validation a bit for the OpenAI chat completions API, to catch empty `messages` parameters, invalid `tool_choice` parameters, invalid `tools` items, or passing `tool_choice` when `tools` isn't given. Lastly, this extends our OpenAI API chat completions verifications to also check for consistent input validation across providers. Providers behind Llama Stack should automatically pass all the new tests due to the input validation added here, but some of the providers fail this test when not run behind Llama Stack due to differences in how they handle input validation and errors. (Closes #1951) ## Test Plan To test this, start an OpenAI API verification stack: ``` llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml ``` Then, run the new verification tests with your provider(s) of choice: ``` python -m pytest -s -v \ tests/verifications/openai_api/test_chat_completion.py \ --provider openai-llama-stack python -m pytest -s -v \ tests/verifications/openai_api/test_chat_completion.py \ --provider together-llama-stack ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-23 17:48:32 +02:00
Nathan Weinberg	6a44e7ba20	docs: add API to external providers table (#2006 ) Also does a minor reorg of the columns Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-23 15:58:10 +02:00
Michael Clifford	64f747fe09	feat: add tool name to chat output in playground (#1996 ) # What does this PR do? This PR adds the name of the tool that is used by the agent on the "tools" page of the playground. See image below for an example. ![Screenshot 2025-04-18 at 3 14 18 PM](https://github.com/user-attachments/assets/04e97783-4003-4121-9446-9e0ad7209256) ## Test Plan Run the playground and navigate to the tools page. There users can see that this additional text is present when tools are invoked and absent when they are not. ``` streamlit run llama_stack/distribution/ui/app.py ``` Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-23 15:57:54 +02:00
Ben Browning	dc46725f56	fix: properly handle streaming client disconnects (#2000 ) # What does this PR do? Previously, when a streaming client would disconnect before we were finished streaming the entire response, an error like the below would get raised from the `sse_generator` function in `llama_stack/distribution/server/server.py`: ``` AttributeError: 'coroutine' object has no attribute 'aclose'. Did you mean: 'close'? ``` This was because we were calling `aclose` on a coroutine instead of the awaited value from that coroutine. This change fixes that, so that we save off the awaited value and then can call `aclose` on it if we encounter an `asyncio.CancelledError`, like we see when a client disconnects before we're finished streaming. The other changes in here are to add a simple set of tests for the happy path of our SSE streaming and this client disconnect path. That unfortunately requires adding one more dependency into our unit test section of pyproject.toml since `server.py` requires loading some of the telemetry code for me to test this functionality. ## Test Plan I wrote the tests in `tests/unit/server/test_sse.py` first, verified the client disconnected test failed before my change, and that it passed afterwards. ``` python -m pytest -s -v tests/unit/server/test_sse.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-23 15:44:28 +02:00
Kevin Postlethwait	e0fa67c81c	docs: add examples for how to define RAG docs (#1981 ) # What does this PR do? Add examples for how to define RAGDocuments. Not sure if this is the best place for these docs. @raghotham Please advise ## Test Plan None, documentation [//]: # (## Documentation) Signed-off-by: Kevin <kpostlet@redhat.com>	2025-04-23 15:39:18 +02:00
Ilya Kolchinsky	deee355952	fix: Added lazy initialization of the remote vLLM client to avoid issues with expired asyncio event loop (#1969 ) # What does this PR do? Closes #1968. The asynchronous client in `VLLMInferenceAdapter` is now initialized directly before first use and not in `VLLMInferenceAdapter.initialize`. This prevents issues arising due to accessing an expired event loop from a completed `asyncio.run`. ## Test Plan Ran unit tests, including `test_remote_vllm.py`. Ran the code snippet mentioned in #1968. --------- Co-authored-by: Sébastien Han <seb@redhat.com>	2025-04-23 15:33:19 +02:00
Ilya Kolchinsky	d39462d073	feat: Hide tool output under an expander in Playground UI (#2003 ) # What does this PR do? Now, tool outputs and retrieved chunks from the vector DB (i.e., everything except for the actual model reply) are hidden under an expander form when presented to the user. # Test Plan Navigate to the RAG page in the Playground UI.	2025-04-23 15:32:12 +02:00
Nathan Weinberg	d6e88e0bc6	docs: add RamaLama to list of known external providers (#2004 ) The RamaLama project now has an external provider offering for Llama Stack: https://github.com/containers/llama-stack-provider-ramalama See also: https://github.com/meta-llama/llama-stack/pull/1676 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-23 09:44:18 +02:00
Ben Browning	825ce39879	fix: Together provider shutdown and default to non-streaming (#2001 ) # What does this PR do? The together inference provider was throwing a stack trace every time it shut down, as it was trying to call a non-existent `close` method on the AsyncTogether client. While fixing that, I also adjusted its shutdown logic to close the OpenAI client if we've created one of those, as that client does have a `close` method. In testing that, I also realized we were defaulting to treating all requests as streaming requests instead of defaulting to non-streaming. So, this flips that default to non-streaming to match how the other providers work. ## Test Plan I tested this by ensuring the together inference provider no longer spits out a long stack trace when shutting it down and by running the OpenAI API chat completion verification suite to ensure the change in default streaming logic didn't mess anything else up. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-22 17:47:53 +02:00
Michael Clifford	e4d001c4e4	feat: cleanup sidebar formatting on tools playground (#1998 ) # What does this PR do? This PR cleans up the sidebar on the tools page of the playground in the following ways: * created a clearer hierarchy of configuration options and tool selections. * Removed the `mcp::` or `builtin::` prefixes from the tool selection buttons. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run the playground and see the updated sidebar does not cause any new errors. ``` streamlit run llama_stack/distribution/ui/app.py ``` [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-22 10:40:37 +02:00
Kevin Postlethwait	3110ad1e7c	fix: update ref to raw_errors due to new version of pydantic (#1995 ) `37da47ef8e (diff-4d7c51b1efe9043e44439a949dfd92e5827321b34082903477fd04876edb7552)` Pydantic was updated from v1 to v2 in this commit which caused this breaking change # What does this PR do? Part of #1857 This won't fix the Validation error with the example, but it will correctly supply user with a proper error rather than a 5xx code. Signed-off-by: Kevin <kpostlet@redhat.com>	2025-04-21 11:50:12 -07:00
Ben Browning	602e949a46	fix: OpenAI Completions API and Fireworks (#1997 ) # What does this PR do? We were passing a dict into the compat mixin for OpenAI Completions when using Llama models with Fireworks, and that was breaking some strong typing code that was added in openai_compat.py. We shouldn't have been converting these params to a dict in that case anyway, so this adjusts things to pass the params in as their actual original types when calling the OpenAIChatCompletionToLlamaStackMixin. ## Test Plan All of the fireworks provider verification tests were failing due to some OpenAI compatibility cleanup in #1962. The changes in that PR were good to make, and this just cleans up the fireworks provider code to stop passing in untyped dicts to some of those `openai_compat.py` methods since we have the original strongly-typed parameters we can pass in. ``` llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml ``` ``` python -m pytest -s -v tests/verifications/openai_api/test_chat_completion.py --provider=fireworks-llama-stack ``` Before this PR, all of the fireworks OpenAI verification tests were failing. Now, most of them are passing. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-21 11:49:12 -07:00
Jash Gulabrai	0d06c654d0	feat: Update NVIDIA to GA docs; remove notebook reference until ready (#1999 ) # What does this PR do? - Update NVIDIA documentation links to GA docs - Remove reference to notebooks until merged [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-18 19:13:18 -04:00
Sébastien Han	94f83382eb	feat: allow building distro with external providers (#1967 ) # What does this PR do? We can now build a distribution that includes external providers. Closes: https://github.com/meta-llama/llama-stack/issues/1948 ## Test Plan Build a distro with an external provider following the doc instructions. [//]: # (## Documentation) Added. Rendered: ![Screenshot 2025-04-18 at 11 26 39](https://github.com/user-attachments/assets/afcf3d50-8d30-48c3-8d24-06a4b3662881) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-18 17:18:28 +02:00
Yuan Tang	c4570bcb48	docs: Add tips for debugging remote vLLM provider (#1992 ) # What does this PR do? This is helpful when debugging issues with vLLM + Llama Stack after this PR https://github.com/vllm-project/vllm/pull/15593 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-18 14:47:47 +02:00
Matthew Farrellee	9845631d51	feat: update nvidia inference provider to use model_store (#1988 ) # What does this PR do? NVIDIA Inference provider was using the ModelRegistryHelper to map input model ids to provider model ids. this updates it to use the model_store. ## Test Plan `LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest -v tests/integration/inference/{test_embedding.py,test_text_inference.py,test_openai_completion.py} --embedding-model nvidia/llama-3.2-nv-embedqa-1b-v2 --text-model=meta-llama/Llama-3.1-70B-Instruct`	2025-04-18 10:16:43 +02:00
Alexey Rybak	e72b1076ca	fix(build): add UBI 9 compiler tool‑chain (#1983 ) # What does this PR do? Fixes the UBI 9 container build failure ( `error: command 'gcc' failed` when installing `polyleven`, `faiss`, etc.) by installing the missing compiler tool‑chain: - `python3.11-devel gcc` make added to the UBI 9 `dnf install` line. ### Closes #1970 ## Test Plan - Build a distro with an UBI image	2025-04-18 09:49:10 +02:00
Yuan Tang	4c6b7005fa	fix: Fix docs lint issues (#1993 ) # What does this PR do? This was not caught as part of the CI build: `dd62a2388c`. [This PR](https://github.com/meta-llama/llama-stack/pull/1354) was too old and didn't include the additional CI builds yet. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-18 02:33:13 -04:00
AN YU (安宇)	dd62a2388c	docs: add notes to websearch tool and two extra example scripts (#1354 ) # What does this PR do? - Adds a note about unexpected Brave Search output appearing even when Tavily Search is called. This behavior is expected for now and is a work in progress https://github.com/meta-llama/llama-stack/issues/1229. The note aims to clear any confusion for new users. - Adds two example scripts demonstrating how to build an agent using: 1. WebSearch tool 2. WolframAlpha tool These examples provide new users with an instant understanding of how to integrate these tools. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Tested these example scripts using following steps: step 1. `ollama run llama3.2:3b-instruct-fp16 --keepalive 60m` step 2. ``` export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" export LLAMA_STACK_PORT=8321 ``` step 3: `llama stack run --image-type conda ~/llama-stack/llama_stack/templates/ollama/run.yaml` step 4: run the example script with your api keys. expected output: ![image](https://github.com/user-attachments/assets/308ddb17-a087-4cf2-8622-b085174ea0ab) ![image](https://github.com/user-attachments/assets/639f239f-8966-433d-943c-ee6b304c0d71) [//]: # (## Documentation)	2025-04-17 20:20:52 -04:00
ehhuang	0ed41aafbf	test: add multi_image test (#1972 ) # What does this PR do? ## Test Plan pytest tests/verifications/openai_api/test_chat_completion.py --provider openai -k 'test_chat_multiple_images'	2025-04-17 12:51:42 -07:00
ehhuang	2976b5d992	fix: OAI compat endpoint for meta reference inference provider (#1962 ) Test plan: python tests/verifications/generate_report.py --providers fireworks,together,llama_meta_ref,openai Co-authored-by: Eric Huang <erichuang@fb.com>	2025-04-17 11:16:04 -07:00
ehhuang	8bd6665775	chore(verification): update README and reorganize generate_report.py (#1978 ) # What does this PR do? ## Test Plan uv run --with-editable ".[dev]" python tests/verifications/generate_report.py --run-tests	2025-04-17 10:41:22 -07:00
Sébastien Han	cb874287a4	fix: resync api spec (#1987 )	2025-04-17 11:36:04 -04:00
Alexey Rybak	326cbba579	feat(agents): add agent naming functionality (#1922 ) # What does this PR do? Allow users to name an agent and use the name in telemetry instead of relying on randomly generated agent_ids. This improves the developer experience by making it easier to find specific agents in telemetry logs. Closes #1832 ## Test Plan - Added tests to verify the agent name is properly stored and retrieved - Ran `uv run -- pytest -v tests/integration/telemetry/test_telemetry.py::test_agent_name_filtering` from the root of the project and made sure the tests pass - Ran `uv run -- pytest -v tests/integration/telemetry/test_telemetry.py::test_agent_query_spans` to verify existing code without agent names still works correctly ## Use Example ``` agent = Agent( llama_stack_client, model=text_model_id, name="CustomerSupportAgent", # New parameter instructions="You are a helpful customer support assistant" ) session_id = agent.create_session(f"test-session-{uuid4()}") ``` ## Implementation Notes - Agent names are optional string parameters with no additional validation - Names are not required to be unique - multiple agents can have the same name - The agent_id remains the unique identifier for an agent --------- Co-authored-by: raghotham <raghotham@gmail.com>	2025-04-17 07:02:47 -07:00
Ben Browning	5b8e75b392	fix: OpenAI spec cleanup for assistant requests (#1963 ) # What does this PR do? Some of our multi-turn verification tests were failing because I had accidentally marked content as a required field in the OpenAI chat completion request assistant messages, but it's actually optional. It is required for messages from other roles, but assistant is explicitly allowed to be optional. Similarly, the assistant message tool_calls field should default to None instead of an empty list. These two changes get the openai-llama-stack verification test back to 100% passing, just like it passes 100% when not behind Llama Stack. They also increase the pass rate of some of the other providers in the verification test, but don't get them to 100%. ## Test Plan I started a Llama Stack server setup to run all the verification tests (requires OPENAI_API_KEY env variable) ``` llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml ``` Then, I manually ran the verification tests to see which were failing, fix them, and ran them again after these changes to ensure they were all passing. ``` python -m pytest -s -v tests/verifications/openai_api/test_chat_completion.py --provider=openai-llama-stack ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-17 06:56:10 -07:00
Matthew Farrellee	4205376653	chore: add meta/llama-3.3-70b-instruct as supported nvidia inference provider model (#1985 ) see https://build.nvidia.com/meta/llama-3_3-70b-instruct	2025-04-17 06:50:40 -07:00
Jash Gulabrai	2ae1d7f4e6	docs: Add NVIDIA platform distro docs (#1971 ) # What does this PR do? Add NVIDIA platform docs that serve as a starting point for Llama Stack users and explains all supported microservices. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-17 05:54:30 -07:00
Jash Gulabrai	45e08ff417	fix: Handle case when Customizer Job status is unknown (#1965 ) # What does this PR do? This PR handles the case where a Customization Job's status is `unknown`. Since we don't map `unknown` to a valid `JobStatus`, the PostTraining provider throws an exception when fetching/listing a job. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_supervised_fine_tuning.py` succeeds [//]: # (## Documentation) Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-17 10:27:07 +02:00
Ihar Hrachyshka	6f97f9a593	chore: Use hashes to pull actions for build-single-provider job (#1977 ) Other jobs already use hashes. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-17 10:26:08 +02:00
Alexey Rybak	8f57b08f2c	fix(build): always pass path when no template/config provided (#1982 ) # What does this PR do? Fixes a crash that occurred when building a stack as a container image via the interactive wizard without supplying --template or --config. - Root cause: template_or_config was None; only the container path relies on that parameter, which later reaches subprocess.run() and triggers `TypeError: expected str, bytes or os.PathLike object, not NoneType.` - Change: in `_run_stack_build_command_from_build_config` we now fall back to the freshly‑written build‑spec file whenever both optional sources are missing. Also adds a spy‑based unit test that asserts a valid string path is passed to build_image() for container builds. ### Closes #1976 ## Test Plan - New unit test: test_build_path.py. Monkey‑patches build_image, captures the fourth argument, and verifies it is a real path - Manual smoke test: ``` llama stack build --image-type container # answer wizard prompts ``` Build proceeds into Docker without raising the previous TypeError. ## Future Work Harmonise `build_image` arguments so every image type receives the same inputs, eliminating this asymmetric special‑case.	2025-04-17 10:20:43 +02:00
Sébastien Han	6ed92e03bc	fix: print traceback on build failure (#1966 ) # What does this PR do? Build failures are hard to read, sometimes we get errors like: ``` Error building stack: 'key' ``` Which are difficult to debug without a proper trace. ## Test Plan If `llama stack build` fails you get a traceback now. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-17 09:45:21 +02:00
Michael Clifford	f12011794b	fix: Updated tools playground to allow vdb selection (#1960 ) # What does this PR do? This PR lets users select an existing vdb to use with their agent on the tools page of the playground. The drop down menu that lets users select a vdb only appears when the rag tool is selected. Without this change, there is no way for a user to specify which vdb they want their rag tool to use on the tools page. I have intentionally left the RAG options sparse here since the full RAG options are exposed on the RAG page. ## Test Plan Without these changes the RAG tool will throw the following error: `name: knowledge_search) does not have any content ` With these changes the RAG tool works as expected. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-17 09:29:40 +02:00
ehhuang	b44f84ce18	test: disable flaky dataset (#1979 ) # What does this PR do? ## Test Plan	2025-04-16 15:33:37 -07:00
Jash Gulabrai	30fc66923b	fix: Add llama-3.2-1b-instruct to NVIDIA fine-tuned model list (#1975 ) # What does this PR do? Adds `meta/llama-3.2-1b-instruct` to list of models that NeMo Customizer can fine-tune. This is the model our example notebooks typically use for fine-tuning. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-16 15:02:08 -07:00
Francisco Arceo	00b232c282	chore: Fix to persist the theme preference across page navigation. (#1974 ) # What does this PR do? This PR persists the theme preference across page navigation. Currently, if the default theme is detected, it is used. But if a user flips _the default theme_ and goes to a new page, the theme will switch back to the default. This resolves that issue. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-16 13:58:25 -07:00
Daniel Alvarez Sanchez	b5a9ef4c6d	fix: Do not send an empty 'tools' list to remote vllm (#1957 ) Fixes: #1955 Since 0.2.0, the vLLM gets an empty list (vs ``None``in 0.1.9 and before) when there are no tools configured which causes the issue described in #1955 p. This patch avoids sending the 'tools' param to the vLLM altogether instead of an empty list. It also adds a small unit test to avoid regressions. The OpenAI [specification](https://platform.openai.com/docs/api-reference/chat/create) does not explicitly state that the list cannot be empty but I found this out through experimentation and it might depend on the actual remote vllm. In any case, as this parameter is Optional, is best to skip it altogether if there's no tools configured. Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>	2025-04-15 20:31:12 -04:00
Chirag Modi	fb8ff77ff2	docs: 0.2.2 doc updates (#1961 ) Add updates to android site readme for 0.2.2	2025-04-15 13:26:17 -07:00
Michael Clifford	093881071a	fix: add max_tokens slider to playground tools page (#1958 ) # What does this PR do? This PR adds a `max_tokens` slider to playground tools page. I have found that in some instances the llama stack server throws a 500 error if the max_tokens value is not explicitly set in the agent's `sampling_params`. This PR, uses the same implementation of the `max_tokens` slider from the chat page, and includes it on the tools page. ## Test Plan 1. Attempting to call a tool without these changes results in a `500: Internal server error: An unexpected error occurred`. 2. Attempting to call a tool with these changes results in the expected output. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-15 09:11:08 -07:00
Dmitry Rogozhkin	71ed47ea76	docs: add example for intel gpu in vllm remote (#1952 ) # What does this PR do? PR adds instructions to setup vLLM remote endpoint for vllm-remote llama stack distribution. ## Test Plan * Verified with manual tests of the configured vllm-remote against vllm endpoint running on the system with Intel GPU * Also verified with ci pytests (see cmdline below). Test passes in the same capacity as it does on the A10 Nvidia setup (some tests do fail which seems to be known issues with vllm remote llama stack distribution) ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config=http://localhost:5001 \ --text-model=meta-llama/Llama-3.2-3B-Instruct ``` CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-04-15 07:56:23 -07:00
Charlie Doern	83b5523e2d	feat: add `--providers` to llama stack build (#1718 ) # What does this PR do? allow users to specify only the providers they want in the llama stack build command. If a user wants a non-interactive build, but doesn't want to use a template, `--providers` allows someone to specify something like `--providers inference=remote::ollama` for a distro with JUST ollama ## Test Plan `llama stack build --providers inference=remote::ollama --image-type venv` <img width="1084" alt="Screenshot 2025-03-20 at 9 34 14 AM" src="https://github.com/user-attachments/assets/502b5fa2-edab-4267-a595-4f987204a6a9" /> `llama stack run --image-type venv /Users/charliedoern/projects/Documents/llama-stack/venv-run.yaml` <img width="1149" alt="Screenshot 2025-03-20 at 9 35 19 AM" src="https://github.com/user-attachments/assets/433765f3-6b7f-4383-9241-dad085b69228" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-04-15 14:17:03 +02:00
ehhuang	32e3da7392	test(verification): more tests, multiturn tool use tests (#1954 ) # What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests `f27f617629/tests/verifications/REPORT.md`	2025-04-14 18:45:22 -07:00
Peter Double	86c6f1f112	fix: FastAPI built-in paths bypass custom routing (Docs) and update r… (#1841 ) ## What does this PR do? This PR improves the server's request routing logic by ensuring built-in FastAPI paths such as `/docs`, `/redoc`, `/openapi.json`, `/favicon.ico`, and `/static` bypass the custom `TracingMiddleware`. This prevents unnecessary tracing logic for documentation and static file requests, ensuring better performance and cleaner logs. Additionally, it adds proper metadata (`title`, `description`, and `version`) to the FastAPI application initialization and updates the requirements document accordingly. [//]: # (Closes #1822 ) --- ## Test Plan - Ran the server locally with `uvicorn` using the provided `run.yaml` config - Verified that: - FastAPI docs (`/docs`, `/redoc`) load correctly without triggering the custom tracing middleware - All other routes still go through the middleware and trace logic - Application metadata appears as expected in the OpenAPI docs To reproduce: 1. Start the server with `python server.py --template <template-name>` 2. Navigate to `/docs` and `/redoc` 3. Confirm that no extra trace headers are added for those routes 4. Confirm other API endpoints behave as expected and include `x-trace-id` in the response headers [//]: # (## Documentation) --- Froze the requirements file to include many of the other libraries that have been added in the past few releases to make install easier. --------- Co-authored-by: Sébastien Han <seb@redhat.com>	2025-04-14 13:28:25 -04:00
Nathan Weinberg	cf158f2cb9	feat: allow ollama to use 'latest' if available but not specified (#1903 ) # What does this PR do? ollama's CLI supports running models via commands such as 'ollama run llama3.2' this syntax does not work with the INFERENCE_MODEL llamastack var as currently specifying a tag such as 'latest' is required this commit will check to see if the 'latest' model is available and use that model if a user passes a model name without a tag but the 'latest' is available in ollama ## Test Plan Behavior pre-code change ```bash $ INFERENCE_MODEL=llama3.2 llama stack build --template ollama --image-type venv --run ... INFO 2025-04-08 13:42:42,842 llama_stack.providers.remote.inference.ollama.ollama:80 inference: checking connectivity to Ollama at `http://beanlab1.bss.redhat.com:11434`... Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/server/server.py", line 502, in <module> main() File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/server/server.py", line 401, in main impls = asyncio.run(construct_stack(config)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 691, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/stack.py", line 222, in construct_stack await register_resources(run_config, impls) File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/stack.py", line 99, in register_resources await method(*obj.model_dump()) File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 294, in register_model registered_model = await self.register_object(model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 228, in register_object registered_obj = await register_object_with_provider(obj, p) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 77, in register_object_with_provider return await p.register_model(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/remote/inference/ollama/ollama.py", line 315, in register_model raise ValueError( ValueError: Model 'llama3.2' is not available in Ollama. Available models: llama3.2:latest ++ error_handler 108 ++ echo 'Error occurred in script at line: 108' Error occurred in script at line: 108 ++ exit 1 ``` Behavior post-code change ```bash $ INFERENCE_MODEL=llama3.2 llama stack build --template ollama --image-type venv --run ... INFO 2025-04-08 13:58:17,365 llama_stack.providers.remote.inference.ollama.ollama:80 inference: checking connectivity to Ollama at `http://beanlab1.bss.redhat.com:11434`... WARNING 2025-04-08 13:58:18,190 llama_stack.providers.remote.inference.ollama.ollama:317 inference: Imprecise provider resource id was used but 'latest' is available in Ollama - using 'llama3.2:latest' INFO 2025-04-08 13:58:18,191 llama_stack.providers.remote.inference.ollama.ollama:308 inference: Pulling embedding model `all-minilm:latest` if necessary... INFO 2025-04-08 13:58:18,799 __main__:478 server: Listening on ['::', '0.0.0.0']:8321 INFO: Started server process [28378] INFO: Waiting for application startup. INFO 2025-04-08 13:58:18,803 __main__:148 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ... ``` ## Documentation Did not document this anywhere but happy to do so if there is an appropriate place Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-14 09:03:54 -07:00
Ihar Hrachyshka	3ed4316ed5	feat: Implement async job execution for torchtune training (#1437 ) # What does this PR do? Now a separate thread is started to execute training jobs. Training requests now return job ID before the job completes. (Which fixes API timeouts for any jobs that take longer than a minute.) Note: the scheduler code is meant to be spun out in the future into a common provider service that can be reused for different APIs and providers. It is also expected to back the /jobs API proposed here: https://github.com/meta-llama/llama-stack/discussions/1238 Hence its somewhat generalized form which is expected to simplify its adoption elsewhere in the future. Note: this patch doesn't attempt to implement missing APIs (e.g. cancel or job removal). This work will belong to follow-up PRs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Added unit tests for the scheduler module. For the API coverage, did manual testing and was able to run a training cycle on GPU. The initial call returned job ID before the training completed, as (now) expected. Artifacts are returned as expected. ``` JobArtifactsResponse(checkpoints=[{'identifier': 'meta-llama/Llama-3.2-3B-Instruct-sft-0', 'created_at': '2025-03-07T22:45:19.892714', 'epoch': 0, 'post_training_job_id': 'test-job2ee77104-2fd3-4a4e-84cf-f83f8b8f1f50', 'path': '/home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0', 'training_metrics': None}], job_uuid='test-job2ee77104-2fd3-4a4e-84cf-f83f8b8f1f50') ``` The integration test is currently disabled for the provider. I will look into how it can be enabled in a different PR / issue context. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-14 08:59:11 -07:00
Ben Browning	7641a5cd0b	fix: 100% OpenAI API verification for together and fireworks (#1946 ) # What does this PR do? TLDR: Changes needed to get 100% passing tests for OpenAI API verification tests when run against Llama Stack with the `together`, `fireworks`, and `openai` providers. And `groq` is better than before, at 88% passing. This cleans up the OpenAI API support for image message types (specifically `image_url` types) and handling of the `response_format` chat completion parameter. Both of these required a few more Pydantic model definitions in our Inference API, just to move from the not-quite-right stubs I had in place to something fleshed out to match the actual OpenAI API specs. As part of testing this, I also found and fixed a bug in the litellm implementation of openai_completion and openai_chat_completion, so the providers based on those should actually be working now. The method `prepare_openai_completion_params` in `llama_stack/providers/utils/inference/openai_compat.py` was improved to actually recursively clean up input parameters, including handling of lists, dicts, and dumping of Pydantic models to dicts. These changes were required to get to 100% passing tests on the OpenAI API verification against the `openai` provider. With the above, the together.ai provider was passing as well as it is without Llama Stack. But, since we have Llama Stack in the middle, I took the opportunity to clean up the together.ai provider so that it now also passes the OpenAI API spec tests we have at 100%. That means together.ai is now passing our verification test better when using an OpenAI client talking to Llama Stack than it is when hitting together.ai directly, without Llama Stack in the middle. And, another round of work for Fireworks to improve translation of incoming OpenAI chat completion requests to Llama Stack chat completion requests gets the fireworks provider passing at 100%. The server-side fireworks.ai tool calling support with OpenAI chat completions and Llama 4 models isn't great yet, but by pointing the OpenAI clients at Llama Stack's API we can clean things up and get everything working as expected for Llama 4 models. ## Test Plan ### OpenAI API Verification Tests I ran the OpenAI API verification tests as below and 100% of the tests passed. First, start a Llama Stack server that runs the `openai` provider with the `gpt-4o` and `gpt-4o-mini` models deployed. There's not a template setup to do this out of the box, so I added a `tests/verifications/openai-api-verification-run.yaml` to do this. First, ensure you have the necessary API key environment variables set: ``` export TOGETHER_API_KEY="..." export FIREWORKS_API_KEY="..." export OPENAI_API_KEY="..." ``` Then, run a Llama Stack server that serves up all these providers: ``` llama stack run \ --image-type venv \ tests/verifications/openai-api-verification-run.yaml ``` Finally, generate a new verification report against all these providers, both with and without the Llama Stack server in the middle. ``` python tests/verifications/generate_report.py \ --run-tests \ --provider \ together \ fireworks \ groq \ openai \ together-llama-stack \ fireworks-llama-stack \ groq-llama-stack \ openai-llama-stack ``` You'll see that most of the configurations with Llama Stack in the middle now pass at 100%, even though some of them do not pass at 100% when hitting the backend provider's API directly with an OpenAI client. ### OpenAI Completion Integration Tests with vLLM: I also ran the smaller `test_openai_completion.py` test suite (that's not yet merged with the verification tests) on multiple of the providers, since I had to adjust the method signature of openai_chat_completion a bit and thus had to touch lots of these providers to match. Here's the tests I ran there, all passing: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### OpenAI Completion Integration Tests with ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ### OpenAI Completion Integration Tests with together.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" llama stack build --template together --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct-Turbo" ``` ### OpenAI Completion Integration Tests with fireworks.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack build --template fireworks --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.1-8B-Instruct" --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-14 08:56:29 -07:00
Sébastien Han	68eeacec0e	docs: resync missing nvidia doc (#1947 ) # What does this PR do? Resync doc. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-14 15:09:16 +02:00
dependabot[bot]	2ec5879f14	chore(github-deps): bump astral-sh/setup-uv from 5.4.0 to 5.4.1 (#1881 ) Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 5.4.0 to 5.4.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v5.4.1 🌈 Add support for pep440 version specifiers</h2> <h2>Changes</h2> <p>With this release you can also use <a href="https://peps.python.org/pep-0440/#version-specifiers">pep440 version specifiers</a> as <code>required-version</code> in files<code>uv.toml</code>, <code>pyroject.toml</code> and in the <code>version</code> input:</p> <pre lang="yaml"><code>- name: Install a pep440-specifier-satisfying version of uv uses: astral-sh/setup-uv@v5 with: version: ">=0.4.25,<0.5" </code></pre> <h2>🐛 Bug fixes</h2> <ul> <li>Add support for pep440 version identifiers <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/353">#353</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known checksums for 0.6.10 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/345">#345</a>)</li> </ul> <h2>📚 Documentation</h2> <ul> <li>Add pep440 to docs header <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/355">#355</a>)</li> <li>Fix glob syntax link <a href="https://github.com/flying-sheep"><code>@flying-sheep</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/349">#349</a>)</li> <li>Add link to supported glob patterns <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/348">#348</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`0c5e2b8115`"><code>0c5e2b8</code></a> Add pep440 to docs header (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/355">#355</a>)</li> <li><a href="`794ea9455c`"><code>794ea94</code></a> Add support for pep440 version identifiers (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/353">#353</a>)</li> <li><a href="`2d49baf2b6`"><code>2d49baf</code></a> chore: update known checksums for 0.6.10 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/345">#345</a>)</li> <li><a href="`4fa25599ce`"><code>4fa2559</code></a> Fix glob syntax link (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/349">#349</a>)</li> <li><a href="`224dce1d79`"><code>224dce1</code></a> Add link to supported glob patterns (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/348">#348</a>)</li> <li>See full diff in <a href="`22695119d7...0c5e2b8115`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=5.4.0&new-version=5.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-14 14:33:43 +02:00
Yuan Tang	030ca4b2be	docs: Move Llama 4 instructions in a collapsed section (#1936 ) # What does this PR do? Currently the instructions for Llama 4 take quite some space before people can see the overview and other sections about Llama Stack. Moving this to a collapsed section would make it less verbose.	2025-04-14 14:14:59 +02:00
Matthew Farrellee	6d6b40983e	refactor: update integration test workflow (#1856 ) workflow - 0. Checkout 1. Install uv 2. Install Ollama 3. Pull Ollama image 4. Start Ollama in background 5. Set Up Environment and Install Dependencies 6. Wait for Ollama to start 7. Start Llama Stack server in background 8. Wait for Llama Stack server to be ready 9. Run Integration Tests changes - (4) starts the loading of the ollama model, it does not start ollama. the model will be loaded when used. this step is removed. (6) is handled in (2). this step is removed. (2) is renamed to reflect it's dual purpose.	2025-04-14 12:17:51 +02:00
Sébastien Han	69554158fa	feat: add health to all providers through providers endpoint (#1418 ) The `/v1/providers` now reports the health status of each provider when implemented. ``` curl -L http://127.0.0.1:8321/v1/providers\|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4072 100 4072 0 0 246k 0 --:--:-- --:--:-- --:--:-- 248k { "data": [ { "api": "inference", "provider_id": "ollama", "provider_type": "remote::ollama", "config": { "url": "http://localhost:11434" }, "health": { "status": "OK" } }, { "api": "vector_io", "provider_id": "faiss", "provider_type": "inline::faiss", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/faiss_store.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "safety", "provider_id": "llama-guard", "provider_type": "inline::llama-guard", "config": { "excluded_categories": [] }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "agents", "provider_id": "meta-reference", "provider_type": "inline::meta-reference", "config": { "persistence_store": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/agents_store.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "telemetry", "provider_id": "meta-reference", "provider_type": "inline::meta-reference", "config": { "service_name": "llama-stack", "sinks": "console,sqlite", "sqlite_db_path": "/Users/leseb/.llama/distributions/ollama/trace_store.db" }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "eval", "provider_id": "meta-reference", "provider_type": "inline::meta-reference", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/meta_reference_eval.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "datasetio", "provider_id": "huggingface", "provider_type": "remote::huggingface", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/huggingface_datasetio.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "datasetio", "provider_id": "localfs", "provider_type": "inline::localfs", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/localfs_datasetio.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "scoring", "provider_id": "basic", "provider_type": "inline::basic", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "scoring", "provider_id": "llm-as-judge", "provider_type": "inline::llm-as-judge", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "scoring", "provider_id": "braintrust", "provider_type": "inline::braintrust", "config": { "openai_api_key": "******" }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "brave-search", "provider_type": "remote::brave-search", "config": { "api_key": "****", "max_results": 3 }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "tavily-search", "provider_type": "remote::tavily-search", "config": { "api_key": "****", "max_results": 3 }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "code-interpreter", "provider_type": "inline::code-interpreter", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "rag-runtime", "provider_type": "inline::rag-runtime", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "model-context-protocol", "provider_type": "remote::model-context-protocol", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "wolfram-alpha", "provider_type": "remote::wolfram-alpha", "config": { "api_key": "******" }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } } ] } ``` Per providers too: ``` curl -L http://127.0.0.1:8321/v1/providers/ollama {"api":"inference","provider_id":"ollama","provider_type":"remote::ollama","config":{"url":"http://localhost:11434"},"health":{"status":"OK"}} ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-14 11:59:36 +02:00
Ashwin Bharambe	ff14773fa7	fix: update llama stack client dependency	2025-04-12 18:14:33 -07:00
Ashwin Bharambe	429f6de7d7	fix: misc fixes for tests kill horrible warnings	2025-04-12 17:12:11 -07:00
Ashwin Bharambe	8b4158169f	fix: dont check protocol compliance for experimental methods	2025-04-12 16:26:32 -07:00
ehhuang	ad86a68a32	feat: support '-' in tool names (#1807 ) # What does this PR do? titled ## Test Plan added new unit tests pytest -s -v tests/unit/models/llama/llama3/test_tool_utils.py	2025-04-12 14:23:03 -07:00
Ashwin Bharambe	ef3dc143ec	fix: test_registration was borked somehow	2025-04-12 12:04:01 -07:00
ehhuang	1e5bf6c19d	feat: update default tool use prompt (#1803 ) # What does this PR do? User reports in https://github.com/meta-llama/llama-stack/issues/1769#issuecomment-2755564632 that Agent uses tool even on a prompt 'Hello'. Updated the default prompt. Also move the instruction part out of `function_description` so that user can override it if desired. ## Test Plan <img width="1344" alt="image" src="https://github.com/user-attachments/assets/c606d65d-071f-4211-a719-b4742676acda" /> Also performance on 100 hotpotqa questions are similar to the current prompt.	2025-04-12 11:54:22 -07:00
Ashwin Bharambe	f34f22f8c7	feat: add batch inference API to llama stack inference (#1945 ) # What does this PR do? This PR adds two methods to the Inference API: - `batch_completion` - `batch_chat_completion` The motivation is for evaluations targeting a local inference engine (like meta-reference or vllm) where batch APIs provide for a substantial amount of acceleration. Why did I not add this to `Api.batch_inference` though? That just resulted in a _lot_ more book-keeping given the structure of Llama Stack. Had I done that, I would have needed to create a notion of a "batch model" resource, setup routing based on that, etc. This does not sound ideal. So what's the future of the batch inference API? I am not sure. Maybe we can keep it for true _asynchronous_ execution. So you can submit requests, and it can return a Job instance, etc. ## Test Plan Run meta-reference-gpu using: ```bash export INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct export INFERENCE_CHECKPOINT_DIR=../checkpoints/Llama-4-Scout-17B-16E-Instruct-20250331210000 export MODEL_PARALLEL_SIZE=4 export MAX_BATCH_SIZE=32 export MAX_SEQ_LEN=6144 LLAMA_MODELS_DEBUG=1 llama stack run meta-reference-gpu ``` Then run the batch inference test case.	2025-04-12 11:41:12 -07:00
Nathan Weinberg	854c2ad264	fix: misleading help text for 'llama stack build' and 'llama stack run' (#1910 ) # What does this PR do? current text for 'llama stack build' and 'llama stack run' says that if no argument is passed to '--image-name' that the active Conda environment will be used in reality, the active enviroment is used whether it is from conda, virtualenv, etc. ## Test Plan N/A ## Documentation N/A Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-12 01:19:11 -07:00
Charlie Doern	0751a960a5	feat: make training config fields optional (#1861 ) # What does this PR do? Today, supervised_fine_tune itself and the `TrainingConfig` class have a bunch of required fields that a provider implementation might not need. for example, if a provider wants to handle hyperparameters in its configuration as well as any type of dataset retrieval, optimizer or LoRA config, a user will still need to pass in a virtually empty `DataConfig`, `OptimizerConfig` and `AlgorithmConfig` in some cases. Many of these fields are intended to work specifically with llama models and knobs intended for customizing inline. Adding remote post_training providers will require loosening these arguments, or forcing users to pass in empty objects to satisfy the pydantic models. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-04-12 01:13:45 -07:00
Ashwin Bharambe	70a7e4d51e	fix: unhide python_start, python_end	2025-04-11 20:30:44 -07:00
Aidan Reilly	51492bd9b6	docs: Update docs and fix warning in start-stack.sh (#1937 ) Small docs update and an update for `start-stack.sh` with missing color and if statment logic. # What does this PR do? 1. Makes a small change to start-stack.sh to resolve this error: ```cmd /home/aireilly/.local/lib/python3.13/site-packages/llama_stack/distribution/start_stack.sh: line 76: [: missing ]' ``` 2. Adds a missing $GREEN colour to start-stack.sh 3. Updated `docs/source/getting_started/detailed_tutorial.md` with some small changes and corrections. ## Test Plan Procedures described in `docs/source/getting_started/detailed_tutorial.md` were verified on Linux Fedora 41.	2025-04-11 16:26:17 -07:00
raghotham	ed58a94b30	docs: fixes to quick start (#1943 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-04-11 13:41:23 -07:00
Ben Browning	2b2db5fbda	feat: OpenAI-Compatible models, completions, chat/completions (#1894 ) # What does this PR do? This stubs in some OpenAI server-side compatibility with three new endpoints: /v1/openai/v1/models /v1/openai/v1/completions /v1/openai/v1/chat/completions This gives common inference apps using OpenAI clients the ability to talk to Llama Stack using an endpoint like http://localhost:8321/v1/openai/v1 . The two "v1" instances in there isn't awesome, but the thinking is that Llama Stack's API is v1 and then our OpenAI compatibility layer is compatible with OpenAI V1. And, some OpenAI clients implicitly assume the URL ends with "v1", so this gives maximum compatibility. The openai models endpoint is implemented in the routing layer, and just returns all the models Llama Stack knows about. The following providers should be working with the new OpenAI completions and chat/completions API: * remote::anthropic (untested) * remote::cerebras-openai-compat (untested) * remote::fireworks (tested) * remote::fireworks-openai-compat (untested) * remote::gemini (untested) * remote::groq-openai-compat (untested) * remote::nvidia (tested) * remote::ollama (tested) * remote::openai (untested) * remote::passthrough (untested) * remote::sambanova-openai-compat (untested) * remote::together (tested) * remote::together-openai-compat (untested) * remote::vllm (tested) The goal to support this for every inference provider - proxying directly to the provider's OpenAI endpoint for OpenAI-compatible providers. For providers that don't have an OpenAI-compatible API, we'll add a mixin to translate incoming OpenAI requests to Llama Stack inference requests and translate the Llama Stack inference responses to OpenAI responses. This is related to #1817 but is a bit larger in scope than just chat completions, as I have real use-cases that need the older completions API as well. ## Test Plan ### vLLM ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ## Documentation Run a Llama Stack distribution that uses one of the providers mentioned in the list above. Then, use your favorite OpenAI client to send completion or chat completion requests with the base_url set to http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the host and port of your Llama Stack server, if different. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-11 13:14:17 -07:00
Francisco Arceo	24d70cedca	docs: Updated docs to show minimal RAG example and some other minor changes (#1935 ) # What does this PR do? Incorporating some feedback into the docs. - `docs/source/getting_started/index.md`: - Demo actually does RAG now - Simplified the installation command for dependencies. - Updated demo script examples to align with the latest API changes. - Replaced manual document manipulation with `RAGDocument` for clarity and maintainability. - Introduced new logic for model and embedding selection using the Llama Stack Client SDK. - Enhanced examples to showcase proper agent initialization and logging. - `docs/source/getting_started/detailed_tutorial.md`: - Updated the section for listing models to include proper code formatting with `bash`. - Removed and reorganized the "Run the Demos" section for clarity. - Adjusted tab-item structures and added new instructions for demo scripts. - `docs/_static/css/my_theme.css`: - Updated heading styles to include `h2`, `h3`, and `h4` for consistent font weight. - Added a new style for `pre` tags to wrap text and break long words, this is particularly useful for rendering long output from generation. ## Test Plan Tested locally. Screenshot for reference: <img width="1250" alt="Screenshot 2025-04-10 at 10 12 12 PM" src="https://github.com/user-attachments/assets/ce1c8986-e072-4c6f-a697-ed0d8fb75b34" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-11 11:50:36 -07:00
Jash Gulabrai	c1cb6aad11	feat: Add unit tests for NVIDIA safety (#1897 ) # What does this PR do? This PR adds unit tests for the NVIDIA Safety provider implementation. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] 1. Ran `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_safety.py` from the root of the project. Verified tests pass. ``` tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_init_nemo_guardrails Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_init_nemo_guardrails_invalid_temperature Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_register_shield_with_valid_id Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_register_shield_without_id Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_allowed Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_blocked Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_http_error Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_not_found Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED ``` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-11 11:49:55 -07:00
Ben Browning	2a74f0db39	fix: remove extra sft args in NvidiaPostTrainingAdapter (#1939 ) # What does this PR do? The supervised_fine_tune method in NvidiaPostTrainingAdapter had some extra args that aren't part of the post_training protocol, and these extra args were causing FastAPI to throw an error when attempting to stand up an endpoint that used this provider. (Closes #1938) ## Test Plan Before this change, bringing up a stack with the `nvidia` template failed. Afterwards, it passes. I'm testing this like: ``` INFERENCE_MODEL="meta/llama-3.1-8b-instruct" \ llama stack build --template nvidia --image-type venv --run ``` I also ensured the nvidia/test_supervised_fine_tuning.py tests still pass via: ``` python -m pytest \ tests/unit/providers/nvidia/test_supervised_fine_tuning.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-11 10:17:57 -07:00
Ilya Kolchinsky	40f41af2f7	feat: Add a direct (non-agentic) RAG option to the Playground RAG page (#1940 ) # What does this PR do? This PR makes it possible to switch between agentic and non-agentic RAG when running the respective Playground page. When non-agentic RAG is selected, user queries are answered by directly querying the vector DB, augmenting the prompt, and sending the extended prompt to the model via Inference API. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Adjust other configuration parameters if necessary; - Set the radio button to Agent-based RAG; - Send a message to the chat; - The query will be answered by an agent using the knowledge search tool as indicated by the output; - Click the 'Clear Chat' button to make it possible to switch modes; - Send a message to the chat again; - This time, the query will be answered by the model directly as can be deduced from the reply.	2025-04-11 10:16:10 -07:00
Matthew Farrellee	c6fa47db6f	fix: ensure resource registration arguments are typed (#1941 ) # What does this PR do? closes https://github.com/meta-llama/llama-stack/issues/1586 this issue arises when loading an mcp_endpoint from run.yaml. the issue does not manifest for mcp servers added via a running distro server. the existing tests only cover the case of adding to a running server. the code for loading run.yaml strips type information from mcp_endpoint, passing `{"uri": ...}` instead of `URL(uri=...)` along to the resource provider registration. ## Test Plan 1. run an mcp server 2. add an mcp tool config to the dev.py, e.g. ``` diff --git a/llama_stack/templates/dev/dev.py b/llama_stack/templates/dev/dev.py index 69924acb..e0dc7189 100644 --- a/llama_stack/templates/dev/dev.py +++ b/llama_stack/templates/dev/dev.py @@ -6,6 +6,8 @@ from typing import List, Tuple +from llama_stack.apis.common.content_types import URL + from llama_stack.apis.models.models import ModelType from llama_stack.distribution.datatypes import ( ModelInput, @@ -154,6 +156,11 @@ def get_distribution_template() -> DistributionTemplate: toolgroup_id="builtin::code_interpreter", provider_id="code-interpreter", ), + ToolGroupInput( + toolgroup_id="mcp::filesystem", + provider_id="model-context-protocol", + mcp_endpoint=URL(uri="http://localhost:8002/sse"), + ), ] embedding_model = ModelInput( model_id="all-MiniLM-L6-v2", ``` 3. run distro_codegen.py 4. llama stack build --template dev --run before this pr, the `llama stack run` would fail w/ `AttributeError: 'dict' object has no attribute 'uri'`, after it will succeed.	2025-04-11 09:25:57 -07:00
Mark Campbell	6aa459b00c	docs: fix errors in kubernetes deployment guide (#1914 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Fixes a couple of errors in PVC/Secret setup and adds context for expected Hugging Face token [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-04-11 13:04:13 +02:00
ehhuang	2fcb70b789	test(verification): overwrite test result instead of creating new ones (#1934 ) # What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests	2025-04-10 16:59:28 -07:00
ehhuang	a4cc4b7e31	test(verification): add streaming tool calling test (#1933 ) # What does this PR do? ## Test Plan --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1933). * #1934 * __->__ #1933	2025-04-10 16:58:06 -07:00
Francisco Arceo	49955a06b1	docs: Update quickstart page to structure things a little more for the novices (#1873 ) # What does this PR do? Another doc enhancement for https://github.com/meta-llama/llama-stack/issues/1818 Summary of changes: - `docs/source/distributions/configuration.md` - Updated dropdown title to include a more user-friendly description. - `docs/_static/css/my_theme.css` - Added styling for `<h3>` elements to set a normal font weight. - `docs/source/distributions/starting_llama_stack_server.md` - Changed section headers from bold text to proper markdown headers (e.g., `##`). - Improved descriptions for starting Llama Stack server using different methods (library, container, conda, Kubernetes). - Enhanced clarity and structure by converting instructions into markdown headers and improved formatting. - `docs/source/getting_started/index.md` - Major restructuring of the "Quick Start" guide: - Added new introductory section for Llama Stack and its capabilities. - Reorganized steps into clearer subsections with proper markdown headers. - Replaced dropdowns with tabbed content for OS-specific instructions. - Added detailed steps for setting up and running the Llama Stack server and client. - Introduced new sections for running basic inference and building agents. - Enhanced readability and visual structure with emojis, admonitions, and examples. - `docs/source/providers/index.md` - Updated the list of LLM inference providers to include "Ollama." - Expanded the list of vector databases to include "SQLite-Vec." Let me know if you need further details! ## Test Plan Renders locally, included screenshot. # Documentation For https://github.com/meta-llama/llama-stack/issues/1818 <img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM" src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 14:09:00 -07:00
Sébastien Han	edd9aaac3b	fix: use torchao 0.8.0 for inference (#1925 ) # What does this PR do? While building the "experimental-post-training" distribution, we encountered a version conflict between torchao with inference requiring version 0.5.0 and training currently depending on version 0.8.0. Resolves this error: ``` × No solution found when resolving dependencies: ╰─▶ Because you require torchao==0.5.0 and torchao==0.8.0, we can conclude that your requirements are unsatisfiable. ERROR 2025-04-10 10:41:22,597 llama_stack.distribution.build:128 uncategorized: Failed to build target test with return code 1 ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-10 13:39:20 -07:00
Ilya Kolchinsky	79fc81f78f	fix: Playground RAG page errors (#1928 ) # What does this PR do? This PR fixes two issues with the RAG page of the Playground UI: 1. When the user modifies a configurable setting via a widget (e.g., system prompt, temperature, etc.), the agent is not recreated. Thus, the change has no effect and the user gets no indication of that. 2. After the first issue is fixed, it becomes possible to recreate the agent mid-conversation or even mid-generation. To mitigate this, widgets related to agent configuration are now disabled when a conversation is in progress (i.e., when the chat is non-empty). They are automatically enabled again when the user resets the chat history. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Send a message to the agent via the chat; - The widgets in charge of the agent parameters will become disabled at this point; - Send a second message asking the model about the content of the first message; - The reply will indicate that the two messages were sent over the same session, that is, the agent was not recreated; - Click the 'Clear Chat' button; - All widgets will be enabled and a new agent will be created (which can be validated by sending another message).	2025-04-10 13:38:31 -07:00
Francisco Arceo	de6ec5803e	fix: Fix linter failures from #1921 (#1932 ) # What does this PR do? fix: Fix linter failures from #1921 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 10:37:31 -07:00
ehhuang	14146e4b3f	feat(verification): various improvements (#1921 ) # What does this PR do? - provider and their models now live in config.yaml - better distinguish different cases within a test - add model key to surface provider's model_id - include example command to rerun single test case ## Test Plan <img width="1173" alt="image" src="https://github.com/user-attachments/assets/b414baf0-c768-451f-8c3b-c2905cf36fac" />	2025-04-10 10:26:19 -07:00
Francisco Arceo	09a83b1ec1	docs: Updating background color for code in darkmode (#1930 ) # What does this PR do? A small quality of life adjustment to make the code background for darkmode black. Makes it much easier to differentiate between code and non-code text. From: <img width="1250" alt="Screenshot 2025-04-10 at 9 22 23 AM" src="https://github.com/user-attachments/assets/3a3aea8b-e540-4e76-a7db-6c276e389cc2" /> To: <img width="1273" alt="Screenshot 2025-04-10 at 9 22 43 AM" src="https://github.com/user-attachments/assets/6ada2cb1-2c33-4a95-be88-7b4c65d4ba93" /> The CSS was sourced from here: https://github.com/MrDogeBro/sphinx_rtd_dark_mode/blob/main/sphinx_rtd_dark_mode/static/dark_mode_css/dark.css Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 09:38:57 -07:00
Sébastien Han	1f2df59ece	docs: fix model name (#1926 ) # What does this PR do? Use llama3.2:3b for consistency. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-10 09:37:48 -07:00
Yuan Tang	1be66d754e	docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923 ) # What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-10 10:04:17 +02:00
Yuan Tang	712c6758c6	docs: Avoid bash script syntax highlighting for dark mode (#1918 ) See https://github.com/meta-llama/llama-stack/pull/1913#issuecomment-2790153778 Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-09 15:43:43 -07:00
Jiawen Liu	36a31fe5dd	fix: on-the-fly int4 quantize parameter (#1920 ) Mirror to https://github.com/meta-llama/llama-models/pull/324 with some clean up ``` with-proxy pip install -e . export INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct export INFERENCE_CHECKPOINT_DIR=../checkpoints/Llama-4-Scout-17B-16E-Instruct export QUANTIZATION_TYPE=int4_mixed with-proxy llama stack build --run --template meta-reference-gpu ``` # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-04-09 15:00:12 -07:00
Ashwin Bharambe	e2299291c4	fix: Mirror llama4 rope scaling fixes, small model simplify (#1917 ) See: - https://github.com/meta-llama/llama-models/pull/322 - https://github.com/meta-llama/llama-models/pull/320	2025-04-09 11:28:45 -07:00
Sébastien Han	770b38f8b5	chore: simplify running the demo UI (#1907 ) # What does this PR do? * Manage UI deps in pyproject * Use a new "ui" dep group to pull the deps with "uv" * Simplify the run command * Bump versions in requirements.txt Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 11:22:29 -07:00
Francisco Arceo	b93318e40b	chore: Detect browser setting for dark/light mode and set default to light mode (#1913 ) # What does this PR do? 1. Adding some lightweight JS to detect the default browser setting for dark/light mode 3. Setting default screen setting to light mode as to not change default behavior. From the docs: https://github.com/MrDogeBro/sphinx_rtd_dark_mode >This lets you choose which theme the user sees when they load the docs for the first time ever. After the first time however, this setting has no effect as the users preference is stored in local storage within their browser. This option accepts a boolean for the value. If this option is true (the default option), users will start in dark mode when first visiting the site. If this option is false, users will start in light mode when they first visit the site. # Closes #1915 ## Test Plan Tested locally on my Mac on Safari and Chrome. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-09 12:40:56 -04:00
Michael Clifford	5c010e234a	fix: add tavily_search option to playground api (#1909 ) # What does this PR do? This PR adds the "TAVILY_SEARCH_API_KEY" option to the playground to enable the use of the websearch tool. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` export TAVILY_SEARCH_API_KEY=*** streamlit run llama_stack/distribution/ui/app.py ``` Without this change the builtin websearch tool will fail due to missing API key. [//]: # (## Documentation) Related to #1902 Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-09 15:56:41 +02:00
Yuan Tang	692f56068c	docs: Add recent release notes (#1899 ) # What does this PR do? These are missing and changelog doc automation is not working yet due to missing permissions for GitHub Actions: https://dev.to/suzuki0430/how-to-enable-the-allow-github-actions-to-create-and-approve-pull-requests-option-when-its-grayed-out-3e1i --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-09 09:34:41 -04:00
Michael Clifford	9657105304	feat: Add tools page to playground (#1904 ) # What does this PR do? This PR adds an additional page to the playground called "Tools". This page connects to a llama-stack server and lists all the available LLM models, builtin tools and MCP tools in the sidebar. Users can select whatever combination of model and tools they want from the sidebar for their agent. Once the selections are made, users can chat with their agent similarly to the RAG page and test out agent tool use. closes #1902 ## Test Plan Ran the following commands with a llama-stack server and the updated playground worked as expected. ``` export LLAMA_STACK_ENDPOINT="http://localhost:8321" streamlit run llama_stack/distribution/ui/app.py ``` [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-09 15:26:52 +02:00
Jaland	30b49d8dfa	fix: Playground Container Issue (#1868 ) What does this PR do? This PR fixes a build issue with the Containerfile caused by missing requirement `llama-stack`. It updates the Containerfile to include the necessary requirements and upgrades the Python version to ensure successful builds. Test Plan The updated Containerfile has been tested, and the build now completes successfully with the required dependencies included.	2025-04-09 11:45:15 +02:00
Paolo Dettori	22814299b0	fix: solve unregister_toolgroup error (#1608 ) # What does this PR do? Fixes issue #1537 that causes "500 Internal Server Error" when unregistering a toolgroup # (Closes #1537 ) ## Test Plan ```console $ pytest -s -v tests/integration/tool_runtime/test_registration.py --stack-config=ollama --env INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" INFO 2025-03-14 21:15:03,999 tests.integration.conftest:41 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS /opt/homebrew/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ===================================================== test session starts ===================================================== platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /opt/homebrew/opt/python@3.10/bin/python3.10 cachedir: .pytest_cache rootdir: /Users/paolo/Projects/aiplatform/llama-stack configfile: pyproject.toml plugins: asyncio-0.25.3, anyio-4.8.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 1 item tests/integration/tool_runtime/test_registration.py::test_register_and_unregister_toolgroup[None-None-None-None-None] INFO 2025-03-14 21:15:04,478 llama_stack.providers.remote.inference.ollama.ollama:75 inference: checking connectivity to Ollama at `http://localhost:11434`... INFO 2025-03-14 21:15:05,350 llama_stack.providers.remote.inference.ollama.ollama:294 inference: Pulling embedding model `all-minilm:latest` if necessary... INFO: Started server process [78391] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:57424 - "GET /sse HTTP/1.1" 200 OK INFO: 127.0.0.1:57434 - "GET /sse HTTP/1.1" 200 OK INFO 2025-03-14 21:15:16,129 mcp.client.sse:51 uncategorized: Connecting to SSE endpoint: http://localhost:8000/sse INFO: 127.0.0.1:57445 - "GET /sse HTTP/1.1" 200 OK INFO 2025-03-14 21:15:16,146 mcp.client.sse:71 uncategorized: Received endpoint URL: http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b INFO 2025-03-14 21:15:16,147 mcp.client.sse:140 uncategorized: Starting post writer with endpoint URL: http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b INFO: 127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted INFO: 127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted INFO: 127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted INFO 2025-03-14 21:15:16,155 mcp.server.lowlevel.server:535 uncategorized: Processing request of type ListToolsRequest PASSED =============================================== 1 passed, 4 warnings in 12.17s ================================================ ``` --------- Signed-off-by: Paolo Dettori <dettori@us.ibm.com>	2025-04-09 10:56:07 +02:00
Matthew Farrellee	a2cf299906	fix: update getting started guide to use `ollama pull` (#1855 ) # What does this PR do? download the getting started w/ ollama model instead of downloading and running it. directly running it was necessary before https://github.com/meta-llama/llama-stack/pull/1854 ## Test Plan run the code on the page	2025-04-09 10:35:19 +02:00
Matthew Farrellee	3a9be58523	fix: use ollama list to find models (#1854 ) # What does this PR do? closes #1853 ## Test Plan ``` uv run llama stack build --image-type conda --image-name ollama --config llama_stack/templates/ollama/build.yaml ollama pull llama3.2:3b LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/integration/inference/test_text_inference.py -v --text-model=llama3.2:3b ```	2025-04-09 10:34:26 +02:00
Sébastien Han	389767010b	feat: ability to execute external providers (#1672 ) # What does this PR do? Providers that live outside of the llama-stack codebase are now supported. A new property `external_providers_dir` has been added to the main config and can be configured as follow: ``` external_providers_dir: /etc/llama-stack/providers.d/ ``` Where the expected structure is: ``` providers.d/ inference/ custom_ollama.yaml vllm.yaml vector_io/ qdrant.yaml ``` Where `custom_ollama.yaml` is: ``` adapter: adapter_type: custom_ollama pip_packages: ["ollama", "aiohttp"] config_class: llama_stack_ollama_provider.config.OllamaImplConfig module: llama_stack_ollama_provider api_dependencies: [] optional_api_dependencies: [] ``` Obviously the package must be installed on the system, here is the `llama_stack_ollama_provider` example: ``` $ uv pip show llama-stack-ollama-provider Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv Name: llama-stack-ollama-provider Version: 0.1.0 Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider Requires: Required-by: ``` Closes: https://github.com/meta-llama/llama-stack/issues/658 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 10:30:41 +02:00
Ashwin Bharambe	45e210fd0c	fix: llama3 bf16 model load	2025-04-09 01:10:49 -07:00
Ihar Hrachyshka	e3d22d8de7	chore: fix hash for thollander/actions-comment-pull-request (#1900 ) # What does this PR do? Fix hash for v3.0.1 tag for a github action. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-09 10:10:07 +02:00
Ashwin Bharambe	8001c30a4f	fix: meta reference + llama4 tokenizer fix	2025-04-09 00:46:32 -07:00
Sébastien Han	10882bf478	chore: remove unused tempdir in agent (#1896 ) # What does this PR do? The usage of the tempdir was removed in `094eb6a5ae`. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 09:43:48 +02:00
AlexHe99	983f6feeb8	docs: Update remote-vllm.md with AMD GPU vLLM server supported. (#1858 ) Add the content to use AMD GPU as the vLLM server. Split the original part to two sub chapters, 1. AMD vLLM server 2. NVIDIA vLLM server (orignal) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: Alex He <alehe@amd.com>	2025-04-08 21:35:32 -07:00
ehhuang	bcbc56baa2	feat: adds test suite to verify provider's OAI compat endpoints (#1901 ) # What does this PR do? ## Test Plan pytest verifications/openai/test_chat_completion.py --provider together	2025-04-08 21:21:38 -07:00
Sébastien Han	7d9adf22ad	refactor: move missing tests to test directory (#1892 ) Move the test_context.py under the main tests directory, and fix the code. The problem was that the function captures the initial values of the context variables and then restores those same initial values before each iteration. This means that any modifications made to the context variables during iteration are lost when the next iteration starts. Error was: ``` ====================================================== FAILURES ======================================================= ______________________________________ test_preserve_contexts_across_event_loops ______________________________________ @pytest.mark.asyncio async def test_preserve_contexts_across_event_loops(): """ Test that context variables are preserved across event loop boundaries with nested generators. This simulates the real-world scenario where: 1. A new event loop is created for each streaming request 2. The async generator runs inside that loop 3. There are multiple levels of nested generators 4. Context needs to be preserved across these boundaries """ # Create context variables request_id = ContextVar("request_id", default=None) user_id = ContextVar("user_id", default=None) # Set initial values # Results container to verify values across thread boundaries results = [] # Inner-most generator (level 2) async def inner_generator(): # Should have the context from the outer scope yield (1, request_id.get(), user_id.get()) # Modify one context variable user_id.set("user-modified") # Should reflect the modification yield (2, request_id.get(), user_id.get()) # Middle generator (level 1) async def middle_generator(): inner_gen = inner_generator() # Forward the first yield from inner item = await inner_gen.__anext__() yield item # Forward the second yield from inner item = await inner_gen.__anext__() yield item request_id.set("req-modified") # Add our own yield with both modified variables yield (3, request_id.get(), user_id.get()) # Function to run in a separate thread with a new event loop def run_in_new_loop(): # Create a new event loop for this thread loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) try: # Outer generator (runs in the new loop) async def outer_generator(): request_id.set("req-12345") user_id.set("user-6789") # Wrap the middle generator wrapped_gen = preserve_contexts_async_generator(middle_generator(), [request_id, user_id]) # Process all items from the middle generator async for item in wrapped_gen: # Store results for verification results.append(item) # Run the outer generator in the new loop loop.run_until_complete(outer_generator()) finally: loop.close() # Run the generator chain in a separate thread with a new event loop with ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(run_in_new_loop) future.result() # Wait for completion # Verify the results assert len(results) == 3 # First yield should have original values assert results[0] == (1, "req-12345", "user-6789") # Second yield should have modified user_id assert results[1] == (2, "req-12345", "user-modified") # Third yield should have both modified values > assert results[2] == (3, "req-modified", "user-modified") E AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified') E E At index 2 diff: 'user-6789' != 'user-modified' E E Full diff: E ( E 3, E 'req-modified', E - 'user-modified', E + 'user-6789', E ) tests/unit/distribution/test_context.py:155: AssertionError -------------------------------------------------- Captured log call -------------------------------------------------- ERROR asyncio:base_events.py:1758 Task was destroyed but it is pending! task: <Task pending name='Task-7' coro=<<async_generator_athrow without __name__>()>> ================================================== warnings summary =================================================== .venv/lib/python3.10/site-packages/pydantic/fields.py:1042 /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pydantic/fields.py:1042: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/ warn( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =============================================== short test summary info =============================================== FAILED tests/unit/distribution/test_context.py::test_preserve_contexts_across_event_loops - AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified') At index 2 diff: 'user-6789' != 'user-modified' Full diff: ( 3, 'req-modified', - 'user-modified', + 'user-6789', ) ``` [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-08 18:54:00 -07:00
wesley chun	0431a6e90b	docs: colorize Discord badge & add icon in README (#1865 ) Update "chat" badge on README to make it more visible for visitors; changing the look from ![image](https://github.com/user-attachments/assets/630be671-a937-4841-8009-93e8eea1cbe1) ... to ... ![image](https://github.com/user-attachments/assets/cfcb946a-e266-48da-bd50-c994cf1e3a9d)	2025-04-08 14:42:47 -04:00
ehhuang	031a40bec0	fix: type (#1898 ) # What does this PR do? ## Test Plan	2025-04-08 09:07:25 -07:00
Michael Clifford	c6e93e32f6	feat: Updated playground rag to use session id for persistent conversation (#1870 ) # What does this PR do? This PR updates the [playground RAG example](llama_stack/distribution/ui/page/playground/rag.py) so that the agent is able to use its builtin conversation history. Here we are using streamlit's `cache_resource` functionality to prevent the agent from re-initializing after every interaction as well as storing its session_id in the `session_state`. This allows the agent in the RAG example to behave more closely to how it works using the python-client directly. [//]: # (If resolving an issue, uncomment and update the line below) Closes #1869 ## Test Plan Without these changes, if you ask it "What is 2 + 2"? followed by the question "What did I just ask?" It will provide an obviously incorrect answer. With these changes, you can ask the same series of questions and it will provide the correct answer. [//]: # (## Documentation) Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-04-08 09:46:13 +02:00
ehhuang	7b4eb0967e	test: verification on provider's OAI endpoints (#1893 ) # What does this PR do? ## Test Plan export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic; LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference --vision-model $MODEL --text-model $MODEL	2025-04-07 23:06:28 -07:00
Ashwin Bharambe	530d4bdfe1	refactor: move all llama code to models/llama out of meta reference (#1887 ) # What does this PR do? Move around bits. This makes the copies from llama-models _much_ easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident. Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu. ## Test Plan ``` LLAMA_MODELS_DEBUG=1 \ with-proxy llama stack run meta-reference-gpu \ --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \ --env INFERENCE_CHECKPOINT_DIR=<DIR> \ --env MODEL_PARALLEL_SIZE=4 \ --env QUANTIZATION_TYPE=fp8_mixed ``` Start a server with and without quantization. Point integration tests to it using: ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-04-07 15:03:58 -07:00
Matthew Farrellee	c52ccc4bbd	docs: update importing_as_library.md (#1863 ) LlamaStackAsLibraryClient.initialize is not async, cannot be await'd	2025-04-07 12:31:04 +02:00
Francisco Arceo	c1973f6528	docs: Fix typo in README.md (#1880 ) # What does this PR do? Fix typo	2025-04-07 11:58:33 +02:00
Hardik Shah	28e262ecdc	feat: make multi-turn tool call tests work with llama4 (#1886 ) Running full Tool Calling required some updates to work e2e. - Remove `python_start` and `python_end` tags - Tool Call messages and Tool Resposne messages should end with `<\|eom\|>` - System prompt needed updates ``` You are a helpful assisant who can can answer general questions or invoke tools when necessary. In addition to tool calls, you should also augment your responses by using the tool outputs. ``` ### Test Plan - Start server with meta-reference ``` LLAMA_STACK_DISABLE_VERSION_CHECK=1 LLAMA_MODELS_DEBUG=1 INFERENCE_MODEL=meta-llama/$MODEL llama stack run meta-reference-gpu ``` - Added NEW tests with 5 test cases for multi-turn tool calls ``` pytest -s -v --stack-config http://localhost:8321 tests/integration/inference/test_text_inference.py --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` - Also verified all vision and agent tests pass	2025-04-06 19:14:21 -07:00
Ashwin Bharambe	5a31e66a91	fix: update llama-stack-client dependency to fix integration tests	2025-04-06 19:11:05 -07:00
ehhuang	378f0de439	docs: llama4 getting started nb (#1878 ) # What does this PR do? ## Test Plan	2025-04-06 18:51:34 -07:00
Ashwin Bharambe	3f92b2bf85	fix: kill the usage of python_start and python_end tokens	2025-04-05 19:00:26 -07:00
Ashwin Bharambe	3021c87271	fix: bump version to 0.2.1 for bugfix release	2025-04-05 16:05:37 -07:00
raghotham	fd7ab37c14	docs: fixing sphinx imports (#1884 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-04-05 14:21:45 -07:00
Hardik Shah	e2213265bc	docs: Update README.md (#1879 ) to mention GPU requirement	2025-04-05 12:15:55 -07:00
Ashwin Bharambe	b8f1561956	feat: introduce llama4 support (#1877 ) As title says. Details in README, elsewhere.	2025-04-05 11:53:35 -07:00
Francisco Arceo	23a99a4b22	docs: Minor updates to docs to make them a little friendlier to new users (#1871 ) # What does this PR do? This PR modifies some of the docs to help them map to (1) the mental model of software engineers building AI models starting with RAG and then moving to Agents and (2) aligning the navbar somewhat closer to the diagram on the home page. ## Test Plan N/A Tested locally. # Documentation Take a look at the screen shot for below and after. ## Before ![Screenshot 2025-04-03 at 10 39 32 PM](https://github.com/user-attachments/assets/c4dc9998-3e46-43b0-8425-892c94ec3a6a) ## After ![Screenshot 2025-04-03 at 10 38 37 PM](https://github.com/user-attachments/assets/05670fcd-e56b-42dd-8af2-07b81f941d40) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-04 08:10:35 -04:00
Ihar Hrachyshka	66d6c2580e	chore: more mypy checks (ollama, vllm, ...) (#1777 ) # What does this PR do? - chore: mypy for strong_typing - chore: mypy for remote::vllm - chore: mypy for remote::ollama - chore: mypy for providers.datatype --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-01 17:12:39 +02:00
Ihar Hrachyshka	d5e0f32485	ci: pin github actions to hashes (#1776 ) # What does this PR do? Let dependabot move them with PRs (and human oversight). Fixes #1775 Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-01 17:09:39 +02:00
Francisco Arceo	19f504e9e2	docs: Updating docs to source from CONTRIBUTING.md (#1850 ) # What does this PR do? Another for https://github.com/meta-llama/llama-stack/issues/1815 This links the `CONTRIBUTING.md` file directly so that we don't have to maintain two different files. Also I updated the title for RAG under Building AI Applications. ## Changes Look of what the Contributing page looks like, proof it sources directly from the markdown file. ![Screenshot 2025-04-01 at 12 43 51 AM](https://github.com/user-attachments/assets/f7021d29-eec3-44ad-a5b3-55c4480ea9ac) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-01 14:50:04 +02:00
Rashmi Pawar	c169c164b3	fix: NVIDIA embedding results in InternalServerError (#1851 ) Closes #1819 ## Test Plan ```bash pytest -v tests/integration/inference/test_embedding.py --stack-config=http://localhost:5002 --embedding-model=nvidia/llama-3.2-nv-embedqa-1b-v2 =============================================================================== test session starts ================================================================================ platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 -- /home/ubuntu/miniconda/envs/nvidia-1/bin/python cachedir: .pytest_cache rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0 collected 23 items tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[string]] PASSED [ 4%] tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[text]] PASSED [ 8%] tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,base64]] XFAIL (nvidia/llama-3.2-nv-embedqa-1b-v2 doe...) [ 13%] tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,string,base64,text]] XFAIL (nvidia/llama-3.2-nv-embed...) [ 17%] tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-end] PASSED [ 21%] tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-start] PASSED [ 26%] tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-end] PASSED [ 30%] tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-start] PASSED [ 34%] tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-None] PASSED [ 39%] tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-none] PASSED [ 43%] tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-None] PASSED [ 47%] tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-none] PASSED [ 52%] tests/integration/inference/test_embedding.py::test_embedding_output_dimension[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED [ 56%] tests/integration/inference/test_embedding.py::test_embedding_task_type[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED [ 60%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-None] PASSED [ 65%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-none] PASSED [ 69%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-end] PASSED [ 73%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-start] PASSED [ 78%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-NONE] PASSED [ 82%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-END] PASSED [ 86%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-START] PASSED [ 91%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-left] PASSED [ 95%] tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-right] PASSED [100%] ===================================================================== 21 passed, 2 xfailed, 1 warning in 7.18s ===================================================================== ``` [//]: # (## Documentation) cc: @dglogo @mattf @sumitb	2025-04-01 13:31:29 +02:00
Ihar Hrachyshka	0a895c70d1	fix(api): don't return list for runtime tools (#1686 ) # What does this PR do? Don't return list for runtime tools. Instead return Response object for pagination and consistency with other APIs. --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-01 09:53:11 +02:00
Ashwin Bharambe	b440a1dc42	test: make sure integration tests runs against the server (#1743 ) Previously, the integration tests started the server, but never really used it because `--stack-config=ollama` uses the ollama template and the inline "llama stack as library" client, not the HTTP client. This PR makes sure we test it both ways. We also add agents tests to the mix. ## Test Plan GitHub --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-03-31 22:38:47 +02:00
Sébastien Han	2ffa2b77ed	refactor: extract pagination logic into shared helper function (#1770 ) # What does this PR do? Move pagination logic from LocalFS and HuggingFace implementations into a common helper function to ensure consistent pagination behavior across providers. This reduces code duplication and centralizes pagination logic in one place. ## Test Plan Run this script: ``` from llama_stack_client import LlamaStackClient # Initialize the client client = LlamaStackClient(base_url="http://localhost:8321") # Register a dataset response = client.datasets.register( purpose="eval/messages-answer", # or "eval/question-answer" or "post-training/messages" source={"type": "uri", "uri": "huggingface://datasets/llamastack/simpleqa?split=train"}, dataset_id="my_dataset", # optional, will be auto-generated if not provided metadata={"description": "My evaluation dataset"}, # optional ) # Verify the dataset was registered by listing all datasets datasets = client.datasets.list() print(f"Registered datasets: {[d.identifier for d in datasets]}") # You can then access the data using the datasetio API # rows = client.datasets.iterrows(dataset_id="my_dataset", start_index=1, limit=2) rows = client.datasets.iterrows(dataset_id="my_dataset") print(f"Data: {rows.data}") ``` And play with `start_index` and `limit`. [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-31 13:08:29 -07:00
Francisco Arceo	d495922949	docs: Updated documentation and Sphinx configuration (#1845 ) # What does this PR do? The goal of this PR is to make the pages easier to navigate by surfacing the child pages on the navbar, updating some of the copy, moving some of the files around. Some changes: 1. Clarifying Titles 2. Restructuring "Distributions" more formally in its own page to be consistent with Providers and adding some clarity to the child pages to surface them and make them easier to navigate 3. Updated sphinx config to not collapse navigation by default 4. Updated copyright year to be calculated dynamically 5. Moved `docs/source/distributions/index.md` -> `docs/source/distributions/starting_llama_stack_server.md` Another for https://github.com/meta-llama/llama-stack/issues/1815 ## Test Plan Tested locally and pages build (screen shots for example). ## Documentation ### Before: ![Screenshot 2025-03-31 at 1 09 21 PM](https://github.com/user-attachments/assets/98e34f76-f0d9-4055-8e2c-441b1e7d8f6a) ### After: ![Screenshot 2025-03-31 at 1 08 52 PM](https://github.com/user-attachments/assets/dfb6b8ad-3a1d-46b6-8f54-0c553664093f) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 13:08:05 -07:00
Francisco Arceo	60430da48a	docs: Update readme for integration tests (#1846 ) # What does this PR do? Update README for integration tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 22:00:02 +02:00
Francisco Arceo	9b478f3756	docs: Adding darkmode to documentation (#1843 ) # What does this PR do? docs: Adding darkmode to documentation ## Test Plan Tested locally. Here's the look: ![Screenshot 2025-03-31 at 9 43 05 AM](https://github.com/user-attachments/assets/5989dbc8-ba03-4710-ad8d-6d4b9ac79786) ## Issues Related to https://github.com/meta-llama/llama-stack/issues/1815 Closes https://github.com/meta-llama/llama-stack/issues/1844 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 08:31:53 -07:00
Yuan Tang	7e51a83eac	docs: Add link to integration tests instructions and minor clarification (#1838 ) # What does this PR do? * Added `--text-model` in example command. * Added link to integration tests instruction and a note on specifying models. This is to avoid confusion when all tests are skipped because no model is provided. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-31 11:37:42 +02:00
Xi Yan	90efafafb7	chore: change context to content for agent (#1840 )	2025-03-30 10:33:58 -07:00
ehhuang	3a2314dcef	fix(telemetry): library client does not log span (#1833 )	2025-03-29 14:55:31 -07:00
Anamika	d8a8a734b5	fix: update sink name for traces and metrics in LlamaStack 0.1.8 (#1836 ) # What does this PR do? This PR updates the sink name configuration for traces and metrics in LlamaStack to align with the latest changes introduced in version 0.1.8. Previously, when using the `otel` sink along with other sinks (like `console` and `sqlite`), the system threw a ValueError, with the message: ```shell Value error, 'otel' is not a valid TelemetrySink [type=value_error, input_value='console,otel,sqlite', input_type=str] For further information visit https://errors.pydantic.dev/2.10/v/value_error ``` ## Test Plan - Test 1: Ran the LlamaStack server with a configuration containing `console,otel,sqlite` as sinks. - Expected result: No errors related to invalid sink names. - Result: The system ran without throwing a `ValueError`. - Test 2: Verified that the `otel_trace`, `otel_metric` sink now works in combination with other sinks (`console`, `sqlite`). - Expected result: Telemetry data is correctly sent to all specified sinks without errors. - Result: All telemetry data was successfully sent to the specified sinks.	2025-03-29 10:09:08 -07:00
Matthew Farrellee	a4c086cee0	fix: skip apis with no providers during `llama stack build` (#1835 ) # What does this PR do? closes #1834 ## Test Plan `llama stack build` successfully	2025-03-29 08:39:35 -07:00
ehhuang	a182705ade	fix(telemetry): query_spans (#1831 ) # What does this PR do? https://github.com/meta-llama/llama-stack/pull/1828 removed __root_span__ attribute which is still needed ## Test Plan added telemetry integration test LLAMA_STACK_CONFIG=http://localhost:5001 pytest -s -v tests/integration/telemetry --safety-shield meta-llama/Llama-Guard-3-8B --text-model accounts/fireworks/models/llama-v3p3-70b-instruct	2025-03-28 20:58:17 -07:00
Francisco Arceo	74a2584cdb	chore: Updating Milvus Client calls to be non-blocking (#1830 ) # What does this PR do? This PR converts blocking Milvus Client calls to non-blocking. Another one for https://github.com/meta-llama/llama-stack/issues/1489 ## Test Plan I ran the integration tests from https://github.com/meta-llama/llama-stack/pull/1467 with: ```python pytest -s -v tests/integration/vector_io/test_vector_io.py \ --stack-config inference=sentence-transformers,vector_io=inline::milvus \ --embedding-model all-miniLM-L6-V2 --env MILVUS_DB_PATH=/tmp/moo.db INFO 2025-03-28 21:35:22,726 tests.integration.conftest:41 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS /Users/farceo/dev/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) =============================================================================================================================================================================================================================================================== test session starts =============================================================================================================================================================================================================================================================== platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/farceo/dev/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'cov': '6.0.0', 'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/farceo/dev/llama-stack configfile: pyproject.toml plugins: cov-6.0.0, html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 7 items tests/integration/vector_io/test_vector_io.py::test_vector_db_retrieve[emb=all-miniLM-L6-V2] PASSED tests/integration/vector_io/test_vector_io.py::test_vector_db_register[emb=all-miniLM-L6-V2] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case0] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case1] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case2] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case3] PASSED tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case4] PASSED ========================================================================================================================================================================================================================================================= 7 passed, 2 warnings in 40.33s ========================================================================================================================================================================================================================================================== ``` [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 22:14:07 -04:00
github-actions[bot]	daa34909a0	build: Bump version to 0.1.9	2025-03-29 00:22:35 +00:00
github-actions[bot]	b7ab1a9710	build: Bump version to 0.1.19	2025-03-29 00:18:38 +00:00
ehhuang	e58c7f6c37	fix(telemetry): root span not yet received (#1828 ) # What does this PR do? closes #1725 In https://github.com/meta-llama/llama-stack/pull/1759's attempt to make trace_id consistent in llama stack and otel exports, it incorrectly sets the span_id in context, which causes the root span to have a parent ID, leading to the issue in #1725. This PR reverts #1759's change to set the parent context. We will need to follow up with a proper way to do this. ## Test Plan <img width="1868" alt="image" src="https://github.com/user-attachments/assets/15e9ac18-8541-461d-b261-c4e124388cc3" />	2025-03-28 14:40:17 -07:00
Xi Yan	7e7bea66ba	fix: skip code interp (#1827 ) # What does this PR do? - this is a flaky test dependent on model output [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan <img width="853" alt="image" src="https://github.com/user-attachments/assets/e7607877-22a9-48e3-adac-e991d1070ec0" /> [//]: # (## Documentation)	2025-03-28 12:58:08 -07:00
Francisco Arceo	af6594f670	fix: Adding chunk_size_in_tokens to playground rag_tool insert (#1826 ) # What does this PR do? Adding chunk_size_in_tokens to playground rag_tool insert. # Closes #1825 ## Test Plan Tested locally. [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 15:56:25 -04:00
Francisco Arceo	37b6da37ba	docs: Document sqlite-vec faiss comparison (#1821 ) # What does this PR do? This PR documents and benchmarks the performance tradeoffs between sqlite-vec and FAISS inline VectorDB providers. # Closes https://github.com/meta-llama/llama-stack/issues/1165 ## Test Plan The test was run using this script: <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python import cProfile import os import uuid import time import random import string import matplotlib.pyplot as plt import pandas as pd from termcolor import cprint from llama_stack_client.types import Document from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from memory_profiler import profile from line_profiler import LineProfiler os.environ["INFERENCE_MODEL"] = "llama3.2:3b-instruct-fp16" os.environ["LLAMA_STACK_CONFIG"] = "ollama" def generate_random_chars(count=400): return ''.join(random.choices(string.ascii_letters, k=count)) def generate_documents(num_docs: int, num_chars: int): documents = [ Document( document_id=f"doc-{i}", content=f"Document content for document {i} - {generate_random_chars(count=num_chars)}", mime_type="text/plain", metadata={}, ) for i in range(num_docs) ] return documents @profile def benchmark_write(client, vector_db_id, documents, batch_size=100): write_times = [] for i in range(0, len(documents), batch_size): batch = documents[i:i + batch_size] start_time = time.time() client.tool_runtime.rag_tool.insert( documents=batch, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) end_time = time.time() write_times.append(end_time - start_time) return write_times @profile def benchmark_read(client, provider_id, vector_db_id, user_prompts): response_times = [] for prompt in user_prompts: start_time = time.time() response = client.vector_io.query( vector_db_id=vector_db_id, query=prompt, ) end_time = time.time() response_times.append(end_time - start_time) return response_times def profile_functions(): profiler = LineProfiler() profiler.add_function(benchmark_write) profiler.add_function(benchmark_read) return profiler def plot_results(output, batch_size): # Create a DataFrame for easy manipulation df_sqlite = pd.DataFrame(output['sqlite-vec']) df_faiss = pd.DataFrame(output['faiss']) df_sqlite['write_times'] = 1000 df_faiss['write_times'] = 1000 avg_write_sqlite = df_sqlite['write_times'].mean() avg_write_faiss = df_faiss['write_times'].mean() avg_read_sqlite = df_sqlite['read_times'].mean() avg_read_faiss = df_faiss['read_times'].mean() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['write_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Write Times') plt.hist(df_faiss['write_times'], bins=10, alpha=0.5, color='red', label='faiss Write Times') plt.axvline(avg_write_sqlite, color='blue', linestyle='--', label=f'Average Write Time (sqlite-vec): {avg_write_sqlite:.3f} ms') plt.axvline(avg_write_faiss, color='red', linestyle='--', label=f'Average Write Time (faiss): {avg_write_faiss:.3f} ms') plt.title(f'Histogram of Write Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]} with batch size = {batch_size}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('write_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times') plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times') plt.axvline(avg_read_sqlite, color='blue', linestyle='--', label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms') plt.axvline(avg_read_faiss, color='red', linestyle='--', label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms') plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('read_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times') plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times') plt.axvline(avg_read_sqlite, color='blue', linestyle='--', label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms') plt.axvline(avg_read_faiss, color='red', linestyle='--', label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms') plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('read_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.plot(df_sqlite.index, df_sqlite['write_times'], marker='o', markersize=4, linestyle='-', color='blue', label='sqlite-vec Write Times') plt.plot(df_faiss.index, df_faiss['write_times'], marker='x', markersize=4, linestyle='-', color='red', label='faiss Write Times') plt.title(f'Write Times by Operation Sequence\n(batch size = {batch_size})') plt.xlabel('Write Operation Sequence') plt.ylabel('Time (milliseconds)') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.tight_layout() plt.savefig('write_time_sequence.png') plt.close() # Print out the summary table print("\nPerformance Summary for sqlite-vec:") print(df_sqlite) # Print out the summary table print("\nPerformance Summary for faiss:") print(df_faiss) def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() # Generate a large dataset num_chars = 50 num_docs = 100 num_writes = 100 write_batch_size = 100 num_reads = 100 documents = generate_documents(num_docs * write_batch_size, num_chars) user_prompts = [ f"Tell me about document {i}" for i in range(1, num_reads + 1) ] providers = ["sqlite-vec", "faiss"] output = { provider_id: {"write_times": None, "read_times": None} for provider_id in providers } # Benchmark writes and reads for SQLite and Faiss for provider_id in providers: cprint(f"Benchmarking provider: {provider_id}", "yellow") client.vector_dbs.register( provider_id=provider_id, vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) write_times = benchmark_write(client, vector_db_id, documents, write_batch_size) average_write_time_ms = sum(write_times) / len(write_times) * 1000. cprint(f"Average write time for {provider_id} is {average_write_time_ms:.2f} milliseconds for {num_writes} runs", "blue") cprint(f"Benchmarking reads for provider: {provider_id}", "yellow") read_times = benchmark_read(client, provider_id, vector_db_id, user_prompts) average_read_time_ms = sum(read_times) / len(read_times) * 1000. cprint(f"Average read time for {provider_id} is {average_read_time_ms:.2f} milliseconds for {num_reads} runs", "blue") client.vector_dbs.unregister(vector_db_id=vector_db_id) output[provider_id]['write_times'] = write_times output[provider_id]['read_times'] = read_times # Generate plots and summary plot_results(output, write_batch_size) if __name__ == "__main__": cProfile.run('main()', 'profile_output.prof') ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 17:41:33 +01:00
Sébastien Han	a4f458e1c1	ci: add myself to CODEOWNERS (#1823 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-28 09:37:42 -07:00
Ihar Hrachyshka	18bac27d4e	fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555 ) # What does this PR do? This is the second attempt to switch to system packages by default. Now with a hack to detect conda environment - in which case conda image-type is used. Note: Conda will only be used when --image-name is unset and CONDA_DEFAULT_ENV is set. This means that users without conda will correctly fall back to using system packages when no --image-* arguments are passed at all. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Uses virtualenv: ``` $ llama stack build --template ollama --image-type venv $ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml [...] Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local [...] ``` Uses system packages (virtualenv already initialized): ``` $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages. [...] ``` Attempt to run from environment packages without necessary packages installed: ``` $ python -m venv barebones $ . ./barebones/bin/activate $ pip install -e . # to install llama command $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] ModuleNotFoundError: No module named 'fastapi' ``` ^ failed as expected because the environment doesn't have necessary packages installed. Now install some packages in the new environment: ``` $ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Now see if setting CONDA_DEFAULT_ENV will change what happens by default: ``` $ export CONDA_DEFAULT_ENV=base $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Using conda environment: base Conda environment base does not exist. [...] ``` --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-27 17:13:22 -04:00
Xi Yan	b5c27f77ad	chore: clean up distro doc (#1804 ) # What does this PR do? - hide distro doc (docker needs to be thoroughly tested). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - docs [//]: # (## Documentation)	2025-03-27 12:12:14 -07:00
Ihar Hrachyshka	81393afb35	chore: require `data` field for all List*Response models (#1799 ) # What does this PR do? No violators are currently in-tree. This is just hardening the api specs for future consistency. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-27 18:15:16 +01:00
Dmitry Rogozhkin	935e706b15	docs: fix remote-vllm instructions (#1805 ) # What does this PR do? * Fix location of `run.yaml` relative to the cloned llama stack repository * Drop `-it` from `docker run` commands as its not needed running services ## Test Plan * Verified running the llama stack following updated instruction CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-03-27 10:19:51 -04:00
Antonin Stefanutti	9d9ab7e7dd	chore: Remove style tags from log formatter (#1808 ) # What does this PR do? Set a formatter for log file handler that does not pollute log messages with color tags. ## Test Plan Successfully tested with `LLAMA_STACK_LOG_FILE=server.log llama stack run ...`	2025-03-27 10:18:21 -04:00
Sébastien Han	e3578b1c1b	chore: remove distributions dir (#1809 ) # What does this PR do? Followup on https://github.com/meta-llama/llama-stack/pull/1801. Move the deps files to llama_stack/templates. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-27 09:03:39 -04:00
Sébastien Han	626313b4c8	fix: resolve precommit error (#1810 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-27 08:16:00 -04:00
Xi Yan	cfd30d2ad5	fix: update agents test (#1796 ) # What does this PR do? - we no longer query vector db when uploading documents as attachments [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest --stack-config="http://localhost:8321" -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct ``` ``` pytest --stack-config=fireworks -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct --record-responses ``` <img width="1160" alt="image" src="https://github.com/user-attachments/assets/90700f79-c002-4474-bb41-7bc0a39dc91c" /> [//]: # (## Documentation)	2025-03-26 22:00:43 -07:00
Ihar Hrachyshka	193e531216	chore: re-enable isort enforcement (#1802 ) # What does this PR do? Re-enable isort enforcement. It was disabled in `1a73f8305b`, probably by mistake. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-26 15:22:17 -07:00
Xi Yan	742020b94a	chore: remove distributions folder (#1801 ) # What does this PR do? - the distribution folder is referencing template, and have dead docker compose scripts [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [//]: # (## Documentation)	2025-03-26 15:07:54 -07:00
Hardik Shah	f8445b0d69	fix: update mcp commands in getting_started.ipynb (#1800 ) as titled	2025-03-26 14:47:32 -07:00
Hardik Shah	e8d5959048	fix: update getting_started.ipynb (#1797 ) using simple `pip install llama-stack-client`	2025-03-26 12:54:21 -07:00
Hardik Shah	cb2a9784ab	fix: multiple issues with getting_started notebook (#1795 ) Fixes multiple issues 1. llama stack build of dependencies was breaking with incompatible numpy / pandas when importing datasets Moved the notebook to start a local server instead of using library as a client. This way the setup is cleaner since its all contained and by using `uv run --with` we can test both the server setup process too in CI and release time. 2. The change to [1] surfaced some other issues - running `llama stack run` was defaulting to conda env name - provider data was not being managed properly - Some notebook cells (telemetry for evals) were not updated with latest changes Fixed all the issues and update the notebook. ### Test 1. Manually run it all in local env 2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`	2025-03-26 10:59:12 -07:00
Yuan Tang	bdfe7fee92	docs: Add more env vars in dotenv instructions (#1791 ) # What does this PR do? Added more hint on `LLAMA_STACK_CONFIG` and API keys necessary for agent tests.	2025-03-25 20:03:21 -07:00
Ihar Hrachyshka	367c08f01e	feat(api): don't return a payload on file delete (#1640 ) # What does this PR do? This is to stay consistent with other APIs. This change registers files in API, even though there are still no providers. Removing tests that require a provider existing for a merged API to enable it in API layer. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-25 17:12:36 -07:00
Xi Yan	65d5d0d1bf	fix: fix imports for mcp registration in notebook (#1787 ) # What does this PR do? - as title [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan notebook [//]: # (## Documentation)	2025-03-25 16:06:03 -07:00
Ihar Hrachyshka	c8f740353b	chore: enable mypy pydantic plugin (#1788 ) # What does this PR do? Enable mypy pydantic plugin. Since the project heavily relies on pydantic models, it's probably wise to enable the plugin to avoid some potential spurious violation warnings the further we expand mypy coverage for the code base. It should be generally risk-free to enable the plugin for the repo. Some info on what plugin brings to the table: https://docs.pydantic.dev/latest/integrations/mypy/#mypy-plugin-capabilities Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-25 15:49:29 -07:00
ehhuang	2f38851751	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 ) Reverts meta-llama/llama-stack#1755 closes #1781	2025-03-25 14:42:05 -07:00
Yuan Tang	77ad120403	docs: Add changelog for v0.1.7 and v0.1.8 (#1780 ) # What does this PR do? This updates the changelog manually for now until we fix the changelog workflow that requires change in repo settings (see [my comment in Discord](`1354127000`)). --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-25 14:40:55 -04:00
Rashmi Pawar	1a73f8305b	feat: Add nemo customizer (#1448 ) # What does this PR do? This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack post-training module. The integration enables users to fine-tune models using NVIDIA's cloud-based customization service through a consistent Llama Stack interface. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Yet to be done Things pending under this PR: - [x] Integration of fine-tuned model(new checkpoint) for inference with nvidia llm distribution - [x] distribution integration of API - [x] Add test cases for customizer(In Progress) - [x] Documentation ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py ============================================================================================================================================================================ test session starts ============================================================================================================================================================================= platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}} rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED [ 50%] tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED [100%] ======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ======================================================================================================================================================================== ``` cc: @mattf @dglogo @sumitb --------- Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>	2025-03-25 11:01:10 -07:00
Daniele Martinoli	ba14552a32	fix: Misleading code in Llama Stack Benchmark Evals notebook (#1774 ) # What does this PR do? Closes #1773 Signed-off-by: Daniele Martinoli <dmartino@redhat.com>	2025-03-25 07:04:47 -07:00
Yuan Tang	441016bee8	feat: Support "stop" parameter in remote:vLLM (#1715 ) # What does this PR do? This adds support for "stop" parameter: https://platform.openai.com/docs/api-reference/completions/create#completions-create-stop ## Test Plan ``` tests/integration/inference/test_text_inference.py::test_text_completion_non_streaming[txt=8B-inference:completion:sanity] PASSED [ 5%] tests/integration/inference/test_text_inference.py::test_text_completion_streaming[txt=8B-inference:completion:sanity] PASSED [ 11%] tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] PASSED [ 16%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_non_streaming[txt=8B-inference:completion:log_probs] PASSED [ 22%] tests/integration/inference/test_text_inference.py::test_text_completion_log_probs_streaming[txt=8B-inference:completion:log_probs] PASSED [ 27%] tests/integration/inference/test_text_inference.py::test_text_completion_structured_output[txt=8B-inference:completion:structured_output] PASSED [ 33%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=8B-inference:chat_completion:non_streaming_01] PASSED [ 38%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=8B-inference:chat_completion:non_streaming_02] PASSED [ 44%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling[txt=8B-inference:chat_completion:ttft] ^TPASSED [ 50%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=8B-inference:chat_completion:streaming_01] PASSED [ 55%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_streaming[txt=8B-inference:chat_completion:streaming_02] PASSED [ 61%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[txt=8B-inference:chat_completion:tool_calling] PASSED [ 66%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[txt=8B-inference:chat_completion:tool_calling] PASSED [ 72%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[txt=8B-inference:chat_completion:tool_calling] PASSED [ 77%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[txt=8B-inference:chat_completion:tool_calling] PASSED [ 83%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output[txt=8B-inference:chat_completion:structured_output] PASSED [ 88%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=8B-inference:chat_completion:tool_calling_tools_absent-True] PASSED [ 94%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=8B-inference:chat_completion:tool_calling_tools_absent-False] PASSED [100%] =============================================================== 18 passed, 3 warnings in 755.79s (0:12:35) =============================================================== ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-24 12:42:55 -07:00
Yuan Tang	9ff82036f7	docs: Simplify vLLM deployment in K8s deployment guide (#1655 ) # What does this PR do? * Removes the use of `huggingface-cli` * Simplifies HF cache mount path * Simplifies vLLM server startup command * Separates PVC/secret creation from deployment/service * Fixes a typo: "pod" should be "deployment" Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-24 09:08:50 -07:00
Francisco Arceo	9e1ddf2b53	chore: Updating sqlite-vec to make non-blocking calls (#1762 ) # What does this PR do? This PR updates the sqlite-vec database calls to be non-blocking. Note that each operation creates a new connection, which incurs some performance overhead but is reasonable given [SQLite's threading and connections constraints](https://www.sqlite.org/threadsafe.html). Summary of changes: - Refactored `SQLiteVecIndex` class to store database path instead of connection object - Added `_create_sqlite_connection()` helper function to create connections on demand - Ensured proper connection closure in all database operations - Fixed test fixtures to use a file-based SQLite database for thread-safety - Updated the `SQLiteVecVectorIOAdapter` class to handle per-operation connections This PR helps chip away at https://github.com/meta-llama/llama-stack/issues/1489 ## Test Plan sqlite-vec unit tests passed locally as well as a test script using the client as a library. ## Misc FYI @varshaprasad96 @kevincogan Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-23 17:25:44 -07:00
Xi Yan	094eb6a5ae	feat(rag): entire document context with attachments (#1763 ) # What does this PR do? What Instead of adhoc creating a vectordb and chunking when documents ae sent as an attachment to agent turn, we directly pass raw text from document into messages to model for user context, and let model perform summarization directly. This removes the magic behaviour, and yields better performance than existing approach. Improved Performance - RAG lifecycle notebook - Model: 0.3 factuality score - (+ websearch) Agent: 0.44 factuality score - (+ vector db) Agent: 0.3 factuality score - (+ raw context) Agent: 0.6 factuality score Closes https://github.com/meta-llama/llama-stack/issues/1478 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - [NEW] added section in RAG lifecycle notebook shows better performance <img width="840" alt="image" src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5" /> [//]: # (## Documentation)	2025-03-23 16:57:48 -07:00
Ashwin Bharambe	8c351fe432	build: Bump version to 0.1.8	2025-03-23 16:01:10 -07:00
Ashwin Bharambe	b1513e66d5	fix: sleep after notebook test	2025-03-23 14:03:35 -07:00
ehhuang	39e094736f	chore: make mypy happy with webmethod (#1758 ) # What does this PR do? Gets rid of errors like the below, which is on all webmethod decorated functions llama_stack/apis/agents/agents.py:398: error: Value of type variable "T" of function cannot be "Callable[[Agents, AgentConfig], Coroutine[Any, Any, AgentCreateResponse]]" [type-var] ## Test Plan Run mypy and observes mypy errors gone	2025-03-22 08:17:23 -07:00
ehhuang	06788643b3	feat(telemetry): clean up spans (#1760 )	2025-03-21 20:05:11 -07:00
Hardik Shah	e4de9e59fd	fix: Update getting_started.ipynb (#1761 ) as titled	2025-03-21 17:10:10 -07:00
Dinesh Yeduguru	5eb15684b4	feat: use same trace ids in stack and otel (#1759 ) # What does this PR do? 1) Uses otel compatible id generation for stack 2) Stack starts returning trace id info in the header of response 3) We inject the same trace id that we have into otel in order to force it to use our trace ids. ## Test Plan ``` curl -i --request POST \ --url http://localhost:8321/v1/inference/chat-completion \ --header 'content-type: application/json' \ --data '{ "model_id": "meta-llama/Llama-3.1-70B-Instruct", "messages": [ { "role": "user", "content": { "type": "text", "text": "where do humans live" } } ], "stream": false }' HTTP/1.1 200 OK date: Fri, 21 Mar 2025 21:51:19 GMT server: uvicorn content-length: 1712 content-type: application/json x-trace-id: 595101ede31ece116ebe35b26d67e8cf {"metrics":[{"metric":"prompt_tokens","value":10,"unit":null},{"metric":"completion_tokens","value":320,"unit":null},{"metric":"total_tokens","value":330,"unit":null}],"completion_message":{"role":"assistant","content":"Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. Continents: Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. Countries: There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. Cities and towns: Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. Rural areas: Some humans live in rural areas, such as villages, farms, and countryside.\n5. Islands: Humans inhabit many islands around the world, including tropical islands, island nations, and islands in the Arctic and Antarctic regions.\n6. Underwater habitats: A few humans live in underwater habitats, such as research stations and submarines.\n7. Space: A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nIn terms of specific environments, humans live in a wide range of ecosystems, including:\n\n* Deserts\n* Forests\n* Grasslands\n* Mountains\n* Oceans\n* Rivers\n* Tundras\n* Wetlands\n\nOverall, humans are incredibly adaptable and can be found living in almost every corner of the globe.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null} ``` Same trace id in Jaeger and sqlite: ![Screenshot 2025-03-21 at 2 51 53 PM](https://github.com/user-attachments/assets/38cc04b0-568c-4b9d-bccd-d3b90e581c27) ![Screenshot 2025-03-21 at 2 52 38 PM](https://github.com/user-attachments/assets/722383ad-6305-4020-8a1c-6cfdf381c25f)	2025-03-21 15:41:26 -07:00
ehhuang	b9fbfed216	chore(telemetry): remove service_name entirely (#1755 ) # What does this PR do? ## Test Plan LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py::test_custom_tool --safety-shield meta-llama/Llama-Guard-3-8B --text-model accounts/fireworks/models/llama-v3p1-8b-instruct and verify trace in jaeger UI https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#	2025-03-21 15:11:56 -07:00
Xi Yan	baf68c665c	fix: fix jobs api literal return type (#1757 ) # What does this PR do? - We cannot directly return a literal type > Note: this is not final jobs API change [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan <img width="837" alt="image" src="https://github.com/user-attachments/assets/18a17561-35f9-443d-987d-54afdd6ff40c" /> [//]: # (## Documentation)	2025-03-21 14:04:21 -07:00
Ashwin Bharambe	d6887f46c6	fix: a couple of tests were broken and not yet exercised by our per-PR test workflow	2025-03-21 12:12:14 -07:00
ehhuang	34f89bfbd6	feat(telemetry): use zero-width space to avoid clutter (#1754 ) # What does this PR do? Before <img width="858" alt="image" src="https://github.com/user-attachments/assets/6cefb1ae-5603-4818-85ea-a0c337b986bc" /> Note the redundant 'llama-stack' in front of every span ## Test Plan <img width="1171" alt="image" src="https://github.com/user-attachments/assets/bdc5fd5b-ff1f-4f10-8b40-cff2ea93dd1f" />	2025-03-21 12:02:10 -07:00
Mark Campbell	711cfa00fc	docs: fix typos in evaluation concepts (#1745 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Typo fix for `output_dir` flag and misspelling of aggregate [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] N/A [//]: # (## Documentation)	2025-03-21 12:00:53 -07:00
Sébastien Han	4c14bb7510	docs: fix change dir command (#1752 ) # What does this PR do? We are already in the llama-stack git directory. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-21 12:00:09 -07:00
Ashwin Bharambe	cb7b9dda6c	fix: compare timezones correctly in download script	2025-03-21 11:46:57 -07:00
ehhuang	f76550ce4e	feat(telemetry): normalize path (#1739 ) # What does this PR do? This will prevent 'operations' from being flooded <img width="401" alt="image" src="https://github.com/user-attachments/assets/c95e0eeb-4a10-4003-88df-9bb6d0a548cd" /> Before <img width="1049" alt="image" src="https://github.com/user-attachments/assets/157fb614-e007-4cb3-a571-226e50525bfa" /> ## Test Plan After <img width="811" alt="image" src="https://github.com/user-attachments/assets/b2b10344-1d73-44e5-abee-a9f039090963" />	2025-03-21 10:17:43 -07:00
Sébastien Han	636d97207f	docs: propose new contribution guidance (#1750 ) # What does this PR do? Propose new contribution guidance. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-21 09:08:02 -07:00
Derek Higgins	00917ef5b2	fix: Add 'accelerate' dependency to 'prompt-guard' (#1724 ) Required to startup a distribution with prompt guard Closes: #1723 ## Test Plan distribution starts with patch applied Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-03-21 07:37:20 -07:00
Yuan Tang	dce9a24a6c	test: Add default vLLM URL in remote-vllm template (#1736 ) # What does this PR do? This is to avoid errors like the following when running inference integration tests: ``` ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url ``` It's also good to have a default, which is consistent with vLLM API server. ## Test Plan Integration tests can run without the error above. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-21 07:31:59 -07:00
Ashwin Bharambe	03b5c61bfc	feat: make sure agent sessions are under access control (#1737 ) This builds on top of #1703. Agent sessions are now properly access controlled. ## Test Plan Added unit tests	2025-03-21 07:31:16 -07:00
Ashwin Bharambe	d7a6d92466	fix: only invoke openapi generator if APIs or API generator changes (#1744 ) As titled	2025-03-21 10:25:18 -04:00
Botao Chen	9114bef484	fix: fix experimental-post-training template (#1740 ) ## What does this PR do? fix the template to make it compatible with the latest dataset and eval api change ## test run `llama stack run llama_stack/templates/experimental-post-training/run.yaml` and spin up the llama stack server successfully	2025-03-20 23:07:19 -07:00
Hardik Shah	395203ce0f	Update getting_started.ipynb Fix numpy version mismatch issue	2025-03-20 22:00:08 -07:00
Hardik Shah	5a68a28263	Revert "install pandas and numpy beforehand to avoid version mismatch" This reverts commit `6e0bc5b078`.	2025-03-20 21:57:52 -07:00
Yuan Tang	934de0a281	ci: Enforce concurrency to reduce CI loads (#1738 ) # What does this PR do? When multiple commits are pushed to a PR, multiple CI builds will be triggered. This PR ensures that we only run one concurrent build for each PR to reduce CI loads. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 22:28:47 -04:00
Hardik Shah	5b9c366614	fix: install pandas and numpy beforehand to avoid version mismatch (#1735 ) As titled, due to the recent upgrade of colab. Pandas was out of sync with numpy breaking `llama stack build` in colab	2025-03-20 17:14:05 -07:00
Dinesh Yeduguru	6104bd06a0	feat: add different sinks for otel traces and metrics (#1731 ) # What does this PR do? Since we now start recording and exporting metrics, we no longer can use single OTEL endpoint to export both traces and metrics. This PR adds two sinks: OTEL_TRACE and OTEL_METRIC to be able to selectively enable the exporters. ## Test Plan Start server with OTEL_TRACE as sink and verify traces show up in jaeger ![Screenshot 2025-03-20 at 3 12 25 PM](https://github.com/user-attachments/assets/51007f28-b5ed-4853-912a-965a5cfe83af)	2025-03-20 15:51:41 -07:00
Hardik Shah	127bac6869	fix: Default to port 8321 everywhere (#1734 ) As titled, moved all instances of 5001 to 8321	2025-03-20 15:50:41 -07:00
Hardik Shah	581e8ae562	fix: docker run with `--pull always` to fetch the latest image (#1733 ) As titled	2025-03-20 15:35:48 -07:00
Ashwin Bharambe	f95bc29ca9	fix: handle registry errors gracefully (#1732 ) We need to be able to handle stale registry entries gracefully. More needs to be done when we are deleting important attributes from resources which could have been persisted. But at the very least, the server cannot die. ## Test Plan Added unit tests	2025-03-20 15:24:07 -07:00
Yuan Tang	f5a5c5d459	docs: Add instruction on enabling tool calling for remote vLLM (#1719 ) # What does this PR do? This PR adds a link to tool calling instructions in vLLM. Users have asked about this many times, e.g. https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:18:17 -07:00
Ihar Hrachyshka	be03cb7523	chore: Don't hide stderr from api generator (#1720 ) # What does this PR do? If the generator fails, pre-commit logs will now show how it failed. Note: stdout is still suppressed, so that regular informational messages do not pollute pre-commit output when all the hook does is update generated files. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Inject a failure in the generator code and confirm it's seen in the output. ``` $ git diff diff --git a/docs/openapi_generator/pyopenapi/utility.py b/docs/openapi_generator/pyopenapi/utility.py index f60a33bb..482e26ef 100644 --- a/docs/openapi_generator/pyopenapi/utility.py +++ b/docs/openapi_generator/pyopenapi/utility.py @@ -127,6 +127,7 @@ def is_optional_type(type_: Any) -> bool: def validate_api_method_return_types() -> List[str]: """Validate that all API methods have proper return types.""" + raise NotImplementedError("This function is not implemented yet") errors = [] protocols = api_protocol_map() ``` ``` $ pre-commit run --all-files check for merge conflicts................................................Passed trim trailing whitespace.................................................Passed check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed ruff.....................................................................Passed ruff-format..............................................................Passed blacken-docs.............................................................Passed uv-lock..................................................................Passed uv-export................................................................Passed mypy.....................................................................Passed Distribution Template Codegen............................................Passed API Spec Codegen.........................................................Failed - hook id: openapi-codegen - exit code: 1 warning: `VIRTUAL_ENV=/Users/ihrachys/.cache/pre-commit/repo9p35zuhm/py_env-python3` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module> fire.Fire(main) File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(varargs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 44, in main return_type_errors = validate_api_method_return_types() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 130, in validate_api_method_return_types raise NotImplementedError("This function is not implemented yet") NotImplementedError: This function is not implemented yet ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 15:17:52 -07:00
Dinesh Yeduguru	86f617a197	fix: tracing middleware to not start for lifespan events (#1730 ) # What does this PR do? Tracing middleware should not start tracing for lifespan events. Lifespan event happens at server startup and shutdown and if we start tracing for them, we will have an active trace for the lifetime of the server, which messes up with regular tracing since we always expect the traces to be never nested. We started hitting this issue since https://github.com/meta-llama/llama-stack/pull/1495. ## Test Plan * llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml * Verify in sqlite store that the trace now has non null span id ![Screenshot 2025-03-20 at 1 49 47 PM](https://github.com/user-attachments/assets/d77354a7-d5f1-4b53-a946-6adbd7a4f772)	2025-03-20 14:22:19 -07:00
Yuan Tang	029e4fc64d	fix: Add missing gcc in container build. Fixes #1716 (#1727 ) # What does this PR do? This should fix https://github.com/meta-llama/llama-stack/issues/1716 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:50:56 -04:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
Ihar Hrachyshka	515c16e352	chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711 ) # What does this PR do? Clean up mypy violations for inline::{telemetry,tool_runtime,vector_io}. This also makes API accept a tool call result without any content (like RAG tool already may produce). Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 10:01:10 -07:00
Ihar Hrachyshka	355134f51d	fix: Support types.UnionType in schemas (#1721 ) # What does this PR do? Since Python 3.10, unions can be expressed as `type1 \| type2`. Sadly, while this is functionally equivalent to `Union[type1, type2]`, the type of the expression is different (`types.UnionType`, not `typing.Union`). We should handle both in schemas. ## Test Plan Switch a schema type from Union to `\|` and confirm the generator doesn't crash with: ``` Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module> fire.Fire(main) File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(varargs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 55, in main spec = Specification( ^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 30, in __init__ self.document = generator.generate() ^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 782, in generate operation = self._build_operation(op) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 648, in _build_operation "application/json": builder.build_media_type( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 221, in build_media_type schema = self.schema_builder.classdef_to_ref(item_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 135, in classdef_to_ref type_schema = self.classdef_to_schema(typ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 116, in classdef_to_schema type_schema, type_definitions = self.schema_generator.classdef_to_schema(typ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 607, in classdef_to_schema types_defined[sub_name] = self._type_to_schema_with_lookup(sub_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 564, in _type_to_schema_with_lookup type_schema = self.type_to_schema(data_type, force_expand=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 320, in type_to_schema return self._type_to_schema(data_type, force_expand, json_schema_extra) \| common_info ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 487, in _type_to_schema property_docstrings = get_class_property_docstrings(typ, self.options.property_description_fun) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 94, in get_class_property_docstrings for base in inspect.getmro(data_type): ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/w2wykgpkzidnnr6cpw8wf94ghb0p8big-python3-3.11.11/lib/python3.11/inspect.py", line 731, in getmro return cls.__mro__ ^^^^^^^^^^^ AttributeError: 'types.UnionType' object has no attribute '__mro__'. Did you mean: '__or__'? ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-20 09:54:02 -07:00
Ihar Hrachyshka	5403582582	fix: Restore discriminator for AlgorithmConfig (#1706 )	2025-03-20 07:33:26 -07:00
ehhuang	af8b4484a3	fix: update default tool call system prompt (#1712 ) # What does this PR do? closes #1584 This should be a rather innocuous change. ## Test Plan Verify that there's no more tool call parsing error for example in issue <img width="1216" alt="image" src="https://github.com/user-attachments/assets/a5a6f4e8-2093-4ca2-bc06-794b707a0429" /> LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-19 22:49:24 -07:00
Ashwin Bharambe	01a25d9744	feat(server): add attribute based access control for resources (#1703 ) This PR introduces a way to implement Attribute Based Access Control (ABAC) for the Llama Stack server. The rough design is: - https://github.com/meta-llama/llama-stack/pull/1626 added a way for the Llama Stack server to query an authenticator - We build upon that and expect "access attributes" as part of the response. These attributes indicate the scopes available for the request. - We use these attributes to perform access control for registered resources as well as for constructing the default access control policies for newly created resources. - By default, if you support authentication but don't return access attributes, we will add a unique namespace pointing to the API_KEY. That way, all resources by default will be scoped to API_KEYs. An important aspect of this design is that Llama Stack stays out of the business of credential management or the CRUD for attributes. How you manage your namespaces or projects is entirely up to you. The design only implements access control checks for the metadata / book-keeping information that the Stack tracks. ### Limitations - Currently, read vs. write vs. admin permissions aren't made explicit, but this can be easily extended by adding appropriate attributes to the `AccessAttributes` data structure. - This design does not apply to agent instances since they are not considered resources the Stack knows about. Agent instances are completely within the scope of the Agents API provider. ### Test Plan Added unit tests, existing integration tests	2025-03-19 21:28:52 -07:00
ehhuang	c4e1b8d094	fix: better tool call parsing error message (#1710 ) # What does this PR do? context #1584 ## Test Plan <img width="1366" alt="image" src="https://github.com/user-attachments/assets/b490b590-3270-43cb-838e-8446a8948f1d" />	2025-03-19 20:39:10 -07:00
Ihar Hrachyshka	41bd350539	chore: Don't set type variables from register_schema() (#1713 ) # What does this PR do? Don't set type variables from register_schema(). `mypy` is not happy about it since type variables are calculated at runtime and hence the typing hints are not available during static analysis. Good news is there is no good reason to set the variables from the return type. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-19 20:29:00 -07:00
Charlie Doern	a483a58c6e	chore: deprecate /v1/inspect/providers (#1678 ) # What does this PR do? with the new /v1/providers API, /v1/inspect/providers is duplicative, deprecate it by removing the route, and add a test for the full /v1/providers API resolves #1623 ## Test Plan `uv run pytest -v tests/integration/providers --stack-config=ollama --text-model="meta-llama/Llama-3.2-3B-Instruct" --embedding-model=all-MiniLM-L6-v2` <img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM" src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-19 20:27:06 -07:00
Charlie Doern	1f04ca357b	fix: telemetry logger (#1714 ) # What does this PR do? currently if you have a run yaml without temeletry the following error is hit: TypeError: TelemetryAdapter.__init__() missing 1 required positional argument: 'deps' this is because the TelemetryAdapter requires a deps arg to be passed. Pass {} to avoid errors. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-19 20:26:13 -07:00
Botao Chen	f369871083	feat: [New Eval Benchamark] IfEval (#1708 ) # What does this PR do? In this PR, we added a new eval open benchmark IfEval based on paper https://arxiv.org/abs/2311.07911 to measure the model capability of instruction following. ## Test Plan spin up a llama stack server with open-benchmark template run `llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20` on client side and get the eval aggregate results	2025-03-19 16:39:59 -07:00
Michael Clifford	a7008dc15d	fix: Correctly set CLI_ARGS using BUILD_PLATFORM env with llama stack… (#1702 ) # What does this PR do? This PR updates `build_container.sh` to prevent an "unknown flag" error when using the `BUILD_PLATFORM` environment variable during `llama stack build`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) Closes #1699 ## Test Plan Running the following code with out these changes results in an "unknown flag" error. ``` CONTAINER_BINARY=podman BUILD_PLATFORM=linux/amd64 llama stack build --template ollama --image-type container ``` With these changes, the same command should build the image correctly. Signed-off-by: Michael Clifford <mcliffor@redhat.com>	2025-03-19 16:18:11 -07:00
ehhuang	b6b103a20d	docs: update for mcp tools (#1705 ) # What does this PR do? ## Test Plan read	2025-03-19 15:45:53 -07:00
yyymeta	d117bfe597	feat: [new open benchmark] DocVQA (#1647 ) # What does this PR do? DocVQA asks model to look a a picture, then answer a question given in text, with a text answer by text information in the picture. these questions often require understanding of relative positions of texts within the picture. original dataset is defined in the "Task1" of https://www.docvqa.org/datasets ## Test Plan setup llama server with ``` llama stack run ./llama_stack/templates/open-benchmark/run.yaml ``` then send traffic: ``` llama-stack-client eval run-benchmark "meta-reference-docvqa" --model-id meta-llama/Llama-3.3-70B-Instruct --output-dir /tmp/gpqa --num-examples 200 ```	2025-03-19 14:56:14 -07:00
ehhuang	1902e5754c	fix: toolgroups unregister (#1704 ) # What does this PR do? FAILED tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None] - AttributeError: 'coroutine' object has no attribute 'data' ## Test Plan LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/tools/test_tools.py --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704). * #1705 * __->__ #1704	2025-03-19 13:43:51 -07:00
Botao Chen	ab777ef5cd	fix: fix open-benchmark template (#1695 ) ## What does this PR do? open-benchmark templated is broken after the datasets api refactor due to 2 reasons - provider_id and provider_resource_id are no longer needed - the type in run.yaml will be resolved as dict this PR is to fix the above 2 issues ## Test spin up a llama stack server successfully with llama stack run `llama_stack/templates/open-benchmark/run.yaml`	2025-03-19 11:27:11 -07:00
Derek Higgins	6949bd1999	fix: Call pandas.read_* in a seperate thread (#1698 ) These block on io reads which in turn block the server. Move them to their own thread. Closes: #1697 # What does this PR do? To avoid blocking the main eventloop, updates datasetio/localfs to load data in a seperate thread Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-03-19 10:46:37 -07:00
Hardik Shah	65ca85ba6b	fix: Updating `ToolCall.arguments` to allow for json strings that can be decoded on client side (#1685 ) ### What does this PR do? Currently, `ToolCall.arguments` is a `Dict[str, RecursiveType]`. However, on the client SDK side -- the `RecursiveType` gets deserialized into a number ( both int and float get collapsed ) and hence when params are `int` they get converted to float which might break client side tools that might be doing type checking. Closes: https://github.com/meta-llama/llama-stack/issues/1683 ### Test Plan Stainless changes -- https://github.com/meta-llama/llama-stack-client-python/pull/204 ``` pytest -s -v --stack-config=fireworks tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.1-8B-Instruct ```	2025-03-19 10:36:19 -07:00
ehhuang	113f3a259c	docs: add documentation for RAGDocument (#1693 ) # What does this PR do? ## Test Plan	2025-03-19 10:16:00 -07:00
Francisco Arceo	5418e63919	chore: Add triagers list #1561 (#1701 ) # What does this PR do? Adds triagers list ## Closes #1561 ## Documentation Was provided here: https://github.com/meta-llama/llama-stack/pull/1621 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-19 09:59:17 -07:00
Yuan Tang	7c0448456e	docs: Remove mentions of focus on Llama models (#1690 ) # What does this PR do? This is a follow-up of https://github.com/meta-llama/llama-stack/issues/965 to avoid mentioning exclusive support on Llama models. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-19 00:17:22 -04:00
Ashwin Bharambe	5b39d5a76a	feat(auth, rfc): Add support for Bearer (api_key) Authentication (#1626 ) This PR adds support (or is a proposal for) for supporting API KEY authentication on the Llama Stack server end. `llama-stack-client` already supports accepting an api_key parameter and passes it down through every request as an `Authentication: ` header. Currently, Llama Stack does not propose APIs for handling authentication or authorization for resources of any kind. Given that, and the fact that any deployment will typically have _some_ authentication system present, we simply adopt a delegation mechanism: delegate to an HTTPS endpoint performing key management / authentication. It is configured via: ```yaml server: auth: endpoint: <...> ``` in the run.yaml configuration. ## How It Works When authentication is enabled: 1. Every API request must include an `Authorization: Bearer <token>` header 2. The server will send a _POST_ validation request to the configured endpoint with the following payload: ```json { "api_key": "<token>", "request": { "path": "/api/path", "headers": { "header1": "value1", ... }, "params": { "param1": "value1", ... } } } ``` 3. If the authentication endpoint returns a 200 status code, the request is allowed to proceed 4. If the authentication endpoint returns any other status code, a 401 Unauthorized response is returned ## Test Plan Unit tests	2025-03-18 16:24:18 -07:00
yyymeta	b79e0435de	fix: avoid tensor memory error (#1688 ) # What does this PR do? we randomly get errors like the following, it's most likely due to accessing an object that is already deallocated ``` E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] Traceback (most recent call last): E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] fn(i, args) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 611, in _wrap E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] ret = record(fn)(args_) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] return f(args, kwargs) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 249, in worker_process_entrypoint E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] task = req_gen.send(result) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 156, in retrieve_requests E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] torch.distributed.broadcast_object_list( E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] return func(args, **kwargs) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3504, in broadcast_object_list E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] object_list[i] = _tensor_to_object(obj_view, obj_size, group) E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2961, in _tensor_to_object E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] return _unpickler(io.BytesIO(buf)).load() E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] EOFError: Ran out of input E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] Process SpawnProcess-1: Traceback (most recent call last): ``` ## Test Plan start server ``` llama-stack-client eval run-benchmark mmmu_v1 --model-id meta-llama/Llama-4-17B-Omni-Instruct --output-dir /tmp/mmmu_standard --num-examples 30 ``` [//]: # (## Documentation)	2025-03-18 16:17:29 -07:00
Sarthak Deshpande	9c8e88ea9c	fix: Fixed import errors for UI and playground (#1666 ) # What does this PR do? Fixed import errors for playground and ui --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-18 15:00:48 -07:00
Ihar Hrachyshka	0cbb7f7f21	chore: fix mypy violations in post_training modules (#1548 ) # What does this PR do? Fixes a bunch of violations. Note: this patch touches all files but post_training.py that will be significantly changed by #1437, hence leaving it out of the picture for now. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Testing with https://github.com/meta-llama/llama-stack/pull/1543 Also checked that GPU training works with the change: ``` INFO: ::1:53316 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK INFO: ::1:53316 - "GET /v1/post-training/job/status?job_uuid=test-jobb5ca2d84-d541-42f8-883b-762828b4c0e7 HTTP/1.1" 200 OK INFO: ::1:53316 - "GET /v1/post-training/job/artifacts?job_uuid=test-jobb5ca2d84-d541-42f8-883b-762828b4c0e7 HTTP/1.1" 200 OK 21:24:01.161 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (32526.75ms) 21:23:28.769 [DEBUG] Setting manual seed to local seed 3918872849. Local seed is seed + rank = 3918872849 + 0 21:23:28.996 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights. 21:23:29.933 [INFO] Memory stats after model init: GPU peak memory allocation: 6.05 GiB GPU peak memory reserved: 6.10 GiB GPU peak memory active: 6.05 GiB 21:23:29.934 [INFO] Model is initialized with precision torch.bfloat16. 21:23:30.115 [INFO] Tokenizer is initialized. 21:23:30.118 [INFO] Optimizer is initialized. 21:23:30.119 [INFO] Loss is initialized. 21:23:30.896 [INFO] Dataset and Sampler are initialized. 21:23:30.898 [INFO] Learning rate scheduler is initialized. 21:23:31.618 [INFO] Memory stats after model init: GPU peak memory allocation: 6.24 GiB GPU peak memory reserved: 6.30 GiB GPU peak memory active: 6.24 GiB 21:23:31.620 [INFO] Starting checkpoint save... 21:23:59.428 [INFO] Model checkpoint of size 6.43 GB saved to /home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth 21:23:59.445 [INFO] Adapter checkpoint of size 0.00 GB saved to /home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-18 14:58:16 -07:00
Sébastien Han	f86f3cf878	docs: remove redundant installation instructions (#1138 ) # What does this PR do? The previous installation instructions were mostly duplicating information already covered in the documentation, either in the “Start a Server” or “Contributing Guide” sections. Removed these redundant details to avoid confusion and streamline the setup process. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-18 14:52:21 -07:00
Yuan Tang	22e560351e	ci: Add scheduled workflow to update changelog (#1503 ) # What does this PR do? This is a follow up from https://github.com/meta-llama/llama-stack/pull/1463. cc @yanxi0830 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-03-18 14:39:22 -07:00
Sarthak Deshpande	5ece262976	chore: Make code interpreter async (#1654 ) # What does this PR do? Made code interpreter tool call to be async such that its non blocking ## Test Plan pytest -s -v tests/integration/agents/test_agents.py --stack-config=together --text-model=meta-llama/Llama-3.3-70B-Instruct <img width="1693" alt="image" src="https://github.com/user-attachments/assets/42520bb6-7acf-42d5-b71f-b35ca149d722" /> [//]: # (## Documentation) Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-18 14:13:46 -07:00
Yuan Tang	d609ffce2a	chore: Add links and badges to both unit and integration tests (#1632 ) # What does this PR do? This makes it easier to know the statuses of both and identifying failed builds. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-18 14:12:17 -07:00
Sébastien Han	c029fbcd13	fix: return 4xx for non-existent resources in GET requests (#1635 ) # What does this PR do? - Removed Optional return types for GET methods - Raised ValueError when requested resource is not found - Ensures proper 4xx response for missing resources - Updated the API generator to check for wrong signatures ``` $ uv run --with ".[dev]" ./docs/openapi_generator/run_openapi_generator.sh Validating API method return types... API Method Return Type Validation Errors: Method ScoringFunctions.get_scoring_function returns Optional type ``` Closes: https://github.com/meta-llama/llama-stack/issues/1630 ## Test Plan Run the server then: ``` curl http://127.0.0.1:8321/v1/models/foo {"detail":"Invalid value: Model 'foo' not found"}% ``` Server log: ``` INFO: 127.0.0.1:52307 - "GET /v1/models/foo HTTP/1.1" 400 Bad Request 09:51:42.654 [END] /v1/models/foo [StatusCode.OK] (134.65ms) 09:51:42.651 [ERROR] Error executing endpoint route='/v1/models/{model_id:path}' method='get' Traceback (most recent call last): File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 193, in endpoint return await maybe_await(value) File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 156, in maybe_await return await value File "/Users/leseb/Documents/AI/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, args, *kwargs) File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 217, in get_model raise ValueError(f"Model '{model_id}' not found") ValueError: Model 'foo' not found ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-18 14:06:53 -07:00
Daniele Martinoli	cca9bd6cc3	feat: Qdrant inline provider (#1273 ) # What does this PR do? Removed local execution option from the remote Qdrant provider and introduced an explicit inline provider for the embedded execution. Updated the ollama template to include this option: this part can be reverted in case we don't want to have two default `vector_io` providers. (Closes #1082) ## Test Plan Build and run an ollama distro: ```bash llama stack build --template ollama --image-type conda llama stack run --image-type conda ollama ``` Run one of the sample ingestionapplicatinos like [rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py), but replace this line: ```py selected_vector_provider = vector_providers[0] ``` with the following, to use the `qdrant` provider: ```py selected_vector_provider = vector_providers[1] ``` After running the test code, verify the timestamp of the Qdrant store: ```bash % ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_* total 784 -rw-r--r--@ 1 dmartino staff 401408 Feb 26 10:07 storage.sqlite ``` [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-03-18 14:04:21 -07:00
Nathan Weinberg	141b3c14dd	docs: fix broken test path in CONTRIBUTING.md (#1679 ) # What does this PR do? fix broken test path in CONTRIBUTING.md Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-18 13:39:46 -07:00
Ihar Hrachyshka	814eb75321	chore: enable ruff for ./scripts too (#1643 ) # What does this PR do? Enable ruff for scripts. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-18 12:17:21 -07:00
Matthew Farrellee	706b4ca651	feat: support nvidia hosted vision models (llama 3.2 11b/90b) (#1278 ) # What does this PR do? support nvidia hosted 3.2 11b/90b vision models. they are not hosted on the common https://integrate.api.nvidia.com/v1. they are hosted on their own individual urls. ## Test Plan `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/inference/test_vision_inference.py --inference-model=meta/llama-3.2-11b-vision-instruct -k image`	2025-03-18 11:54:10 -07:00
Jamie Land	f4dc290705	feat: Created Playground Containerfile and Image Workflow (#1256 ) # What does this PR do? Adds a container file that can be used to build the playground UI. This file will be built by this PR in the stack-ops repo: https://github.com/meta-llama/llama-stack-ops/pull/9 Docker command in the docs will need to change once I know the address of the official repository. ## Test Plan Tested image on my local Openshift Instance using this helm chart: https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack [//]: # (## Documentation) --------- Co-authored-by: Jamie Land <hokie10@gmail.com>	2025-03-18 09:26:49 -07:00
Sébastien Han	ffe9b3b278	ci(ollama): run more integration tests (#1636 ) # What does this PR do? Run additional tests in a matrix to accelerate the process and clearly identify failing providers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-18 08:54:42 -07:00
Luis Tomas Bolivar	168cbcbb92	fix: Add the option to not verify SSL at remote-vllm provider (#1585 ) # What does this PR do? Add the option to not verify SSL certificates for the remote-vllm provider. This allows llama stack server to talk to remote LLMs which have self-signed certificates Partially addresses #1545	2025-03-18 09:33:35 -04:00
ehhuang	37f155e41d	feat(agent): support multiple tool groups (#1556 ) Summary: closes #1488 Test Plan: added new integration test ``` LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini ``` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1556). * __->__ #1556 * #1550	2025-03-17 22:13:09 -07:00
ehhuang	c23a7af5d6	fix: agents with non-llama model (#1550 ) # Summary: Includes fixes to get test_agents working with openAI model, e.g. tool parsing and message conversion # Test Plan: ``` LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini ``` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1550). * #1556 * __->__ #1550	2025-03-17 22:11:06 -07:00
Yuan Tang	0bdfc71f8d	test: Bump slow_callback_duration to 200ms to avoid flaky remote vLLM unit tests (#1675 ) # What does this PR do? This avoids flaky timeout issue observed in CI builds, e.g. `3891286596` ## Test Plan Ran multiple times and pass consistently. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-17 21:33:04 -07:00
Yuan Tang	2d2bb701fa	ci: Add dependabot scans for Python deps (#1618 ) # What does this PR do? This PR adds dependabot updates for Python dependencies. In addition: * Consistent weekly schedule on a specific day * Specific commit messages * `open-pull-requests-limit` is intentional to avoid upgrading dependencies that will likely cause regressions. We want to keep the focus here on security updates only Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-17 20:20:31 -07:00
Yuan Tang	e14f69eb7e	chore: Remove unused cursor rules (#1653 ) # What does this PR do? I think this was included accidentally via https://github.com/meta-llama/llama-stack/pull/1475. @raghotham @ashwinb let me know if it's intentional to include this. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-17 20:19:37 -07:00
Nathan Weinberg	1261bc93bf	docs: fixed broken tip in distro build docs (#1673 ) # What does this PR do? fixed broken tip in distro build docs ## Test Plan Local docs build Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-17 17:22:26 -07:00
Xi Yan	5287b437ae	feat(api): (1/n) datasets api clean up (#1573 ) ## PR Stack - https://github.com/meta-llama/llama-stack/pull/1573 - https://github.com/meta-llama/llama-stack/pull/1625 - https://github.com/meta-llama/llama-stack/pull/1656 - https://github.com/meta-llama/llama-stack/pull/1657 - https://github.com/meta-llama/llama-stack/pull/1658 - https://github.com/meta-llama/llama-stack/pull/1659 - https://github.com/meta-llama/llama-stack/pull/1660 Client SDK - https://github.com/meta-llama/llama-stack-client-python/pull/203 CI - `1391130488` <img width="1042" alt="image" src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca" /> -- the test_rag_agent_with_attachments is flaky and not related to this PR ## Doc <img width="789" alt="image" src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9" /> ## Client Usage ```python client.datasets.register( source={ "type": "uri", "uri": "lsfs://mydata.jsonl", }, schema="jsonl_messages", # optional dataset_id="my_first_train_data" ) # quick prototype debugging client.datasets.register( data_reference={ "type": "rows", "rows": [ "messages": [...], ], }, schema="jsonl_messages", ) ``` ## Test Plan - CI: `1387805545` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py ``` ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ```	2025-03-17 16:55:45 -07:00
Nathan Weinberg	3b35a39b8b	ci: limit PR testing based on modified files (#1644 ) # What does this PR do? rather than have unit and functional tests run on all PRs, we should only have them run on PRs changing relevant files Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-17 15:20:29 -07:00
Sébastien Han	24fd06879e	refactor: simplify command execution and remove PTY handling (#1641 ) # What does this PR do? A PTY is unnecessary for interactive mode since `subprocess.run()` already inherits the calling terminal’s stdin, stdout, and stderr, allowing natural interaction. Using a PTY can introduce unwanted side effects like buffering issues and inconsistent signal handling. Standard input/output is sufficient for most interactive programs. This commit simplifies the command execution by: 1. Removing PTY-based execution in favor of direct subprocess handling 2. Consolidating command execution into a single run_command function 3. Improving error handling with specific subprocess error types 4. Adding proper type hints and documentation 5. Maintaining Ctrl+C handling for graceful interruption ## Test Plan ``` llama stack run ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-17 15:03:14 -07:00
Ihar Hrachyshka	77ca09467f	chore: consolidate scripts under ./scripts directory (#1646 )	2025-03-17 17:56:30 -04:00
Nathan Weinberg	e48af78b76	fix: add shutdown method for ProviderImpl (#1670 ) # What does this PR do? Currently there is no shutdown method implemented for the `ProviderImpl` class This leads to the following warning ```shell INFO: Waiting for application shutdown. INFO 2025-03-17 17:25:13,280 __main__:145 server: Shutting down INFO 2025-03-17 17:25:13,282 __main__:129 server: Shutting down ModelsRoutingTable INFO 2025-03-17 17:25:13,284 __main__:129 server: Shutting down DatasetsRoutingTable INFO 2025-03-17 17:25:13,286 __main__:129 server: Shutting down DatasetIORouter INFO 2025-03-17 17:25:13,287 __main__:129 server: Shutting down TelemetryAdapter INFO 2025-03-17 17:25:13,288 __main__:129 server: Shutting down InferenceRouter INFO 2025-03-17 17:25:13,290 __main__:129 server: Shutting down ShieldsRoutingTable INFO 2025-03-17 17:25:13,291 __main__:129 server: Shutting down SafetyRouter INFO 2025-03-17 17:25:13,292 __main__:129 server: Shutting down VectorDBsRoutingTable INFO 2025-03-17 17:25:13,293 __main__:129 server: Shutting down VectorIORouter INFO 2025-03-17 17:25:13,294 __main__:129 server: Shutting down ToolGroupsRoutingTable INFO 2025-03-17 17:25:13,295 __main__:129 server: Shutting down ToolRuntimeRouter INFO 2025-03-17 17:25:13,296 __main__:129 server: Shutting down MetaReferenceAgentsImpl INFO 2025-03-17 17:25:13,297 __main__:129 server: Shutting down ScoringFunctionsRoutingTable INFO 2025-03-17 17:25:13,298 __main__:129 server: Shutting down ScoringRouter INFO 2025-03-17 17:25:13,299 __main__:129 server: Shutting down BenchmarksRoutingTable INFO 2025-03-17 17:25:13,300 __main__:129 server: Shutting down EvalRouter INFO 2025-03-17 17:25:13,301 __main__:129 server: Shutting down DistributionInspectImpl INFO 2025-03-17 17:25:13,303 __main__:129 server: Shutting down ProviderImpl WARNING 2025-03-17 17:25:13,304 __main__:134 server: No shutdown method for ProviderImpl INFO: Application shutdown complete. INFO: Finished server process [1] ``` ## Test Plan Start a server and shut it down Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-17 14:55:40 -07:00
cdgamarose-nv	252a487085	feat: added nvidia as safety provider (#1248 ) # What does this PR do? Adds nvidia as a safety provider by interfacing with the nemo guardrails microservice. This enables checking user’s input or the LLM’s output against input and output guardrails by using the `/v1/guardrails/checks` endpoint of the[ guardrails API.](https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/guides/checks-guide.html) ## Test Plan Deploy nemo guardrails service following the documentation: https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/getting-started/deploy-docker.html ### Standalone: ```bash (venv) local-cdgamarose@a1u1g-rome-0153:~/llama-stack$ pytest -v -s llama_stack/providers/tests/safety/test_safety.py --providers inference=nvidia,safety=nvidia --safety-shield meta/llama-3.1-8b-instruct =================================================================================== test session starts =================================================================================== platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/llama-stack/venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.10.12', 'Platform': 'Linux-5.15.0-122-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'html': '4.1.1'}} rootdir: /localhome/local-cdgamarose/llama-stack configfile: pyproject.toml plugins: metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, html-4.1.1 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_shield_list[--inference=nvidia:safety=nvidia] Initializing NVIDIASafetyAdapter(http://0.0.0.0:7331)... PASSED llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_run_shield[--inference=nvidia:safety=nvidia] PASSED ============================================================================== 2 passed, 2 warnings in 4.78s ============================================================================== ``` ### Distribution: ``` llama stack run llama_stack/templates/nvidia/run-with-safety.yaml curl -v -X 'POST' "http://localhost:8321/v1/safety/run-shield" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"shield_id": "meta/llama-3.1-8b-instruct", "messages":[{"role": "user", "content": "you are stupid"}]}' {"violation":{"violation_level":"error","user_message":"Sorry I cannot do this.","metadata":{"self check input":{"status":"blocked"}}}} ``` [//]: # (## Documentation) --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-17 14:39:23 -07:00
Kelly Brown	ac51564ad5	docs: Fixing outputs in client cli and formatting suggestions (#1668 ) Description: Updates the client example output as well as add a suggested formatting for some of the required and optional cli flags. If the re-formatting is unnecessary, I can remove it from this PR and just have this fix the example output	2025-03-17 14:31:09 -07:00
Jeff MAURY	f11b6db40d	fix: build distribution with podman (#1671 ) # What does this PR do? Update the container build script so that it is compatible with podman. The --progress=plain is now the default option and can be overriden. ## Test Plan N/A [//]: # (## Documentation) Signed-off-by: Jeff MAURY <jmaury@redhat.com>	2025-03-17 14:30:06 -07:00
Sarthak Deshpande	dfa11a1216	fix: fixed import error (#1637 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] The generate_response_prompt had an import error, fixed that error. Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-17 17:04:47 -04:00
yyymeta	fb418813fc	fix: passthrough impl response.content.text (#1665 ) # What does this PR do? current passthrough impl returns chatcompletion_message.content as a TextItem() , not a straight string. so it's not compatible with other providers, and causes parsing error downstream. change away from the generic pydantic conversion, and explicitly parse out content.text ## Test Plan setup llama server with passthrough ``` llama-stack-client eval run-benchmark "MMMU_Pro_standard" --model-id meta-llama/Llama-3-8B --output-dir /tmp/ --num-examples 20 ``` works without parsing error	2025-03-17 13:42:08 -07:00
Kelly Brown	60ae7455f6	docs: Fix trailing whitespace error (#1669 ) Description: Fixes the trailing whitespace error thats coming up on main	2025-03-17 08:53:30 -07:00
Chirag Modi	b56b06037c	Web updates to point to latest releases for Mobile SDK (#1650 ) # What does this PR do? Web updates to point to latest releases for Mobile SDK - point to `latest-release` branch for mobile sdk repos to minimize the number of change points on the site. - updates to some instructions	2025-03-14 17:06:07 -07:00
Nathan Weinberg	d2dda4af64	docs: add additional guidance around using `virtualenv` (#1642 ) # What does this PR do? current docs are very tailored to `conda` also adds guidance around running code examples within virtual environment for both `conda` and `virtualenv` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-14 16:00:55 -07:00
Ashwin Bharambe	7b81761a56	fix: update CDN url for stoplight	2025-03-14 15:46:45 -07:00
Ashwin Bharambe	93cfade8c9	ci: Bump version to 0.1.7	2025-03-14 15:21:26 -07:00
Ashwin Bharambe	c5857a9b50	fix: sleep between tests oof	2025-03-14 14:45:37 -07:00
yyymeta	a626b7bce3	feat: [new open benchmark] BFCL_v3 (#1578 ) # What does this PR do? create a new dataset BFCL_v3 from https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html overall each question asks the model to perform a task described in natural language, and additionally a set of available functions and their schema are given for the model to choose from. the model is required to write the function call form including function name and parameters , to achieve the stated purpose. the results are validated against provided ground truth, to make sure that the generated function call and the ground truth function call are syntactically and semantically equivalent, by checking their AST . ## Test Plan start server by ``` llama stack run ./llama_stack/templates/ollama/run.yaml ``` then send traffic ``` llama-stack-client eval run-benchmark "bfcl" --model-id meta-llama/Llama-3.2-3B-Instruct --output-dir /tmp/gpqa --num-examples 2 ``` [//]: # (## Documentation)	2025-03-14 12:50:49 -07:00
Charlie Doern	78d4872c0c	feat: add support for logging config in the run.yaml (#1408 ) # What does this PR do? a user should be able to store a static logging configuration outside of their environment. This would make sense to store in the run yaml given that we store other things like server configuration in there. The environment variable settings override the config settings if both are available. The format in the config looks like this: ``` logging_config: category_levels: VALID_CATEGORY: VALID_STRING_LOG_LEVEL ``` any specified category out of the following: `core \| server \| router \| inference \| agents \| safety \| eval \| tools \| client` combined with any of the following log levels: `debug \| info \| warning \| error \| critical` can be placed in the category_levels list in order to achieve the desired log level ## Test Plan Test locally with a run config like the following: ``` version: '2' image_name: ollama logging_config: category_levels: server: debug apis: ... ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-14 12:36:25 -07:00
Ihar Hrachyshka	e3e7013ac8	chore: Add pre-commit check to sync api spec docs (#1609 ) # What does this PR do? It will fail if the newly generated spec docs are different. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ pre-commit run --all-files check for merge conflicts................................................Passed trim trailing whitespace.................................................Passed check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed ruff.....................................................................Passed ruff-format..............................................................Passed blacken-docs.............................................................Passed uv-lock..................................................................Passed uv-export................................................................Passed mypy.....................................................................Passed Distribution Template Codegen............................................Passed API Spec Codegen.........................................................Passed ``` Now add a field to existing API. Repeat: ``` $ pre-commit run --all-files check for merge conflicts................................................Passed trim trailing whitespace.................................................Passed check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed ruff.....................................................................Passed ruff-format..............................................................Passed blacken-docs.............................................................Passed uv-lock..................................................................Passed uv-export................................................................Passed mypy.....................................................................Passed Distribution Template Codegen............................................Passed API Spec Codegen.........................................................Failed - hook id: openapi-codegen - files were modified by this hook ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-14 09:20:49 -07:00
Ihar Hrachyshka	bfc79217a8	chore: Add ./scripts/unit-tests.sh (#1515 ) # What does this PR do? Useful for local development. Now you can just trigger the script and not care about specific arguments to pass to run unit tests. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ . ./venv/bin/activate $ ./scripts/run_tests.sh $ echo $? 0 ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>	2025-03-13 20:25:15 -07:00
Xi Yan	33b096cc21	fix: OpenAPI with provider get (#1627 ) # What does this PR do? - https://github.com/meta-llama/llama-stack/pull/1429 introduces GetProviderResponse in OpenAPI, which is not needed, and not correctly defined. cc @cdoern [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` llama-stack-client providers list ``` <img width="610" alt="image" src="https://github.com/user-attachments/assets/2f7b62a5-daf2-4bf9-9505-69755c7025fc" /> [//]: # (## Documentation)	2025-03-13 19:56:32 -07:00
Kai Wu	9e73341008	fix: change dog.jpg path in test_vision_inference.py (#1624 ) # What does this PR do? quick fix as the vision_inference test dog.jpg path has been changed. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-03-13 18:58:12 -07:00
Yuan Tang	ca0cbf4338	fix: Fix pre-commit check (#1628 ) # What does this PR do? Fixes pre-commit check failure after merging https://github.com/meta-llama/llama-stack/pull/1010: `3874877097` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 18:57:42 -07:00
Alina Ryan	c02464b635	fix: Clarify `llama model prompt-format` help text (#1010 ) # What does this PR do? Updates the help text for the `llama model prompt-format` command to clarify that users should provide a specific model name (e.g., Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the default value and field for `--model-name` to prevent users from mistakenly thinking a model family name is acceptable. Adds guidance to run `llama model list` to view valid model names. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Output of `llama model prompt-format -h` Before: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1 usage: llama model prompt-format [-h] [-m MODEL_NAME] llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct ``` Output of `llama model prompt-format -h` After: ``` (venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Example: Llama3.1-8B or Llama3.2-11B-Vision, etc (Run `llama model list` to see a list of valid model names) Example: llama model prompt-format <options> ``` Signed-off-by: Alina Ryan <aliryan@redhat.com>	2025-03-13 20:47:09 -04:00
Sébastien Han	98b1b15e0f	refactor: move all datetime.now() calls to UTC (#1589 ) # What does this PR do? Updated all instances of datetime.now() to use timezone.utc for consistency in handling time across different systems. This ensures that timestamps are always in Coordinated Universal Time (UTC), avoiding issues with time zone discrepancies and promoting uniformity in time-related data. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-13 15:34:53 -07:00
Yuan Tang	b906bad238	docs: Add OpenAI, Anthropic, Gemini to inference API providers table (#1622 ) # What does this PR do? Forgot to update this page as well as part of https://github.com/meta-llama/llama-stack/pull/1617. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 15:28:52 -07:00
Charlie Doern	a062723d03	feat: add provider API for listing and inspecting provider info (#1429 ) # What does this PR do? currently the `inspect` API for providers is really a `list` API. Create a new `providers` API which has a GET `providers/{provider_id}` inspect API which returns "user friendly" configuration to the end user. Also add a GET `/providers` endpoint which returns the list of providers as `inspect/providers` does today. This API follows CRUD and is more intuitive/RESTful. This work is part of the RFC at https://github.com/meta-llama/llama-stack/pull/1359 sensitive fields are redacted using `redact_sensetive_fields` on the server side before returning a response: <img width="456" alt="Screenshot 2025-03-13 at 4 40 21 PM" src="https://github.com/user-attachments/assets/9465c221-2a26-42f8-a08a-6ac4a9fecce8" /> ## Test Plan using https://github.com/meta-llama/llama-stack-client-python/pull/181 a user is able to to run the following: `llama stack build --template ollama --image-type venv` `llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml` `llama-stack-client providers inspect ollama` <img width="378" alt="Screenshot 2025-03-13 at 4 39 35 PM" src="https://github.com/user-attachments/assets/8273d05d-8bc3-44c6-9e4b-ef95e48d5466" /> also, was able to run the new test_list integration test locally with ollama: <img width="1509" alt="Screenshot 2025-03-13 at 11 03 40 AM" src="https://github.com/user-attachments/assets/9b9db166-f02f-45b0-86a4-306d85149bc8" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-13 15:07:21 -07:00
dependabot[bot]	e101d15f12	build(deps): bump astral-sh/setup-uv from 4 to 5 (#1620 )	2025-03-13 16:40:15 -04:00
Ihar Hrachyshka	a3d710e59c	chore: Always check that git merge conflict markers are not present (#1610 ) # What does this PR do? Before the change, it was only doing it during the merge. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ git checkout `d263edbf90` $ pre-commit run --all-files check for merge conflicts................................................Failed - hook id: check-merge-conflict - exit code: 1 docs/_static/llama-stack-spec.yaml:3179: Merge conflict string '<<<<<<<' found docs/_static/llama-stack-spec.yaml:3185: Merge conflict string '=======' found docs/_static/llama-stack-spec.yaml:3190: Merge conflict string '>>>>>>>' found [...] ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-13 13:19:44 -07:00
ehhuang	ed841380dc	test: turn off recordable mock for now (#1616 ) Summary: will figure out how to do this best, turning it off for now. Test Plan: test_agents.py --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1616). * __->__ #1616 * #1615	2025-03-13 13:18:08 -07:00
Yuan Tang	a1bb7c8d82	docs: Add OpenAI, Anthropic, Gemini to API providers table (#1617 ) # What does this PR do? These are supported via https://github.com/meta-llama/llama-stack/pull/1267. cc @ashwinb Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 15:47:58 -04:00
Sébastien Han	28aade9a27	ci: add GitHub Action to close stale issues and PRs (#1613 ) # What does this PR do? - Issues/PRs inactive for 60 days are marked as stale - Stale items are closed after 30 additional days of inactivity - Adds appropriate warning and closing messages - Sets daily schedule for stale checks Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-13 12:09:04 -07:00
Sébastien Han	edfcb02a0e	ci(ollama): add GitHub Actions workflow for integration tests (#1546 ) # What does this PR do? Added a GitHub Action to run inference tests for the Ollama provider. This ensures we have coverage for Ollama integration. --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-13 12:04:53 -07:00
ehhuang	42788a9d50	test: re record responses after client sync (#1615 ) Summary: Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct --record-responses	2025-03-13 11:21:10 -07:00
Xi Yan	98811cc034	fix: clean up test imports (#1600 ) # What does this PR do? - Clean up dead SDK code in https://github.com/meta-llama/llama-stack-client-python/pull/198 - Regen for local cache key issue [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/ --text-model meta-llama/Llama-3.3-70B-Instruct ``` - CI: `1382351211` <img width="1658" alt="image" src="https://github.com/user-attachments/assets/1a2de383-35a2-47a0-8d80-d666d4970c34" /> [//]: # (## Documentation)	2025-03-13 11:01:52 -07:00
Sébastien Han	5e54113b19	ci: add dynamic CI job to test templates (#1230 ) # What does this PR do? Introduced a new CI job that dynamically generates a build matrix based on available templates from `llama_stack/templates/*/build.yaml`. This allows automated testing for all templates without manual intervention. The CI currently builds for venv and containers. Signed-off-by: Sébastien Han <seb@redhat.com> ~Will pass once https://github.com/meta-llama/llama-stack/pull/1228 merges.~ Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-13 10:14:01 -07:00
Xi Yan	9617468d13	fix: passthrough provider template + fix (#1612 ) # What does this PR do? - Fix issue w/ passthrough provider [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan llama stack run [//]: # (## Documentation)	2025-03-13 09:44:26 -07:00
Ashwin Bharambe	d072b5fa0c	test: add unit test to ensure all config types are instantiable (#1601 )	2025-03-12 22:29:58 -07:00
ehhuang	0a0d6cb96e	fix: openapi spec gen (#1602 ) Summary: Test Plan: sh docs/openapi_generator/run_openapi_generator.sh	2025-03-12 21:55:05 -07:00
Nathan Weinberg	d263edbf90	build: remove .python-version (#1513 ) # What does this PR do? the current `.python-version` file forces `uv` to setup the development environment with Python 3.10 this causes an error if a dev system does not have Python 3.10, even though the project officially supports newer versions of Python as well since `uv` can use the `pyproject.toml` to determine python versions, we can safely remove this file from the repo and subsequent git tracking follows up on https://github.com/meta-llama/llama-stack/pull/1172 ## Test Plan N/A --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-12 20:08:24 -07:00
ehhuang	a505bf45a3	feat(api): remove tool_name from ToolResponseMessage (#1599 ) Summary: This is not used anywhere. closes #1421 Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct --record-responses	2025-03-12 19:41:48 -07:00
ehhuang	6bfcb65343	test: code exec on mac (#1549 ) Summary: 1. adds option to not use bwrap for code execution 2. disable bwrap when running tests on macs Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct ``` Verify code_interpreter result in logs INFO 2025-03-11 08:10:39,858 llama_stack.providers.inline.agents.meta_reference.agent_instance:1032 agents: tool call code_interpreter completed with result: content='completed\n\n541\n' error_message=None error_code=None metadata=None	2025-03-12 19:21:53 -07:00
Nathan Weinberg	2baf200b63	ci: add html report to unit test artifacts (#1576 ) # What does this PR do? additional artifacts make test results more human-readable ## Test Plan Ran locally Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-12 19:05:49 -07:00
ehhuang	ed6caead72	chore: simplify _get_tool_defs (#1384 ) Summary: Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-12 18:51:18 -07:00
ehhuang	41c9bca1aa	chore: refactor Agent toolgroup processing (#1381 ) Summary: Refactoring only. Centralize logic to preprocess toolgroup to one place. Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/api/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1381). * #1384 * __->__ #1381	2025-03-12 18:48:03 -07:00
Dinesh Yeduguru	99bbe0e70b	feat: Add new compact MetricInResponse type (#1593 ) # What does this PR do? This change adds a compact type to include metrics in response as opposed to the full MetricEvent which is relevant for internal logging purposes. ## Test Plan ``` LLAMA_STACK_CONFIG=~/.llama/distributions/fireworks/fireworks-run.yaml pytest -s -v agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml curl --request POST \ --url http://localhost:8321/v1/inference/chat-completion \ --header 'content-type: application/json' \ --data '{ "model_id": "meta-llama/Llama-3.1-70B-Instruct", "messages": [ { "role": "user", "content": { "type": "text", "text": "where do humans live" } } ], "stream": false }' { "metrics": [ { "metric": "prompt_tokens", "value": 10, "unit": null }, { "metric": "completion_tokens", "value": 522, "unit": null }, { "metric": "total_tokens", "value": 532, "unit": null } ], "completion_message": { "role": "assistant", "content": "Humans live in various parts of the world...............", "stop_reason": "out_of_tokens", "tool_calls": [] }, "logprobs": null } ```	2025-03-12 15:45:44 -07:00
Nathan Weinberg	ad939c97c3	docs: add unit test badge to README (#1591 ) # What does this PR do? This PR adds a simple unit test badge to the project README It also modifies the workflow to run on merges to main, so that the status reflected in the README is that of main and not pull request branches --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-12 15:41:35 -07:00
ehhuang	1311faf3f5	fix: logging (#1598 ) Summary: Test Plan:	2025-03-12 14:57:31 -07:00
Dinesh Yeduguru	0fdb15bcc7	fix: fix build error in context.py (#1595 ) # What does this PR do? This fixes the build error ## Test Plan pre-commit run --all-files check for merge conflicts................................................Passed trim trailing whitespace.................................................Passed check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed ruff.....................................................................Passed ruff-format..............................................................Passed blacken-docs.............................................................Passed uv-lock..................................................................Passed uv-export................................................................Passed mypy.....................................................................Passed Distribution Template Codegen............................................Passed	2025-03-12 13:26:23 -07:00
ehhuang	b7a9c45477	chore: deprecate ToolResponseMessage in agent.resume API (#1566 ) # Summary: closes #1431 # Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-12 12:10:21 -07:00
Dinesh Yeduguru	58d08d100e	feat: Add back inference metrics and preserve context variables across asyncio boundary (#1552 ) # What does this PR do? This PR adds back the changes in #1300 which were reverted in #1476 . It also adds logic to preserve context variables across asyncio boundary. this is needed with the library client since the async generator logic yields control to code outside the event loop, and on resuming, does not have the same context as before and this requires preserving the context vars. address #1477 ## Test Plan ``` curl --request POST \ --url http://localhost:8321/v1/inference/chat-completion \ --header 'content-type: application/json' \ --data '{ "model_id": "meta-llama/Llama-3.1-70B-Instruct", "messages": [ { "role": "user", "content": { "type": "text", "text": "where do humans live" } } ], "stream": false }' \| jq . { "metrics": [ { "trace_id": "kCZwO3tyQC-FuAGb", "span_id": "bsP_5a5O", "timestamp": "2025-03-11T16:47:38.549084Z", "attributes": { "model_id": "meta-llama/Llama-3.1-70B-Instruct", "provider_id": "fireworks" }, "type": "metric", "metric": "prompt_tokens", "value": 10, "unit": "tokens" }, { "trace_id": "kCZwO3tyQC-FuAGb", "span_id": "bsP_5a5O", "timestamp": "2025-03-11T16:47:38.549449Z", "attributes": { "model_id": "meta-llama/Llama-3.1-70B-Instruct", "provider_id": "fireworks" }, "type": "metric", "metric": "completion_tokens", "value": 369, "unit": "tokens" }, { "trace_id": "kCZwO3tyQC-FuAGb", "span_id": "bsP_5a5O", "timestamp": "2025-03-11T16:47:38.549457Z", "attributes": { "model_id": "meta-llama/Llama-3.1-70B-Instruct", "provider_id": "fireworks" }, "type": "metric", "metric": "total_tokens", "value": 379, "unit": "tokens" } ], "completion_message": { "role": "assistant", "content": "Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. Continents: Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. Countries: There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. Cities and towns: Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. Rural areas: Some humans live in rural areas, such as villages, farms, and countryside.\n5. Islands: Humans inhabit many islands around the world, including those in the Pacific, Indian, and Atlantic Oceans.\n6. Mountains and highlands: Humans live in mountainous regions, such as the Himalayas, the Andes, and the Rocky Mountains.\n7. Deserts: Some humans live in desert regions, such as the Sahara, the Mojave, and the Atacama.\n8. Coastal areas: Many humans live in coastal areas, such as beaches, ports, and coastal cities.\n9. Underwater habitats: A few humans live in underwater habitats, such as research stations and submarines.\n10. Space: A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nOverall, humans can be found living in almost every environment on Earth, from the frozen tundra to the hottest deserts, and from the highest mountains to the deepest oceans.", "stop_reason": "end_of_turn", "tool_calls": [] }, "logprobs": null } ``` Orignal repro no longer showing any error: ``` LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml python -m examples.agents.e2e_loop_with_client_tools localhost 8321 ``` client logs: https://gist.github.com/dineshyv/047c7e87b18a5792aa660e311ea53166 server logs: https://gist.github.com/dineshyv/97a2174099619e9916c7c490be26e559	2025-03-12 12:01:03 -07:00
Xi Yan	c7139b0b67	fix: fix precommit (#1594 ) # What does this PR do? - fix precommit [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan CI [//]: # (## Documentation)	2025-03-12 11:59:21 -07:00
Botao Chen	90ca4d94de	fix: fix passthrough inference provider to make it work for agent (#1577 ) ## What does this PR do? We noticed that the passthrough inference provider doesn't work agent due to the type mis-match between client and server. We manually cast the llama stack client type to llama stack server type to fix the issue. ## test run `python -m examples.agents.hello localhost 8321` within llama-stack-apps <img width="1073" alt="Screenshot 2025-03-11 at 8 43 44 PM" src="https://github.com/user-attachments/assets/bd1bdd31-606a-420c-a249-95f6184cc0b1" /> fix https://github.com/meta-llama/llama-stack/issues/1560	2025-03-12 11:16:17 -07:00
Botao Chen	0b0be70605	feat: Add open benchmark template codegen (#1579 ) ## What does this PR do? As title, add codegen for open-benchmark template ## test checked the new generated run.yaml file and it's identical before and after the change Also add small improvement to together template so that missing TOGETHER_API_KEY won't crash the server which is the consistent user experience as other remote providers	2025-03-12 11:12:08 -07:00
Charlie Doern	4eee349acd	fix: respect log_level in uvicorn and third party libs (#1524 ) # What does this PR do? uvicorn has a `log_level` arg in uvicorn.run, pass in the effective level set by the logger. Additionally, third party libraries like httpx are using our logging format, but not honoring our log level. This seems unintended, so loop through all items in the loggerDict and apply the same log level as what we have set. ## Test Plan before: ``` llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml Environment variable LLAMA_STACK_LOGGING found: all=warn Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv + python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321 Environment variable LLAMA_STACK_LOGGING found: all=warn WARNING 2025-03-10 16:05:49,706 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will not work correctly. INFO 2025-03-10 16:05:49,916 datasets:54 uncategorized: PyTorch version 2.5.1 available. INFO 2025-03-10 16:05:50,010 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-03-10 16:05:50,297 httpx:1740 uncategorized: HTTP Request: POST http://localhost:11434/api/pull "HTTP/1.1 200 OK" INFO 2025-03-10 16:05:50,314 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/tags "HTTP/1.1 200 OK" INFO: Started server process [89663] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` after: ``` llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml Environment variable LLAMA_STACK_LOGGING found: all=warn Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv + python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321 Environment variable LLAMA_STACK_LOGGING found: all=warn WARNING 2025-03-10 16:05:20,429 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will not work correctly. INFO 2025-03-10 16:05:20,639 datasets:54 uncategorized: PyTorch version 2.5.1 available. ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-12 11:07:28 -07:00
Nathan Weinberg	00da911167	ci: run unit tests on all supported python versions (#1575 ) # What does this PR do? python unit tests running via GitHub Actions were only running with python 3.10 the project supports all python versions greater than or equal to 3.10 this commit adds 3.11, 3.12, and 3.13 to the test matrix for better coverage and confidence for non-3.10 users ## Test Plan All tests pass locally with python 3.11, 3.12, and 3.13 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-12 09:55:11 -07:00
Ihar Hrachyshka	b1a9b4cfa8	chore: Expand mypy exclusions list (#1543 ) # What does this PR do? Expand the mypy exclude list. It will be easier to enable typing checks for specific modules if we have an explicit list of violators that we can reduce over time, item by item. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan pre-commit passes. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-12 09:53:04 -07:00
ehhuang	59dddafd12	feat: convert typehints from client_tool to litellm format (#1565 ) Summary: supports https://github.com/meta-llama/llama-stack-client-python/pull/193 Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-11 20:02:11 -07:00
LESSuseLESS	2370e826bc	test: adding an e2e test for measuring TTFT (#1568 ) # What does this PR do? TTFT number largely depends on input length. Ideally we have a "standard" test that we can use to measure against any llama stack serving. TODO: Once JSON is replaced with YAML, I will add "notes" for each test to explain purpose of each test in place. ## Test plan Please refer to e2e test doc for setup. ``` LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \ --text-model="meta-llama/Llama-3.2-3B-Instruct" \ tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling ```	2025-03-11 14:41:55 -07:00
Josh Salomon	5f90be5388	fix: Fixed bad file name in inline::localfs (#1358 ) Bug https://github.com/meta-llama/llama-stack/issues/1357 # What does this PR do? Fix a bug of a wrong file name in inline::localfs datasetio provider [//]: # (If resolving an issue, uncomment and update the line below) # (Closes #1357) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Josh Salomon <jsalomon@redhat.com>	2025-03-11 12:46:11 -07:00
Xi Yan	43044f29e2	fix: fix llama stack run with missing agent impl (#1559 ) # What does this PR do? - recent merge https://github.com/meta-llama/llama-stack/pull/1410 introduce error ``` ValueError: Provider meta-reference (Api.agents) does not implement the following methods: [('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')] ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` llama stack run ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct ``` `1379530386` [//]: # (## Documentation)	2025-03-11 11:22:22 -07:00
Dinesh Yeduguru	85501ed875	fix: remove Llama-3.2-1B-Instruct for fireworks (#1558 ) # What does this PR do? remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to be hosted on website. ## Test Plan python distro_codegen.py	2025-03-11 11:19:29 -07:00
Nathan Weinberg	275bab1373	test: loosen Python 3.10 version for unit tests (#1547 ) # What does this PR do? as I brought up in #1515 it shouldn't be nessessary to tie the unit test runner to an exact z-stream of Python 3.10 updated so unit test runner always uses latest z-stream of Python 3.10 ## Test Plan ```shell $ uv run -p 3.10 --with-editable . --with-editable ".[dev]" --with-editable ".[unit]" pytest --cov=llama_stack -s -v tests/unit/ --junitxml=pytest-report.xml ``` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-11 11:11:32 -07:00
Charlie Doern	b647ecd9ed	feat: add support for LLAMA_STACK_LOG_FILE (#1450 ) # What does this PR do? setting $LLAMA_STACK_LOG_FILE will pipe the logs to a file as well as stdout. this is done by using a logging FileHandler Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-11 11:09:31 -07:00
Sébastien Han	83a2c78615	feat(api): list agents / sessions and get agent (#1410 ) # What does this PR do? Add support for listing agents, describing an agent, and retrieving session IDs for a given agent. This is only the API definition, the implementations will come separately. Closes: https://github.com/meta-llama/llama-stack/issues/1294 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-11 10:33:46 -07:00
Ihar Hrachyshka	aca82df7ed	fix: Multiple fixes for server shutdown (fix lifespan handling; fix handling CancelledError when raised by provider; let uvicorn handle signals) (#1495 ) # What does this PR do? If implementation raises CancelledError (e.g. when it runs its own async loop for jobs), the main server shutdown handler gets confused and doesn't attempt to shut down the main loop tasks. While at it, also fixing the following failure when this happens: ``` UnboundLocalError: cannot access local variable 'loop' where it is not associated with a value ``` Shutdown handlers were not running because lifespan logic was broken since ~Oct 2024. Fixed that too and enforcing `lifespan` now (making sure server will crash when it fails to interact with app through middleware). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Spotted while working on https://github.com/meta-llama/llama-stack/pull/1437 One way to trigger it without the PR above is to add `raise CancelledError` in any of the running providers' `shutdown` methods; then `kill -INT <pid>` the server process. Validated this with the following test patch: ``` diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py index b85c463a..10dad83e 100644 --- a/llama_stack/distribution/server/server.py +++ b/llama_stack/distribution/server/server.py @@ -174,6 +174,7 @@ def handle_signal(app, signum, _) -> None: except asyncio.CancelledError: pass finally: + logger.info("Stopping event loop") loop.stop() loop = asyncio.get_running_loop() diff --git a/llama_stack/providers/inline/post_training/torchtune/post_training.py b/llama_stack/providers/inline/post_training/torchtune/post_training.py index b837362d..163f43d8 100644 --- a/llama_stack/providers/inline/post_training/torchtune/post_training.py +++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py @@ -3,6 +3,7 @@ # # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +import asyncio from datetime import datetime from typing import Any, Dict, Optional @@ -43,6 +44,9 @@ class TorchtunePostTrainingImpl: self.jobs = {} self.checkpoints_dict = {} + async def shutdown(self) -> None: + raise asyncio.CancelledError("Shutdown") + async def supervised_fine_tune( self, job_uuid: str, ``` Without the fix: ``` INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO: Shutting down INFO: Finished server process [52099] INFO 2025-03-07 23:25:33,548 __main__:143 server: Received signal SIGINT (2). Exiting gracefully... INFO 2025-03-07 23:25:33,550 __main__:150 server: Shutting down DatasetsRoutingTable INFO 2025-03-07 23:25:33,551 __main__:177 server: Stopping event loop ERROR 2025-03-07 23:25:33,552 asyncio:1785 uncategorized: unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-12' coro=<handle_signal.<locals>.shutdown() done, defined at /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:145> exception=UnboundLocalError("cannot access local variable 'loop' where it is not associated with a value")> ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:178 in shutdown │ │ │ │ 175 │ │ │ pass │ │ 176 │ │ finally: │ │ 177 │ │ │ logger.info("Stopping event loop") │ │ ❱ 178 │ │ │ loop.stop() │ │ 179 │ │ │ 180 │ loop = asyncio.get_running_loop() │ │ 181 │ loop.create_task(shutdown()) │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ UnboundLocalError: cannot access local variable 'loop' where it is not associated with a value ``` With the fix, now seeing the following messages when the server is killed: ``` INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO: Shutting down INFO: Finished server process [50836] INFO 2025-03-07 23:20:35,182 __main__:143 server: Received signal SIGINT (2). Exiting gracefully... INFO 2025-03-07 23:20:35,184 __main__:149 server: Shutting down DatasetsRoutingTable ERROR 2025-03-07 23:20:35,185 __main__:158 server: Failed to shutdown DatasetsRoutingTable: {CancelledError()} ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ /usr/lib64/python3.11/asyncio/tasks.py:476 in wait_for │ │ │ │ 473 │ try: │ │ 474 │ │ # wait until the future completes or the timeout │ │ 475 │ │ try: │ │ ❱ 476 │ │ │ await waiter │ │ 477 │ │ except exceptions.CancelledError: │ │ 478 │ │ │ if fut.done(): │ │ 479 │ │ │ │ return fut.result() │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ CancelledError During handling of the above exception, another exception occurred: ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │ │ │ │ 149 │ │ │ logger.info("Shutting down %s", impl_name) │ │ 150 │ │ │ try: │ │ 151 │ │ │ │ if hasattr(impl, "shutdown"): │ │ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │ │ 153 │ │ │ │ else: │ │ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │ │ 155 │ │ │ except asyncio.TimeoutError: │ │ │ │ /usr/lib64/python3.11/asyncio/tasks.py:479 in wait_for │ │ │ │ 476 │ │ │ await waiter │ │ 477 │ │ except exceptions.CancelledError: │ │ 478 │ │ │ if fut.done(): │ │ ❱ 479 │ │ │ │ return fut.result() │ │ 480 │ │ │ else: │ │ 481 │ │ │ │ fut.remove_done_callback(cb) │ │ 482 │ │ │ │ # We must ensure that the task is not running │ │ │ │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/routers/routing_tables.py:131 in shutdown │ │ │ │ 128 │ │ │ elif api == Api.tool_runtime: │ │ 129 │ │ │ │ p.tool_store = self │ │ 130 │ │ │ ❱ 131 │ async def shutdown(self) -> None: │ │ 132 │ │ for p in self.impls_by_provider_id.values(): │ │ 133 │ │ │ await p.shutdown() │ │ 134 │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ CancelledError INFO 2025-03-07 23:20:35,295 __main__:149 server: Shutting down DatasetIORouter INFO 2025-03-07 23:20:35,296 __main__:149 server: Shutting down ScoringFunctionsRoutingTable INFO 2025-03-07 23:20:35,297 __main__:149 server: Shutting down ScoringRouter INFO 2025-03-07 23:20:35,298 __main__:149 server: Shutting down ModelsRoutingTable INFO 2025-03-07 23:20:35,299 __main__:149 server: Shutting down InferenceRouter INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down ShieldsRoutingTable INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down SafetyRouter INFO 2025-03-07 23:20:35,301 __main__:149 server: Shutting down VectorDBsRoutingTable INFO 2025-03-07 23:20:35,302 __main__:149 server: Shutting down VectorIORouter INFO 2025-03-07 23:20:35,303 __main__:149 server: Shutting down ToolGroupsRoutingTable INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down ToolRuntimeRouter INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down MetaReferenceAgentsImpl INFO 2025-03-07 23:20:35,305 __main__:149 server: Shutting down TelemetryAdapter INFO 2025-03-07 23:20:35,306 __main__:149 server: Shutting down TorchtunePostTrainingImpl ERROR 2025-03-07 23:20:35,307 __main__:158 server: Failed to shutdown TorchtunePostTrainingImpl: {CancelledError('Shutdown')} ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮ │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │ │ │ │ 149 │ │ │ logger.info("Shutting down %s", impl_name) │ │ 150 │ │ │ try: │ │ 151 │ │ │ │ if hasattr(impl, "shutdown"): │ │ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │ │ 153 │ │ │ │ else: │ │ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │ │ 155 │ │ │ except asyncio.TimeoutError: │ │ │ │ /usr/lib64/python3.11/asyncio/tasks.py:489 in wait_for │ │ │ │ 486 │ │ │ │ raise │ │ 487 │ │ │ │ 488 │ │ if fut.done(): │ │ ❱ 489 │ │ │ return fut.result() │ │ 490 │ │ else: │ │ 491 │ │ │ fut.remove_done_callback(cb) │ │ 492 │ │ │ # We must ensure that the task is not running │ │ │ │ /home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/post_training. │ │ py:48 in shutdown │ │ │ │ 45 │ │ self.checkpoints_dict = {} │ │ 46 │ │ │ 47 │ async def shutdown(self) -> None: │ │ ❱ 48 │ │ raise asyncio.CancelledError("Shutdown") │ │ 49 │ │ │ 50 │ async def supervised_fine_tune( │ │ 51 │ │ self, │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ CancelledError: Shutdown INFO 2025-03-07 23:20:35,352 __main__:149 server: Shutting down BenchmarksRoutingTable INFO 2025-03-07 23:20:35,353 __main__:149 server: Shutting down EvalRouter INFO 2025-03-07 23:20:35,354 __main__:149 server: Shutting down DistributionInspectImpl INFO 2025-03-07 23:20:35,355 __main__:177 server: Stopping event loop Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 488, in <module> main() File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 476, in main uvicorn.run(*uvicorn_config) File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/main.py", line 579, in run server.run() File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/server.py", line 66, in run return asyncio.run(self.serve(sockets=sockets)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/asyncio/runners.py", line 189, in run with Runner(debug=debug) as runner: File "/usr/lib64/python3.11/asyncio/runners.py", line 63, in __exit__ self.close() File "/usr/lib64/python3.11/asyncio/runners.py", line 71, in close _cancel_all_tasks(loop) File "/usr/lib64/python3.11/asyncio/runners.py", line 201, in _cancel_all_tasks loop.run_until_complete(tasks.gather(to_cancel, return_exceptions=True)) File "/usr/lib64/python3.11/asyncio/base_events.py", line 652, in run_until_complete raise RuntimeError('Event loop stopped before Future completed.') RuntimeError: Event loop stopped before Future completed. ++ error_handler 104 ++ echo 'Error occurred in script at line: 104' Error occurred in script at line: 104 ++ exit 1 ``` With all patches included, the shutdown now looks as follows: ``` $ kill -INT $(ps ax \| grep llama_stack.distribution.server.server \| grep -v nvim \| awk -e '{print $1}' \| sort \| head -n 1) ``` ``` 20:56:09.308 [START] INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO: Shutting down INFO: Waiting for application shutdown. INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl INFO: Application shutdown complete. INFO: Finished server process [33862] ``` [//]: # (## Documentation) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-11 10:30:55 -07:00
Kelly Brown	d33b8ea3dc	docs: Small nits in llama CLI reference (#1542 ) Description: Fixes some small nits in the llama CLI reference Note: There are a few nits in this PR, but also has some small suggestions, feel free to close if not necessary	2025-03-11 10:12:18 -07:00
Ihar Hrachyshka	c3d7d17bc4	chore: fix typing hints for get_provider_impl deps arguments (#1544 ) # What does this PR do? It's a dict that may contain different types, as per resolver:instantiate_provider implementation. (AFAIU it also never contains ProviderSpecs, but instances of provider implementations.) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan mypy passing if enabled checks for these modules. (See #1543) [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-11 10:07:28 -07:00
Ihar Hrachyshka	04106b94aa	docs: Remove duplicate docs on api docs generator (#1534 ) # What does this PR do? Since #892, we also need to install ruamel. Instead of maintaining the list of script dependencies in multiple places, remove it and assume developers read CONTRIBUTING.md docs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Just docs. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-11 10:01:46 -07:00
Ihar Hrachyshka	0e73186a11	fix: Add missing shutdown handler for TorchtunePostTrainingImpl (#1535 ) # What does this PR do? Added missing shutdown handler. (Currently empty.) Without it, when server shuts down, it posts the following warning: ``` __main__:129 server: No shutdown method for TorchtunePostTrainingImpl ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan (The test plan assumes shutdown logic is fixed, see #1495) Without the patch: ``` INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO: Shutting down INFO: Waiting for application shutdown. INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl INFO: Application shutdown complete. INFO: Finished server process [33862] ``` Run with the patch and observe no warning: ``` $ kill -INT $(ps ax \| grep llama_stack.distribution.server.server \| grep -v nvim \| awk -e '{print $1}' \| sort \| head -n 1) ``` ``` INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO: Shutting down INFO: Waiting for application shutdown. INFO 2025-03-11 00:32:56,863 __main__:140 server: Shutting down INFO 2025-03-11 00:32:56,864 __main__:124 server: Shutting down DatasetsRoutingTable INFO 2025-03-11 00:32:56,866 __main__:124 server: Shutting down DatasetIORouter INFO 2025-03-11 00:32:56,867 __main__:124 server: Shutting down ScoringFunctionsRoutingTable INFO 2025-03-11 00:32:56,868 __main__:124 server: Shutting down ScoringRouter INFO 2025-03-11 00:32:56,869 __main__:124 server: Shutting down ModelsRoutingTable INFO 2025-03-11 00:32:56,870 __main__:124 server: Shutting down InferenceRouter INFO 2025-03-11 00:32:56,871 __main__:124 server: Shutting down ShieldsRoutingTable INFO 2025-03-11 00:32:56,872 __main__:124 server: Shutting down SafetyRouter INFO 2025-03-11 00:32:56,873 __main__:124 server: Shutting down VectorDBsRoutingTable INFO 2025-03-11 00:32:56,874 __main__:124 server: Shutting down VectorIORouter INFO 2025-03-11 00:32:56,875 __main__:124 server: Shutting down ToolGroupsRoutingTable INFO 2025-03-11 00:32:56,876 __main__:124 server: Shutting down ToolRuntimeRouter INFO 2025-03-11 00:32:56,877 __main__:124 server: Shutting down MetaReferenceAgentsImpl INFO 2025-03-11 00:32:56,878 __main__:124 server: Shutting down TelemetryAdapter INFO 2025-03-11 00:32:56,879 __main__:124 server: Shutting down TorchtunePostTrainingImpl INFO 2025-03-11 00:32:56,880 __main__:124 server: Shutting down BenchmarksRoutingTable INFO 2025-03-11 00:32:56,881 __main__:124 server: Shutting down EvalRouter INFO 2025-03-11 00:32:56,882 __main__:124 server: Shutting down DistributionInspectImpl ``` [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-11 10:01:09 -07:00
Ashwin Bharambe	e13c92f269	revert: feat(server): Use system packages for execution (#1551 ) Reverts meta-llama/llama-stack#1252 The above PR breaks the following invocation: ```bash llama stack run ~/.llama/distributions/together/together-run.yaml ```	2025-03-11 09:58:25 -07:00
Dinesh Yeduguru	ead9397e22	fix: tracing fixes for trace context propogation across coroutines (#1522 ) # What does this PR do? This PR has two fixes needed for correct trace context propagation across asycnio boundary Fix 1: Start using context vars to store the global trace context. This is needed since we cannot use the same trace context across coroutines since the state is shared. each coroutine should have its own trace context so that each of it can start storing its state correctly. Fix 2: Start a new span for each new coroutines started for running shields to keep the span tree clean ## Test Plan ### Integration tests with server LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct server logs: https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe test logs: https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3 ### Integration tests with library client LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8 ### Apps test with server: ``` LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml python -m examples.agents.e2e_loop_with_client_tools localhost 8321 ``` server logs: https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797 app logs: https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8	2025-03-11 07:12:48 -07:00
Botao Chen	e3edca7739	feat: [new open benchmark] Math 500 (#1538 ) ## What does this PR do? Created a new math_500 open-benchmark based on OpenAI's [Let's Verify Step by Step](https://arxiv.org/abs/2305.20050) paper and hugging face's [HuggingFaceH4/MATH-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500) dataset. The challenge part of this benchmark is to parse the generated and expected answer and verify if they are same. For the parsing part, we refer to [Minerva: Solving Quantitative Reasoning Problems with Language Models](https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/). To simply the parse logic, as the next step, we plan to also refer to what [simple-eval](https://github.com/openai/simple-evals) is doing, using llm as judge to check if the generated answer matches the expected answer or not ## Test Plan on sever side, spin up a server with open-benchmark template `llama stack run llama_stack/templates/open-benchamrk/run.yaml` on client side, issue an open benchmark eval request `llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-math-500" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20` and get ther aggregated eval results <img width="238" alt="Screenshot 2025-03-10 at 7 57 04 PM" src="https://github.com/user-attachments/assets/2c9da042-3b70-470e-a7c4-69f4cc24d1fb" /> check the generated answer and the related scoring and they make sense	2025-03-10 20:38:28 -07:00
Courtney Pacheco	ff853ccc38	fix: Use `--with-editable` to capture accurate code coverage reporting (#1532 ) # What does this PR do? I created a PR earlier today, but I realized the code coverage reporting isn't correct: #1512 Essentially, we need to use `--with-editable` to enable develop/editable mode through `uv`. Using editable mode will create a package.egg-link file, and that allows pytest to accurately capture code coverage. Before, some files had "0%" or "100%" coverage, which isn't accurate: <img width="1455" alt="Screenshot 2025-03-10 at 10 01 53 AM" src="https://github.com/user-attachments/assets/c425515a-9ecd-4962-a2d4-18cd16d12f25" /> More info on `--with-editable`: https://docs.astral.sh/uv/reference/cli/#uv-run--with-editable [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Tested locally <img width="775" alt="Screenshot 2025-03-10 at 7 00 14 PM" src="https://github.com/user-attachments/assets/31141318-5cf6-4666-8676-b5d8c8d2e719" /> Screenshot from CI: <img width="1000" alt="Screenshot 2025-03-10 at 7 07 57 PM" src="https://github.com/user-attachments/assets/47092909-ff8d-4e97-80dc-2a16d948405a" /> [//]: # (## Documentation) Signed-off-by: Courtney Pacheco <6019922+courtneypacheco@users.noreply.github.com>	2025-03-10 19:30:28 -04:00
Ashwin Bharambe	dc84bc755a	fix: revert to using faiss for ollama distro (#1530 ) This is unfortunate because `sqlite-vec` seems promising. But its PIP package is not quite complete. It does not have binary for arm64 (I think, or maybe it even lacks 64 bit builds?) which results in the arm64 container resulting in ``` File "/usr/local/lib/python3.10/site-packages/sqlite_vec/init.py", line 17, in load conn.load_extension(loadable_path()) sqlite3.OperationalError: /usr/local/lib/python3.10/site-packages/sqlite_vec/vec0.so: wrong ELF class: ELFCLASS32 ``` To get around I tried to install from source via `uv pip install sqlite-vec --no-binary=sqlite-vec` however it even lacks a source distribution which makes that impossible. ## Test Plan Build the container locally using: ```bash LLAMA_STACK_DIR=. llama stack build --template ollama --image-type container ``` Run the container as: ``` podman run --privileged -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -v ~/.llama:/root/.llama \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env OLLAMA_URL=http://host.containers.internal:11434 \ -v ~/local/llama-stack:/app/llama-stack-source localhost/distribution-ollama:dev --port $LLAMA_STACK_PORT ``` Verify the container starts up correctly. Without this patch, it would encounter the ELFCLASS32 error.	2025-03-10 16:15:17 -07:00
Sébastien Han	21e39633d8	feat(server): Use system packages for execution (#1252 ) # What does this PR do? Users prefer to rely on the main CLI rather than invoking the server through a Python module. Users interact with a high-level CLI rather than needing to know internal module structures. Now, when running llama stack run <path-to-config>, the server will attempt to use the system package or a virtual environment if one is active. This also eliminates the current process dependency chain when running from a virtual environment: -> llama stack run        -> start_env.sh              -> python -m server... Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run: ``` ollama run llama3.2:3b-instruct-fp16 --keepalive=2m & llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6 ``` Notice that the server starts and shutdowns normally. [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-10 16:01:03 -07:00
Reid	feacf89548	docs: improve integration test doc (#1502 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It should use `export` for env var for api key. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-10 15:50:46 -07:00
Sébastien Han	91b1b92908	build: revamp "test" dependencies from pyproject (#1468 ) # What does this PR do? The `test` section has been updated to include only the essential dependencies needed for running integration tests, which are shared across all providers. If a provider requires additional dependencies, please add them to your environment separately. When using uv to run your tests, you can specify extra dependencies with the `--with` flag. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-10 15:43:16 -07:00
Sébastien Han	201a7567ef	test: add inspect unit test (#1417 ) # What does this PR do? Add unit tests for the inspect endpoint. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan $ ollama run llama3.2:3b-instruct-fp16 --keepalive=60m & $ LLAMA_STACK_CONFIG=./llama_stack/templates/ollama/run.yaml uv run pytest -v -s tests/integration/inspect/test_inspect.py /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================== test session starts ============================================== platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items tests/integration/inspect/test_inspect.py::TestInspect::test_health[txt=8B] PASSED tests/integration/inspect/test_inspect.py::TestInspect::test_version[txt=8B] PASSED ========================================= 2 passed, 3 warnings in 2.26s =================================== ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-10 15:36:18 -07:00
Charlie Doern	7559b4055e	chore: add color to Env Variable message (#1525 ) # What does this PR do? currently the `"Environment variable LLAMA_STACK_LOGGING found"` message is printed with no color switch to cprint and highlight in yellow for visibility Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-10 15:29:40 -07:00
Ihar Hrachyshka	a64021bb47	fix: Disable async loop warning messages during test run (#1526 ) # What does this PR do? The test class by default enables debug mode, which produces some unexpected warnings like: ``` tests/unit/models/test_prompt_adapter.py::PrepareMessagesTests::test_completion_message_encoding WARNING 2025-03-10 20:41:48,577 asyncio:1904 uncategorized: Executing <Task pending name='Task-1' coro=<IsolatedAsyncioTestCase._asyncioLoopRunner() running at /home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:95 > wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py:42 9> created at /home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:11 7> took 0.231 seconds PASSED ``` I suggest we disable these since they are not very useful and can confuse other developers. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run tests. The warnings are no longer seen. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-10 15:29:08 -07:00
ehhuang	0e3c0cf8de	fix: server logging (#1521 ) Summary: Test Plan: ERROR 2025-03-10 10:53:00,804 __main__:239 server: Error executing endpoint route='/v1/inference/chat-completion' method='post'	2025-03-10 15:25:23 -07:00
Sarthak Deshpande	921f8b1125	chore: Together async client (#1510 ) # What does this PR do? Uses together async client instead of sync client [//]: # (If resolving an issue, uncomment and update the line below) ## Test Plan Command to run the test is in the image below(2 tests fail, and they were failing for the old stable version as well with the same errors.) <img width="1689" alt="image" src="https://github.com/user-attachments/assets/503db720-5379-425d-9844-0225010e41a1" /> [//]: # (## Documentation) --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-10 15:25:01 -07:00
Ashwin Bharambe	bc8daf7fea	fix: include jinja2 as a core llama-stack dependency (#1529 ) We removed `llama-models` as a dep which was pulling this in for us previously. This did not get caught in the release process because the distros we use for testing (fireworks / together) pull that in via sentence transformers which we don't use in all distros (notably ollama.) See #1511 ## Test Plan Ran `llama-stack-ops/actions/test-and-cut/main.sh` with `ONLY_TEST_DONT_CUT=1 COMMIT_ID=origin/fix_jinja2` and by making it build the ollama docker. Ran the docker to ensure it does not error out with jinja2 dependency error. (Unfortunately there is another error with sqlite_vec there.)	2025-03-10 14:59:11 -07:00
James Kunstle	735892cbd2	refactor: `ImageType` to `LlamaStackImageType` (#1500 ) This disambiguates "Image" term from "container image" alternative usage and allows for: ```python if image_type == LlamaStackImagetype.venv: ... ``` accesses rather than `ImageType.venv.value` # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Changes enum use to comply with semantic python styling and naming conventions. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Refactor was automated and small so simple run-through of creating images was done. Signed-off-by: James Kunstle <jkunstle@redhat.com>	2025-03-10 17:12:53 -04:00
Courtney Pacheco	6dbac3beed	chore: Display code coverage for unit tests in PR builds (#1512 ) # What does this PR do? This PR allows for unit test code coverage % to be reported in PR builds. Currently, today's output tells the end user which tests passed and which tests failed: <img width="744" alt="Screenshot 2025-03-10 at 9 44 28 AM" src="https://github.com/user-attachments/assets/40b1a578-951f-4b74-8a37-a39c039b1d7e" /> If a contributor is creating a new module within Llama Stack and starts writing unit tests for that module, it might be difficult for Llama Stack maintainers to immediately determine the code coverage percentage for that new module. To allow for code coverage reporting in the CI, we simply need to install `pytest-cov` so we can use the `--cov` flag with the existing `pytest` command. Ideally, it would be nicer to have a bot report code coverage, but this PR can be a temporary solution. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I ran these changes locally: <img width="1455" alt="Screenshot 2025-03-10 at 10 01 53 AM" src="https://github.com/user-attachments/assets/dfd765c6-5979-42a3-b899-7713a3f202e6" /> PR build to confirm the expected behavior: <img width="1326" alt="Screenshot 2025-03-10 at 12 47 36 PM" src="https://github.com/user-attachments/assets/fe94f1e6-fbb5-4e57-9902-197502c50621" /> [//]: # (## Documentation) Signed-off-by: Courtney Pacheco <6019922+courtneypacheco@users.noreply.github.com>	2025-03-10 16:27:33 -04:00
Reid	0b8cb830b9	docs: update ollama doc url (#1508 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It should changed in this pr https://github.com/meta-llama/llama-stack/pull/1190/files#diff-53e3f35ced54ee5e57dc8b0d3b04770ed84f2f6434c6f492f42569b3c2810ecd [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-10 13:04:59 -07:00
Xi Yan	23278d1e5d	fix: update getting_started structured decoding cell (#1523 ) # What does this PR do? - Together's inference only supports 3.1 for structured decoding [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb ``` [//]: # (## Documentation)	2025-03-10 13:03:57 -07:00
Reid	8814111da1	docs: improve eval doc (#1501 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-10 11:38:07 -07:00
ehhuang	d045b8830f	docs: update prompt for websearch example (#1520 ) Summary: model is sometimes reluctant to use tools by default. Test Plan: run in notebook	2025-03-10 10:42:05 -07:00
Sarthak Deshpande	a9c5d3cd3d	chore: made inbuilt tools blocking calls into async non blocking calls (#1509 ) # What does this PR do? This PR converts blocking calls for in built tools like wolfram, brave, tavily and bing into non blocking async calls [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] pytest -s -v tool_runtime/test_builtin_tools.py --stack-config=together --text-model=meta-llama/Llama-3.1-8B-Instruct Used the command above to get the below results <img width="1710" alt="image" src="https://github.com/user-attachments/assets/76b0ca06-f6e4-45fa-a114-0449bef2325b" /> <img width="1389" alt="image" src="https://github.com/user-attachments/assets/5220ccbb-7882-4240-b17e-f362ad46d25b" /> <img width="1432" alt="image" src="https://github.com/user-attachments/assets/bb93a41e-e82a-4c98-a22d-6b0e320aa974" /> [//]: # (## Documentation) --------- Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>	2025-03-09 16:59:24 -07:00
Ashwin Bharambe	70ff226b6a	fix(library_client): ensure pending asyncio tasks like generator athrow are executed	2025-03-09 16:17:27 -07:00
Ashwin Bharambe	ba917a9c48	fix: make sure readthedocs is triggered if pyproject.toml is updated	2025-03-08 23:05:10 -08:00
Ashwin Bharambe	205661bc78	fix: Use re-entrancy and concurrency safe context managers for provider data (#1498 ) Concurrent requests should not trample (or reuse) each others' provider data. Provider data should be scoped to each request. ## Test Plan Set the uvicorn server to have a single worker process + thread by updating the config: ```python uvicorn_config = { ... "workers": 1, "loop": "asyncio", } ``` Then perform the following steps on `origin/main` (without this change). (1) Run the server using `llama stack run dev` without having `FIREWORKS_API_KEY` in the environment. (2) Run a test by specifying the FIREWORKS_API_KEY env var so it gets stored in the thread local ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 \ --text-model accounts/fireworks/models/llama-v3p1-8b-instruct \ -k test_text_chat_completion_with_tool_calling_and_streaming \ --env FIREWORKS_API_KEY=<...> ``` Ensure you don't have any other API keys in the environment (otherwise the bug will not reproduce due to other specifics in our testing code.) Verify this works. (3) Run the same command again without specifying FIREWORKS_API_KEY. See that the request actually succeeds when it should have failed. ---- Now do the same tests on this branch, verify step (3) results in failure. Finally, run the full `test_text_inference.py` test suite with this change, verify it succeeds.	2025-03-08 22:56:30 -08:00
Yuan Tang	6033e6893e	docs: Add v0.1.6 release notes to changelog (#1506 ) # What does this PR do? Adds v0.1.6 release notes to changelog. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-08 16:20:08 -08:00
Ashwin Bharambe	0db3a2f511	fix: run pre-commit due to release script bumps	2025-03-07 16:31:42 -08:00
github-actions[bot]	c4e527b21c	Bump version to 0.1.6	2025-03-08 00:25:40 +00:00
ehhuang	23e39cc3c4	fix: handle log errors (#1499 ) Summary: \| File "/Users/erichuang/projects/llama-stack/llama_stack/distribution/server/server.py", line 213, in sse_generator \| logger.exception(f"Error in sse_generator: {e}") \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1864, in exception \| self.log(ERROR, msg, args, exc_info=exc_info, kwargs) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1879, in log \| self.logger.log(level, msg, args, kwargs) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1547, in log \| self._log(level, msg, args, kwargs) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1624, in _log \| self.handle(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1634, in handle \| self.callHandlers(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 1696, in callHandlers \| hdlr.handle(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py", line 968, in handle \| self.emit(record) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py", line 167, in emit \| message_renderable = self.render_message(record, message) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py", line 193, in render_message \| message_text = Text.from_markup(message) if use_markup else Text(message) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py", line 287, in from_markup \| rendered_text = render(text, style, emoji=emoji, emoji_variant=emoji_variant) \| File "/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py", line 167, in render \| raise MarkupError( \| rich.errors.MarkupError: closing tag '[/INST]' at position 105 doesn't match any open tag Test Plan: reran failing rag_with_vector_db example	2025-03-07 15:58:26 -08:00
Botao Chen	ade76e4a69	fix: update the open benchmark eval doc (#1497 ) ## What does this PR do? add proper links to the doc ## test preview the doc <img width="1304" alt="Screenshot 2025-03-07 at 3 03 22 PM" src="https://github.com/user-attachments/assets/0a0e2a3d-2420-4af0-99c3-a4786855fae0" /> <img width="1303" alt="Screenshot 2025-03-07 at 3 03 32 PM" src="https://github.com/user-attachments/assets/e11844e7-ee8a-4a64-8617-abafa02b2868" />	2025-03-07 15:05:27 -08:00
Botao Chen	89e449c2cb	fix: Fix open benchmark template (#1496 ) ## What does this PR do? Delete the open_benchmark template which was generated by the auto codegen by accident	2025-03-07 14:49:10 -08:00
dependabot[bot]	d63e798f6d	build(deps): bump thollander/actions-comment-pull-request from 2 to 3 (#1485 )	2025-03-07 17:31:53 -05:00
dependabot[bot]	9506012736	build(deps): bump actions/upload-artifact from 3 to 4 (#1486 )	2025-03-07 17:31:00 -05:00
Xi Yan	9028407386	fix: clean up detailed history for CHANGELOG (#1494 ) # What does this PR do? - do not dump all commit history in CHANGELOG cc @terrytangyuan [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python scripts/gen-changelog.py ``` [//]: # (## Documentation)	2025-03-07 14:03:54 -08:00
ehhuang	3b4f3a6b15	test: update recorded fixtures (#1493 ) Summary: Test Plan:	2025-03-07 13:58:38 -08:00
ehhuang	b0cc38b269	test: fix recordable mocks cache key (#1492 ) Summary: CI writes files to /tmp [{"__module__": "llama_stack.apis.inference.inference", "__pydantic__": "SystemMessage", "data": {"content": "You are a helpful assistant", "role": "system"}}, {"__module__": "llama_stack.apis.inference.inference", "__pydantic__": "UserMessage", "data": {"content": "Here is a csv file, can you describe it?", "context": null, "role": "user"}}, {"__module__": "llama_stack.apis.inference.inference", "__pydantic__": "ToolResponseMessage", "data": {"call_id": "", "content": [{"text": "# User provided a file accessible to you at \\"/tmp/tmp7k7dg6qk/gcDtT5M8inflation.csv\\"\\nYou can use code_interpreter to load and inspect it.", "type": "text"}], "role": "tool", "tool_name": {"__enum__": "BuiltinTool", "__module__": "llama_stack.models.llama.datatypes", "value": "code_interpreter"}}}]], {"response_format": null, "sa Test Plan:	2025-03-07 13:45:25 -08:00
ehhuang	a1cdace093	test: image downloading is flaky (#1491 ) Summary: Test Plan:	2025-03-07 13:39:26 -08:00
Fred Reiss	a8d0cdaf37	feat: updated inline vllm inference provider (#880 ) # What does this PR do? This PR updates the inline vLLM inference provider in several significant ways: * Models are now attached at run time to instances of the provider via the `.../models` API instead of hard-coding the model's full name into the provider's YAML configuration. * The provider supports models that are not Meta Llama models. Any model that vLLM supports can be loaded by passing Huggingface coordinates in the "provider_model_id" field. Custom fine-tuned versions of Meta Llama models can be loaded by specifying a path on local disk in the "provider_model_id". * To implement full chat completions support, including tool calling and constrained decoding, the provider now routes the `chat_completions` API to a captive (i.e. called directly in-process, not via HTTPS) instance of vLLM's OpenAI-compatible server . * The `logprobs` parameter and completions API are also working. ## Test Plan Existing tests in `llama_stack/providers/tests/inference/test_text_inference.py` have good coverage of the new functionality. These tests can be invoked as follows: ``` cd llama-stack && pytest \ -vvv \ llama_stack/providers/tests/inference/test_text_inference.py \ --providers inference=vllm \ --inference-model meta-llama/Llama-3.2-3B-Instruct ====================================== test session starts ====================================== platform linux -- Python 3.12.8, pytest-8.3.4, pluggy-1.5.0 -- /mnt/datadisk1/freiss/llama/env/bin/python3.12 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'Linux-6.8.0-1016-ibm-x86_64-with-glibc2.39', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'anyio': '4.8.0', 'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.2'}, 'JAVA_HOME': '/usr/lib/jvm/java-8-openjdk-amd64'} rootdir: /mnt/datadisk1/freiss/llama/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0, html-4.1.1, metadata-3.1.1, asyncio-0.25.2 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 9 items llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[-vllm] PASSED [ 11%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-vllm] PASSED [ 22%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-vllm] PASSED [ 33%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-vllm] PASSED [ 44%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-vllm] PASSED [ 55%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[-vllm] PASSED [ 66%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-vllm] PASSED [ 77%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-vllm] PASSED [ 88%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-vllm] PASSED [100%] =========================== 9 passed, 13 warnings in 97.18s (0:01:37) =========================== ``` ## Sources ## Before submitting - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-07 13:38:23 -08:00
ehhuang	acbae66b9d	chore: escape tool output for logging (#1490 ) Summary: error: llama_stack/providers/inline/agents/meta_reference/agent_instance.py:1032: in execute_tool_call_maybe logger.info(f"tool call {name} completed with result: {result}") /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1841: in info self.log(INFO, msg, args, kwargs) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1879: in log self.logger.log(level, msg, args, kwargs) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1547: in log self._log(level, msg, args, kwargs) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1624: in _log self.handle(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1634: in handle self.callHandlers(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:1696: in callHandlers hdlr.handle(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py:968: in handle self.emit(record) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py:167: in emit message_renderable = self.render_message(record, message) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py:193: in render_message message_text = Text.from_markup(message) if use_markup else Text(message) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py:287: in from_markup rendered_text = render(text, style, emoji=emoji, emoji_variant=emoji_variant) /opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py:167: in render raise MarkupError( E rich.errors.MarkupError: closing tag '[/INST]' at position 3274 doesn't match any open tag Test Plan:	2025-03-07 13:33:45 -08:00
Xi Yan	a55aab5958	fix: fix scoring tests (#1487 ) # What does this PR do? - fix scoring test [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py --text-model meta-llama/Llama-3.3-70B-Instruct --judge-model meta-llama/Llama-3.3-70B-Instruct ``` <img width="1061" alt="image" src="https://github.com/user-attachments/assets/740f9e6e-a654-4265-9db1-61481515a852" /> [//]: # (## Documentation)	2025-03-07 13:13:41 -08:00
Sébastien Han	e6355bfc3b	ci: enable Dependabot for GitHub Actions (#1470 ) # What does this PR do? Add a Dependabot configuration file (.github/dependabot.yml) to enable automated dependency updates for GitHub Actions. This ensures workflows stay up to date with the latest versions, improving security and reliability. Dependabot is configured to: - Monitor GitHub Actions dependencies. - Check for updates in the workflow directory - Run updates on a daily schedule. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-07 12:54:56 -08:00
Xi Yan	5a2b9e121c	fix: return result for together's get_params (#1484 ) # What does this PR do? - return results for together's get_params - fix issue <img width="1538" alt="image" src="https://github.com/user-attachments/assets/c4cd3802-85ef-4ff3-b2fd-76737be2e4ff" /> - the `return params` was accidentally deleted in https://github.com/meta-llama/llama-stack/pull/1362/files#diff-d9345410ea64589cee96487b22eab0d45f7497a80c25dca295cecd254decb204 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` npm test examples ``` [//]: # (## Documentation)	2025-03-07 12:52:26 -08:00
ehhuang	1257288361	build: add 'tiktoken' to deps (#1483 ) Summary: Test Plan:	2025-03-07 12:36:02 -08:00
ehhuang	124e8d7cfe	build: include .md (#1482 ) Summary: Test Plan:	2025-03-07 12:10:52 -08:00
Ben Browning	d86a893ead	fix: Swap to AsyncOpenAI client in remote vllm provider (#1459 ) # What does this PR do? This switches from an OpenAI client to the AsyncOpenAI client in the remote vllm provider. The main benefit of this is that instead of each client call being a blocking operation that was blocking our server event loop, the client calls are now async operations that do not block the event loop. The actual fix is quite simple and straightforward. Creating a reliable reproducer of this with a unit test that verifies we were blocking the event loop before and are not blocking it any longer was a bit harder. Some other inference providers have this same issue, so we may want to make that simple delayed http server a bit more generic and pull it into a common place as other inference providers get fixed. (Closes #1457) ## Test Plan I verified the unit tests and test_text_inference tests pass with this change like below: ``` python -m pytest -v tests/unit ``` ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v -s \ tests/integration/inference/test_text_inference.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-03-07 14:48:00 -05:00
ehhuang	256448c14e	fix(cli): llama model prompt-format (#1481 ) Summary: + llama model prompt-format -m Llama3.2-11B-Vision-Instruct Traceback (most recent call last): File "/tmp/tmp.gCwyyCcjoA/.venv/bin/llama", line 10, in <module> sys.exit(main()) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 50, in main parser.run(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 44, in run args.func(args) File "/tmp/tmp.gCwyyCcjoA/.venv/lib/python3.10/site-packages/llama_stack/cli/model/prompt_format.py", line 59, in _run_model_template_cmd if args.list: AttributeError: 'Namespace' object has no attribute 'list' Test Plan: llama model prompt-format -m Llama3.2-11B-Vision-Instruct	2025-03-07 11:45:54 -08:00
Sébastien Han	ffa32af930	build: bump llama-stack-client version (#1469 ) ## What does this PR do? Use 0.1.5. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-07 11:42:38 -08:00
Sébastien Han	7cf1e24c4e	feat(logging): implement category-based logging (#1362 ) # What does this PR do? This commit introduces a new logging system that allows loggers to be assigned a category while retaining the logger name based on the file name. The log format includes both the logger name and the category, producing output like: ``` INFO 2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by tavily-search ``` Key features include: - Category-based logging: Loggers can be assigned a category (e.g., "core", "server") when programming. The logger can be loaded like this: `logger = get_logger(name=__name__, category="server")` - Environment variable control: Log levels can be configured per-category using the `LLAMA_STACK_LOGGING` environment variable. For example: `LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for the "server" and "core" categories. - `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all categories and third-party libraries. This provides fine-grained control over logging levels while maintaining a clean and informative log format. The formatter uses the rich library which provides nice colors better stack traces like so: ``` ERROR 2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146> exception=UnboundLocalError("local variable 'loop' referenced before assignment")> ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮ │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown │ │ │ │ 175 │ │ except asyncio.CancelledError: │ │ 176 │ │ │ pass │ │ 177 │ │ finally: │ │ ❱ 178 │ │ │ loop.stop() │ │ 179 │ │ │ 180 │ loop = asyncio.get_running_loop() │ │ 181 │ loop.create_task(shutdown()) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ UnboundLocalError: local variable 'loop' referenced before assignment ``` Co-authored-by: Ashwin Bharambe <@ashwinb> Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration: INFO 2025-03-03 21:55:35,928 __main__:380 [server]: apis: - agents ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-07 11:34:30 -08:00
Sébastien Han	bad12ee21f	fix: remove ruff N999 (#1388 ) # What does this PR do? Since we moved the move tests/client-sdk to tests/api in https://github.com/meta-llama/llama-stack/pull/1376. The N999 rule is not needed anymore. And furthermore in `abfbaf3c1b` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-07 11:14:04 -08:00
ehhuang	fbd47bb4b6	feat(agent): plain function as client tool (#1479 ) Summary: support added in https://github.com/meta-llama/llama-stack-client-python/pull/187 Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-07 11:10:07 -08:00
Charlie Doern	1097912054	refactor: display defaults in help text (#1480 ) # What does this PR do? using `formatter_class=argparse.ArgumentDefaultsHelpFormatter` displays (default: DEFAULT_VALUE) for each flag. add this formatter class to build and run to show users some default values like `conda`, `8321`, etc ## Test Plan ran locally with following output: before: ``` llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. positional arguments: config Path to config file to use for the run options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 --image-name IMAGE_NAME Name of the image to run. Defaults to the current conda environment --disable-ipv6 Disable IPv6 support --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. --tls-keyfile TLS_KEYFILE Path to TLS key file for HTTPS --tls-certfile TLS_CERTFILE Path to TLS certificate file for HTTPS --image-type {conda,container,venv} Image Type used during the build. This can be either conda or container or venv. ``` after: ``` llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. positional arguments: config Path to config file to use for the run options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321) --image-name IMAGE_NAME Name of the image to run. Defaults to the current conda environment (default: None) --disable-ipv6 Disable IPv6 support (default: False) --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: []) --tls-keyfile TLS_KEYFILE Path to TLS key file for HTTPS (default: None) --tls-certfile TLS_CERTFILE Path to TLS certificate file for HTTPS (default: None) --image-type {conda,container,venv} Image Type used during the build. This can be either conda or container or venv. (default: conda) ``` [//]: # (## Documentation) Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-07 11:05:58 -08:00
Xi Yan	b8c519ba11	feat: rag eval lifecycle notebook (#1458 ) # What does this PR do? - Add RAG eval lifecycle notebook - Closes https://github.com/meta-llama/llama-stack/issues/1113 - Best reviewed in https://github.com/meta-llama/llama-stack/blob/rag_eval_notebook/docs/notebooks/Llama_Stack_RAG_Lifecycle.ipynb [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run notebook [//]: # (## Documentation)	2025-03-07 10:41:50 -08:00
Ihar Hrachyshka	511afe1381	chore: add pytest-report.xml to gitignore (#1473 ) # What does this PR do? Ignores `pytest-report.xml`. The file is produced by the unit tests github workflow. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Not needed. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-07 10:41:22 -08:00
Reid	40cd48fa09	chore: remove the incorrect output (#1472 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) Based on the client output changed, so the output is incorrect: `458e20702b/src/llama_stack_client/lib/cli/models/models.py (L52)` and https://github.com/meta-llama/llama-stack/pull/1348#pullrequestreview-2654971315 previous discussion that no need to maintain the output, so remove it. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-07 10:39:33 -08:00
Yuan Tang	c4b229f2c9	chore: Delete unused .gitmodules (#1460 ) This is no longer needed after https://github.com/meta-llama/llama-stack/pull/1265. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-07 10:38:55 -08:00
Yuan Tang	649d9bc26d	fix(security): Bump jinja2 to >=3.1.6 (#1461 ) This addresses the new vulnerability https://github.com/advisories/GHSA-cpwx-vrp4-4pq7. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-07 10:38:39 -08:00
Botao Chen	4dccf916d1	feat: open benchmark template and doc (#1465 ) ## What does this PR do? - Provide a distro template to let developer easily run the open benchmarks llama stack supports on llama and non-llama models. - Provide doc on how to run open benchmark eval via CLI and open benchmark contributing guide [//]: # (If resolving an issue, uncomment and update the line below) (Closes #1375 ) ## Test Plan open benchmark eval results on llama, gpt, gemini and clause <img width="771" alt="Screenshot 2025-03-06 at 7 33 05 PM" src="https://github.com/user-attachments/assets/1bd85456-b9b9-4b37-af76-4ce1d2bac00e" /> doc preview <img width="944" alt="Screenshot 2025-03-06 at 7 33 58 PM" src="https://github.com/user-attachments/assets/f4e5866d-b395-4c40-aa8b-080edeb5cdb6" /> <img width="955" alt="Screenshot 2025-03-06 at 7 34 04 PM" src="https://github.com/user-attachments/assets/629defb6-d5e4-473c-aa03-308bce386fb4" /> <img width="965" alt="Screenshot 2025-03-06 at 7 35 29 PM" src="https://github.com/user-attachments/assets/c21ff96c-9e8c-4c54-b6b8-25883125f4cf" /> <img width="957" alt="Screenshot 2025-03-06 at 7 35 37 PM" src="https://github.com/user-attachments/assets/47571c90-1381-4e2c-bbed-c4f3a60578d0" />	2025-03-07 10:37:55 -08:00
Ashwin Bharambe	290cc843fc	test: first unit test for resolver (#1475 ) Starting to create unit tests to cover critical (and mostly undocumented) provider resolution and routing logic. ## Test Plan Unit tests	2025-03-07 10:20:51 -08:00
Dinesh Yeduguru	60e7f3d705	fix: Revert "feat: record token usage for inference API (#1300 )" (#1476 ) This reverts commit `b8535417e0`. Test plan: LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml python -m examples.agents.e2e_loop_with_client_tools localhost 8321	2025-03-07 10:16:47 -08:00
Yuan Tang	df4fbae35c	ci: Add script to generate changelog (#1463 )	2025-03-07 12:45:08 -05:00
Ashwin Bharambe	4d9fe25bbf	fix: fetched latest pypi version when building documentation	2025-03-06 21:15:15 -08:00
Ashwin Bharambe	330cc9d09d	feat: add Milvus vectorDB (#1467 ) # What does this PR do? See https://github.com/meta-llama/llama-stack/pull/1171 which is the original PR. Author: @zc277584121 feat: add [Milvus](https://milvus.io/) vectorDB note: I use the MilvusClient to implement it instead of AsyncMilvusClient, because when I tested AsyncMilvusClient, it would raise issues about evenloop, which I think AsyncMilvusClient SDK is not robust enough to be compatible with llama_stack framework. ## Test Plan have passed the unit test and ene2end test Here is my end2end test logs, including the client code, client log, server logs from inline and remote settings [test_end2end_logs.zip](https://github.com/user-attachments/files/18964391/test_end2end_logs.zip) --------- Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Cheney Zhang <chen.zhang@zilliz.com>	2025-03-06 20:59:31 -08:00
Xi Yan	1e3be1e4d7	fix: fix agent test recorded responses (#1462 ) # What does this PR do? - re-gen to fix agents test - update test_custom_tool [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct ``` <img width="1294" alt="image" src="https://github.com/user-attachments/assets/63521532-b989-4cf2-8fe5-c7f057f1c4dc" /> [//]: # (## Documentation)	2025-03-06 19:37:52 -08:00
Ihar Hrachyshka	8234cdf1a5	fix(deps): move chardet and pypdf imports inline where used (#1434 ) # What does this PR do? Fix import errors due to `chardet` and `pypdf` not being installed while imported from `url_utils.py`. Closes #1432 ## Test Plan Now able to run the server with the config. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-06 17:09:14 -08:00
Sébastien Han	803bf0e029	fix: solve ruff B008 warnings (#1444 ) # What does this PR do? The commit addresses the Ruff warning B008 by refactoring the code to avoid calling SamplingParams() directly in function argument defaults. Instead, it either uses Field(default_factory=SamplingParams) for Pydantic models or sets the default to None and instantiates SamplingParams inside the function body when the argument is None. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-06 16:48:35 -08:00
Xi Yan	3a454be9b2	docs: add back eval concept doc (#1456 ) # What does this PR do? - add eval concept doc in Core Concept tab [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan <img width="1266" alt="image" src="https://github.com/user-attachments/assets/8eb06a49-3c04-4899-805c-1b5349471f1f" /> cc @SLR722 [//]: # (## Documentation)	2025-03-06 15:47:20 -08:00
ehhuang	ca2910d27a	docs: update test_agents to use new Agent SDK API (#1402 ) # Summary: new Agent SDK API is added in https://github.com/meta-llama/llama-stack-client-python/pull/178 Update docs and test to reflect this. Closes https://github.com/meta-llama/llama-stack/issues/1365 # Test Plan: ```bash py.test -v -s --nbval-lax ./docs/getting_started.ipynb LLAMA_STACK_CONFIG=fireworks \ pytest -s -v tests/integration/agents/test_agents.py \ --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct ```	2025-03-06 15:21:12 -08:00
ehhuang	3d71e5a036	test: recordable mocks use json only (#1443 ) # Summary: removes the use of pickle # Test Plan: Run the following with `--record-responses` first, then another time without. LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct	2025-03-06 14:46:29 -08:00
Xi Yan	564977c646	docs: update eval doc (#1453 ) # What does this PR do? - Update eval doc to reflect latest changes - Closes https://github.com/meta-llama/llama-stack/issues/1441 ## Test Plan read [//]: # (## Documentation)	2025-03-06 14:14:10 -08:00
Reid	db4ee7a9ff	docs: improve rag doc (#1411 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-06 14:03:52 -08:00
Xi Yan	1a95271fab	fix: notebook vision inference (#1423 ) # What does this PR do? - update to use library client throughout cc @jeffxtang [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb ``` [//]: # (## Documentation)	2025-03-06 13:40:21 -08:00
ehhuang	46bc5f4a7a	chore: log exception (#1452 ) Summary: Test Plan: <img width="1236" alt="image" src="https://github.com/user-attachments/assets/facc43ba-85ff-42e4-8e04-b7970c630c4d" />	2025-03-06 11:42:51 -08:00
Sébastien Han	4bbb4ddeae	fix: resolve pydantic warning on .dict() usage (#1445 ) # What does this PR do? The method "dict" in class "BaseModel" is deprecated we should use model_dump instead. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-06 11:27:47 -08:00
Ashwin Bharambe	e8071b54dc	fix: no skip_logger_removal for non-library client	2025-03-06 11:04:56 -08:00
Yuan Tang	14c9ebbae5	docs: Add CHANGELOG.md (#1440 ) # What does this PR do? @raghotham @ashwinb @yanxi0830 This adds a single changelog doc for easier browsing based on our previous discussions. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-06 13:57:24 -05:00
Charlie Doern	8d86137ab2	docs: add information on how to set log level before running (#1430 ) # What does this PR do? currently logcat is not documented for build && run. Add documentation in building_distro.md Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-03-06 10:54:14 -08:00
Xi Yan	bcb13c492f	test: revamp eval related integration tests (#1433 ) # What does this PR do? - revamp and clean up datasets/scoring/eval integration tests - closes https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan dataset ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/datasetio/ ``` <img width="842" alt="image" src="https://github.com/user-attachments/assets/88fc2b6a-b496-47bf-bc0c-8fea48ba36ff" /> scoring ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="851" alt="image" src="https://github.com/user-attachments/assets/50f46415-b44c-4c37-a6c3-076f2767adb3" /> eval ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/eval --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct ``` <img width="841" alt="image" src="https://github.com/user-attachments/assets/8eb1c65c-3b39-4d66-8ff4-f471ca783e49" /> [//]: # (## Documentation)	2025-03-06 10:51:35 -08:00
Ashwin Bharambe	82e94fe22f	ci: add Github workflow which runs unittests in PR (#1442 )	2025-03-05 21:23:28 -05:00
Ashwin Bharambe	e6ae557661	fix: update testing documentation	2025-03-05 17:41:13 -08:00
Ashwin Bharambe	2fe976ed0a	refactor(test): introduce --stack-config and simplify options (#1404 ) You now run the integration tests with these options: ```bash Custom options: --stack-config=STACK_CONFIG a 'pointer' to the stack. this can be either be: (a) a template name like `fireworks`, or (b) a path to a run.yaml file, or (c) an adhoc config spec, e.g. `inference=fireworks,safety=llama-guard,agents=meta- reference` --env=ENV Set environment variables, e.g. --env KEY=value --text-model=TEXT_MODEL comma-separated list of text models. Fixture name: text_model_id --vision-model=VISION_MODEL comma-separated list of vision models. Fixture name: vision_model_id --embedding-model=EMBEDDING_MODEL comma-separated list of embedding models. Fixture name: embedding_model_id --safety-shield=SAFETY_SHIELD comma-separated list of safety shields. Fixture name: shield_id --judge-model=JUDGE_MODEL comma-separated list of judge models. Fixture name: judge_model_id --embedding-dimension=EMBEDDING_DIMENSION Output dimensionality of the embedding model to use for testing. Default: 384 --record-responses Record new API responses instead of using cached ones. --report=REPORT Path where the test report should be written, e.g. --report=/path/to/report.md ``` Importantly, if you don't specify any of the models (text-model, vision-model, etc.) the relevant tests will get skipped! This will make running tests somewhat more annoying since all options will need to be specified. We will make this easier by adding some easy wrapper yaml configs. ## Test Plan Example: ```bash ashwin@ashwin-mbp ~/local/llama-stack/tests/integration (unify_tests) $ LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/test_text_inference.py \ --text-model meta-llama/Llama-3.2-3B-Instruct ```	2025-03-05 17:02:02 -08:00
Reid	a0d6b165b0	chore: remove unused build dir (#1379 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] - From old PR, it use `BUILDS_BASE_DIR` in `llama_stack/cli/stack/configure.py`(removed). https://github.com/meta-llama/llama-stack/pull/371/files - Based on the current `build` code, it should only use `DISTRIBS_BASE_DIR` to save it. `46b0a404e8/llama_stack/cli/stack/_build.py (L298)` `46b0a404e8/llama_stack/cli/stack/_build.py (L301)` Pls correct me if I am understand incorrectly. So it should no need to use in `run` now. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-05 15:40:00 -08:00
Ihar Hrachyshka	4d4be03176	fix: don't import from llama_models (#1436 ) # What does this PR do? Some imports were not switched to in-tree copy of the modules. This is a follow-up to: https://github.com/meta-llama/llama-stack/pull/1344 Closes #1435 ## Test Plan Manually started the server... [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-05 15:30:38 -08:00
ehhuang	6cf79437b3	feat: support ClientTool output metadata (#1426 ) # Summary: Client side change in https://github.com/meta-llama/llama-stack-client-python/pull/180 Changes the resume_turn API to accept `ToolResponse` instead of `ToolResponseMessage`: 1. `ToolResponse` contains `metadata` 2. `ToolResponseMessage` is a concept for model inputs. Here we are just submitting the outputs of tool execution. # Test Plan: Ran integration tests with newly added test using client tool with metadata LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses	2025-03-05 14:30:27 -08:00
Ben Browning	ac717f38dc	chore: Reduce flakes in test_text_inference on smaller models (#1428 ) # What does this PR do? When running `tests/integration/inference/test_text_inference.py` on smaller models, such as Llama-3.2-3B-Instruct, I sometimes get test flakes where the model passes "San Francisco" as an argument to my tool call instead of "San Francisco, CA" which is what we expect. So, this expands upon that tool calling parameter's description to explicitly state that both city and state are required. With this change, the tool calling tests that are checking for this "San Francisco, CA" value are always passing for me instead of sometimes failing. ## Test Plan I test this locally via vLLM like: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/integration/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" \ --vision-inference-model "" ``` I don't expect this would negatively impact the parameter generated for this tool call by other models, as we're providing additional guidance but not removing any of the existing guidance. However, I cannot easily confirm that myself. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-03-05 13:05:30 -08:00
Dinesh Yeduguru	b8535417e0	feat: record token usage for inference API (#1300 ) # What does this PR do? Inference router computes the token usage related metrics for all providers and returns the metrics as part of response and also logs to telemetry. ## Test Plan LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml ``` curl --request POST \ --url http://localhost:8321/v1/inference/chat-completion \ --header 'content-type: application/json' \ --data '{ "model_id": "meta-llama/Llama-3.1-70B-Instruct", "messages": [ { "role": "user", "content": { "type": "text", "text": "where do humans live" } } ], "stream": false }' \| jq . { "metrics": [ { "trace_id": "yjv1tf0jS1evOyPm", "span_id": "WqYKvg0_", "timestamp": "2025-02-27T18:55:10.770903Z", "attributes": { "model_id": "meta-llama/Llama-3.1-70B-Instruct", "provider_id": "fireworks" }, "type": "metric", "metric": "prompt_tokens", "value": 10, "unit": "tokens" }, { "trace_id": "yjv1tf0jS1evOyPm", "span_id": "WqYKvg0_", "timestamp": "2025-02-27T18:55:10.770916Z", "attributes": { "model_id": "meta-llama/Llama-3.1-70B-Instruct", "provider_id": "fireworks" }, "type": "metric", "metric": "completion_tokens", "value": 411, "unit": "tokens" }, { "trace_id": "yjv1tf0jS1evOyPm", "span_id": "WqYKvg0_", "timestamp": "2025-02-27T18:55:10.770919Z", "attributes": { "model_id": "meta-llama/Llama-3.1-70B-Instruct", "provider_id": "fireworks" }, "type": "metric", "metric": "total_tokens", "value": 421, "unit": "tokens" } ], "completion_message": { "role": "assistant", "content": "Humans live in various parts of the world, inhabiting almost every continent, country, and region. Here's a breakdown of where humans live:\n\n1. Continents: Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica (research stations only)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. Countries: There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. Regions: Humans live in diverse regions, including:\n\t* Deserts (e.g., Sahara, Mojave)\n\t* Forests (e.g., Amazon, Congo)\n\t* Grasslands (e.g., Prairies, Steppes)\n\t* Mountains (e.g., Himalayas, Andes)\n\t* Oceans (e.g., coastal areas, islands)\n\t* Tundras (e.g., Arctic, sub-Arctic)\n4. Cities and towns: Many humans live in urban areas, such as cities and towns, which are often located near:\n\t* Coastlines\n\t* Rivers\n\t* Lakes\n\t* Mountains\n5. Rural areas: Some humans live in rural areas, such as:\n\t* Villages\n\t* Farms\n\t* Countryside\n6. Islands: Humans inhabit many islands, including:\n\t* Tropical islands (e.g., Hawaii, Maldives)\n\t* Arctic islands (e.g., Greenland, Iceland)\n\t* Continental islands (e.g., Great Britain, Ireland)\n7. Extreme environments: Humans also live in extreme environments, such as:\n\t* High-altitude areas (e.g., Tibet, Andes)\n\t* Low-altitude areas (e.g., Death Valley, Dead Sea)\n\t* Areas with extreme temperatures (e.g., Arctic, Sahara)\n\nOverall, humans have adapted to live in a wide range of environments and ecosystems around the world.", "stop_reason": "end_of_turn", "tool_calls": [] }, "logprobs": null } ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/inference ======================================================================== short test summary info ========================================================================= FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=8B:vis=11B-inference:chat_completion:tool_calling_tools_absent-True] - ValueError: Unsupported tool prompt format: ToolPromptFormat.json FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[txt=8B:vis=11B-inference:chat_completion:tool_calling_tools_absent-False] - ValueError: Unsupported tool prompt format: ToolPromptFormat.json FAILED tests/integration/inference/test_vision_inference.py::test_image_chat_completion_non_streaming[txt=8B:vis=11B] - fireworks.client.error.InvalidRequestError: {'error': {'object': 'error', 'type': 'invalid_request_error', 'message': 'Failed to decode image cannot identify image f... FAILED tests/integration/inference/test_vision_inference.py::test_image_chat_completion_streaming[txt=8B:vis=11B] - fireworks.client.error.InvalidRequestError: {'error': {'object': 'error', 'type': 'invalid_request_error', 'message': 'Failed to decode image cannot identify image f... ========================================================= 4 failed, 16 passed, 23 xfailed, 17 warnings in 44.36s ========================================================= ```	2025-03-05 12:41:45 -08:00
Ben Browning	9c4074ed49	fix: Gracefully handle no choices in remote vLLM response (#1424 ) # What does this PR do? This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list. A specific example of the type of stack trace this fixes: ``` File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response choice = chunk.choices[0] ~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details. This is related to #1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling. ## Test Plan As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is `test_process_vllm_chat_completion_stream_response_no_choices` in the new `tests/unit/providers/inference/test_remote_vllm.py`. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for #1236 (builtin tool calling) and #1325 (vLLM <= v0.6.3). This required fixing the signature of `_process_vllm_chat_completion_stream_response` to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat `OpenAICompatCompletionResponse` to `openai.types.chat.chat_completion_chunk.ChatCompletionChunk`. It was not actually getting passed `OpenAICompatCompletionResponse` objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed. Run these new unit tests like this: ``` pytest tests/unit/providers/inference/test_remote_vllm.py ``` Additionally, I ensured the existing `test_text_inference.py` tests passed via: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v tests/integration/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" \ --vision-inference-model "" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-03-05 15:07:54 -05:00
Xi Yan	bcc5370d2e	feat: effective agent workflow notebook (#1372 ) # What does this PR do? - Add Notebook: Build and Monitor Agent Workflows with Llama Stack + Anthropic's Best Practice - Better reviewed in: https://github.com/meta-llama/llama-stack/blob/effective_agents/docs/notebooks/Llama_Stack_Agent_Workflows.ipynb - Closes https://github.com/meta-llama/llama-stack/issues/1371 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Agent_Workflows.ipynb ``` <img width="671" alt="image" src="https://github.com/user-attachments/assets/e5a7e312-ab3d-406a-a0f8-3b1d836e7b46" /> [//]: # (## Documentation)	2025-03-05 11:53:25 -08:00
yyymeta	1c6fbd95a5	fix: regex parser to support more answer formats (#1425 ) # What does this PR do? add better-performance prompt: existing prompts expect a generated response that ends in "Answer :". But during test, we found that for GPQA, the prompt used by meta internal genEval "The best answer is [ABCD]" achieves higher accuracy . ## Test Plan ``` (myenv) [yyy@devgpu018.nha2 ~/internal-llama-stack (yyy)]$llama-stack-client eval run-benchmark "meta-reference-gpqa-cot" --model-id meta-llama/Llama-4-17B-Llama-API --output-dir /tmp/gpqa --num-examples 20 .... Sending HTTP Request: GET http://localhost:5001/v1/scoring-functions/basic::regex_parser_multiple_choice_answer 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20/20 [ 0:04:46 < 0:00:00 , 0 it/s ] ✓ Results saved to: /tmp/gpqa/meta-reference-gpqa-cot_results.json! (myenv) [yyy@devgpu018.nha2 ~/internal-llama-stack (yyy)]$ (myenv) [yyy@devgpu018.nha2 ~/internal-llama-stack (yyy)]$ (myenv) [yyy@devgpu018.nha2 ~/internal-llama-stack (yyy)]$ (myenv) [yyy@devgpu018.nha2 ~/internal-llama-stack (yyy)]$ tail /tmp/gpqa/meta-reference-gpqa-cot_results.json { "score": 0.0 }, { "accuracy": 0.5, "num_correct": 10.0, "num_total": 20 } ] }(myenv) [yyy@devgpu018.nha2 ~/internal-llama-stack (yyy)]$ ``` [//]: # (## Documentation)	2025-03-05 11:52:07 -08:00
Ben Browning	00570fde31	chore: Get sqlite_vec and vector_store unit tests passing (#1413 )	2025-03-05 13:20:13 -05:00
Reid	77d323c2f8	docs: fix typo (#1416 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-05 10:02:32 -08:00
Xi Yan	d3508c4c76	feat(1/n): scoring function registration for llm-as-judge (#1405 ) # What does this PR do? - add ability to register a llm-as-judge scoring function with custom judge prompts / params. - Closes https://github.com/meta-llama/llama-stack/issues/1395 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Via CLI ``` llama-stack-client scoring_functions register \ --scoring-fn-id "llm-as-judge::my-prompt" \ --description "my custom judge" \ --return-type '{"type": "string"}' \ --provider-id "llm-as-judge" \ --provider-scoring-fn-id "my-prompt" \ --params '{"type": "llm_as_judge", "judge_model": "meta-llama/Llama-3.2-3B-Instruct", "prompt_template": "always output 1.0"}' ``` <img width="1373" alt="image" src="https://github.com/user-attachments/assets/7c6fc0ae-64fe-4581-8927-a9d8d746bd72" /> - Unit test will be addressed with https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (## Documentation)	2025-03-05 10:00:34 -08:00
Xi Yan	3d9331840e	docs: api documentation for agents/eval/scoring/datasets (#1400 ) # What does this PR do? - add some docs to OpenAPI for agents/eval/scoring/datasetio [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - read [//]: # (## Documentation)	2025-03-05 09:40:24 -08:00
Xi Yan	0d18274d34	chore: update hf source for eval notebook (#1403 ) # What does this PR do? - update llamastack/evals to llamastack/simpleqa [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` [//]: # (## Documentation)	2025-03-05 09:38:30 -08:00
Ellis Tarn	24a27baf7c	chore: Make README code blocks more easily copy pastable (#1420 ) # What does this PR do? When going through READMEs, I found that I had to keep editing the code blocks since they were prefixed with `$ `. A common pattern is to triple click (highlight all) a block and then copy paste. This minor change will make this easier for folks to follow the READMEs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan N/A [//]: # (## Documentation)	2025-03-05 09:11:01 -08:00
Botao Chen	3fabe076cd	chore: Update CODEOWNERS (#1407 ) Add SLR722 as code owner	2025-03-04 21:48:24 -08:00
Daniele Martinoli	fb998683e0	fix: Agent uses the first configured vector_db_id when documents are provided (#1276 ) # What does this PR do? The agent API allows to query multiple DBs using the `vector_db_ids` argument of the `rag` tool: ```py toolgroups=[ { "name": "builtin::rag", "args": {"vector_db_ids": [vector_db_id]}, } ], ``` This means that multiple DBs can be used to compose an aggregated context by executing the query on each of them. When documents are passed to the next agent turn, there is no explicit way to configure the vector DB where the embeddings will be ingested. In such cases, we can assume that: - if any `vector_db_ids` is given, we use the first one (it probably makes sense to assume that it's the only one in the list, otherwise we should loop on all the given DBs to have a consistent ingestion) - if no `vector_db_ids` is given, we can use the current logic to generate a default DB using the default provider. If multiple providers are defined, the API will fail as expected: the user has to provide details on where to ingest the documents. (Closes #1270) ## Test Plan The issue description details how to replicate the problem. [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com>	2025-03-04 21:44:13 -08:00
Xi Yan	78962be996	chore: refactor create_and_execute_turn and resume_turn (#1399 ) # What does this PR do? - Closes https://github.com/meta-llama/llama-stack/issues/1212 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` <img width="1203" alt="image" src="https://github.com/user-attachments/assets/35b60017-b3f2-4e98-87f2-2868730261bd" /> ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py::test_rag_and_code_agent --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-04 16:07:30 -08:00
Ashwin Bharambe	abfbaf3c1b	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 ) All of the tests from `llama_stack/providers/tests/` are now moved to `tests/integration`. I converted the `tools`, `scoring` and `datasetio` tests to use API. However, `eval` and `post_training` proved to be a bit challenging to leaving those. I think `post_training` should be relatively straightforward also. As part of this, I noticed that `wolfram_alpha` tool wasn't added to some of our commonly used distros so I added it. I am going to remove a lot of code duplication from distros next so while this looks like a one-off right now, it will go away and be there uniformly for all distros.	2025-03-04 14:53:47 -08:00
Ashwin Bharambe	dd0db8038b	refactor(test): unify vector_io tests and make them configurable (#1398 ) ## Test Plan `LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2 --inference-model='' --vision-inference-model=''` ``` test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED ``` Same thing with: - LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss - LLAMA_STACK_CONFIG=fireworks (Note that ergonomics will soon be improved re: cmd-line options and env variables)	2025-03-04 13:37:45 -08:00
ehhuang	fd8c991393	fix: rag as attachment bug (#1392 ) Summary: Test Plan: added new test LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/api/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B	2025-03-04 13:08:16 -08:00
Xi Yan	e9a37bad63	chore: rename task_config to benchmark_config (#1397 ) # What does this PR do? - This was missed from previous deprecation: https://github.com/meta-llama/llama-stack/pull/1186 - Part of https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` [//]: # (## Documentation)	2025-03-04 12:44:04 -08:00
Xi Yan	158b6dc404	chore: deprecate allow_turn_resume (#1377 ) # What does this PR do? - Deprecate allow_turn_resume flag as this is used for staying backward compat. - Closes https://github.com/meta-llama/llama-stack/issues/1363 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/api/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" --record-responses ``` <img width="1054" alt="image" src="https://github.com/user-attachments/assets/d31de2d4-0953-41e1-a71a-7e1579fa351a" /> [//]: # (## Documentation)	2025-03-04 12:22:11 -08:00
Ashwin Bharambe	cad5eed4b5	refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393 ) Continues the refactor of tests. Tests from `providers/tests` should be considered deprecated. For this PR, I deleted most of the tests in - inference - safety - agents since much more comprehensive tests exist in `tests/integration/{inference,safety,agents}` already. I moved `test_persistence.py` from agents, but disabled all the tests since that test needs to be properly migrated. ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v agents --vision-inference-model='' /Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/lib/python3.10/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ======================================================================================================= test session starts ======================================================================================================== platform darwin -- Python 3.10.16, pytest-8.3.3, pluggy-1.5.0 -- /Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.3', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/ashwin/local/llama-stack configfile: pyproject.toml plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, anyio-4.8.0, nbval-0.11.0 asyncio: mode=strict, default_loop_scope=None collected 15 items agents/test_agents.py::test_agent_simple[txt=8B] PASSED agents/test_agents.py::test_tool_config[txt=8B] PASSED agents/test_agents.py::test_builtin_tool_web_search[txt=8B] PASSED agents/test_agents.py::test_builtin_tool_code_execution[txt=8B] PASSED agents/test_agents.py::test_code_interpreter_for_attachments[txt=8B] PASSED agents/test_agents.py::test_custom_tool[txt=8B] PASSED agents/test_agents.py::test_custom_tool_infinite_loop[txt=8B] PASSED agents/test_agents.py::test_tool_choice[txt=8B] PASSED agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag/knowledge_search] PASSED agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag] PASSED agents/test_agents.py::test_rag_agent_with_attachments[txt=8B] PASSED agents/test_agents.py::test_rag_and_code_agent[txt=8B] PASSED agents/test_agents.py::test_create_turn_response[txt=8B] PASSED agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This test needs to be migrated to api / client-sdk world) agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This test needs to be migrated to api / client-sdk world) ```	2025-03-04 10:41:57 -08:00
Ashwin Bharambe	4ca58eb987	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00
Ashwin Bharambe	c6b13b6a24	fix: pre-commit	2025-03-04 09:49:40 -08:00
Ashwin Bharambe	1c63ec981a	feat(test): allow specifying simple ad-hoc distributions in LLAMA_STACK_CONFIG	2025-03-04 09:47:11 -08:00
Reid	cb085d56c6	docs: fix typo (#1390 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-04 09:02:55 -08:00
Alexey Rybak	d57cffb495	fix(pgvector): replace hyphens with underscores in table names (#1385 ) # What does this PR do? Fix SQL syntax errors caused by hyphens in Vector DB IDs by sanitizing table # (Closes #1332 ) ## Test Plan Test confirms table names with hyphens are properly converted to underscores	2025-03-04 07:06:35 -08:00
Sébastien Han	468edfd92c	fix: fix end of files for pre-commit (#1387 ) # What does this PR do? Fix end of files hook for pre-commit. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run pre-commit without any errors: ``` uv run pre-commit run --all-files ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-04 07:05:02 -08:00
ehhuang	07a992ef90	feat: deterministic tools ordering (#1380 ) Summary: 1. The `tools` parameter we construct to pass the inference API is non-deterministic. As a result, our recordable mocks is flaky as the ordering change sometimes. This PR makes it so that `tools` ordering is deterministic and aligned with the order user specified. 2. In recordable mock key generation, client tool's parameter type was 'str' and now is 'string' for some reason. I didn't dig into exactly why, but just regenerated the fixtures. Test Plan: Regenerate mocks: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses ``` Rerun tests without --record-responses: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-03-03 20:38:07 -08:00
Ashwin Bharambe	86fc514abb	refactor: move more tests, delete some providers tests (#1382 ) Move unittests to tests/unittests. Gradually nuking tests from providers/tests/ and unifying them into tests/api (which are e2e tests using SDK types) ## Test Plan `pytest -s -v tests/unittests/`	2025-03-03 20:28:34 -08:00
Ashwin Bharambe	e5ec68f66e	fix: fix bugs in relative imports exposed due to dir move	2025-03-03 19:42:45 -08:00
Ashwin Bharambe	55668d3c5b	refactor: move a few tests to top-level tests/ directory	2025-03-03 17:33:39 -08:00
Ashwin Bharambe	5736c7d682	refactor: move tests/client-sdk to tests/api (#1376 ) This PR moves the client-sdk tests to the api directory to better reflect their purpose and improve code organization.	2025-03-03 17:28:12 -08:00
Ashwin Bharambe	c3155cb1bc	fix: add a bunch more keys to be passed as provider data for client-sdk tests	2025-03-03 17:05:26 -08:00
Reid	5c9d12a206	chore: improve --port help text (#1346 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It would be better to tell user env var usage in help text. ``` before: $ llama stack run --help --port PORT Port to run the server on. Defaults to 8321 after $ llama stack run --help --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-03 16:49:03 -08:00
Ashwin Bharambe	0a76ece249	feat: add more logs to agent_instance.py	2025-03-03 16:15:47 -08:00
ehhuang	ee5e9b935a	feat: better using get_default_tool_prompt_format (#1360 ) Summary: https://github.com/meta-llama/llama-stack/pull/1214 introduced `get_default_tool_prompt_format` but tried to use it on the raw identifier. Here we move calling this func later in the stack and rely on the inference provider to resolve the raw identifier into llama model, then call get_default_tool_prompt_format. Test Plan: ``` LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming --inference-model=llama3.2:3b-instruct-fp16 --vision-inference-model="" ``` Before: <img width="1288" alt="image" src="https://github.com/user-attachments/assets/918c7839-1f45-4540-864e-4b842cc367df" /> After: <img width="1522" alt="image" src="https://github.com/user-attachments/assets/447d78af-b3b9-4837-8cb7-6ac549005efe" />	2025-03-03 14:50:06 -08:00
ehhuang	386c806c70	test: introduce recordable mocks for Agent tests (#1268 ) Summary: Agent tests shouldn't need to run inference and tools calls repeatedly. This PR introduces a way to record inference/tool calls and reuse them in subsequent test runs, which makes the tests more reliable and saves costs. Test Plan: Run when there's no recorded calls created (fails): ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B ``` Run when `--record-responses` to record calls: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses ``` Run without `--record-responses` again (succeeds): ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-03-03 14:48:32 -08:00
Ashwin Bharambe	816fdf289a	refactor: move generation.py to llama3	2025-03-03 13:50:19 -08:00
Ashwin Bharambe	02066591b8	refactor: move generation.py to llama3	2025-03-03 13:46:50 -08:00
Ashwin Bharambe	725423c95c	refactor: move llama3 impl to meta_reference provider (#1364 ) Just moving bits to a better place ## Test Plan ```bash torchrun $CONDA_PREFIX/bin/pytest -s -v test_text_inference.py ```	2025-03-03 13:22:57 -08:00
Ashwin Bharambe	af396e3809	fix: update version and fix docs release notes link	2025-03-03 11:48:57 -08:00
Ashwin Bharambe	789f918042	fix: add tomli to requirements.txt for docs; ideally we need to move this to uv	2025-03-03 11:11:17 -08:00
Sébastien Han	f86154dff5	refactor: restructure resolver logic and improve type safety (#1323 ) # What does this PR do? - Modularized `resolve_impls` by extracting helper functions for validation, sorting, and instantiation. - Improved readability by introducing `validate_and_prepare_providers`, `sort_providers_by_dependency`, and `instantiate_providers`. - Enhanced type safety with explicit type hints (`Tuple`, `Dict`, `Set`, etc.). - Fixed potential issues with provider module imports and added error handling. - Updated `pyproject.toml` to enforce type checking on `resolver.py` using `mypy`. Signed-off-by: Sébastien Han <seb@redhat.com> - [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run the server. [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-03 10:45:12 -08:00
Daniele Martinoli	cae6c00d8a	fix: Fixed use of chunk.id (#1356 ) # What does this PR do? Closes #1355 ## Test Plan Start server and execute e`xamples/agents/rag_with_vector_db.py` from `llama-stack-apps`.	2025-03-03 10:42:59 -08:00
Xi Yan	7d111c7510	feat: unify max_infer_iters in client/server agent loop (#1309 ) # What does this PR do? We currently use `max_infer_iters` in 2 different ways 1/ Server: track number of times 2/ Client side: track number of times we send `resume_turn` request This PR gets rid of the need of (2) and makes server track total number of times we perform inference within a Turn NOTE The PR will assume StopReason is set to - end_of_message: turn is not finished, we could be waiting for client tool call responses - end_of_turn: if the entire turn is finished and there's no more things to be done. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-03 10:08:36 -08:00
Ashwin Bharambe	754feba61f	feat: add a configurable category-based logger (#1352 ) A self-respecting server needs good observability which starts with configurable logging. Llama Stack had little until now. This PR adds a `logcat` facility towards that. Callsites look like: ```python logcat.debug("inference", f"params to ollama: {params}") ``` - the first parameter is a category. there is a static list of categories in `llama_stack/logcat.py` - each category can be associated with a log-level which can be configured via the `LLAMA_STACK_LOGGING` env var. - a value `LLAMA_STACK_LOGGING=inference=debug;server=info"` does the obvious thing. there is a special key called `all` which is an alias for all categories ## Test Plan Ran with `LLAMA_STACK_LOGGING="all=debug" llama stack run fireworks` and saw the following: ![image](https://github.com/user-attachments/assets/d24b95ab-3941-426c-9ea0-a4c62542e6f0) Hit it with a client-sdk test case and saw this: ![image](https://github.com/user-attachments/assets/3fee8c6c-986e-4125-a09c-f5dc019682e2)	2025-03-02 18:51:14 -08:00
Reid	a9a7b11326	docs: update agent_execution_loop example code (#1350 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] - add missing `import` - add client define - update `attachments` to `documents`, `40da0d0e76` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-02 18:27:43 -08:00
Reid	58586f4f8c	fix: update cmd check logic (#1347 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Sorry for the https://github.com/meta-llama/llama-stack/pull/1340 logic, it will cause issue if in `non-container` env. ``` Using conda <<<<<<<------ environment: stack + is_command_available docker + command -v docker + printf '\033[0;31mError: docker command not found. Is docker installed and in your PATH?\033[0m' Error: docker command not found. Is docker installed and in your PATH?+ exit 1 ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-02 18:26:59 -08:00
Reid	e84f1a5549	fix: fix pre-commit check issue (#1349 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] For `3805604220` ``` Fixing docs/source/building_applications/tools.md check for added large files..............................................Passed fix end of files.........................................................Passed Insert license in comments...............................................Passed ruff.....................................................................Passed ruff-format..............................................................Passed blacken-docs.............................................................Passed uv-lock..................................................................Passed uv-export................................................................Passed mypy.....................................................................Passed Distribution Template Codegen............................................Passed pre-commit hook(s) made changes. If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`. To run `pre-commit` as part of git workflow, use `pre-commit install`. All changes made by hooks: diff --git a/docs/source/building_applications/tools.md b/docs/source/building_applications/tools.md index afffbc8..5a569ff 100644 --- a/docs/source/building_applications/tools.md +++ b/docs/source/building_applications/tools.md @@ -127,7 +127,7 @@ MCP tools require: ## Adding Custom Tools -When you want to use tools other than the built-in tools, you can implement a python function and decorate it with `@client_tool`. +When you want to use tools other than the built-in tools, you can implement a python function and decorate it with `@client_tool`. To define a custom tool, you need to use the `@client_tool` decorator. ```python Error: Process completed with exit code 1. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-02 11:13:17 -05:00
ehhuang	52977e56a8	docs: update Agent documentation (#1333 ) Summary: - [new] Agent concepts (session, turn) - [new] how to write custom tools - [new] non-streaming API and how to get outputs - [update] remaining `memory` -> `rag` rename - [new] note importance of `instructions` Test Plan: read	2025-03-01 22:34:52 -08:00
Ashwin Bharambe	46b0a404e8	chore: remove straggler references to llama-models (#1345 ) Straggler references cleanup	2025-03-01 14:26:03 -08:00
Ashwin Bharambe	8bbd52bb9f	chore: remove dependency on llama_models completely (#1344 )	2025-03-01 12:48:08 -08:00
Reid	7131d5ddeb	chore: remove start_venv.sh (#1341 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `start_venv.sh` lifecycle should be: `025f615868` >> `34e3faa4e8` >> `4684fd3f8d` Finally replaced by `start_stack.sh` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 11:22:06 -08:00
Ashwin Bharambe	6609d4ada4	feat: allow conditionally enabling providers in run.yaml (#1321 ) # What does this PR do? We want to bundle a bunch of (typically remote) providers in a distro template and be able to configure them "on the fly" via environment variables. So far, we have been able to do this with simple env var replacements. However, sometimes you want to only conditionally enable providers (because the relevant remote services may not be alive, or relevant.) This was not possible until now. To aid this, we add a simple (bash-like) env var replacement enhancement: `${env.FOO+bar}` evaluates to `bar` if the variable is SET and evaluates to empty string if it is not. On top of that, we update our main resolver to ignore any provider whose ID is null. This allows using the distro like this: ```bash llama stack run dev --env CHROMADB_URL=http://localhost:6001 --env ENABLE_CHROMADB=1 ``` when only Chroma is UP. This disables the other `pgvector` provider in the run configuration. ## Test Plan Hard code `chromadb` as the vector io provider inside `test_vector_io.py` and run: ```bash LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/vector_io/ --embedding-model all-MiniLM-L6-v2 ```	2025-03-01 11:19:14 -08:00
ehhuang	81c6ef5c1c	fix: don't update tool_config inplace (#1338 ) Summary: messes tests up Test Plan: run agent tests	2025-03-01 10:40:00 -08:00
Reid	327b17e5f0	chore: add container cmd check in start_stack.sh (#1340 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 10:39:32 -08:00
ehhuang	7cff9f504f	fix: raise error when request param failed to convert (#1339 ) # Summary: This led to extremely hard to debug messages. Before: llama_stack/distribution/library_client.py:275: in request response = await self._call_non_streaming( llama_stack/distribution/library_client.py:322: in _call_non_streaming result = await matched_func(*body) llama_stack/providers/utils/telemetry/trace_protocol.py:102: in async_wrapper result = await method(self, args, **kwargs) llama_stack/providers/inline/agents/meta_reference/agents.py:80: in create_agent value=agent_config.model_dump_json(), E AttributeError: 'dict' object has no attribute 'model_dump_json' After: E ValueError: Failed to convert parameter {'model': 'meta-llama/Llama-3.1-8B-Instruct', 'instructions': 'You are a helpful assistant', 'sampling_params': {'strategy': {'type': 'top_p', 'temperature': 0.0001, 'top_p': 0.9}}, 'toolgroups': [{'name': 'builtin::rag'}], 'input_shields': ['meta-llama/Llama-Guard-3-8B'], 'output_shields': ['meta-llama/Llama-Guard-3-8B'], 'enable_session_persistence': False} into <class 'llama_stack.apis.agents.agents.AgentConfig'>: 2 validation errors for AgentConfig E toolgroups.0.str E Input should be a valid string [type=string_type, input_value={'name': 'builtin::rag'}, input_type=dict] E For further information visit https://errors.pydantic.dev/2.10/v/string_type E toolgroups.0.AgentToolGroupWithArgs.args E Field required [type=missing, input_value={'name': 'builtin::rag'}, input_type=dict] E For further information visit https://errors.pydantic.dev/2.10/v/missing # Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-03-01 10:39:05 -08:00
Reid	dc069025f5	chore: fix typo (#1343 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `21ec67356c/distributions` It should missed the `s`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 10:36:04 -08:00
ehhuang	21ec67356c	fix: RAG with documents (#1337 ) Summary: This was broken by https://github.com/meta-llama/llama-stack/pull/1015/files#r1975394190 Test Plan: added e2e test	2025-02-28 16:51:00 -08:00
ehhuang	7854af8b52	docs: update user prompt example (#1329 ) Summary: in case user sets it to a small model with poor tool use capability Test Plan: copy and paste to notebook and ran	2025-02-28 16:42:29 -08:00
ehhuang	ba3bedc7e9	test: remove old test (#1334 ) Summary: This test is no longer relevant. We updated the default system prompt in https://github.com/meta-llama/llama-stack/pull/1310, and system override behavior is already unit-tested in test_prompt_adapter.py Test Plan: read	2025-02-28 16:42:13 -08:00
ehhuang	2faee24873	chore: better raise (#1335 ) Summary: addresses https://github.com/meta-llama/llama-stack/pull/1282#discussion_r1972546802 Test Plan:	2025-02-28 16:41:20 -08:00
Ashwin Bharambe	7ad7e3b970	fix: only install llama-stack package, deps are now correctly incorporated	2025-02-28 16:12:11 -08:00
Surya Prakash Pathak	9b6a2577b1	docs: Update llama-stack version in README.md (#1330 ) # What does this PR do? This PR updates the version in the [README.md](https://github.com/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/README.md) to reflect the latest changes in Llama Stack setup. Previously, using llama-stack==0.1.0 caused an error when running: ```bash llama stack build --template ollama --image-type conda ``` Upgrading to llama-stack==0.1.3 resolves this issue. ## Test Plan - Verified that `llama stack build --template ollama --image-type conda` works correctly. --------- Signed-off-by: Surya Prakash Pathak <supathak@redhat.com>	2025-02-28 13:37:03 -08:00
Xi Yan	82fa0803fa	chore: refactor client tool in test (#1331 ) # What does this PR do? Use @client_tool decorator instead of ClientTool [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/client-sdk/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` <img width="1053" alt="image" src="https://github.com/user-attachments/assets/d3ade884-ef42-494e-8028-3b09d9ef1978" /> [//]: # (## Documentation)	2025-02-28 12:29:50 -08:00
Xi Yan	15f69e75ff	fix: replace eval with json decoding for format_adapter (#1328 ) # What does this PR do? - using `eval` is a security risk [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - see https://github.com/meta-llama/llama-stack/pull/1327 cc @SLR722 we will need to update the corresponding dataset via ```python def update_to_json_str(): dataset = datasets.load_dataset(...) processed_dataset = dataset[split].map( lambda x: { "column": json.dumps(eval(x["column"])) } ) processed_dataset.push_to_hub(...) ``` [//]: # (## Documentation)	2025-02-28 11:25:23 -08:00
Ashwin Bharambe	5547ef953c	feat: enhance OpenAPI spec to include Error types (#1320 ) # What does this PR do? An API spec must talk about Error handling. This was a pretty glaring omission so far. This PR begins to address it by adding a set of standard error responses we can attach to all our API calls. At a future point, we can add specific error types where necessary (although we should not hurry to do that; it is best done very late.) ## Test Plan Checked that Stainless SDK generation succeeds.	2025-02-28 11:16:12 -08:00
Xi Yan	6520baebed	fix: replace eval with json decoding (#1327 ) # What does this PR do? - Using `eval` on server is a security risk - Replace `eval` with `json.loads` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="747" alt="image" src="https://github.com/user-attachments/assets/7aff3d95-0b12-4394-b9d0-aeff791eee38" /> [//]: # (## Documentation)	2025-02-28 11:10:45 -08:00
Reid	66cd128ab5	docs: update the downloaded list doc (#1266 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Since released the `--downloaded` option, so update the related documents. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:10:12 -08:00
Reid	14c442f177	chore: update cmd check (#1293 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:08:05 -08:00
Reid	ea4f13cc20	chore: add container cmd check (#1306 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:07:24 -08:00
Reid	5366dab31e	docs: update build doc (#1262 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `55eb257459/llama_stack/cli/stack/run.py (L22)` `55eb257459/llama_stack/cli/stack/_build.py (L103)` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:03:45 -08:00
Matthew Farrellee	83dc8fbdff	test: cleanup embedding model test suite (#1322 ) # What does this PR do? - skip media tests for models that do not support media - skip output_dimension tests for models that do not support it - skip task_type tests for models that do not support it - provide task_type for models that require it ## Test Plan `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_embedding.py --embedding-model ...`	2025-02-28 10:02:36 -08:00
Sébastien Han	c91548fe07	build(container): misc improvements (#1291 ) # What does this PR do? See individual commit messages. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Apply this diff: ``` diff --git a/llama_stack/templates/ollama/build.yaml b/llama_stack/templates/ollama/build.yaml index da33b8d5..4a702f6f 100644 --- a/llama_stack/templates/ollama/build.yaml +++ b/llama_stack/templates/ollama/build.yaml @@ -28,5 +28,5 @@ distribution_spec: - remote::tavily-search - inline::code-interpreter - inline::rag-runtime - - remote::model-context-protocol + container_image: "registry.access.redhat.com/ubi9" image_type: conda ``` Then run: ``` CONTAINER_BINARY=podman llama stack build --template ollama --image-type container --image-name registry.access.redhat.com/ubi9 Containerfile created successfully in /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile FROM registry.access.redhat.com/ubi9 WORKDIR /app RUN dnf -y update && dnf install -y iputils net-tools wget vim-minimal python3.11 python3.11-pip python3.11-wheel python3.11-setuptools && ln -s /bin/pip3.11 /bin/pip && ln -s /bin/python3.11 /bin/python && dnf clean all ENV UV_SYSTEM_PYTHON=1 RUN pip install uv RUN uv pip install --no-cache ollama nltk opentelemetry-sdk aiosqlite matplotlib datasets sqlite-vec scipy chromadb-client psycopg2-binary numpy scikit-learn openai redis pandas tqdm blobfile sentencepiece aiohttp requests pillow pymongo transformers autoevals opentelemetry-exporter-otlp-proto-http pypdf chardet aiosqlite fastapi fire httpx uvicorn RUN uv pip install --no-cache llama-stack RUN pip uninstall -y uv ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "ollama"] # Allows running as non-root user RUN mkdir -p /.llama /.cache RUN chmod -R g+rw /app /.llama /.cache PWD: /Users/leseb/Documents/AI/llama-stack Containerfile: /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile + podman build --platform linux/arm64 -t distribution-ollama:0.1.4 -f /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile . --progress=plain STEP 1/11: FROM registry.access.redhat.com/ubi9 STEP 2/11: WORKDIR /app --> Using cache d73dafd4caddd75bc29242a9031258fea759dc571c5bb53a64b5e6d86b3b1335 --> d73dafd4cadd STEP 3/11: RUN dnf -y update && dnf install -y iputils net-tools wget vim-minimal python3.11 python3.11-pip python3.11-wheel python3.11-setuptools && ln -s /bin/pip3.11 /bin/pip && ln -s /bin/python3.11 /bin/python && dnf clean all --> Using cache b74ad682db149771612a3ea1e4796e0760ab8a4e07c26ad672b46a86d38178c2 --> b74ad682db14 STEP 4/11: ENV UV_SYSTEM_PYTHON=1 --> Using cache 0812a05e6576506aa2fe646cbf239d0cb504cac30a50cb5cf4dc88e49039466d --> 0812a05e6576 STEP 5/11: RUN pip install uv --> Using cache a0ce1705f87e52f70f6eb34e66f67b68ebc7c1a073f4d2a664b189cfa89a4e88 --> a0ce1705f87e STEP 6/11: RUN uv pip install --no-cache ollama nltk opentelemetry-sdk aiosqlite matplotlib datasets sqlite-vec scipy chromadb-client psycopg2-binary numpy scikit-learn openai redis pandas tqdm blobfile sentencepiece aiohttp requests pillow pymongo transformers autoevals opentelemetry-exporter-otlp-proto-http pypdf chardet aiosqlite fastapi fire httpx uvicorn Using Python 3.11.9 environment at: /usr Resolved 107 packages in 1.78s Downloading kiwisolver (1.4MiB) Downloading aiohttp (1.6MiB) Downloading grpcio (5.4MiB) Downloading nltk (1.4MiB) Downloading transformers (9.5MiB) Downloading pydantic-core (1.7MiB) Downloading lxml (4.6MiB) Downloading psycopg2-binary (2.7MiB) Downloading scipy (33.8MiB) Downloading scikit-learn (12.0MiB) Downloading tokenizers (2.8MiB) Downloading fonttools (4.6MiB) Downloading pymongo (1.3MiB) Downloading rapidfuzz (1.4MiB) Downloading sentencepiece (1.2MiB) Downloading pyarrow (38.7MiB) Downloading matplotlib (8.1MiB) Downloading pycryptodomex (2.1MiB) Downloading pillow (4.2MiB) Downloading pandas (14.9MiB) Downloading numpy (13.6MiB) Building fire==0.7.0 Downloaded sentencepiece Downloaded kiwisolver Downloaded pymongo Downloaded rapidfuzz Downloaded nltk Downloaded aiohttp Built fire==0.7.0 Downloaded pydantic-core Downloaded pycryptodomex Downloaded psycopg2-binary Downloaded tokenizers Downloaded pillow Downloaded lxml Downloaded fonttools Downloaded grpcio Downloaded matplotlib Downloaded transformers Downloaded scikit-learn Downloaded numpy Downloaded pandas Downloaded scipy Downloaded pyarrow Prepared 107 packages in 3.03s Installed 107 packages in 62ms + aiohappyeyeballs==2.4.6 + aiohttp==3.11.13 + aiosignal==1.3.2 + aiosqlite==0.21.0 + annotated-types==0.7.0 + anyio==4.8.0 + attrs==25.1.0 + autoevals==0.0.120 + backoff==2.2.1 + blobfile==3.0.0 + braintrust-core==0.0.58 + certifi==2025.1.31 + chardet==5.2.0 + charset-normalizer==3.4.1 + chevron==0.14.0 + chromadb-client==0.6.3 + click==8.1.8 + contourpy==1.3.1 + cycler==0.12.1 + datasets==3.3.2 + deprecated==1.2.18 + dill==0.3.8 + distro==1.9.0 + dnspython==2.7.0 + fastapi==0.115.8 + filelock==3.17.0 + fire==0.7.0 + fonttools==4.56.0 + frozenlist==1.5.0 + fsspec==2024.12.0 + googleapis-common-protos==1.68.0 + grpcio==1.70.0 + h11==0.14.0 + httpcore==1.0.7 + httpx==0.28.1 + huggingface-hub==0.29.1 + idna==3.10 + importlib-metadata==8.5.0 + jiter==0.8.2 + joblib==1.4.2 + jsonschema==4.23.0 + jsonschema-specifications==2024.10.1 + kiwisolver==1.4.8 + levenshtein==0.26.1 + lxml==5.3.1 + matplotlib==3.10.0 + monotonic==1.6 + multidict==6.1.0 + multiprocess==0.70.16 + nltk==3.9.1 + numpy==1.26.4 + ollama==0.4.7 + openai==1.64.0 + opentelemetry-api==1.30.0 + opentelemetry-exporter-otlp-proto-common==1.30.0 + opentelemetry-exporter-otlp-proto-grpc==1.30.0 + opentelemetry-exporter-otlp-proto-http==1.30.0 + opentelemetry-proto==1.30.0 + opentelemetry-sdk==1.30.0 + opentelemetry-semantic-conventions==0.51b0 + orjson==3.10.15 + overrides==7.7.0 + packaging==24.2 + pandas==2.2.3 + pillow==11.1.0 + posthog==3.16.0 + propcache==0.3.0 + protobuf==5.29.3 + psycopg2-binary==2.9.10 + pyarrow==19.0.1 + pycryptodomex==3.21.0 + pydantic==2.10.6 + pydantic-core==2.27.2 + pymongo==4.11.1 + pyparsing==3.2.1 + pypdf==5.3.0 + python-dateutil==2.9.0.post0 + pytz==2025.1 + pyyaml==6.0.2 + rapidfuzz==3.12.1 + redis==5.2.1 + referencing==0.36.2 + regex==2024.11.6 + requests==2.32.3 + rpds-py==0.23.1 + safetensors==0.5.3 + scikit-learn==1.6.1 + scipy==1.15.2 + sentencepiece==0.2.0 + six==1.17.0 + sniffio==1.3.1 + sqlite-vec==0.1.6 + starlette==0.45.3 + tenacity==9.0.0 + termcolor==2.5.0 + threadpoolctl==3.5.0 + tokenizers==0.21.0 + tqdm==4.67.1 + transformers==4.49.0 + typing-extensions==4.12.2 + tzdata==2025.1 + urllib3==2.3.0 + uvicorn==0.34.0 + wrapt==1.17.2 + xxhash==3.5.0 + yarl==1.18.3 + zipp==3.21.0 --> 5b5b823605a1 STEP 7/11: RUN uv pip install --no-cache llama-stack Using Python 3.11.9 environment at: /usr Resolved 55 packages in 1.08s Downloading setuptools (1.2MiB) Downloading pygments (1.2MiB) Downloading llama-models (1.5MiB) Downloading tiktoken (1.1MiB) Downloaded tiktoken Downloaded llama-models Downloaded pygments Downloaded setuptools Prepared 15 packages in 402ms Installed 15 packages in 15ms + jinja2==3.1.5 + llama-models==0.1.4 + llama-stack==0.1.4 + llama-stack-client==0.1.4 + markdown-it-py==3.0.0 + markupsafe==3.0.2 + mdurl==0.1.2 + prompt-toolkit==3.0.50 + pyaml==25.1.0 + pygments==2.19.1 + python-dotenv==1.0.1 + rich==13.9.4 + setuptools==75.8.2 + tiktoken==0.9.0 + wcwidth==0.2.13 --> 38a037443807 STEP 8/11: RUN pip uninstall -y uv Found existing installation: uv 0.6.3 Uninstalling uv-0.6.3: Successfully uninstalled uv-0.6.3 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv --> 54f749dc5ece STEP 9/11: ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "ollama"] --> 481c138b1982 STEP 10/11: RUN mkdir -p /.llama /.cache --> 0fc174f014a8 STEP 11/11: RUN chmod -R g+rw /app /.llama /.cache COMMIT distribution-ollama:0.1.4 --> d41b4ab4b136 Successfully tagged localhost/distribution-ollama:0.1.4 d41b4ab4b1363bfbaf6239e6f313bcb37873ef4b5f2fd816a4ee55acf2ac54d3 + set +x Success! Build Successful! ``` UBI9 container successfully builds. Run the container: ``` podman run d41b4ab4b1363bfbaf6239e6f313bcb37873ef4b5f2fd816a4ee55acf2ac54d3 --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:213: Resolved 30 providers INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-inference => ollama INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: models => __routing_table__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inference => __autorouted__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-vector_io => sqlite-vec INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-safety => llama-guard INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: shields => __routing_table__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: safety => __autorouted__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: vector_io => __autorouted__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 10:01:52 -08:00
Yuan Tang	18ab1985da	fix: Make remote::vllm compatible with vLLM <= v0.6.3 (#1325 ) # What does this PR do? This is to be consistent with OpenAI API and support vLLM <= v0.6.3 References: * https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice * https://github.com/vllm-project/vllm/pull/10000 This fixes the error when running older versions of vLLM: ``` 00:50:19.834 [START] /v1/inference/chat-completion INFO 2025-02-28 00:50:20,203 httpx:1025: HTTP Request: POST https://api-xeai-granite-3-1-8b-instruct.apps.int.stc.ai.preprod.us-east-1.aws.paas.redhat.com/v1/chat/completions "HTTP/1.1 400 Bad Request" Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 235, in endpoint return await maybe_await(value) File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 201, in maybe_await return await value File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper result = await method(self, args, kwargs) File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 193, in chat_completion return await provider.chat_completion(params) File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper result = await method(self, args, kwargs) File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 286, in chat_completion return await self._nonstream_chat_completion(request, self.client) File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 292, in _nonstream_chat_completion r = client.chat.completions.create(params) File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 279, in wrapper return func(args, *kwargs) File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py", line 879, in create return self._post( File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1290, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in request return self._request( File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1071, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, When using `tool_choice`, `tools` must be set.', 'input': {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'What model are you?'}]}], 'model': 'granite-3-1-8b-instruct', 'max_tokens': 4096, 'stream': False, 'temperature': 0.0, 'tools': None, 'tool_choice': 'auto'}, 'ctx': {'error': ValueError('When using `tool_choice`, `tools` must be set.')}}]", 'type': 'BadRequestError', 'param': None, 'code': 400} INFO: 2600:1700:9d20:ac0::49:59736 - "POST /v1/inference/chat-completion HTTP/1.1" 500 Internal Server Error 00:50:20.266 [END] /v1/inference/chat-completion [StatusCode.OK] (431.99ms) ``` ## Test Plan All existing tests pass. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-28 12:48:49 -05:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
Reid	3b57d8ee88	feat: add prompt-format list (#1222 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)` Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1, 3.2 are not all models can show it with `prompt-format`, so cannot refer to `llama model list`, only refer to list when enter a invalid model, so it would be nice to help to check the valid models: ``` llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8 usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct before: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> after: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) -l, --list List the valid supported models Example: llama model prompt-format <options> $ llama model prompt-format -l ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Llama3.1-8B │ ├──────────────────────────────┤ │ Llama3.1-70B │ ├──────────────────────────────┤ │ Llama3.1-405B │ ├──────────────────────────────┤ │ Llama3.1-8B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-70B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-405B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-1B │ ├──────────────────────────────┤ │ Llama3.2-3B │ ├──────────────────────────────┤ │ Llama3.2-1B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-3B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision-Instruct │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision-Instruct │ └──────────────────────────────┘ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 09:27:22 -08:00
Yuan Tang	234408f411	docs: Add link to distributions guide in quick start guide (#1326 ) # What does this PR do? A couple of users have asked this question so I thought it would be a good idea to add a link.	2025-02-28 09:18:02 -08:00

1523 changed files with 542153 additions and 50442 deletions

12

.coveragerc Normal file

View file

 @ -0,0 +1,12 @@
 [run]
 omit =
     */tests/*
     */llama_stack/providers/*
     */llama_stack/templates/*
     .venv/*
     */llama_stack/cli/scripts/*
     */llama_stack/ui/*
     */llama_stack/distribution/ui/*
     */llama_stack/strong_typing/*
     */llama_stack/env.py
     */__init__.py

2

.github/CODEOWNERS vendored

View file

 @ -2,4 +2,4 @@
 # These owners will be the default owners for everything in
 # the repo. Unless a later match takes precedence,
 * @ashwinb @yanxi0830 @hardikjshah @dltn @raghotham @dineshyv @vladimirivic @sixianyi0721 @ehhuang @terrytangyuan
 * @ashwinb @yanxi0830 @hardikjshah @raghotham @ehhuang @terrytangyuan @leseb @bbrowning @reluctantfuturist @mattf @slekkala1

									
										30

.github/ISSUE_TEMPLATE/tech-debt.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,30 @@

				name: 🔧 Tech Debt

				description: Something that is functional but should be improved or optimizied

				labels: ["tech-debt"]

				body:

				- type: textarea

				  id: tech-debt-explanation

				  attributes:

				    label: 🤔 What is the technical debt you think should be addressed?

				    description: >

				      A clear and concise description of _what_ needs to be addressed - ensure you are describing

				      constitutes [technical debt](https://en.wikipedia.org/wiki/Technical_debt) and is not a bug

				      or feature request.

				  validations:

				    required: true

				- type: textarea

				  id: tech-debt-motivation

				  attributes:

				    label: 💡 What is the benefit of addressing this technical debt?

				    description: >

				      A clear and concise description of _why_ this work is needed.

				  validations:

				    required: true

				- type: textarea

				  id: other-thoughts

				  attributes:

				    label: Other thoughts

				    description: >

				      Any thoughts about how this may result in complexity in the codebase, or other trade-offs.

									
										10

.github/PULL_REQUEST_TEMPLATE.md
									
										vendored
									
										View file
										
				@ -1,10 +1,8 @@

				# What does this PR do?

				[Provide a short summary of what this PR does and why. Link to relevant issues if applicable.]

				<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->

				[//]: # (If resolving an issue, uncomment and update the line below)

				[//]: # (Closes #[issue-number])

				<!-- If resolving an issue, uncomment and update the line below -->

				<!-- Closes #[issue-number] -->

				## Test Plan

				[Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*]

				[//]: # (## Documentation)

				<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->

2

.github/TRIAGERS.md vendored Normal file

View file

 @ -0,0 +1,2 @@
 # This file documents Triage members in the Llama Stack community
  @franciscojavierarceo

									
										88

.github/actions/run-and-record-tests/action.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,88 @@

				name: 'Run and Record Tests'

				description: 'Run integration tests and handle recording/artifact upload'

				inputs:

				  test-subdirs:

				    description: 'Comma-separated list of test subdirectories to run'

				    required: true

				  test-pattern:

				    description: 'Regex pattern to pass to pytest -k'

				    required: false

				    default: ''

				  stack-config:

				    description: 'Stack configuration to use'

				    required: true

				  provider:

				    description: 'Provider to use for tests'

				    required: true

				  inference-mode:

				    description: 'Inference mode (record or replay)'

				    required: true

				  run-vision-tests:

				    description: 'Whether to run vision tests'

				    required: false

				    default: 'false'

				runs:

				  using: 'composite'

				  steps:

				    - name: Check Storage and Memory Available Before Tests

				      if: ${{ always() }}

				      shell: bash

				      run: |

				        free -h

				        df -h

				    - name: Run Integration Tests

				      shell: bash

				      run: |

				        uv run --no-sync ./scripts/integration-tests.sh \

				          --stack-config '${{ inputs.stack-config }}' \

				          --provider '${{ inputs.provider }}' \

				          --test-subdirs '${{ inputs.test-subdirs }}' \

				          --test-pattern '${{ inputs.test-pattern }}' \

				          --inference-mode '${{ inputs.inference-mode }}' \

				          ${{ inputs.run-vision-tests == 'true' && '--run-vision-tests' || '' }} \

				          | tee pytest-${{ inputs.inference-mode }}.log

				    - name: Commit and push recordings

				      if: ${{ inputs.inference-mode == 'record' }}

				      shell: bash

				      run: |

				        echo "Checking for recording changes"

				        git status --porcelain tests/integration/recordings/

				        if [[ -n $(git status --porcelain tests/integration/recordings/) ]]; then

				          echo "New recordings detected, committing and pushing"

				          git add tests/integration/recordings/

				          if [ "${{ inputs.run-vision-tests }}" == "true" ]; then

				            git commit -m "Recordings update from CI (vision)"

				          else

				            git commit -m "Recordings update from CI"

				          fi

				          git fetch origin ${{ github.ref_name }}

				          git rebase origin/${{ github.ref_name }}

				          echo "Rebased successfully"

				          git push origin HEAD:${{ github.ref_name }}

				          echo "Pushed successfully"

				        else

				          echo "No recording changes"

				        fi

				    - name: Write inference logs to file

				      if: ${{ always() }}

				      shell: bash

				      run: |

				        sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log || true

				    - name: Upload logs

				      if: ${{ always() }}

				      uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				      with:

				        name: logs-${{ github.run_id }}-${{ github.run_attempt || '' }}-${{ strategy.job-index }}

				        path: |

				          *.log

				        retention-days: 1

									
										23

.github/actions/setup-ollama/action.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,23 @@

				name: Setup Ollama

				description: Start Ollama

				inputs:

				  run-vision-tests:

				    description: 'Run vision tests: "true" or "false"'

				    required: false

				    default: 'false'

				runs:

				  using: "composite"

				  steps:

				    - name: Start Ollama

				      shell: bash

				      run: |

				        if [ "${{ inputs.run-vision-tests }}" == "true" ]; then

				          image="ollama-with-vision-model"

				        else

				          image="ollama-with-models"

				        fi

				        echo "Starting Ollama with image: $image"

				        docker run -d --name ollama -p 11434:11434 docker.io/llamastack/$image

				        echo "Verifying Ollama status..."

				        timeout 30 bash -c 'while ! curl -s -L http://127.0.0.1:11434; do sleep 1 && echo "."; done'

									
										43

.github/actions/setup-runner/action.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,43 @@

				name: Setup runner

				description: Prepare a runner for the tests (install uv, python, project dependencies, etc.)

				inputs:

				  python-version:

				    description: The Python version to use

				    required: false

				    default: "3.12"

				  client-version:

				    description: The llama-stack-client-python version to test against (latest or published)

				    required: false

				    default: "latest"

				runs:

				  using: "composite"

				  steps:

				    - name: Install uv

				      uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6.0.1

				      with:

				        python-version: ${{ inputs.python-version }}

				        version: 0.7.6

				    - name: Install dependencies

				      shell: bash

				      run: |

				        echo "Updating project dependencies via uv sync"

				        uv sync --all-groups

				        echo "Installing ad-hoc dependencies"

				        uv pip install faiss-cpu

				        # Install llama-stack-client-python based on the client-version input

				        if [ "${{ inputs.client-version }}" = "latest" ]; then

				          echo "Installing latest llama-stack-client-python from main branch"

				          uv pip install git+https://github.com/llamastack/llama-stack-client-python.git@main

				        elif [ "${{ inputs.client-version }}" = "published" ]; then

				          echo "Installing published llama-stack-client-python from PyPI"

				          uv pip install llama-stack-client

				        else

				          echo "Invalid client-version: ${{ inputs.client-version }}"

				          exit 1

				        fi

				        echo "Installed llama packages"

				        uv pip list | grep llama

									
										66

.github/actions/setup-test-environment/action.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,66 @@

				name: 'Setup Test Environment'

				description: 'Common setup steps for integration tests including dependencies, providers, and build'

				inputs:

				  python-version:

				    description: 'Python version to use'

				    required: true

				  client-version:

				    description: 'Client version (latest or published)'

				    required: true

				  provider:

				    description: 'Provider to setup (ollama or vllm)'

				    required: true

				    default: 'ollama'

				  run-vision-tests:

				    description: 'Whether to setup provider for vision tests'

				    required: false

				    default: 'false'

				  inference-mode:

				    description: 'Inference mode (record or replay)'

				    required: true

				runs:

				  using: 'composite'

				  steps:

				    - name: Install dependencies

				      uses: ./.github/actions/setup-runner

				      with:

				        python-version: ${{ inputs.python-version }}

				        client-version: ${{ inputs.client-version }}

				    - name: Setup ollama

				      if: ${{ inputs.provider == 'ollama' && inputs.inference-mode == 'record' }}

				      uses: ./.github/actions/setup-ollama

				      with:

				        run-vision-tests: ${{ inputs.run-vision-tests }}

				    - name: Setup vllm

				      if: ${{ inputs.provider == 'vllm' && inputs.inference-mode == 'record' }}

				      uses: ./.github/actions/setup-vllm

				    - name: Build Llama Stack

				      shell: bash

				      run: |

				        # Install llama-stack-client-python based on the client-version input

				        if [ "${{ inputs.client-version }}" = "latest" ]; then

				          echo "Installing latest llama-stack-client-python from main branch"

				          export LLAMA_STACK_CLIENT_DIR=git+https://github.com/llamastack/llama-stack-client-python.git@main

				        elif [ "${{ inputs.client-version }}" = "published" ]; then

				          echo "Installing published llama-stack-client-python from PyPI"

				          unset LLAMA_STACK_CLIENT_DIR

				        else

				          echo "Invalid client-version: ${{ inputs.client-version }}"

				          exit 1

				        fi

				        echo "Building Llama Stack"

				        LLAMA_STACK_DIR=. \

				          uv run --no-sync llama stack build --template ci-tests --image-type venv

				    - name: Configure git for commits

				      shell: bash

				      run: |

				        git config --local user.email "github-actions[bot]@users.noreply.github.com"

				        git config --local user.name "github-actions[bot]"

									
										27

.github/actions/setup-vllm/action.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,27 @@

				name: Setup VLLM

				description: Start VLLM

				runs:

				  using: "composite"

				  steps:

				    - name: Start VLLM

				      shell: bash

				      run: |

				        # Start vllm container

				        docker run -d \

				          --name vllm \

				          -p 8000:8000 \

				          --privileged=true \

				          quay.io/higginsd/vllm-cpu:65393ee064 \

				          --host 0.0.0.0 \

				          --port 8000 \

				          --enable-auto-tool-choice \

				          --tool-call-parser llama3_json \

				          --model /root/.cache/Llama-3.2-1B-Instruct \

				          --served-model-name meta-llama/Llama-3.2-1B-Instruct

				          # Wait for vllm to be ready

				          echo "Waiting for vllm to be ready..."

				          timeout 900 bash -c 'until curl -f http://localhost:8000/health; do

				            echo "Waiting for vllm..."

				            sleep 5

				          done'

									
										33

.github/dependabot.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,33 @@

				# GitHub Dependabot configuration

				version: 2

				updates:

				  # Enable version updates for GitHub Actions

				  - package-ecosystem: "github-actions"

				    directory: "/" # Will use the default workflow location of `.github/workflows`

				    schedule:

				      interval: "weekly"

				      day: "saturday"

				    commit-message:

				      prefix: chore(github-deps)

				  - package-ecosystem: "uv"

				    directory: "/"

				    schedule:

				      interval: "weekly"

				      day: "saturday"

				    labels:

				      - type/dependencies

				      - python

				    commit-message:

				      prefix: chore(python-deps)

				  - package-ecosystem: npm

				    directory: "/llama_stack/ui"

				    schedule:

				      interval: "weekly"

				      day: "saturday"

				    labels:

				      - type/dependencies

				      - javascript

				    commit-message:

				      prefix: chore(ui-deps)

									
										23

.github/workflows/README.md
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,23 @@

				# Llama Stack CI

				Llama Stack uses GitHub Actions for Continuous Integration (CI). Below is a table detailing what CI the project includes and the purpose.

				| Name | File | Purpose |

				| ---- | ---- | ------- |

				| Update Changelog | [changelog.yml](changelog.yml) | Creates PR for updating the CHANGELOG.md |

				| Installer CI | [install-script-ci.yml](install-script-ci.yml) | Test the installation script |

				| Integration Auth Tests | [integration-auth-tests.yml](integration-auth-tests.yml) | Run the integration test suite with Kubernetes authentication |

				| SqlStore Integration Tests | [integration-sql-store-tests.yml](integration-sql-store-tests.yml) | Run the integration test suite with SqlStore |

				| Integration Tests (Replay) | [integration-tests.yml](integration-tests.yml) | Run the integration test suite from tests/integration in replay mode |

				| Vector IO Integration Tests | [integration-vector-io-tests.yml](integration-vector-io-tests.yml) | Run the integration test suite with various VectorIO providers |

				| Pre-commit | [pre-commit.yml](pre-commit.yml) | Run pre-commit checks |

				| Test Llama Stack Build | [providers-build.yml](providers-build.yml) | Test llama stack build |

				| Python Package Build Test | [python-build-test.yml](python-build-test.yml) | Test building the llama-stack PyPI project |

				| Integration Tests (Record) | [record-integration-tests.yml](record-integration-tests.yml) | Run the integration test suite from tests/integration |

				| Check semantic PR titles | [semantic-pr.yml](semantic-pr.yml) | Ensure that PR titles follow the conventional commit spec |

				| Close stale issues and PRs | [stale_bot.yml](stale_bot.yml) | Run the Stale Bot action |

				| Test External Providers Installed via Module | [test-external-provider-module.yml](test-external-provider-module.yml) | Test External Provider installation via Python module |

				| Test External API and Providers | [test-external.yml](test-external.yml) | Test the External API and Provider mechanisms |

				| UI Tests | [ui-unit-tests.yml](ui-unit-tests.yml) | Run the UI test suite |

				| Unit Tests | [unit-tests.yml](unit-tests.yml) | Run the unit test suite |

				| Update ReadTheDocs | [update-readthedocs.yml](update-readthedocs.yml) | Update the Llama Stack ReadTheDocs site |

									
										31

.github/workflows/changelog.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,31 @@

				name: Update Changelog

				run-name: Creates PR for updating the CHANGELOG.md

				on:

				  release:

				    types: [published, unpublished, created, edited, deleted, released]

				permissions:

				  contents: read

				jobs:

				  generate_changelog:

				    name: Generate changelog

				    permissions:

				      contents: write  # for peter-evans/create-pull-request to create branch

				      pull-requests: write  # for peter-evans/create-pull-request to create a PR

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				        with:

				          ref: main

				          fetch-depth: 0

				      - run: |

				          python ./scripts/gen-changelog.py

				      - uses: peter-evans/create-pull-request@271a8d0340265f705b14b6d32b9829c1cb33d45e # v7.0.8

				        with:

				          title: 'docs: update CHANGELOG.md for ${{ github.ref_name }}'

				          commit-message: 'docs: update CHANGELOG.md for ${{ github.ref_name }}'

				          branch: create-pull-request/changelog

				          signoff: true

									
										355

.github/workflows/gha_workflow_llama_stack_tests.yml
									
										vendored
									
										View file
									
				@ -1,355 +0,0 @@

				name: "Run Llama-stack Tests"

				on:

				  #### Temporarily disable PR runs until tests run as intended within mainline.

				  #TODO Add this back.

				  #pull_request_target:

				  #  types: ["opened"]

				  #  branches:

				  #    - 'main'

				  #  paths:

				  #    - 'llama_stack/**/*.py'

				  #    - 'tests/**/*.py'

				  workflow_dispatch:

				    inputs:

				      runner:

				        description: 'GHA Runner Scale Set label to run workflow on.'

				        required: true

				        default: "llama-stack-gha-runner-gpu"

				      checkout_reference:

				        description: "The branch, tag, or SHA to checkout"

				        required: true

				        default: "main"

				      debug:

				        description: 'Run debugging steps?'

				        required: false

				        default: "true"

				      sleep_time:

				        description: '[DEBUG] sleep time for debugging'

				        required: true

				        default: "0"

				      provider_id:

				        description: 'ID of your provider'

				        required: true

				        default: "meta_reference"

				      model_id:

				        description: 'Shorthand name for target model ID (llama_3b or llama_8b)'

				        required: true

				        default: "llama_3b"

				      model_override_3b:

				        description: 'Specify shorthand model for <llama_3b> '

				        required: false

				        default: "Llama3.2-3B-Instruct"

				      model_override_8b:

				        description: 'Specify shorthand model for <llama_8b> '

				        required: false

				        default: "Llama3.1-8B-Instruct"

				env:

				  # ID used for each test's provider config

				  PROVIDER_ID: "${{ inputs.provider_id || 'meta_reference' }}"

				  # Path to model checkpoints within EFS volume

				  MODEL_CHECKPOINT_DIR: "/data/llama"

				  # Path to directory to run tests from

				  TESTS_PATH: "${{ github.workspace }}/llama_stack/providers/tests"

				  # Keep track of a list of model IDs that are valid to use within pytest fixture marks

				  AVAILABLE_MODEL_IDs: "llama_3b llama_8b"

				  # Shorthand name for model ID, used in pytest fixture marks

				  MODEL_ID: "${{ inputs.model_id || 'llama_3b' }}"

				  # Override the `llama_3b` / `llama_8b' models, else use the default.

				  LLAMA_3B_OVERRIDE: "${{ inputs.model_override_3b || 'Llama3.2-3B-Instruct' }}"

				  LLAMA_8B_OVERRIDE: "${{ inputs.model_override_8b || 'Llama3.1-8B-Instruct' }}"

				  # Defines which directories in TESTS_PATH to exclude from the test loop

				  EXCLUDED_DIRS: "__pycache__"

				  # Defines the output xml reports generated after a test is run

				  REPORTS_GEN: ""

				jobs:

				  execute_workflow:

				    name: Execute workload on Self-Hosted GPU k8s runner

				    permissions:

				      pull-requests: write

				    defaults:

				      run:

				        shell: bash

				    runs-on: ${{ inputs.runner != '' && inputs.runner || 'llama-stack-gha-runner-gpu' }}

				    if: always()

				    steps:

				      ##############################

				      #### INITIAL DEBUG CHECKS ####

				      ##############################

				      - name: "[DEBUG] Check content of the EFS mount"

				        id: debug_efs_volume

				        continue-on-error: true

				        if: inputs.debug == 'true'

				        run: |

				            echo "========= Content of the EFS mount ============="

				            ls -la ${{ env.MODEL_CHECKPOINT_DIR }}

				      - name: "[DEBUG] Get runner container OS information"

				        id: debug_os_info

				        if: ${{ inputs.debug == 'true' }}

				        run: |

				            cat /etc/os-release

				      - name: "[DEBUG] Print environment variables"

				        id: debug_env_vars

				        if: ${{ inputs.debug == 'true' }}

				        run: |

				            echo "PROVIDER_ID = ${PROVIDER_ID}"

				            echo "MODEL_CHECKPOINT_DIR = ${MODEL_CHECKPOINT_DIR}"

				            echo "AVAILABLE_MODEL_IDs = ${AVAILABLE_MODEL_IDs}"

				            echo "MODEL_ID = ${MODEL_ID}"

				            echo "LLAMA_3B_OVERRIDE = ${LLAMA_3B_OVERRIDE}"

				            echo "LLAMA_8B_OVERRIDE = ${LLAMA_8B_OVERRIDE}"

				            echo "EXCLUDED_DIRS = ${EXCLUDED_DIRS}"

				            echo "REPORTS_GEN = ${REPORTS_GEN}"

				      ############################

				      #### MODEL INPUT CHECKS ####

				      ############################

				      - name: "Check if env.model_id is valid"

				        id: check_model_id

				        run: |

				          if [[ " ${AVAILABLE_MODEL_IDs[@]} " =~ " ${MODEL_ID} " ]]; then

				            echo "Model ID '${MODEL_ID}' is valid."

				          else

				            echo "Model ID '${MODEL_ID}' is invalid. Terminating workflow."

				            exit 1

				          fi

				      #######################

				      #### CODE CHECKOUT ####

				      #######################

				      - name: "Checkout 'meta-llama/llama-stack' repository"

				        id: checkout_repo

				        uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.branch }}

				      - name: "[DEBUG] Content of the repository after checkout"

				        id: debug_content_after_checkout

				        if: ${{ inputs.debug == 'true' }}

				        run: |

				            ls -la ${GITHUB_WORKSPACE}

				      ##########################################################

				      ####              OPTIONAL SLEEP DEBUG                ####

				      #                                                        #

				      # Use to "exec" into the test k8s POD and run tests      #

				      # manually to identify what dependencies are being used. #

				      #                                                        #

				      ##########################################################

				      - name: "[DEBUG] sleep"

				        id: debug_sleep

				        if: ${{ inputs.debug == 'true' && inputs.sleep_time != '' }}

				        run: |

				            sleep ${{ inputs.sleep_time }}

				      ############################

				      #### UPDATE SYSTEM PATH ####

				      ############################

				      - name: "Update path: execute"

				        id: path_update_exec

				        run: |

				          # .local/bin is needed for certain libraries installed below to be recognized

				          # when calling their executable to install sub-dependencies

				          mkdir -p ${HOME}/.local/bin

				          echo "${HOME}/.local/bin" >> "$GITHUB_PATH"

				      #####################################

				      #### UPDATE CHECKPOINT DIRECTORY ####

				      #####################################

				      - name: "Update checkpoint directory"

				        id: checkpoint_update

				        run: |

				          echo "Checkpoint directory: ${MODEL_CHECKPOINT_DIR}/$LLAMA_3B_OVERRIDE"

				          if [ "${MODEL_ID}" = "llama_3b" ] && [ -d "${MODEL_CHECKPOINT_DIR}/${LLAMA_3B_OVERRIDE}" ]; then

				            echo "MODEL_CHECKPOINT_DIR=${MODEL_CHECKPOINT_DIR}/${LLAMA_3B_OVERRIDE}" >> "$GITHUB_ENV"

				          elif [ "${MODEL_ID}" = "llama_8b" ] && [ -d "${MODEL_CHECKPOINT_DIR}/${LLAMA_8B_OVERRIDE}" ]; then

				            echo "MODEL_CHECKPOINT_DIR=${MODEL_CHECKPOINT_DIR}/${LLAMA_8B_OVERRIDE}" >> "$GITHUB_ENV"

				          else

				            echo "MODEL_ID & LLAMA_*B_OVERRIDE are not a valid pairing. Terminating workflow."

				            exit 1

				          fi

				      - name: "[DEBUG] Checkpoint update check"

				        id: debug_checkpoint_update

				        if: ${{ inputs.debug == 'true' }}

				        run: |

				          echo "MODEL_CHECKPOINT_DIR (after update) = ${MODEL_CHECKPOINT_DIR}"

				      ##################################

				      #### DEPENDENCY INSTALLATIONS ####

				      ##################################

				      - name: "Installing 'apt' required packages"

				        id: install_apt

				        run: |

				          echo "[STEP] Installing 'apt' required packages"

				          sudo apt update -y

				          sudo apt install -y python3 python3-pip npm wget

				      - name: "Installing packages with 'curl'"

				        id: install_curl

				        run: |

				          curl -fsSL https://ollama.com/install.sh | sh

				      - name: "Installing packages with 'wget'"

				        id: install_wget

				        run: |

				          wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

				          chmod +x Miniconda3-latest-Linux-x86_64.sh

				          ./Miniconda3-latest-Linux-x86_64.sh -b install -c pytorch -c nvidia faiss-gpu=1.9.0

				          # Add miniconda3 bin to system path

				          echo "${HOME}/miniconda3/bin" >> "$GITHUB_PATH"

				      - name: "Installing packages with 'npm'"

				        id: install_npm_generic

				        run: |

				          sudo npm install -g junit-merge

				      - name: "Installing pip dependencies"

				        id: install_pip_generic

				        run: |

				          echo "[STEP] Installing 'llama-stack' models"

				          pip install -U pip setuptools

				          pip install -r requirements.txt

				          pip install -e .

				          pip install -U \

				            torch torchvision \

				            pytest pytest_asyncio \

				            fairscale lm-format-enforcer \

				            zmq chardet pypdf \

				            pandas sentence_transformers together \

				            aiosqlite

				      - name: "Installing packages with conda"

				        id: install_conda_generic

				        run: |

				          conda install -q -c pytorch -c nvidia faiss-gpu=1.9.0

				      #############################################################

				      #### TESTING TO BE DONE FOR BOTH PRS AND MANUAL DISPATCH ####

				      #############################################################

				      - name: "Run Tests: Loop"

				        id: run_tests_loop

				        working-directory: "${{ github.workspace }}"

				        run: |

				          pattern=""

				          for dir in llama_stack/providers/tests/*; do

				            if [ -d "$dir" ]; then

				              dir_name=$(basename "$dir")

				              if [[ ! " $EXCLUDED_DIRS " =~ " $dir_name " ]]; then

				                for file in "$dir"/test_*.py; do

				                  test_name=$(basename "$file")

				                  new_file="result-${dir_name}-${test_name}.xml"

				                  if torchrun $(which pytest) -s -v ${TESTS_PATH}/${dir_name}/${test_name} -m "${PROVIDER_ID} and ${MODEL_ID}" \

				                     --junitxml="${{ github.workspace }}/${new_file}"; then

				                    echo "Ran test: ${test_name}"

				                  else

				                    echo "Did NOT run test: ${test_name}"

				                  fi

				                  pattern+="${new_file} "

				                done

				              fi

				            fi

				          done

				          echo "REPORTS_GEN=$pattern" >> "$GITHUB_ENV"

				      - name: "Test Summary: Merge"

				        id: test_summary_merge

				        working-directory: "${{ github.workspace }}"

				        run: |

				          echo "Merging the following test result files: ${REPORTS_GEN}"

				          # Defaults to merging them into 'merged-test-results.xml'

				          junit-merge ${{ env.REPORTS_GEN }}

				      ############################################

				      #### AUTOMATIC TESTING ON PULL REQUESTS ####

				      ############################################

				      #### Run tests ####

				      - name: "PR - Run Tests"

				        id: pr_run_tests

				        working-directory: "${{ github.workspace }}"

				        if: github.event_name == 'pull_request_target'

				        run: |

				          echo "[STEP] Running PyTest tests at 'GITHUB_WORKSPACE' path: ${GITHUB_WORKSPACE} | path: ${{ github.workspace }}"

				          # (Optional) Add more tests here.

				          # Merge test results with 'merged-test-results.xml' from above.

				          # junit-merge <new-test-results> merged-test-results.xml

				      #### Create test summary ####

				      - name: "PR - Test Summary"

				        id: pr_test_summary_create

				        if: github.event_name == 'pull_request_target'

				        uses: test-summary/action@v2

				        with:

				          paths: "${{ github.workspace }}/merged-test-results.xml"

				          output: test-summary.md

				      - name: "PR - Upload Test Summary"

				        id: pr_test_summary_upload

				        if: github.event_name == 'pull_request_target'

				        uses: actions/upload-artifact@v3

				        with:

				          name: test-summary

				          path: test-summary.md

				      #### Update PR request ####

				      - name: "PR - Update comment"

				        id: pr_update_comment

				        if: github.event_name == 'pull_request_target'

				        uses: thollander/actions-comment-pull-request@v2

				        with:

				          filePath: test-summary.md

				      ########################

				      #### MANUAL TESTING ####

				      ########################

				      #### Run tests ####

				      - name: "Manual - Run Tests: Prep"

				        id: manual_run_tests

				        working-directory: "${{ github.workspace }}"

				        if: github.event_name == 'workflow_dispatch'

				        run: |

				          echo "[STEP] Running PyTest tests at 'GITHUB_WORKSPACE' path: ${{ github.workspace }}"

				          #TODO Use this when collection errors are resolved

				          # pytest -s -v -m "${PROVIDER_ID} and ${MODEL_ID}" --junitxml="${{ github.workspace }}/merged-test-results.xml"

				          # (Optional) Add more tests here.

				          # Merge test results with 'merged-test-results.xml' from above.

				          # junit-merge <new-test-results> merged-test-results.xml

				      #### Create test summary ####

				      - name: "Manual - Test Summary"

				        id: manual_test_summary

				        if: always() && github.event_name == 'workflow_dispatch'

				        uses: test-summary/action@v2

				        with:

				          paths: "${{ github.workspace }}/merged-test-results.xml"

									
										39

.github/workflows/install-script-ci.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,39 @@

				name: Installer CI

				run-name: Test the installation script

				on:

				  pull_request:

				    paths:

				      - 'scripts/install.sh'

				  push:

				    paths:

				      - 'scripts/install.sh'

				  schedule:

				    - cron: '0 2 * * *'  # every day at 02:00 UTC

				jobs:

				  lint:

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # 5.0.0

				      - name: Run ShellCheck on install.sh

				        run: shellcheck scripts/install.sh

				  smoke-test-on-dev:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Build a single provider

				        run: |

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync \

				            llama stack build --template starter --image-type container --image-name test

				      - name: Run installer end-to-end

				        run: |

				          IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)

				          ./scripts/install.sh --image $IMAGE_ID

									
										112

.github/workflows/integration-auth-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,112 @@

				name: Integration Auth Tests

				run-name: Run the integration test suite with Kubernetes authentication

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'distributions/**'

				      - 'llama_stack/**'

				      - '!llama_stack/ui/**'

				      - 'tests/integration/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - 'requirements.txt'

				      - '.github/workflows/integration-auth-tests.yml' # This workflow

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  test-matrix:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        auth-provider: [oauth2_token]

				      fail-fast: false # we want to run all tests regardless of failure

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Install minikube

				        if: ${{ matrix.auth-provider == 'kubernetes' }}

				        uses: medyagh/setup-minikube@e3c7f79eb1e997eabccc536a6cf318a2b0fe19d9 # v0.0.20

				      - name: Start minikube

				        if: ${{ matrix.auth-provider == 'oauth2_token' }}

				        run: |

				          minikube start

				          kubectl get pods -A

				      - name: Configure Kube Auth

				        if: ${{ matrix.auth-provider == 'oauth2_token' }}

				        run: |

				          kubectl create namespace llama-stack

				          kubectl create serviceaccount llama-stack-auth -n llama-stack

				          kubectl create token llama-stack-auth -n llama-stack > llama-stack-auth-token

				      - name: Set Kubernetes Config

				        if: ${{ matrix.auth-provider == 'oauth2_token' }}

				        run: |

				          echo "KUBERNETES_API_SERVER_URL=$(kubectl get --raw /.well-known/openid-configuration| jq -r .jwks_uri)" >> $GITHUB_ENV

				          echo "KUBERNETES_CA_CERT_PATH=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.certificate-authority}')" >> $GITHUB_ENV

				          echo "KUBERNETES_ISSUER=$(kubectl get --raw /.well-known/openid-configuration| jq -r .issuer)" >> $GITHUB_ENV

				          echo "KUBERNETES_AUDIENCE=$(kubectl create token llama-stack-auth -n llama-stack --duration=1h | cut -d. -f2 | base64 -d | jq -r '.aud[0]')" >> $GITHUB_ENV

				          echo "TOKEN=$(cat llama-stack-auth-token)" >> $GITHUB_ENV

				      - name: Set Kube Auth Config and run server

				        env:

				          INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"

				        if: ${{ matrix.auth-provider == 'oauth2_token' }}

				        run: |

				          run_dir=$(mktemp -d)

				          cat <<'EOF' > $run_dir/run.yaml

				          version: '2'

				          image_name: kube

				          apis: []

				          providers: {}

				          server:

				            port: 8321

				          EOF

				          yq eval '.server.auth.provider_config.type = "${{ matrix.auth-provider }}"' -i $run_dir/run.yaml

				          yq eval '.server.auth.provider_config.tls_cafile = "${{ env.KUBERNETES_CA_CERT_PATH }}"' -i $run_dir/run.yaml

				          yq eval '.server.auth.provider_config.issuer = "${{ env.KUBERNETES_ISSUER }}"' -i $run_dir/run.yaml

				          yq eval '.server.auth.provider_config.audience = "${{ env.KUBERNETES_AUDIENCE }}"' -i $run_dir/run.yaml

				          yq eval '.server.auth.provider_config.jwks.uri = "${{ env.KUBERNETES_API_SERVER_URL }}"' -i $run_dir/run.yaml

				          yq eval '.server.auth.provider_config.jwks.token = "${{ env.TOKEN }}"' -i $run_dir/run.yaml

				          cat $run_dir/run.yaml

				          nohup uv run llama stack run $run_dir/run.yaml --image-type venv > server.log 2>&1 &

				      - name: Wait for Llama Stack server to be ready

				        run: |

				          echo "Waiting for Llama Stack server..."

				          for i in {1..30}; do

				            if curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://localhost:8321/v1/health | grep -q "OK"; then

				              echo "Llama Stack server is up!"

				              if grep -q "Enabling authentication with provider: ${{ matrix.auth-provider }}" server.log; then

				                echo "Llama Stack server is configured to use ${{ matrix.auth-provider }} auth"

				                exit 0

				              else

				                echo "Llama Stack server is not configured to use ${{ matrix.auth-provider }} auth"

				                cat server.log

				                exit 1

				              fi

				            fi

				            sleep 1

				          done

				          echo "Llama Stack server failed to start"

				          cat server.log

				          exit 1

				      - name: Test auth

				        run: |

				          curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://127.0.0.1:8321/v1/providers|jq

									
										72

.github/workflows/integration-sql-store-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,72 @@

				name: SqlStore Integration Tests

				run-name: Run the integration test suite with SqlStore

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'llama_stack/providers/utils/sqlstore/**'

				      - 'tests/integration/sqlstore/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - 'requirements.txt'

				      - '.github/workflows/integration-sql-store-tests.yml' # This workflow

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  test-postgres:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        python-version: ["3.12", "3.13"]

				      fail-fast: false

				    services:

				      postgres:

				        image: postgres:15

				        env:

				          POSTGRES_USER: llamastack

				          POSTGRES_PASSWORD: llamastack

				          POSTGRES_DB: llamastack

				        ports:

				          - 5432:5432

				        options: >-

				          --health-cmd pg_isready

				          --health-interval 10s

				          --health-timeout 5s

				          --health-retries 5

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Run SqlStore Integration Tests

				        env:

				          ENABLE_POSTGRES_TESTS: "true"

				          POSTGRES_HOST: localhost

				          POSTGRES_PORT: 5432

				          POSTGRES_DB: llamastack

				          POSTGRES_USER: llamastack

				          POSTGRES_PASSWORD: llamastack

				        run: |

				          uv run pytest -sv tests/integration/providers/utils/sqlstore/

				      - name: Upload test logs

				        if: ${{ always() }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: postgres-test-logs-${{ github.run_id }}-${{ github.run_attempt }}-${{ matrix.python-version }}

				          path: |

				            *.log

				          retention-days: 1

									
										87

.github/workflows/integration-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,87 @@

				name: Integration Tests (Replay)

				run-name: Run the integration test suite from tests/integration in replay mode

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    types: [opened, synchronize, reopened]

				    paths:

				      - 'llama_stack/**'

				      - '!llama_stack/ui/**'

				      - 'tests/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - '.github/workflows/integration-tests.yml' # This workflow

				      - '.github/actions/setup-ollama/action.yml'

				      - '.github/actions/setup-test-environment/action.yml'

				      - '.github/actions/run-and-record-tests/action.yml'

				  schedule:

				    # If changing the cron schedule, update the provider in the test-matrix job

				    - cron: '0 0 * * *'  # (test latest client) Daily at 12 AM UTC

				    - cron: '1 0 * * 0'  # (test vllm) Weekly on Sunday at 1 AM UTC

				  workflow_dispatch:

				    inputs:

				      test-all-client-versions:

				        description: 'Test against both the latest and published versions'

				        type: boolean

				        default: false

				      test-provider:

				        description: 'Test against a specific provider'

				        type: string

				        default: 'ollama'

				      test-subdirs:

				        description: 'Comma-separated list of test subdirectories to run'

				        type: string

				        default: ''

				      test-pattern:

				        description: 'Regex pattern to pass to pytest -k'

				        type: string

				        default: ''

				concurrency:

				  # Skip concurrency for pushes to main - each commit should be tested independently

				  group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}

				  cancel-in-progress: true

				jobs:

				  run-replay-mode-tests:

				    runs-on: ubuntu-latest

				    name: ${{ format('Integration Tests ({0}, {1}, {2}, client={3}, vision={4})', matrix.client-type, matrix.provider, matrix.python-version, matrix.client-version, matrix.run-vision-tests) }}

				    strategy:

				      fail-fast: false

				      matrix:

				        client-type: [library, server]

				        # Use vllm on weekly schedule, otherwise use test-provider input (defaults to ollama)

				        provider: ${{ (github.event.schedule == '1 0 * * 0') && fromJSON('["vllm"]') || fromJSON(format('["{0}"]', github.event.inputs.test-provider || 'ollama')) }}

				        # Use Python 3.13 only on nightly schedule (daily latest client test), otherwise use 3.12

				        python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}

				        client-version: ${{ (github.event.schedule == '0 0 * * *' || github.event.inputs.test-all-client-versions == 'true') && fromJSON('["published", "latest"]') || fromJSON('["latest"]') }}

				        run-vision-tests: [true, false]

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Setup test environment

				        uses: ./.github/actions/setup-test-environment

				        with:

				          python-version: ${{ matrix.python-version }}

				          client-version: ${{ matrix.client-version }}

				          provider: ${{ matrix.provider }}

				          run-vision-tests: ${{ matrix.run-vision-tests }}

				          inference-mode: 'replay'

				      - name: Run tests

				        uses: ./.github/actions/run-and-record-tests

				        with:

				          test-subdirs: ${{ inputs.test-subdirs }}

				          test-pattern: ${{ inputs.test-pattern }}

				          stack-config: ${{ matrix.client-type == 'library' && 'ci-tests' || 'server:ci-tests' }}

				          provider: ${{ matrix.provider }}

				          inference-mode: 'replay'

				          run-vision-tests: ${{ matrix.run-vision-tests }}

									
										203

.github/workflows/integration-vector-io-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,203 @@

				name: Vector IO Integration Tests

				run-name: Run the integration test suite with various VectorIO providers

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'llama_stack/**'

				      - '!llama_stack/ui/**'

				      - 'tests/integration/vector_io/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - 'requirements.txt'

				      - '.github/workflows/integration-vector-io-tests.yml' # This workflow

				  schedule:

				    - cron: '0 0 * * *'  # (test on python 3.13) Daily at 12 AM UTC

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}

				  cancel-in-progress: true

				jobs:

				  test-matrix:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector", "remote::weaviate", "remote::qdrant"]

				        python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}

				      fail-fast: false # we want to run all tests regardless of failure

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				        with:

				          python-version: ${{ matrix.python-version }}

				      - name: Setup Chroma

				        if: matrix.vector-io-provider == 'remote::chromadb'

				        run: |

				          docker run --rm -d --pull always \

				            --name chromadb \

				            -p 8000:8000 \

				            -v ~/chroma:/chroma/chroma \

				            -e IS_PERSISTENT=TRUE \

				            -e ANONYMIZED_TELEMETRY=FALSE \

				            chromadb/chroma:latest

				      - name: Setup Weaviate

				        if: matrix.vector-io-provider == 'remote::weaviate'

				        run: |

				          docker run --rm -d --pull always \

				          --name weaviate \

				          -p 8080:8080 -p 50051:50051 \

				          cr.weaviate.io/semitechnologies/weaviate:1.32.0

				      - name: Start PGVector DB

				        if: matrix.vector-io-provider == 'remote::pgvector'

				        run: |

				          docker run -d \

				            --name pgvector \

				            -e POSTGRES_USER=llamastack \

				            -e POSTGRES_PASSWORD=llamastack \

				            -e POSTGRES_DB=llamastack \

				            -p 5432:5432 \

				            pgvector/pgvector:pg17

				      - name: Wait for PGVector to be ready

				        if: matrix.vector-io-provider == 'remote::pgvector'

				        run: |

				          echo "Waiting for Postgres to be ready..."

				          for i in {1..30}; do

				            if docker exec pgvector pg_isready -U llamastack > /dev/null 2>&1; then

				              echo "Postgres is ready!"

				              break

				            fi

				            echo "Not ready yet... ($i)"

				            sleep 1

				          done

				      - name: Enable pgvector extension

				        if: matrix.vector-io-provider == 'remote::pgvector'

				        run: |

				          PGPASSWORD=llamastack psql -h localhost -U llamastack -d llamastack \

				            -c "CREATE EXTENSION IF NOT EXISTS vector;"

				      - name: Setup Qdrant

				        if: matrix.vector-io-provider == 'remote::qdrant'

				        run: |

				          docker run --rm -d --pull always \

				            --name qdrant \

				            -p 6333:6333 \

				            qdrant/qdrant

				      - name: Wait for Qdrant to be ready

				        if: matrix.vector-io-provider == 'remote::qdrant'

				        run: |

				          echo "Waiting for Qdrant to be ready..."

				          for i in {1..30}; do

				            if curl -s http://localhost:6333/collections | grep -q '"status":"ok"'; then

				              echo "Qdrant is ready!"

				              exit 0

				            fi

				            sleep 2

				          done

				          echo "Qdrant failed to start"

				          docker logs qdrant

				          exit 1

				      - name: Wait for ChromaDB to be ready

				        if: matrix.vector-io-provider == 'remote::chromadb'

				        run: |

				          echo "Waiting for ChromaDB to be ready..."

				          for i in {1..30}; do

				            if curl -s http://localhost:8000/api/v2/heartbeat | grep -q "nanosecond heartbeat"; then

				              echo "ChromaDB is ready!"

				              exit 0

				            fi

				            sleep 2

				          done

				          echo "ChromaDB failed to start"

				          docker logs chromadb

				          exit 1

				      - name: Wait for Weaviate to be ready

				        if: matrix.vector-io-provider == 'remote::weaviate'

				        run: |

				          echo "Waiting for Weaviate to be ready..."

				          for i in {1..30}; do

				            if curl -s http://localhost:8080 | grep -q "https://weaviate.io/developers/weaviate/current/"; then

				              echo "Weaviate is ready!"

				              exit 0

				            fi

				            sleep 2

				          done

				          echo "Weaviate failed to start"

				          docker logs weaviate

				          exit 1

				      - name: Build Llama Stack

				        run: |

				          uv run --no-sync llama stack build --template ci-tests --image-type venv

				      - name: Check Storage and Memory Available Before Tests

				        if: ${{ always() }}

				        run: |

				          free -h

				          df -h

				      - name: Run Vector IO Integration Tests

				        env:

				          ENABLE_CHROMADB: ${{ matrix.vector-io-provider == 'remote::chromadb' && 'true' || '' }}

				          CHROMADB_URL: ${{ matrix.vector-io-provider == 'remote::chromadb' && 'http://localhost:8000' || '' }}

				          ENABLE_PGVECTOR: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'true' || '' }}

				          PGVECTOR_HOST: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'localhost' || '' }}

				          PGVECTOR_PORT: ${{ matrix.vector-io-provider == 'remote::pgvector' && '5432' || '' }}

				          PGVECTOR_DB: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}

				          PGVECTOR_USER: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}

				          PGVECTOR_PASSWORD: ${{ matrix.vector-io-provider == 'remote::pgvector' && 'llamastack' || '' }}

				          ENABLE_QDRANT: ${{ matrix.vector-io-provider == 'remote::qdrant' && 'true' || '' }}

				          QDRANT_URL: ${{ matrix.vector-io-provider == 'remote::qdrant' && 'http://localhost:6333' || '' }}

				          ENABLE_WEAVIATE: ${{ matrix.vector-io-provider == 'remote::weaviate' && 'true' || '' }}

				          WEAVIATE_CLUSTER_URL: ${{ matrix.vector-io-provider == 'remote::weaviate' && 'localhost:8080' || '' }}

				        run: |

				          uv run --no-sync \

				            pytest -sv --stack-config="files=inline::localfs,inference=inline::sentence-transformers,vector_io=${{ matrix.vector-io-provider }}" \

				            tests/integration/vector_io \

				            --embedding-model inline::sentence-transformers/all-MiniLM-L6-v2

				      - name: Check Storage and Memory Available After Tests

				        if: ${{ always() }}

				        run: |

				          free -h

				          df -h

				      - name: Create sanitized provider name

				        if: ${{ always() }}

				        run: |

				          echo "SANITIZED_PROVIDER=$(echo "${{ matrix.vector-io-provider }}" | tr ':' '_')" >> $GITHUB_ENV

				      - name: Write ChromaDB logs to file

				        if: ${{ always() && matrix.vector-io-provider == 'remote::chromadb' }}

				        run: |

				          docker logs chromadb > chromadb.log

				      - name: Write Qdrant logs to file

				        if: ${{ always() && matrix.vector-io-provider == 'remote::qdrant' }}

				        run: |

				          docker logs qdrant > qdrant.log

				      - name: Upload all logs to artifacts

				        if: ${{ always() }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: vector-io-logs-${{ github.run_id }}-${{ github.run_attempt }}-${{ env.SANITIZED_PROVIDER }}-${{ matrix.python-version }}

				          path: |

				            *.log

				          retention-days: 1

									
										79

.github/workflows/pre-commit.yml
									
										vendored
									
										View file
										
				@ -1,29 +1,100 @@

				name: Pre-commit

				run-name: Run pre-commit checks

				on:

				  pull_request:

				  push:

				    branches: [main]

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  pre-commit:

				    runs-on: ubuntu-latest

				    permissions:

				      contents: write

				      pull-requests: write

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@v4

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				        with:

				          # For dependabot PRs, we need to checkout with a token that can push changes

				          token: ${{ github.actor == 'dependabot[bot]' && secrets.GITHUB_TOKEN || github.token }}

				          # Fetch full history for dependabot PRs to allow commits

				          fetch-depth: ${{ github.actor == 'dependabot[bot]' && 0 || 1 }}

				      - name: Set up Python

				        uses: actions/setup-python@v5

				        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0

				        with:

				          python-version: '3.11'

				          python-version: '3.12'

				          cache: pip

				          cache-dependency-path: |

				            **/requirements*.txt

				            .pre-commit-config.yaml

				      - uses: pre-commit/action@v3.0.1

				      # npm ci may fail -

				      #   npm error `npm ci` can only install packages when your package.json and package-lock.json or npm-shrinkwrap.json are in sync. Please update your lock file with `npm install` before continuing.

				      #   npm error Invalid: lock file's llama-stack-client@0.2.17 does not satisfy llama-stack-client@0.2.18

				      # - name: Set up Node.js

				      #   uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af # v4.1.0

				      #   with:

				      #     node-version: '20'

				      #     cache: 'npm'

				      #     cache-dependency-path: 'llama_stack/ui/'

				      # - name: Install npm dependencies

				      #   run: npm ci

				      #   working-directory: llama_stack/ui

				      - uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1

				        continue-on-error: true

				        env:

				          SKIP: no-commit-to-branch

				          RUFF_OUTPUT_FORMAT: github

				      - name: Debug

				        run: |

				          echo "github.ref: ${{ github.ref }}"

				          echo "github.actor: ${{ github.actor }}"

				      - name: Commit changes for dependabot PRs

				        if: github.actor == 'dependabot[bot]'

				        run: |

				          if ! git diff --exit-code || [ -n "$(git ls-files --others --exclude-standard)" ]; then

				            git config --local user.email "github-actions[bot]@users.noreply.github.com"

				            git config --local user.name "github-actions[bot]"

				            # Ensure we're on the correct branch

				            git checkout -B ${{ github.head_ref }}

				            git add -A

				            git commit -m "Apply pre-commit fixes"

				            # Pull latest changes from the PR branch and rebase our commit on top

				            git pull --rebase origin ${{ github.head_ref }}

				            # Push to the PR branch

				            git push origin ${{ github.head_ref }}

				            echo "Pre-commit fixes committed and pushed"

				          else

				            echo "No changes to commit"

				          fi

				      - name: Verify if there are any diff files after pre-commit

				        if: github.actor != 'dependabot[bot]'

				        run: |

				          git diff --exit-code || (echo "There are uncommitted changes, run pre-commit locally and commit again" && exit 1)

				      - name: Verify if there are any new files after pre-commit

				        if: github.actor != 'dependabot[bot]'

				        run: |

				          unstaged_files=$(git ls-files --others --exclude-standard)

				          if [ -n "$unstaged_files" ]; then

				            echo "There are uncommitted new files, run pre-commit locally and commit again"

				            echo "$unstaged_files"

				            exit 1

				          fi

									
										154

.github/workflows/providers-build.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,154 @@

				name: Test Llama Stack Build

				run-name: Test llama stack build

				on:

				  push:

				    branches:

				      - main

				    paths:

				      - 'llama_stack/cli/stack/build.py'

				      - 'llama_stack/cli/stack/_build.py'

				      - 'llama_stack/core/build.*'

				      - 'llama_stack/core/*.sh'

				      - '.github/workflows/providers-build.yml'

				      - 'llama_stack/distributions/**'

				      - 'pyproject.toml'

				  pull_request:

				    paths:

				      - 'llama_stack/cli/stack/build.py'

				      - 'llama_stack/cli/stack/_build.py'

				      - 'llama_stack/core/build.*'

				      - 'llama_stack/core/*.sh'

				      - '.github/workflows/providers-build.yml'

				      - 'llama_stack/distributions/**'

				      - 'pyproject.toml'

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  generate-matrix:

				    runs-on: ubuntu-latest

				    outputs:

				      distros: ${{ steps.set-matrix.outputs.distros }}

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Generate Distribution List

				        id: set-matrix

				        run: |

				          distros=$(ls llama_stack/distributions/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]')

				          echo "distros=$distros" >> "$GITHUB_OUTPUT"

				  build:

				    needs: generate-matrix

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        distro: ${{ fromJson(needs.generate-matrix.outputs.distros) }}

				        image-type: [venv, container]

				      fail-fast: false # We want to run all jobs even if some fail

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Print build dependencies

				        run: |

				          uv run llama stack build --distro ${{ matrix.distro }} --image-type ${{ matrix.image-type }} --image-name test --print-deps-only

				      - name: Run Llama Stack Build

				        run: |

				          # USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead

				          # LLAMA_STACK_DIR is set to the current directory so we are building from the source

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --distro ${{ matrix.distro }} --image-type ${{ matrix.image-type }} --image-name test

				      - name: Print dependencies in the image

				        if: matrix.image-type == 'venv'

				        run: |

				          uv pip list

				  build-single-provider:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Build a single provider

				        run: |

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --image-type venv --image-name test --providers inference=remote::ollama

				  build-custom-container-distribution:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Build a single provider

				        run: |

				          yq -i '.image_type = "container"' llama_stack/distributions/ci-tests/build.yaml

				          yq -i '.image_name = "test"' llama_stack/distributions/ci-tests/build.yaml

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config llama_stack/distributions/ci-tests/build.yaml

				      - name: Inspect the container image entrypoint

				        run: |

				          IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)

				          entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)

				          echo "Entrypoint: $entrypoint"

				          if [ "$entrypoint" != "[python -m llama_stack.core.server.server /app/run.yaml]" ]; then

				            echo "Entrypoint is not correct"

				            exit 1

				          fi

				  build-ubi9-container-distribution:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Pin distribution to UBI9 base

				        run: |

				          yq -i '

				            .image_type    = "container" |

				            .image_name    = "ubi9-test" |

				            .distribution_spec.container_image = "registry.access.redhat.com/ubi9:latest"

				          ' llama_stack/distributions/ci-tests/build.yaml

				      - name: Build dev container (UBI9)

				        env:

				          USE_COPY_NOT_MOUNT: "true"

				          LLAMA_STACK_DIR: "."

				        run: |

				          uv run llama stack build --config llama_stack/distributions/ci-tests/build.yaml

				      - name: Inspect UBI9 image

				        run: |

				          IMAGE_ID=$(docker images --format "{{.Repository}}:{{.Tag}}" | head -n 1)

				          entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)

				          echo "Entrypoint: $entrypoint"

				          if [ "$entrypoint" != "[python -m llama_stack.core.server.server /app/run.yaml]" ]; then

				            echo "Entrypoint is not correct"

				            exit 1

				          fi

				          echo "Checking /etc/os-release in $IMAGE_ID"

				          docker run --rm --entrypoint sh "$IMAGE_ID" -c \

				              'source /etc/os-release && echo "$ID"' \

				              | grep -qE '^(rhel|ubi)$' \

				              || { echo "Base image is not UBI 9!"; exit 1; }

									
										49

.github/workflows/python-build-test.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,49 @@

				name: Python Package Build Test

				run-name: Test building the llama-stack PyPI project

				on:

				  push:

				    branches:

				      - main

				  pull_request:

				    branches:

				      - main

				    paths-ignore:

				        - 'llama_stack/ui/**'

				jobs:

				  build:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        python-version: ['3.12', '3.13']

				    steps:

				    - name: Checkout repository

				      uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				    - name: Install uv

				      uses: astral-sh/setup-uv@d9e0f98d3fc6adb07d1e3d37f3043649ddad06a1 # v6.5.0

				      with:

				        python-version: ${{ matrix.python-version }}

				        activate-environment: true

				        version: 0.7.6

				    - name: Build Llama Stack package

				      run: |

				        uv build

				    - name: Install Llama Stack package

				      run: |

				        uv pip install dist/*.whl

				    - name: Verify Llama Stack package

				      run: |

				        uv pip list

				        uv pip show llama-stack

				        command -v llama

				        llama model prompt-format -m Llama3.2-90B-Vision-Instruct

				        llama model list

				        llama stack list-apis

				        llama stack list-providers inference

									
										70

.github/workflows/record-integration-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,70 @@

				# This workflow should be run manually when needing to re-record tests. This happens when you have

				#  - added a new test

				#  - or changed an existing test such that a new inference call is made

				# You should make a PR and then run this workflow on that PR branch. The workflow will re-record the

				# tests and commit the recordings to the PR branch.

				name: Integration Tests (Record)

				run-name: Run the integration test suite from tests/integration

				on:

				  workflow_dispatch:

				    inputs:

				      test-subdirs:

				        description: 'Comma-separated list of test subdirectories to run'

				        type: string

				        default: ''

				      test-provider:

				        description: 'Test against a specific provider'

				        type: string

				        default: 'ollama'

				      run-vision-tests:

				        description: 'Whether to run vision tests'

				        type: boolean

				        default: false

				      test-pattern:

				        description: 'Regex pattern to pass to pytest -k'

				        type: string

				        default: ''

				jobs:

				  record-tests:

				    runs-on: ubuntu-latest

				    permissions:

				      contents: write

				    steps:

				      - name: Echo workflow inputs

				        run: |

				          echo "::group::Workflow Inputs"

				          echo "test-subdirs: ${{ inputs.test-subdirs }}"

				          echo "test-provider: ${{ inputs.test-provider }}"

				          echo "run-vision-tests: ${{ inputs.run-vision-tests }}"

				          echo "test-pattern: ${{ inputs.test-pattern }}"

				          echo "branch: ${{ github.ref_name }}"

				          echo "::endgroup::"

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				        with:

				          fetch-depth: 0

				      - name: Setup test environment

				        uses: ./.github/actions/setup-test-environment

				        with:

				          python-version: "3.12"  # Use single Python version for recording

				          client-version: "latest"

				          provider: ${{ inputs.test-provider || 'ollama' }}

				          run-vision-tests: ${{ inputs.run-vision-tests }}

				          inference-mode: 'record'

				      - name: Run and record tests

				        uses: ./.github/actions/run-and-record-tests

				        with:

				          test-pattern: ${{ inputs.test-pattern }}

				          test-subdirs: ${{ inputs.test-subdirs }}

				          stack-config: 'server:ci-tests'  # recording must be done with server since more tests are run

				          provider: ${{ inputs.test-provider || 'ollama' }}

				          inference-mode: 'record'

				          run-vision-tests: ${{ inputs.run-vision-tests }}

									
										8

.github/workflows/semantic-pr.yml
									
										vendored
									
										View file
										
				@ -1,5 +1,7 @@

				name: Check semantic PR titles

				run-name: Ensure that PR titles follow the conventional commit spec

				on:

				  pull_request_target:

				    types:

				@ -8,6 +10,10 @@ on:

				      - reopened

				      - synchronize

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}

				  cancel-in-progress: true

				permissions:

				  contents: read

				@ -16,6 +22,6 @@ jobs:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Check PR Title's semantic conformance

				        uses: amannn/action-semantic-pull-request@v5

				        uses: amannn/action-semantic-pull-request@7f33ba792281b034f64e96f4c0b5496782dd3b37 # v6.1.0

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

									
										47

.github/workflows/stale_bot.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,47 @@

				name: Close stale issues and PRs

				run-name: Run the Stale Bot action

				on:

				  schedule:

				    - cron: '0 0 * * *' # every day at midnight

				env:

				  LC_ALL: en_US.UTF-8

				defaults:

				  run:

				    shell: bash

				permissions:

				  contents: read

				jobs:

				  stale:

				    permissions:

				      issues: write

				      pull-requests: write

				    runs-on: ubuntu-latest

				    steps:

				      - name: Stale Action

				        uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0

				        with:

				          stale-issue-label: 'stale'

				          stale-issue-message: >

				            This issue has been automatically marked as stale because it has not had activity within 60 days.

				            It will be automatically closed if no further activity occurs within 30 days.

				          close-issue-message: >

				            This issue has been automatically closed due to inactivity.

				            Please feel free to reopen if you feel it is still relevant!

				          days-before-issue-stale: 60

				          days-before-issue-close: 30

				          stale-pr-label: 'stale'

				          stale-pr-message: >

				            This pull request has been automatically marked as stale because it has not had activity within 60 days.

				            It will be automatically closed if no further activity occurs within 30 days.

				          close-pr-message: >

				            This pull request has been automatically closed due to inactivity.

				            Please feel free to reopen if you intend to continue working on it!

				          days-before-pr-stale: 60

				          days-before-pr-close: 30

				          operations-per-run: 300

									
										86

.github/workflows/test-external-provider-module.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,86 @@

				name: Test External Providers Installed via Module

				run-name: Test External Provider installation via Python module

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'llama_stack/**'

				      - 'tests/integration/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - 'tests/external/*'

				      - '.github/workflows/test-external-provider-module.yml' # This workflow

				jobs:

				  test-external-providers-from-module:

				    # This workflow is disabled. See https://github.com/meta-llama/llama-stack/pull/2975#issuecomment-3138702984 for details

				    if: false

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        image-type: [venv]

				        # We don't do container yet, it's tricky to install a package from the host into the

				        # container and point 'uv pip install' to the correct path...

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Install Ramalama

				        shell: bash

				        run: |

				          uv pip install ramalama

				      - name: Run Ramalama

				        shell: bash

				        run: |

				          nohup ramalama serve llama3.2:3b-instruct-fp16  > ramalama_server.log 2>&1 &

				      - name: Apply image type to config file

				        run: |

				          yq -i '.image_type = "${{ matrix.image-type }}"' tests/external/ramalama-stack/run.yaml

				          cat tests/external/ramalama-stack/run.yaml

				      - name: Build distro from config file

				        run: |

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config tests/external/ramalama-stack/build.yaml

				      - name: Start Llama Stack server in background

				        if: ${{ matrix.image-type }} == 'venv'

				        env:

				          INFERENCE_MODEL: "llama3.2:3b-instruct-fp16"

				          LLAMA_STACK_LOG_FILE: "server.log"

				        run: |

				          # Use the virtual environment created by the build step (name comes from build config)

				          source ramalama-stack-test/bin/activate

				          uv pip list

				          nohup llama stack run tests/external/ramalama-stack/run.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &

				      - name: Wait for Llama Stack server to be ready

				        run: |

				          for i in {1..30}; do

				            if ! grep -q "successfully connected to Ramalama" server.log; then

				              echo "Waiting for Llama Stack server to load the provider..."

				              sleep 1

				            else

				              echo "Provider loaded"

				              exit 0

				            fi

				          done

				          echo "Provider failed to load"

				          cat server.log

				          exit 1

				      - name: Upload all logs to artifacts

				        if: ${{ always() }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: logs-${{ github.run_id }}-${{ github.run_attempt }}-external-provider-module-test

				          path: |

				            *.log

				          retention-days: 1

									
										89

.github/workflows/test-external.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,89 @@

				name: Test External API and Providers

				run-name: Test the External API and Provider mechanisms

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'llama_stack/**'

				      - '!llama_stack/ui/**'

				      - 'tests/integration/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - 'requirements.txt'

				      - 'tests/external/*'

				      - '.github/workflows/test-external.yml' # This workflow

				jobs:

				  test-external:

				    runs-on: ubuntu-latest

				    strategy:

				      matrix:

				        image-type: [venv]

				        # We don't do container yet, it's tricky to install a package from the host into the

				        # container and point 'uv pip install' to the correct path...

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Create API configuration

				        run: |

				          mkdir -p /home/runner/.llama/apis.d

				          cp tests/external/weather.yaml /home/runner/.llama/apis.d/weather.yaml

				      - name: Create provider configuration

				        run: |

				          mkdir -p /home/runner/.llama/providers.d/remote/weather

				          cp tests/external/kaze.yaml /home/runner/.llama/providers.d/remote/weather/kaze.yaml

				      - name: Print distro dependencies

				        run: |

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync llama stack build --config tests/external/build.yaml --print-deps-only

				      - name: Build distro from config file

				        run: |

				          USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync llama stack build --config tests/external/build.yaml

				      - name: Start Llama Stack server in background

				        if: ${{ matrix.image-type }} == 'venv'

				        env:

				          INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"

				          LLAMA_STACK_LOG_FILE: "server.log"

				        run: |

				          # Use the virtual environment created by the build step (name comes from build config)

				          source ci-test/bin/activate

				          uv pip list

				          nohup llama stack run tests/external/run-byoa.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &

				      - name: Wait for Llama Stack server to be ready

				        run: |

				          echo "Waiting for Llama Stack server..."

				          for i in {1..30}; do

				            if curl -sSf http://localhost:8321/v1/health | grep -q "OK"; then

				              echo "Llama Stack server is up!"

				              exit 0

				            fi

				            sleep 1

				          done

				          echo "Llama Stack server failed to start"

				          cat server.log

				          exit 1

				      - name: Test external API

				        run: |

				          curl -sSf http://localhost:8321/v1/weather/locations

				      - name: Upload all logs to artifacts

				        if: ${{ always() }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: logs-${{ github.run_id }}-${{ github.run_attempt }}-external-test

				          path: |

				            *.log

				          retention-days: 1

									
										69

.github/workflows/tests.yml
									
										vendored
									
										View file
									
				@ -1,69 +0,0 @@

				name: auto-tests

				on:

				  # pull_request:

				  workflow_dispatch:

				    inputs:

				      commit_sha:

				        description: 'Specific Commit SHA to trigger on'

				        required: false

				        default: $GITHUB_SHA # default to the last commit of $GITHUB_REF branch

				jobs:

				  test-llama-stack-as-library:

				    runs-on: ubuntu-latest

				    env:

				      TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}

				      FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}

				      TAVILY_SEARCH_API_KEY: ${{ secrets.TAVILY_SEARCH_API_KEY }}

				    strategy:

				      matrix:

				        provider: [fireworks, together]

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ github.event.inputs.commit_sha }}

				      - name: Echo commit SHA

				        run: |

				          echo "Triggered on commit SHA: ${{ github.event.inputs.commit_sha }}"

				          git rev-parse HEAD

				      - name: Install dependencies

				        run: |

				          python -m pip install --upgrade pip

				          pip install -r requirements.txt pytest

				          pip install -e .

				      - name: Build providers

				        run: |

				          llama stack build --template ${{ matrix.provider }} --image-type venv

				      - name: Install the latest llama-stack-client & llama-models packages

				        run: |

				          pip install -e git+https://github.com/meta-llama/llama-stack-client-python.git#egg=llama-stack-client

				          pip install -e git+https://github.com/meta-llama/llama-models.git#egg=llama-models

				      - name: Run client-sdk test

				        working-directory: "${{ github.workspace }}"

				        env:

				          REPORT_OUTPUT: md_report.md

				        shell: bash

				        run: |

				          pip install --upgrade pytest-md-report

				          echo "REPORT_FILE=${REPORT_OUTPUT}" >> "$GITHUB_ENV"

				          export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct

				          LLAMA_STACK_CONFIG=./llama_stack/templates/${{ matrix.provider }}/run.yaml pytest --md-report --md-report-verbose=1 ./tests/client-sdk/inference/ --md-report-output "$REPORT_OUTPUT"

				      - name: Output reports to the job summary

				        if: always()

				        shell: bash

				        run: |

				          if [ -f "$REPORT_FILE" ]; then

				            echo "<details><summary> Test Report for ${{ matrix.provider }} </summary>" >> $GITHUB_STEP_SUMMARY

				            echo "" >> $GITHUB_STEP_SUMMARY

				            cat "$REPORT_FILE" >> $GITHUB_STEP_SUMMARY

				            echo "" >> $GITHUB_STEP_SUMMARY

				            echo "</details>" >> $GITHUB_STEP_SUMMARY

				          fi

									
										55

.github/workflows/ui-unit-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,55 @@

				name: UI Tests

				run-name: Run the UI test suite

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'llama_stack/ui/**'

				      - '.github/workflows/ui-unit-tests.yml' # This workflow

				  workflow_dispatch:

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  ui-tests:

				    runs-on: ubuntu-latest

				    strategy:

				      fail-fast: false

				      matrix:

				        node-version: [22]

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Setup Node.js

				        uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0

				        with:

				          node-version: ${{ matrix.node-version }}

				          cache: 'npm'

				          cache-dependency-path: 'llama_stack/ui/package-lock.json'

				      - name: Install dependencies

				        working-directory: llama_stack/ui

				        run: npm ci

				      - name: Run linting

				        working-directory: llama_stack/ui

				        run: npm run lint

				      - name: Run format check

				        working-directory: llama_stack/ui

				        run: npm run format:check

				      - name: Run unit tests

				        working-directory: llama_stack/ui

				        env:

				          CI: true

				        run: npm test -- --coverage --watchAll=false --passWithNoTests

									
										55

.github/workflows/unit-tests.yml
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,55 @@

				name: Unit Tests

				run-name: Run the unit test suite

				on:

				  push:

				    branches: [ main ]

				  pull_request:

				    branches: [ main ]

				    paths:

				      - 'llama_stack/**'

				      - '!llama_stack/ui/**'

				      - 'tests/unit/**'

				      - 'uv.lock'

				      - 'pyproject.toml'

				      - 'requirements.txt'

				      - '.github/workflows/unit-tests.yml' # This workflow

				  workflow_dispatch:

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  unit-tests:

				    runs-on: ubuntu-latest

				    strategy:

				      fail-fast: false

				      matrix:

				        python:

				          - "3.12"

				          - "3.13"

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				        with:

				          python-version: ${{ matrix.python }}

				      - name: Run unit tests

				        run: |

				          PYTHON_VERSION=${{ matrix.python }} ./scripts/unit-tests.sh --junitxml=pytest-report-${{ matrix.python }}.xml

				      - name: Upload test results

				        if: always()

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: test-results-${{ matrix.python }}

				          path: |

				            .pytest_cache/

				            pytest-report-${{ matrix.python }}.xml

				            htmlcov-${{ matrix.python }}/

				          retention-days: 7

									
										29

.github/workflows/update-readthedocs.yml
									
										vendored
									
										View file
										
				@ -1,5 +1,7 @@

				name: Update ReadTheDocs

				run-name: Update the Llama Stack ReadTheDocs site

				on:

				  workflow_dispatch:

				    inputs:

				@ -12,14 +14,22 @@ on:

				      - main

				    paths:

				      - 'docs/**'

				      - 'pyproject.toml'

				      - '.github/workflows/update-readthedocs.yml'

				    tags:

				      - '*'

				  pull_request:

				    branches:

				      - main

				    paths:

				      - 'docs/**'

				      - 'pyproject.toml'

				      - '.github/workflows/update-readthedocs.yml'

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  update-readthedocs:

				    runs-on: ubuntu-latest

				@ -27,18 +37,10 @@ jobs:

				      TOKEN: ${{ secrets.READTHEDOCS_TOKEN }}

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

				      - name: Set up Python

				        uses: actions/setup-python@v5

				        with:

				          python-version: '3.11'

				      - name: Install the latest version of uv

				        uses: astral-sh/setup-uv@v5

				      - name: Sync with uv

				        run: uv sync --extra docs

				      - name: Install dependencies

				        uses: ./.github/actions/setup-runner

				      - name: Build HTML

				        run: |

				@ -55,7 +57,10 @@ jobs:

				          response=$(curl -X POST \

				            -H "Content-Type: application/json" \

				            -d "{\"token\": \"$TOKEN\"}" \

				            -d "{

				              \"token\": \"$TOKEN\",

				              \"version\": \"$GITHUB_REF_NAME\"

				            }" \

				            https://readthedocs.org/api/v2/webhook/llama-stack/289768/)

				          echo "Response: $response"

8

.gitignore vendored

View file

 @ -6,6 +6,7 @@ dev_requirements.txt
 build
 .DS_Store
 llama_stack/configs/*
 .cursor/
 xcuserdata/
 *.hmap
 .DS_Store
 @ -18,5 +19,12 @@ Package.resolved
 .vscode
 _build
 docs/src
 # Sample tool-calling datasets generated by NVIDIA notebooks
 docs/notebooks/nvidia/tool_calling/sample_data/
 pyrightconfig.json
 venv/
 pytest-report.xml
 .coverage
 .python-version
 CLAUDE.md
 .claude/

0

.gitmodules vendored

View file

									
										160

.pre-commit-config.yaml
									
										View file
										
				@ -1,24 +1,35 @@

				exclude: 'build/'

				default_language_version:

				    python: python3

				    python: python3.12

				    node: "22"

				repos:

				-   repo: https://github.com/pre-commit/pre-commit-hooks

				    rev: v5.0.0  # Latest stable version

				    hooks:

				    -   id: check-merge-conflict

				        args: ['--assume-in-merge']

				    -   id: trailing-whitespace

				        exclude: '\.py$'  # Exclude Python files as Ruff already handles them

				    -   id: check-added-large-files

				        args: ['--maxkb=1000']

				    -   id: end-of-file-fixer

				        exclude: '^(.*\.svg)$'

				# Temporarily disabling this

				#    -   id: no-commit-to-branch

				#        args: ['--branch=main']

				        exclude: '^(.*\.svg|.*\.md)$'

				    -   id: no-commit-to-branch

				    -   id: check-yaml

				        args: ["--unsafe"]

				    -   id: detect-private-key

				    -   id: mixed-line-ending

				        args: [--fix=lf] # Forces to replace line ending by LF (line feed)

				    -   id: check-executables-have-shebangs

				    -   id: check-json

				    -   id: check-shebang-scripts-are-executable

				    -   id: check-symlinks

				    -   id: check-toml

				-   repo: https://github.com/Lucas-C/pre-commit-hooks

				    rev: v1.5.4

				    rev: v1.5.5

				    hooks:

				    -   id: insert-license

				        files: \.py$|\.sh$

				@ -27,7 +38,7 @@ repos:

				          - docs/license_header.txt

				-   repo: https://github.com/astral-sh/ruff-pre-commit

				    rev: v0.9.4

				    rev: v0.12.2

				    hooks:

				    -   id: ruff

				        args: [ --fix ]

				@ -35,26 +46,19 @@ repos:

				    -   id: ruff-format

				-   repo: https://github.com/adamchainz/blacken-docs

				    rev: 1.19.0

				    rev: 1.19.1

				    hooks:

				    -   id: blacken-docs

				        additional_dependencies:

				        - black==24.3.0

				-   repo: https://github.com/astral-sh/uv-pre-commit

				    rev: 0.6.3

				    rev: 0.7.20

				    hooks:

				    -   id: uv-lock

				    -   id: uv-export

				        args: [

				            "--frozen",

				            "--no-hashes",

				            "--no-emit-project",

				            "--output-file=requirements.txt"

				        ]

				-   repo: https://github.com/pre-commit/mirrors-mypy

				    rev: v1.15.0

				    rev: v1.16.1

				    hooks:

				    -   id: mypy

				        additional_dependencies:

				@ -66,12 +70,6 @@ repos:

				          - pydantic

				        pass_filenames: false

				# - repo: https://github.com/jsh9/pydoclint

				#   rev: d88180a8632bb1602a4d81344085cf320f288c5a

				#   hooks:

				#     - id: pydoclint

				#       args: [--config=pyproject.toml]

				# - repo: https://github.com/tcort/markdown-link-check

				#   rev: v3.11.2

				#   hooks:

				@ -83,15 +81,121 @@ repos:

				      - id: distro-codegen

				        name: Distribution Template Codegen

				        additional_dependencies:

				          - rich

				          - pydantic

				          - uv==0.6.0

				        entry: uv run python -m llama_stack.scripts.distro_codegen

				          - uv==0.7.8

				        entry: uv run --group codegen ./scripts/distro_codegen.py

				        language: python

				        pass_filenames: false

				        require_serial: true

				        files: ^llama_stack/templates/.*$|^llama_stack/providers/.*/inference/.*/models\.py$

				      - id: provider-codegen

				        name: Provider Codegen

				        additional_dependencies:

				          - uv==0.7.8

				        entry: uv run --group codegen ./scripts/provider_codegen.py

				        language: python

				        pass_filenames: false

				        require_serial: true

				        files: ^llama_stack/providers/.*$

				      - id: openapi-codegen

				        name: API Spec Codegen

				        additional_dependencies:

				          - uv==0.7.8

				        entry: sh -c 'uv run ./docs/openapi_generator/run_openapi_generator.sh > /dev/null'

				        language: python

				        pass_filenames: false

				        require_serial: true

				        files: ^llama_stack/apis/|^docs/openapi_generator/

				      - id: check-workflows-use-hashes

				        name: Check GitHub Actions use SHA-pinned actions

				        entry: ./scripts/check-workflows-use-hashes.sh

				        language: system

				        pass_filenames: false

				        require_serial: true

				        always_run: true

				        files: ^\.github/workflows/.*\.ya?ml$

				      - id: check-init-py

				        name: Check for missing __init__.py files

				        entry: ./scripts/check-init-py.sh

				        language: system

				        pass_filenames: false

				        require_serial: true

				        always_run: true

				        files: ^llama_stack/.*$

				      - id: forbid-pytest-asyncio

				        name: Block @pytest.mark.asyncio and @pytest_asyncio.fixture

				        entry: bash

				        language: system

				        types: [python]

				        pass_filenames: true

				        args:

				          - -c

				          - |

				            grep -EnH '^[^#]*@pytest\.mark\.asyncio|@pytest_asyncio\.fixture' "$@" && {

				              echo;

				              echo "❌ Do not use @pytest.mark.asyncio or @pytest_asyncio.fixture."

				              echo "   pytest is already configured with async-mode=auto."

				              echo;

				              exit 1;

				            } || true

				      - id: generate-ci-docs

				        name: Generate CI documentation

				        additional_dependencies:

				          - uv==0.7.8

				        entry: uv run ./scripts/gen-ci-docs.py

				        language: python

				        pass_filenames: false

				        require_serial: true

				        files: ^.github/workflows/.*$

				      # ui-prettier and ui-eslint are disabled until we can avoid `npm ci`, which is slow and may fail -

				      #   npm error `npm ci` can only install packages when your package.json and package-lock.json or npm-shrinkwrap.json are in sync. Please update your lock file with `npm install` before continuing.

				      #   npm error Invalid: lock file's llama-stack-client@0.2.17 does not satisfy llama-stack-client@0.2.18

				      # and until we have infra for installing prettier and next via npm -

				      #   Lint UI code with ESLint.....................................................Failed

				      #   - hook id: ui-eslint

				      #   - exit code: 127

				      #   > ui@0.1.0 lint

				      #   > next lint --fix --quiet

				      #   sh: line 1: next: command not found

				      #

				      # - id: ui-prettier

				      #   name: Format UI code with Prettier

				      #   entry: bash -c 'cd llama_stack/ui && npm ci && npm run format'

				      #   language: system

				      #   files: ^llama_stack/ui/.*\.(ts|tsx)$

				      #   pass_filenames: false

				      #   require_serial: true

				      # - id: ui-eslint

				      #   name: Lint UI code with ESLint

				      #   entry: bash -c 'cd llama_stack/ui && npm run lint -- --fix --quiet'

				      #   language: system

				      #   files: ^llama_stack/ui/.*\.(ts|tsx)$

				      #   pass_filenames: false

				      #   require_serial: true

				      - id: check-log-usage

				        name: Ensure 'llama_stack.log' usage for logging

				        entry: bash

				        language: system

				        types: [python]

				        pass_filenames: true

				        args:

				          - -c

				          - |

				            matches=$(grep -EnH '^[^#]*\b(import\s+logging|from\s+logging\b)' "$@" | grep -v -e '#\s*allow-direct-logging' || true)

				            if [ -n "$matches" ]; then

				              # GitHub Actions annotation format

				              while IFS=: read -r file line_num rest; do

				                echo "::error file=$file,line=$line_num::Do not use 'import logging' or 'from logging import' in $file. Use the custom log instead: from llama_stack.log import get_logger; logger = get_logger(). If direct logging is truly needed, add: # allow-direct-logging"

				              done <<< "$matches"

				              exit 1

				            fi

				            exit 0

				ci:

				    autofix_commit_msg: 🎨 [pre-commit.ci] Auto format from pre-commit.com hooks

				    autoupdate_commit_msg: ⬆ [pre-commit.ci] pre-commit autoupdate

				    autofix_prs: true

				    autoupdate_branch: ''

				    autoupdate_schedule: weekly

				    skip: []

				    submodules: false

1

.python-version

View file

				`@ -1 +0,0 @@`
				`3.10`

									
										31

.readthedocs.yaml
									
										View file
										
				@ -5,28 +5,21 @@

				# Required

				version: 2

				# Build documentation in the "docs/" directory with Sphinx

				sphinx:

				  configuration: docs/source/conf.py

				# Set the OS, Python version and other tools you might need

				build:

				  os: ubuntu-22.04

				  tools:

				    python: "3.12"

				    # You can also specify other tool versions:

				    # nodejs: "19"

				    # rust: "1.64"

				    # golang: "1.19"

				# Build documentation in the "docs/" directory with Sphinx

				sphinx:

				  configuration: docs/source/conf.py

				# Optionally build your docs in additional formats such as PDF and ePub

				# formats:

				#    - pdf

				#    - epub

				# Optional but recommended, declare the Python requirements required

				# to build your documentation

				# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html

				python:

				  jobs:

				    pre_create_environment:

				      - asdf plugin add uv

				      - asdf install uv latest

				      - asdf global uv latest

				    create_environment:

				      - uv venv "${READTHEDOCS_VIRTUALENV_PATH}"

				    install:

				   - requirements: docs/requirements.txt

				      - UV_PROJECT_ENVIRONMENT="${READTHEDOCS_VIRTUALENV_PATH}" uv sync --frozen --group docs

									
										516

CHANGELOG.md
									
										Normal file
									
										View file
										
				@ -0,0 +1,516 @@

				# Changelog

				# v0.2.15

				Published on: 2025-07-16T03:30:01Z

				---

				# v0.2.14

				Published on: 2025-07-04T16:06:48Z

				## Highlights

				* Support for Llama Guard 4

				* Added Milvus  support to vector-stores API

				* Documentation and zero-to-hero updates for latest APIs

				---

				# v0.2.13

				Published on: 2025-06-28T04:28:11Z

				## Highlights

				* search_mode support in OpenAI vector store API

				* Security fixes

				---

				# v0.2.12

				Published on: 2025-06-20T22:52:12Z

				## Highlights

				* Filter support in file search

				* Support auth attributes in inference and response stores

				---

				# v0.2.11

				Published on: 2025-06-17T20:26:26Z

				## Highlights

				* OpenAI-compatible vector store APIs

				* Hybrid Search in Sqlite-vec

				* File search tool in Responses API

				* Pagination in inference and response stores

				* Added `suffix` to completions API for fill-in-the-middle tasks

				---

				# v0.2.10.1

				Published on: 2025-06-06T20:11:02Z

				## Highlights

				* ChromaDB provider fix

				---

				# v0.2.10

				Published on: 2025-06-05T23:21:45Z

				## Highlights

				* OpenAI-compatible embeddings API

				* OpenAI-compatible Files API

				* Postgres support in starter distro

				* Enable ingestion of precomputed embeddings

				* Full multi-turn support in Responses API

				* Fine-grained access control policy

				---

				# v0.2.9

				Published on: 2025-05-30T20:01:56Z

				## Highlights

				* Added initial streaming support in Responses API

				* UI view for Responses

				* Postgres inference store support

				---

				# v0.2.8

				Published on: 2025-05-27T21:03:47Z

				# Release v0.2.8

				## Highlights

				* Server-side MCP with auth firewalls now works in the Stack - both for Agents and Responses

				* Get chat completions APIs and UI to show chat completions

				* Enable keyword search for sqlite-vec

				---

				# v0.2.7

				Published on: 2025-05-16T20:38:10Z

				## Highlights

				This is a small update. But a couple highlights:

				* feat: function tools in OpenAI Responses by @bbrowning in https://github.com/meta-llama/llama-stack/pull/2094, getting closer to ready. Streaming is the next missing piece.

				* feat: Adding support for customizing chunk context in RAG insertion and querying by @franciscojavierarceo in https://github.com/meta-llama/llama-stack/pull/2134

				* feat: scaffolding for Llama Stack UI by @ehhuang in https://github.com/meta-llama/llama-stack/pull/2149, more to come in the coming releases.

				---

				# v0.2.6

				Published on: 2025-05-12T18:06:52Z

				---

				# v0.2.5

				Published on: 2025-05-04T20:16:49Z

				---

				# v0.2.4

				Published on: 2025-04-29T17:26:01Z

				## Highlights

				* One-liner to install and run Llama Stack yay! by @reluctantfuturist in https://github.com/meta-llama/llama-stack/pull/1383

				* support for NVIDIA NeMo datastore by @raspawar in https://github.com/meta-llama/llama-stack/pull/1852

				* (yuge!) Kubernetes authentication by @leseb in https://github.com/meta-llama/llama-stack/pull/1778

				* (yuge!) OpenAI Responses API by @bbrowning in https://github.com/meta-llama/llama-stack/pull/1989

				* add api.llama provider, llama-guard-4 model by @ashwinb in https://github.com/meta-llama/llama-stack/pull/2058

				---

				# v0.2.3

				Published on: 2025-04-25T22:46:21Z

				## Highlights

				* OpenAI compatible inference endpoints and client-SDK support. `client.chat.completions.create()` now works.

				* significant improvements and functionality added to the nVIDIA distribution

				* many improvements to the test verification suite.

				* new inference providers: Ramalama, IBM WatsonX

				* many improvements to the Playground UI

				---

				# v0.2.2

				Published on: 2025-04-13T01:19:49Z

				## Main changes

				- Bring Your Own Provider (@leseb) - use out-of-tree provider code to execute the distribution server

				- OpenAI compatible inference API in progress (@bbrowning)

				- Provider verifications (@ehhuang)

				- Many updates and fixes to playground

				- Several llama4 related fixes

				---

				# v0.2.1

				Published on: 2025-04-05T23:13:00Z

				---

				# v0.2.0

				Published on: 2025-04-05T19:04:29Z

				## Llama 4 Support

				Checkout more at https://www.llama.com

				---

				# v0.1.9

				Published on: 2025-03-29T00:52:23Z

				### Build and Test Agents

				* Agents: Entire document context with attachments

				* RAG: Documentation with sqlite-vec faiss comparison

				* Getting started: Fixes to getting started notebook.

				### Agent Evals and Model Customization

				* (**New**) Post-training: Add nemo customizer

				### Better Engineering

				* Moved sqlite-vec to non-blocking calls

				* Don't return a payload on file delete

				---

				# v0.1.8

				Published on: 2025-03-24T01:28:50Z

				# v0.1.8 Release Notes

				### Build and Test Agents

				* Safety: Integrated NVIDIA as a safety provider.

				* VectorDB: Added Qdrant as an inline provider.

				* Agents: Added support for multiple tool groups in agents.

				* Agents: Simplified imports for Agents in client package

				### Agent Evals and Model Customization

				* Introduced DocVQA and IfEval benchmarks.

				### Deploying and Monitoring Agents

				* Introduced a Containerfile and image workflow for the Playground.

				* Implemented support for Bearer (API Key) authentication.

				* Added attribute-based access control for resources.

				* Fixes on docker deployments: use --pull always and standardized the default port to 8321

				* Deprecated: /v1/inspect/providers use /v1/providers/ instead

				### Better Engineering

				* Consolidated scripts under the ./scripts directory.

				* Addressed mypy violations in various modules.

				* Added Dependabot scans for Python dependencies.

				* Implemented a scheduled workflow to update the changelog automatically.

				* Enforced concurrency to reduce CI loads.

				### New Contributors

				* @cmodi-meta made their first contribution in https://github.com/meta-llama/llama-stack/pull/1650

				* @jeffmaury made their first contribution in https://github.com/meta-llama/llama-stack/pull/1671

				* @derekhiggins made their first contribution in https://github.com/meta-llama/llama-stack/pull/1698

				* @Bobbins228 made their first contribution in https://github.com/meta-llama/llama-stack/pull/1745

				**Full Changelog**: https://github.com/meta-llama/llama-stack/compare/v0.1.7...v0.1.8

				---

				# v0.1.7

				Published on: 2025-03-14T22:30:51Z

				## 0.1.7 Release Notes

				###  Build and Test Agents

				* Inference: ImageType is now refactored to LlamaStackImageType

				* Inference: Added tests to measure TTFT

				* Inference: Bring back usage metrics

				* Agents: Added endpoint for get agent, list agents and list sessions

				* Agents: Automated conversion of type hints in client tool for lite llm format

				* Agents: Deprecated ToolResponseMessage in agent.resume API

				* Added Provider API for listing and inspecting provider info

				### Agent Evals and Model Customization

				* Eval: Added new eval benchmarks Math 500 and BFCL v3

				* Deploy and Monitoring of Agents

				* Telemetry: Fix tracing to work across coroutines

				###  Better Engineering

				* Display code coverage for unit tests

				* Updated call sites (inference, tool calls, agents) to move to async non blocking calls

				* Unit tests also run on Python 3.11, 3.12, and 3.13

				* Added ollama inference to Integration tests CI

				* Improved documentation across examples, testing, CLI, updated providers table )

				---

				# v0.1.6

				Published on: 2025-03-08T04:35:08Z

				## 0.1.6 Release Notes

				### Build and Test Agents

				* Inference: Fixed support for inline vllm provider

				* (**New**) Agent: Build & Monitor Agent Workflows with Llama Stack + Anthropic's Best Practice [Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Agent_Workflows.ipynb)

				* (**New**) Agent: Revamped agent [documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent.html) with more details and examples

				* Agent: Unify tools and Python SDK Agents API

				* Agent: AsyncAgent Python SDK wrapper supporting async client tool calls

				* Agent: Support python functions without @client_tool decorator as client tools

				* Agent: deprecation for allow_resume_turn flag, and remove need to specify tool_prompt_format

				* VectorIO: MilvusDB support added

				### Agent Evals and Model Customization

				* (**New**) Agent: Llama Stack RAG Lifecycle [Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_RAG_Lifecycle.ipynb)

				* Eval: Documentation for eval, scoring, adding new benchmarks

				* Eval: Distribution template to run benchmarks on llama & non-llama models

				* Eval: Ability to register new custom LLM-as-judge scoring functions

				* (**New**) Looking for contributors for open benchmarks. See [documentation](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) for details.

				### Deploy and Monitoring of Agents

				* Better support for different log levels across all components for better monitoring

				### Better Engineering

				* Enhance OpenAPI spec to include Error types across all APIs

				* Moved all tests to /tests and created unit tests to run on each PR

				* Removed all dependencies on llama-models repo

				---

				# v0.1.5.1

				Published on: 2025-02-28T22:37:44Z

				## 0.1.5.1 Release Notes

				* Fixes for security risk in https://github.com/meta-llama/llama-stack/pull/1327 and https://github.com/meta-llama/llama-stack/pull/1328

				**Full Changelog**: https://github.com/meta-llama/llama-stack/compare/v0.1.5...v0.1.5.1

				---

				# v0.1.5

				Published on: 2025-02-28T18:14:01Z

				## 0.1.5 Release Notes

				###  Build Agents

				* Inference: Support more non-llama models (openai, anthropic, gemini)

				* Inference: Can use the provider's model name in addition to the HF alias

				* Inference: Fixed issues with calling tools that weren't specified in the prompt

				* RAG: Improved system prompt for RAG and no more need for hard-coded rag-tool calling

				* Embeddings: Added support for Nemo retriever embedding models

				* Tools: Added support for MCP tools in Ollama Distribution

				* Distributions: Added new Groq distribution

				### Customize Models

				* Save post-trained checkpoint in SafeTensor format to allow Ollama inference provider to use the post-trained model

				### Monitor agents

				* More comprehensive logging of agent steps including client tools

				* Telemetry inputs/outputs are now structured and queryable

				* Ability to retrieve agents session, turn, step by ids

				### Better Engineering

				* Moved executorch Swift code out of this repo into the llama-stack-client-swift repo, similar to kotlin

				* Move most logging to use logger instead of prints

				* Completed text /chat-completion and /completion tests

				---

				# v0.1.4

				Published on: 2025-02-25T00:02:43Z

				## v0.1.4 Release Notes

				Here are the key changes coming as part of this release:

				### Build and Test Agents

				* Inference: Added support for non-llama models

				* Inference: Added option to list all downloaded models and remove models

				* Agent: Introduce new api agents.resume_turn to include client side tool execution in the same turn

				* Agent: AgentConfig introduces new variable “tool_config” that allows for better tool configuration and system prompt overrides

				* Agent: Added logging for agent step start and completion times

				* Agent: Added support for logging for tool execution metadata

				* Embedding: Updated /inference/embeddings to support asymmetric models, truncation and variable sized outputs

				* Embedding: Updated embedding models for Ollama, Together, and Fireworks with available defaults

				* VectorIO: Improved performance of sqlite-vec using chunked writes

				### Agent Evals and Model Customization

				* Deprecated api /eval-tasks. Use /eval/benchmark  instead

				* Added CPU training support for TorchTune

				### Deploy and Monitoring of Agents

				* Consistent view of client and server tool calls in telemetry

				### Better Engineering

				* Made tests more data-driven for consistent evaluation

				* Fixed documentation links and improved API reference generation

				* Various small fixes for build scripts and system reliability

				---

				# v0.1.3

				Published on: 2025-02-14T20:24:32Z

				## v0.1.3 Release

				Here are some key changes that are coming as part of this release.

				### Build and Test Agents

				Streamlined the initial development experience

				- Added support for  llama stack run --image-type venv

				- Enhanced vector store options with new sqlite-vec provider and improved Qdrant integration

				- vLLM improvements for tool calling and logprobs

				- Better handling of sporadic code_interpreter tool calls

				### Agent Evals

				Better benchmarking and Agent performance assessment

				- Renamed eval API /eval-task to /benchmarks

				- Improved documentation and notebooks for RAG and evals

				### Deploy and Monitoring of Agents

				Improved production readiness

				- Added usage metrics collection for chat completions

				- CLI improvements for provider information

				- Improved error handling and system reliability

				- Better model endpoint handling and accessibility

				- Improved signal handling on distro server

				### Better Engineering

				Infrastructure and code quality improvements

				- Faster text-based chat completion tests

				- Improved testing for non-streaming agent apis

				- Standardized import formatting with ruff linter

				- Added conventional commits standard

				- Fixed documentation parsing issues

				---

				# v0.1.2

				Published on: 2025-02-07T22:06:49Z

				# TL;DR

				- Several stabilizations to development flows after the switch to `uv`

				- Migrated CI workflows to new OSS repo - [llama-stack-ops](https://github.com/meta-llama/llama-stack-ops)

				- Added automated rebuilds for ReadTheDocs

				- Llama Stack server supports HTTPS

				- Added system prompt overrides support

				- Several bug fixes and improvements to documentation (check out Kubernetes deployment guide by @terrytangyuan )

				---

				# v0.1.1

				Published on: 2025-02-02T02:29:24Z

				A bunch of small / big improvements everywhere including support for Windows, switching to `uv` and many provider improvements.

				---

				# v0.1.0

				Published on: 2025-01-24T17:47:47Z

				We are excited to announce a stable API release of Llama Stack, which enables developers to build RAG applications and Agents using tools and safety shields, monitor and those agents with telemetry, and evaluate the agent with scoring functions.

				## Context

				GenAI application developers need more than just an LLM - they need to integrate tools, connect with their data sources, establish guardrails, and ground the LLM responses effectively. Currently, developers must piece together various tools and APIs, complicating the development lifecycle and increasing costs. The result is that developers are spending more time on these integrations rather than focusing on the application logic itself. The bespoke coupling of components also makes it challenging to adopt state-of-the-art solutions in the rapidly evolving GenAI space. This is particularly difficult for open models like Llama, as best practices are not widely established in the open.

				Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.

				With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stack’s plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.

				## Release

				After iterating on the APIs for the last 3 months, today we’re launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.

				There are example standalone apps in llama-stack-apps.

				## Key Features of this release

				- **Unified API Layer**

				  - Inference: Run LLM models

				  - RAG: Store and retrieve knowledge for RAG

				  - Agents: Build multi-step agentic workflows

				  - Tools: Register tools that can be called by the agent

				  - Safety: Apply content filtering and safety policies

				  - Evaluation: Test model and agent quality

				  - Telemetry: Collect and analyze usage data and complex agentic traces

				  - Post Training ( Coming Soon ): Fine tune models for specific use cases

				- **Rich Provider Ecosystem**

				  - Local Development: Meta's Reference, Ollama

				  - Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras

				  - On-premises: Nvidia NIM, vLLM, TGI, Dell-TGI

				  - On-device: iOS and Android support

				- **Built for Production**

				  - Pre-packaged distributions for common deployment scenarios

				  - Backwards compatibility across model versions

				  - Comprehensive evaluation capabilities

				  - Full observability and monitoring

				- **Multiple developer interfaces**

				  - CLI: Command line interface

				  - Python SDK

				  - Swift iOS SDK

				  - Kotlin Android SDK

				- **Sample llama stack applications**

				  - Python

				  - iOS

				  - Android

				---

				# v0.1.0rc12

				Published on: 2025-01-22T22:24:01Z

				---

				# v0.0.63

				Published on: 2024-12-18T07:17:43Z

				A small but important bug-fix release to update the URL datatype for the client-SDKs. The issue affected multimodal agentic turns especially.

				**Full Changelog**: https://github.com/meta-llama/llama-stack/compare/v0.0.62...v0.0.63

				---

									
										216

CONTRIBUTING.md
									
										View file
										
				@ -1,17 +1,91 @@

				# Contributing to Llama-Stack

				# Contributing to Llama Stack

				We want to make contributing to this project as easy and transparent as

				possible.

				## Set up your development environment

				We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.

				You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).

				You can install the dependencies by running:

				```bash

				cd llama-stack

				uv sync --group dev

				uv pip install -e .

				source .venv/bin/activate

				```

				```{note}

				You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.12`).

				Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.

				For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).

				```

				Note that you can create a dotenv file `.env` that includes necessary environment variables:

				```

				LLAMA_STACK_BASE_URL=http://localhost:8321

				LLAMA_STACK_CLIENT_LOG=debug

				LLAMA_STACK_PORT=8321

				LLAMA_STACK_CONFIG=<provider-name>

				TAVILY_SEARCH_API_KEY=

				BRAVE_SEARCH_API_KEY=

				```

				And then use this dotenv file when running client SDK tests via the following:

				```bash

				uv run --env-file .env -- pytest -v tests/integration/inference/test_text_inference.py --text-model=meta-llama/Llama-3.1-8B-Instruct

				```

				### Pre-commit Hooks

				We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:

				```bash

				uv run pre-commit install

				```

				After that, pre-commit hooks will run automatically before each commit.

				Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:

				```bash

				uv run pre-commit run --all-files

				```

				```{caution}

				Before pushing your changes, make sure that the pre-commit hooks have passed successfully.

				```

				## Discussions -> Issues -> Pull Requests

				We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).

				If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.

				### Issues

				We use GitHub issues to track public bugs. Please ensure your description is

				clear and has sufficient instructions to be able to reproduce the issue.

				Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe

				disclosure of security bugs. In those cases, please go through the process

				outlined on that page and do not file a public issue.

				### Contributor License Agreement ("CLA")

				In order to accept your pull request, we need you to submit a CLA. You only need

				to do this once to work on any of Meta's open source projects.

				Complete your CLA here: <https://code.facebook.com/cla>

				**I'd like to contribute!**

				All issues are actionable (please report if they are not.) Pick one and start working on it. Thank you.

				If you need help or guidance, comment on the issue. Issues that are extra friendly to new contributors are tagged with "contributor friendly".

				If you are new to the project, start by looking at the issues tagged with "good first issue". If you're interested

				leave a comment on the issue and a triager will assign it to you.

				Please avoid picking up too many issues at once. This helps you stay focused and ensures that others in the community also have opportunities to contribute.

				- Try to work on only 1–2 issues at a time, especially if you’re still getting familiar with the codebase.

				- Before taking an issue, check if it’s already assigned or being actively discussed.

				- If you’re blocked or can’t continue with an issue, feel free to unassign yourself or leave a comment so others can step in.

				**I have a bug!**

				@ -41,81 +115,44 @@ If you need help or guidance, comment on the issue. Issues that are extra friend

				4. Make sure your code lints using `pre-commit`.

				5. If you haven't already, complete the Contributor License Agreement ("CLA").

				6. Ensure your pull request follows the [conventional commits format](https://www.conventionalcommits.org/en/v1.0.0/).

				## Contributor License Agreement ("CLA")

				In order to accept your pull request, we need you to submit a CLA. You only need

				to do this once to work on any of Meta's open source projects.

				Complete your CLA here: <https://code.facebook.com/cla>

				## Issues

				We use GitHub issues to track public bugs. Please ensure your description is

				clear and has sufficient instructions to be able to reproduce the issue.

				Meta has a [bounty program](http://facebook.com/whitehat/info) for the safe

				disclosure of security bugs. In those cases, please go through the process

				outlined on that page and do not file a public issue.

				7. Ensure your pull request follows the [coding style](#coding-style).

				## Set up your development environment

				Please keep pull requests (PRs) small and focused. If you have a large set of changes, consider splitting them into logically grouped, smaller PRs to facilitate review and testing.

				We use [uv](https://github.com/astral-sh/uv) to manage python dependencies and virtual environments.

				You can install `uv` by following this [guide](https://docs.astral.sh/uv/getting-started/installation/).

				You can install the dependencies by running:

				```bash

				$ cd llama-stack

				$ uv sync --extra dev

				$ uv pip install -e .

				$ source .venv/bin/activate

				```{tip}

				As a general guideline:

				- Experienced contributors should try to keep no more than 5 open PRs at a time.

				- New contributors are encouraged to have only one open PR at a time until they’re familiar with the codebase and process.

				```

				Note that you can create a dotenv file `.env` that includes necessary environment variables:

				```

				LLAMA_STACK_BASE_URL=http://localhost:8321

				LLAMA_STACK_CLIENT_LOG=debug

				LLAMA_STACK_PORT=8321

				LLAMA_STACK_CONFIG=

				```

				## Repository guidelines

				And then use this dotenv file when running client SDK tests via the following:

				```bash

				$ uv run --env-file .env -- pytest -v tests/client-sdk/inference/test_text_inference.py

				```

				### Coding Style

				## Pre-commit Hooks

				* Comments should provide meaningful insights into the code. Avoid filler comments that simply

				  describe the next step, as they create unnecessary clutter, same goes for docstrings.

				* Prefer comments to clarify surprising behavior and/or relationships between parts of the code

				  rather than explain what the next line of code does.

				* Catching exceptions, prefer using a specific exception type rather than a broad catch-all like

				  `Exception`.

				* Error messages should be prefixed with "Failed to ..."

				* 4 spaces for indentation rather than tab

				* When using `# noqa` to suppress a style or linter warning, include a comment explaining the

				  justification for bypassing the check.

				* When using `# type: ignore` to suppress a mypy warning, include a comment explaining the

				  justification for bypassing the check.

				* Don't use unicode characters in the codebase. ASCII-only is preferred for compatibility or

				  readability reasons.

				* Providers configuration class should be Pydantic Field class. It should have a `description` field

				  that describes the configuration. These descriptions will be used to generate the provider

				  documentation.

				* When possible, use keyword arguments only when calling functions.

				* Llama Stack utilizes [custom Exception classes](llama_stack/apis/common/errors.py) for certain Resources that should be used where applicable.

				We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:

				```bash

				$ uv run pre-commit install

				```

				After that, pre-commit hooks will run automatically before each commit.

				Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:

				```bash

				$ uv run pre-commit run --all-files

				```

				> [!CAUTION]

				> Before pushing your changes, make sure that the pre-commit hooks have passed successfully.

				## Adding a new dependency to the project

				To add a new dependency to the project, you can use the `uv` command. For example, to add `foo` to the project, you can run:

				```bash

				$ uv add foo

				$ uv sync

				```

				## Coding Style

				* 4 spaces for indentation rather than tabs

				* 80 character line length

				* ...

				### License

				By contributing to Llama, you agree that your contributions will be licensed

				under the LICENSE file in the root directory of this source tree.

				## Common Tasks

				@ -123,35 +160,41 @@ Some tips about common tasks you work on while contributing to Llama Stack:

				### Using `llama stack build`

				Building a stack image (conda / docker) will use the production version of the `llama-stack`, `llama-models` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_MODELS_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands.

				Building a stack image will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands.

				Example:

				```bash

				$ cd work/

				$ git clone https://github.com/meta-llama/llama-stack.git

				$ git clone https://github.com/meta-llama/llama-models.git

				$ cd llama-stack

				$ LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack build --template <...>

				cd work/

				git clone https://github.com/meta-llama/llama-stack.git

				git clone https://github.com/meta-llama/llama-stack-client-python.git

				cd llama-stack

				LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --distro <...>

				```

				### Updating distribution configurations

				### Updating Provider Configurations

				If you have made changes to a provider's configuration in any form (introducing a new config key, or

				changing models, etc.), you should run `./scripts/distro_codegen.py` to re-generate various YAML

				files as well as the documentation. You should not change `docs/source/.../distributions/` files

				manually as they are auto-generated.

				If you have made changes to a provider's configuration in any form (introducing a new config key, or changing models, etc.), you should run `python llama_stack/scripts/distro_codegen.py` to re-generate various YAML files as well as the documentation. You should not change `docs/source/.../distributions/` files manually as they are auto-generated.

				### Updating the provider documentation

				If you have made changes to a provider's configuration, you should run `./scripts/provider_codegen.py`

				to re-generate the documentation. You should not change `docs/source/.../providers/` files manually

				as they are auto-generated.

				Note that the provider "description" field will be used to generate the provider documentation.

				### Building the Documentation

				If you are making changes to the documentation at [https://llama-stack.readthedocs.io/en/latest/](https://llama-stack.readthedocs.io/en/latest/), you can use the following command to build the documentation and preview your changes. You will need [Sphinx](https://www.sphinx-doc.org/en/master/) and the readthedocs theme.

				```bash

				$ cd llama-stack/docs

				$ uv sync --extra docs

				# This rebuilds the documentation pages.

				$ uv run make html

				uv run --group docs make -C docs/ html

				# This will start a local server (usually at http://127.0.0.1:8000) that automatically rebuilds and refreshes when you make changes to the documentation.

				$ uv run sphinx-autobuild source build/html --write-all

				uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all

				```

				### Update API Documentation

				@ -159,12 +202,7 @@ $ uv run sphinx-autobuild source build/html --write-all

				If you modify or add new API endpoints, update the API documentation accordingly. You can do this by running the following command:

				```bash

				$ uv sync --extra dev

				$ uv run ./docs/openapi_generator/run_openapi_generator.sh

				uv run ./docs/openapi_generator/run_openapi_generator.sh

				```

				The generated API documentation will be available in `docs/_static/`. Make sure to review the changes before committing.

				## License

				By contributing to Llama, you agree that your contributions will be licensed

				under the LICENSE file in the root directory of this source tree.

9

MANIFEST.in

View file

 @ -1,6 +1,9 @@
 include pyproject.toml
 include distributions/dependencies.json
 include llama_stack/distribution/*.sh
 include llama_stack/models/llama/llama3/tokenizer.model
 include llama_stack/models/llama/llama4/tokenizer.model
 include llama_stack/core/*.sh
 include llama_stack/cli/scripts/*.sh
 include llama_stack/templates/*/*.yaml
 include llama_stack/distributions/*/*.yaml
 include llama_stack/providers/tests/test_cases/inference/*.json
 include llama_stack/models/llama/*/*.md
 include llama_stack/tests/integration/*.jpg

									
										175

README.md
									
										View file
										
				@ -3,9 +3,85 @@

				[![PyPI version](https://img.shields.io/pypi/v/llama_stack.svg)](https://pypi.org/project/llama_stack/)

				[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-stack)](https://pypi.org/project/llama-stack/)

				[![License](https://img.shields.io/pypi/l/llama_stack.svg)](https://github.com/meta-llama/llama-stack/blob/main/LICENSE)

				[![Discord](https://img.shields.io/discord/1257833999603335178)](https://discord.gg/llama-stack)

				[![Discord](https://img.shields.io/discord/1257833999603335178?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/llama-stack)

				[![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)

				[![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)

				[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb)

				[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)

				### ✨🎉 Llama 4 Support  🎉✨

				We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.

				<details>

				<summary>👋 Click here to see how to run Llama 4 models on Llama Stack </summary>

				\

				*Note you need 8xH100 GPU-host to run these models*

				```bash

				pip install -U llama_stack

				MODEL="Llama-4-Scout-17B-16E-Instruct"

				# get meta url from llama.com

				llama model download --source meta --model-id $MODEL --meta-url <META_URL>

				# start a llama stack server

				INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu

				# install client to interact with the server

				pip install llama-stack-client

				```

				### CLI

				```bash

				# Run a chat completion

				MODEL="Llama-4-Scout-17B-16E-Instruct"

				llama-stack-client --endpoint http://localhost:8321 \

				inference chat-completion \

				--model-id meta-llama/$MODEL \

				--message "write a haiku for meta's llama 4 models"

				ChatCompletionResponse(

				    completion_message=CompletionMessage(content="Whispers in code born\nLlama's gentle, wise heartbeat\nFuture's soft unfold", role='assistant', stop_reason='end_of_turn', tool_calls=[]),

				    logprobs=None,

				    metrics=[Metric(metric='prompt_tokens', value=21.0, unit=None), Metric(metric='completion_tokens', value=28.0, unit=None), Metric(metric='total_tokens', value=49.0, unit=None)]

				)

				```

				### Python SDK

				```python

				from llama_stack_client import LlamaStackClient

				client = LlamaStackClient(base_url=f"http://localhost:8321")

				model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"

				prompt = "Write a haiku about coding"

				print(f"User> {prompt}")

				response = client.inference.chat_completion(

				    model_id=model_id,

				    messages=[

				        {"role": "system", "content": "You are a helpful assistant."},

				        {"role": "user", "content": prompt},

				    ],

				)

				print(f"Assistant> {response.completion_message.content}")

				```

				As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!

				</details>

				### 🚀 One-Line Installer 🚀

				To try Llama Stack locally, run:

				```bash

				curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/scripts/install.sh | bash

				```

				### Overview

				Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides

				@ -33,60 +109,49 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on

				### API Providers

				Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.

				Please checkout for [full list](https://llama-stack.readthedocs.io/en/latest/providers/index.html)

				| **API Provider Builder** |    **Environments**    | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** |

				|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|

				|      Meta Reference      |      Single Node       |     ✅      |       ✅       |     ✅      |     ✅      |       ✅       |

				|        SambaNova         |         Hosted         |            |       ✅       |            |            |               |

				|         Cerebras         |         Hosted         |            |       ✅       |            |            |               |

				|        Fireworks         |         Hosted         |     ✅      |       ✅       |     ✅      |            |               |

				|       AWS Bedrock        |         Hosted         |            |       ✅       |            |     ✅      |               |

				|         Together         |         Hosted         |     ✅      |       ✅       |            |     ✅      |               |

				|           Groq           |         Hosted         |            |       ✅       |            |            |               |

				|          Ollama          |      Single Node       |            |       ✅       |            |            |               |

				|           TGI            | Hosted and Single Node |            |       ✅       |            |            |               |

				|        NVIDIA NIM        | Hosted and Single Node |            |       ✅       |            |            |               |

				|          Chroma          |      Single Node       |            |               |     ✅      |            |               |

				|        PG Vector         |      Single Node       |            |               |     ✅      |            |               |

				|    PyTorch ExecuTorch    |     On-device iOS      |     ✅      |       ✅       |            |            |               |

				|           vLLM           | Hosted and Single Node |            |       ✅       |            |            |               |

				| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |

				|:--------------------:|:------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:--------:|

				|    Meta Reference    | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |

				|      SambaNova       | Hosted | | ✅ | | ✅ | | | | |

				|       Cerebras       | Hosted | | ✅ | | | | | | |

				|      Fireworks       | Hosted | ✅ | ✅ | ✅ | | | | | |

				|     AWS Bedrock      | Hosted | | ✅ | | ✅ | | | | |

				|       Together       | Hosted | ✅ | ✅ | | ✅ | | | | |

				|         Groq         | Hosted | | ✅ | | | | | | |

				|        Ollama        | Single Node | | ✅ | | | | | | |

				|         TGI          | Hosted/Single Node | | ✅ | | | | | | |

				|      NVIDIA NIM      | Hosted/Single Node | | ✅ | | ✅ | | | | |

				|       ChromaDB       | Hosted/Single Node | | | ✅ | | | | | |

				|        Milvus        | Hosted/Single Node | | | ✅ | | | | | |

				|        Qdrant        | Hosted/Single Node | | | ✅ | | | | | |

				|       Weaviate       | Hosted/Single Node | | | ✅ | | | | | |

				|      SQLite-vec      | Single Node | | | ✅ | | | | | |

				|      PG Vector       | Single Node | | | ✅ | | | | | |

				|  PyTorch ExecuTorch  | On-device iOS | ✅ | ✅ | | | | | | |

				|         vLLM         | Single Node | | ✅ | | | | | | |

				|        OpenAI        | Hosted | | ✅ | | | | | | |

				|      Anthropic       | Hosted | | ✅ | | | | | | |

				|        Gemini        | Hosted | | ✅ | | | | | | |

				|       WatsonX        | Hosted | | ✅ | | | | | | |

				|     HuggingFace      | Single Node | | | | | | ✅ | | ✅ |

				|      TorchTune       | Single Node | | | | | | ✅ | | |

				|     NVIDIA NEMO      | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ |

				|        NVIDIA        | Hosted | | | | | | ✅ | ✅ | ✅ |

				> **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation.

				### Distributions

				A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:

				A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code.

				Here are some of the distributions we support:

				|               **Distribution**                |                                                                    **Llama Stack Docker**                                                                     |                                                 Start This Distribution                                                  |

				|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|

				|                Starter Distribution                 |           [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general)           |      [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/starter.html)      |

				|                Meta Reference                 |           [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general)           |      [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html)      |

				|           Meta Reference Quantized            | [llamastack/distribution-meta-reference-quantized-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-quantized-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-quantized-gpu.html) |

				|                   SambaNova                   |                     [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general)                     |   [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html)   |

				|                   Cerebras                    |                     [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general)                     |   [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html)   |

				|                    Ollama                     |                       [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general)                       |            [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html)            |

				|                      TGI                      |                          [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general)                          |             [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html)              |

				|                   Together                    |                     [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general)                     |           [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html)           |

				|                   Fireworks                   |                    [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general)                    |          [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html)           |

				| vLLM |                  [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general)                  |         [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html)          |

				### Installation

				You have two ways to install this repository:

				* **Install as a package**:

				   You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:

				   ```bash

				   pip install llama-stack

				   ```

				* **Install from source**:

				   If you prefer to install from the source code, we recommend using [uv](https://github.com/astral-sh/uv).

				   Then, run the following commands:

				   ```bash

				    git clone git@github.com:meta-llama/llama-stack.git

				    cd llama-stack

				    uv sync

				    uv pip install -e .

				   ```

				|                   PostgreSQL                  |                [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general)                |                  |

				### Documentation

				@ -115,3 +180,17 @@ Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest

				Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.

				You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.

				## 🌟 GitHub Star History

				## Star History

				[![Star History Chart](https://api.star-history.com/svg?repos=meta-llama/llama-stack&type=Date)](https://www.star-history.com/#meta-llama/llama-stack&Date)

				## ✨ Contributors

				Thanks to all of our amazing contributors!

				<a href="https://github.com/meta-llama/llama-stack/graphs/contributors">

				  <img src="https://contrib.rocks/image?repo=meta-llama/llama-stack" />

				</a>

21

coverage.svg Normal file

View file

 @ -0,0 +1,21 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <svg xmlns="http://www.w3.org/2000/svg" width="99" height="20">
     <linearGradient id="b" x2="0" y2="100%">
         <stop offset="0" stop-color="#bbb" stop-opacity=".1"/>
         <stop offset="1" stop-opacity=".1"/>
     </linearGradient>
     <mask id="a">
         <rect width="99" height="20" rx="3" fill="#fff"/>
     </mask>
     <g mask="url(#a)">
         <path fill="#555" d="M0 0h63v20H0z"/>
         <path fill="#fe7d37" d="M63 0h36v20H63z"/>
         <path fill="url(#b)" d="M0 0h99v20H0z"/>
     </g>
     <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="11">
         <text x="31.5" y="15" fill="#010101" fill-opacity=".3">coverage</text>
         <text x="31.5" y="14">coverage</text>
         <text x="80" y="15" fill="#010101" fill-opacity=".3">44%</text>
         <text x="80" y="14">44%</text>
     </g>
 </svg>

After

Width: | Height: | Size: 904 B

1

distributions/bedrock/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/bedrock/build.yaml`

									
										15

distributions/bedrock/compose.yaml
									
										View file
									
				@ -1,15 +0,0 @@

				services:

				  llamastack:

				    image: distribution-bedrock

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/llamastack-run-bedrock.yaml

				    ports:

				      - "8321:8321"

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml_config /root/llamastack-run-bedrock.yaml"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

1

distributions/bedrock/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/bedrock/run.yaml`

1

distributions/cerebras/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/cerebras/build.yaml`

									
										16

distributions/cerebras/compose.yaml
									
										View file
									
				@ -1,16 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-cerebras

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/llamastack-run-cerebras.yaml

				    ports:

				      - "8321:8321"

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml_config /root/llamastack-run-cerebras.yaml"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

1

distributions/cerebras/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/cerebras/run.yaml`

									
										50

distributions/dell-tgi/compose.yaml
									
										View file
									
				@ -1,50 +0,0 @@

				services:

				  text-generation-inference:

				    image: registry.dell.huggingface.co/enterprise-dell-inference-meta-llama-meta-llama-3.1-8b-instruct

				    network_mode: "host"

				    volumes:

				      - $HOME/.cache/huggingface:/data

				    ports:

				      - "5009:5009"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=0,1,2,3,4

				      - NUM_SHARD=4

				      - MAX_BATCH_PREFILL_TOKENS=32768

				      - MAX_INPUT_TOKENS=8000

				      - MAX_TOTAL_TOKENS=8192

				    command: []

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            # that's the closest analogue to --gpus; provide

				            # an integer amount of devices or 'all'

				            count: all

				            # Devices are reserved using a list of capabilities, making

				            # capabilities the only required field. A device MUST

				            # satisfy all the requested capabilities for a successful

				            # reservation.

				            capabilities: [gpu]

				    runtime: nvidia

				  llamastack:

				    depends_on:

				      text-generation-inference:

				        condition: service_healthy

				    image: llamastack/distribution-tgi

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      # Link to TGI run.yaml file

				      - ./run.yaml:/root/my-run.yaml

				    ports:

				      - "8321:8321"

				    # Hack: wait for TGI server to start before starting docker

				    entrypoint: bash -c "sleep 60; python -m llama_stack.distribution.server.server --yaml_config /root/my-run.yaml"

				    restart_policy:

				      condition: on-failure

				      delay: 3s

				      max_attempts: 5

				      window: 60s

									
										44

distributions/dell-tgi/run.yaml
									
										View file
									
				@ -1,44 +0,0 @@

				version: '2'

				image_name: local

				container_image: null

				conda_env: local

				apis:

				- shields

				- agents

				- models

				- memory

				- memory_banks

				- inference

				- safety

				providers:

				  inference:

				  - provider_id: tgi0

				    provider_type: remote::tgi

				    config:

				      url: http://127.0.0.1:80

				  safety:

				  - provider_id: meta0

				    provider_type: inline::llama-guard

				    config:

				      model: Llama-Guard-3-1B

				      excluded_categories: []

				  - provider_id: meta1

				    provider_type: inline::prompt-guard

				    config:

				      model: Prompt-Guard-86M

				  memory:

				  - provider_id: meta0

				    provider_type: inline::faiss

				    config: {}

				  agents:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config:

				      persistence_store:

				        namespace: null

				        type: sqlite

				        db_path: ~/.llama/runtime/kvstore.db

				  telemetry:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config: {}

									
										625

distributions/dependencies.json
									
										View file
									
				@ -1,625 +0,0 @@

				{

				  "bedrock": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "boto3",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn"

				  ],

				  "cerebras": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "cerebras_cloud_sdk",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "ci-tests": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "fastapi",

				    "fire",

				    "fireworks-ai",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "sqlite-vec",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "dell": [

				    "aiohttp",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "huggingface_hub",

				    "matplotlib",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "dev": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "fastapi",

				    "fire",

				    "fireworks-ai",

				    "httpx",

				    "litellm",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "sqlite-vec",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "fireworks": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "fireworks-ai",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "groq": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "litellm",

				    "matplotlib",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn"

				  ],

				  "hf-endpoint": [

				    "aiohttp",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "huggingface_hub",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn"

				  ],

				  "hf-serverless": [

				    "aiohttp",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "huggingface_hub",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "meta-reference-gpu": [

				    "accelerate",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "fairscale",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "lm-format-enforcer",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentence-transformers",

				    "sentencepiece",

				    "torch",

				    "torchvision",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "zmq"

				  ],

				  "meta-reference-quantized-gpu": [

				    "accelerate",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "fairscale",

				    "faiss-cpu",

				    "fastapi",

				    "fbgemm-gpu",

				    "fire",

				    "httpx",

				    "lm-format-enforcer",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentence-transformers",

				    "sentencepiece",

				    "torch",

				    "torchao==0.5.0",

				    "torchvision",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "zmq"

				  ],

				  "nvidia": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn"

				  ],

				  "ollama": [

				    "aiohttp",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "ollama",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "sqlite-vec",

				    "tqdm",

				    "transformers",

				    "uvicorn"

				  ],

				  "remote-vllm": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "sambanova": [

				    "aiosqlite",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn"

				  ],

				  "tgi": [

				    "aiohttp",

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "huggingface_hub",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "together": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "together",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ],

				  "vllm-gpu": [

				    "aiosqlite",

				    "autoevals",

				    "blobfile",

				    "chardet",

				    "chromadb-client",

				    "datasets",

				    "faiss-cpu",

				    "fastapi",

				    "fire",

				    "httpx",

				    "matplotlib",

				    "mcp",

				    "nltk",

				    "numpy",

				    "openai",

				    "opentelemetry-exporter-otlp-proto-http",

				    "opentelemetry-sdk",

				    "pandas",

				    "pillow",

				    "psycopg2-binary",

				    "pymongo",

				    "pypdf",

				    "redis",

				    "requests",

				    "scikit-learn",

				    "scipy",

				    "sentencepiece",

				    "tqdm",

				    "transformers",

				    "uvicorn",

				    "vllm",

				    "sentence-transformers --no-deps",

				    "torch torchvision --index-url https://download.pytorch.org/whl/cpu"

				  ]

				}

1

distributions/fireworks/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/fireworks/build.yaml`

									
										14

distributions/fireworks/compose.yaml
									
										View file
									
				@ -1,14 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-fireworks

				    ports:

				      - "8321:8321"

				    environment:

				      - FIREWORKS_API_KEY=${FIREWORKS_API_KEY}

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --template fireworks"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

1

distributions/fireworks/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/fireworks/run.yaml`

1

distributions/meta-reference-gpu/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/meta-reference-gpu/build.yaml`

									
										34

distributions/meta-reference-gpu/compose.yaml
									
										View file
									
				@ -1,34 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-meta-reference-gpu

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/my-run.yaml

				    ports:

				      - "8321:8321"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=0

				    command: []

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            # that's the closest analogue to --gpus; provide

				            # an integer amount of devices or 'all'

				            count: 1

				            # Devices are reserved using a list of capabilities, making

				            # capabilities the only required field. A device MUST

				            # satisfy all the requested capabilities for a successful

				            # reservation.

				            capabilities: [gpu]

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

				    runtime: nvidia

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml_config /root/my-run.yaml"

1

distributions/meta-reference-gpu/run-with-safety.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/meta-reference-gpu/run-with-safety.yaml`

1

distributions/meta-reference-gpu/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/meta-reference-gpu/run.yaml`

1

distributions/meta-reference-quantized-gpu/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/meta-reference-quantized-gpu/build.yaml`

									
										35

distributions/meta-reference-quantized-gpu/compose.yaml
									
										View file
									
				@ -1,35 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-meta-reference-quantized-gpu

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/my-run.yaml

				    ports:

				      - "8321:8321"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=0

				    command: []

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            # that's the closest analogue to --gpus; provide

				            # an integer amount of devices or 'all'

				            count: 1

				            # Devices are reserved using a list of capabilities, making

				            # capabilities the only required field. A device MUST

				            # satisfy all the requested capabilities for a successful

				            # reservation.

				            capabilities: [gpu]

				    runtime: nvidia

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml_config /root/my-run.yaml"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

									
										58

distributions/meta-reference-quantized-gpu/run.yaml
									
										View file
									
				@ -1,58 +0,0 @@

				version: '2'

				image_name: local

				container_image: null

				conda_env: local

				apis:

				- shields

				- agents

				- models

				- memory

				- memory_banks

				- inference

				- safety

				providers:

				  inference:

				  - provider_id: meta0

				    provider_type: inline::meta-reference-quantized

				    config:

				      model: Llama3.2-3B-Instruct:int4-qlora-eo8

				      quantization:

				        type: int4

				      torch_seed: null

				      max_seq_len: 2048

				      max_batch_size: 1

				  - provider_id: meta1

				    provider_type: inline::meta-reference-quantized

				    config:

				      # not a quantized model !

				      model: Llama-Guard-3-1B

				      quantization: null

				      torch_seed: null

				      max_seq_len: 2048

				      max_batch_size: 1

				  safety:

				  - provider_id: meta0

				    provider_type: inline::llama-guard

				    config:

				      model: Llama-Guard-3-1B

				      excluded_categories: []

				  - provider_id: meta1

				    provider_type: inline::prompt-guard

				    config:

				      model: Prompt-Guard-86M

				  memory:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config: {}

				  agents:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config:

				      persistence_store:

				        namespace: null

				        type: sqlite

				        db_path: ~/.llama/runtime/kvstore.db

				  telemetry:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config: {}

1

distributions/ollama/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/ollama/build.yaml`

									
										71

distributions/ollama/compose.yaml
									
										View file
									
				@ -1,71 +0,0 @@

				services:

				  ollama:

				    image: ollama/ollama:latest

				    network_mode: ${NETWORK_MODE:-bridge}

				    volumes:

				      - ~/.ollama:/root/.ollama

				    ports:

				      - "11434:11434"

				    environment:

				      OLLAMA_DEBUG: 1

				    command: []

				    deploy:

				      resources:

				        limits:

				          memory: 8G    # Set maximum memory

				        reservations:

				          memory: 8G    # Set minimum memory reservation

				    # healthcheck:

				    #   # ugh, no CURL in ollama image

				    #   test: ["CMD", "curl", "-f", "http://ollama:11434"]

				    #   interval: 10s

				    #   timeout: 5s

				    #   retries: 5

				  ollama-init:

				    image: ollama/ollama:latest

				    depends_on:

				      - ollama

				        # condition: service_healthy

				    network_mode: ${NETWORK_MODE:-bridge}

				    environment:

				      - OLLAMA_HOST=ollama

				      - INFERENCE_MODEL=${INFERENCE_MODEL}

				      - SAFETY_MODEL=${SAFETY_MODEL:-}

				    volumes:

				      - ~/.ollama:/root/.ollama

				      - ./pull-models.sh:/pull-models.sh

				    entrypoint: ["/pull-models.sh"]

				  llamastack:

				    depends_on:

				      ollama:

				        condition: service_started

				      ollama-init:

				        condition: service_started

				    image: ${LLAMA_STACK_IMAGE:-llamastack/distribution-ollama}

				    network_mode: ${NETWORK_MODE:-bridge}

				    volumes:

				      - ~/.llama:/root/.llama

				      # Link to ollama run.yaml file

				      - ~/local/llama-stack/:/app/llama-stack-source

				      - ./run${SAFETY_MODEL:+-with-safety}.yaml:/root/my-run.yaml

				    ports:

				      - "${LLAMA_STACK_PORT:-5001}:${LLAMA_STACK_PORT:-5001}"

				    environment:

				      - INFERENCE_MODEL=${INFERENCE_MODEL}

				      - SAFETY_MODEL=${SAFETY_MODEL:-}

				      - OLLAMA_URL=http://ollama:11434

				    entrypoint: >

				        python -m llama_stack.distribution.server.server /root/my-run.yaml \

				        --port ${LLAMA_STACK_PORT:-5001}

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 10s

				        max_attempts: 3

				        window: 60s

				volumes:

				  ollama:

				  ollama-init:

				  llamastack:

									
										18

distributions/ollama/pull-models.sh
									
										View file
									
				@ -1,18 +0,0 @@

				#!/bin/sh

				# Copyright (c) Meta Platforms, Inc. and affiliates.

				# All rights reserved.

				#

				# This source code is licensed under the terms described in the LICENSE file in

				# the root directory of this source tree.

				echo "Preloading (${INFERENCE_MODEL}, ${SAFETY_MODEL})..."

				for model in ${INFERENCE_MODEL} ${SAFETY_MODEL}; do

				  echo "Preloading $model..."

				  if ! ollama run "$model"; then

				    echo "Failed to pull and run $model"

				    exit 1

				  fi

				done

				echo "All models pulled successfully"

1

distributions/ollama/run-with-safety.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/ollama/run-with-safety.yaml`

1

distributions/ollama/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/ollama/run.yaml`

1

distributions/remote-nvidia/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/nvidia/build.yaml`

									
										19

distributions/remote-nvidia/compose.yaml
									
										View file
									
				@ -1,19 +0,0 @@

				services:

				  llamastack:

				    image: distribution-nvidia:dev

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/llamastack-run-nvidia.yaml

				    ports:

				      - "8321:8321"

				    environment:

				      - INFERENCE_MODEL=${INFERENCE_MODEL:-Llama3.1-8B-Instruct}

				      - NVIDIA_API_KEY=${NVIDIA_API_KEY:-}

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml-config /root/llamastack-run-nvidia.yaml"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

1

distributions/remote-nvidia/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/nvidia/run.yaml`

1

distributions/remote-vllm/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/remote-vllm/build.yaml`

									
										100

distributions/remote-vllm/compose.yaml
									
										View file
									
				@ -1,100 +0,0 @@

				services:

				  vllm-inference:

				    image: vllm/vllm-openai:latest

				    volumes:

				      - $HOME/.cache/huggingface:/root/.cache/huggingface

				    network_mode: ${NETWORK_MODE:-bridged}

				    ports:

				       - "${VLLM_INFERENCE_PORT:-5100}:${VLLM_INFERENCE_PORT:-5100}"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=${VLLM_INFERENCE_GPU:-0}

				      - HUGGING_FACE_HUB_TOKEN=$HF_TOKEN

				    command: >

				      --gpu-memory-utilization 0.75

				      --model ${VLLM_INFERENCE_MODEL:-meta-llama/Llama-3.2-3B-Instruct}

				      --enforce-eager

				      --max-model-len 8192

				      --max-num-seqs 16

				      --port ${VLLM_INFERENCE_PORT:-5100}

				    healthcheck:

				      test: ["CMD", "curl", "-f", "http://localhost:${VLLM_INFERENCE_PORT:-5100}/v1/health"]

				      interval: 30s

				      timeout: 10s

				      retries: 5

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            capabilities: [gpu]

				    runtime: nvidia

				  # A little trick:

				  # if VLLM_SAFETY_MODEL is set, we will create a service for the safety model

				  # otherwise, the entry will end in a hyphen which gets ignored by docker compose

				  vllm-${VLLM_SAFETY_MODEL:+safety}:

				    image: vllm/vllm-openai:latest

				    volumes:

				      - $HOME/.cache/huggingface:/root/.cache/huggingface

				    network_mode: ${NETWORK_MODE:-bridged}

				    ports:

				      - "${VLLM_SAFETY_PORT:-5101}:${VLLM_SAFETY_PORT:-5101}"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=${VLLM_SAFETY_GPU:-1}

				      - HUGGING_FACE_HUB_TOKEN=$HF_TOKEN

				    command: >

				      --gpu-memory-utilization 0.75

				      --model ${VLLM_SAFETY_MODEL}

				      --enforce-eager

				      --max-model-len 8192

				      --max-num-seqs 16

				      --port ${VLLM_SAFETY_PORT:-5101}

				    healthcheck:

				      test: ["CMD", "curl", "-f", "http://localhost:${VLLM_SAFETY_PORT:-5101}/v1/health"]

				      interval: 30s

				      timeout: 10s

				      retries: 5

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            capabilities: [gpu]

				    runtime: nvidia

				  llamastack:

				    depends_on:

				      - vllm-inference:

				          condition: service_healthy

				      - vllm-${VLLM_SAFETY_MODEL:+safety}:

				          condition: service_healthy

				    # image: llamastack/distribution-remote-vllm

				    image: llamastack/distribution-remote-vllm:test-0.0.52rc3

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run${VLLM_SAFETY_MODEL:+-with-safety}.yaml:/root/llamastack-run-remote-vllm.yaml

				    network_mode: ${NETWORK_MODE:-bridged}

				    environment:

				      - VLLM_URL=http://vllm-inference:${VLLM_INFERENCE_PORT:-5100}/v1

				      - VLLM_SAFETY_URL=http://vllm-safety:${VLLM_SAFETY_PORT:-5101}/v1

				      - INFERENCE_MODEL=${INFERENCE_MODEL:-meta-llama/Llama-3.2-3B-Instruct}

				      - MAX_TOKENS=${MAX_TOKENS:-4096}

				      - SQLITE_STORE_DIR=${SQLITE_STORE_DIR:-$HOME/.llama/distributions/remote-vllm}

				      - SAFETY_MODEL=${SAFETY_MODEL:-meta-llama/Llama-Guard-3-1B}

				    ports:

				      - "${LLAMA_STACK_PORT:-5001}:${LLAMA_STACK_PORT:-5001}"

				    # Hack: wait for vLLM server to start before starting docker

				    entrypoint: bash -c "sleep 60; python -m llama_stack.distribution.server.server --yaml_config /root/llamastack-run-remote-vllm.yaml --port 5001"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

				volumes:

				  vllm-inference:

				  vllm-safety:

				  llamastack:

1

distributions/remote-vllm/run-with-safety.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/remote-vllm/run-with-safety.yaml`

1

distributions/remote-vllm/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/remote-vllm/run.yaml`

									
										9

distributions/runpod/build.yaml
									
										View file
									
				@ -1,9 +0,0 @@

				name: runpod

				distribution_spec:

				  description: Use Runpod for running LLM inference

				  providers:

				    inference: remote::runpod

				    memory: meta-reference

				    safety: meta-reference

				    agents: meta-reference

				    telemetry: meta-reference

1

distributions/sambanova/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/sambanova/build.yaml`

									
										16

distributions/sambanova/compose.yaml
									
										View file
									
				@ -1,16 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-sambanova

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/llamastack-run-sambanova.yaml

				    ports:

				      - "5000:5000"

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml_config /root/llamastack-run-sambanova.yaml"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

1

distributions/sambanova/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/sambanova/run.yaml`

1

distributions/tgi/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/tgi/build.yaml`

									
										103

distributions/tgi/compose.yaml
									
										View file
									
				@ -1,103 +0,0 @@

				services:

				  tgi-inference:

				    image: ghcr.io/huggingface/text-generation-inference:latest

				    volumes:

				      - $HOME/.cache/huggingface:/data

				    network_mode: ${NETWORK_MODE:-bridged}

				    ports:

				       - "${TGI_INFERENCE_PORT:-8080}:${TGI_INFERENCE_PORT:-8080}"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=${TGI_INFERENCE_GPU:-0}

				      - HF_TOKEN=$HF_TOKEN

				      - HF_HOME=/data

				      - HF_DATASETS_CACHE=/data

				      - HF_MODULES_CACHE=/data

				      - HF_HUB_CACHE=/data

				    command: >

				      --dtype bfloat16

				      --usage-stats off

				      --sharded false

				      --model-id ${TGI_INFERENCE_MODEL:-meta-llama/Llama-3.2-3B-Instruct}

				      --port ${TGI_INFERENCE_PORT:-8080}

				      --cuda-memory-fraction 0.75

				    healthcheck:

				      test: ["CMD", "curl", "-f", "http://tgi-inference:${TGI_INFERENCE_PORT:-8080}/health"]

				      interval: 5s

				      timeout: 5s

				      retries: 30

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            capabilities: [gpu]

				    runtime: nvidia

				  tgi-${TGI_SAFETY_MODEL:+safety}:

				    image: ghcr.io/huggingface/text-generation-inference:latest

				    volumes:

				      - $HOME/.cache/huggingface:/data

				    network_mode: ${NETWORK_MODE:-bridged}

				    ports:

				       - "${TGI_SAFETY_PORT:-8081}:${TGI_SAFETY_PORT:-8081}"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=${TGI_SAFETY_GPU:-1}

				      - HF_TOKEN=$HF_TOKEN

				      - HF_HOME=/data

				      - HF_DATASETS_CACHE=/data

				      - HF_MODULES_CACHE=/data

				      - HF_HUB_CACHE=/data

				    command: >

				      --dtype bfloat16

				      --usage-stats off

				      --sharded false

				      --model-id ${TGI_SAFETY_MODEL:-meta-llama/Llama-Guard-3-1B}

				      --port ${TGI_SAFETY_PORT:-8081}

				      --cuda-memory-fraction 0.75

				    healthcheck:

				      test: ["CMD", "curl", "-f", "http://tgi-safety:${TGI_SAFETY_PORT:-8081}/health"]

				      interval: 5s

				      timeout: 5s

				      retries: 30

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            capabilities: [gpu]

				    runtime: nvidia

				  llamastack:

				    depends_on:

				      tgi-inference:

				        condition: service_healthy

				      tgi-${TGI_SAFETY_MODEL:+safety}:

				        condition: service_healthy

				    image: llamastack/distribution-tgi:test-0.0.52rc3

				    network_mode: ${NETWORK_MODE:-bridged}

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run${TGI_SAFETY_MODEL:+-with-safety}.yaml:/root/my-run.yaml

				    ports:

				      - "${LLAMA_STACK_PORT:-5001}:${LLAMA_STACK_PORT:-5001}"

				    # Hack: wait for TGI server to start before starting docker

				    entrypoint: bash -c "sleep 60; python -m llama_stack.distribution.server.server --yaml_config /root/my-run.yaml"

				    restart_policy:

				      condition: on-failure

				      delay: 3s

				      max_attempts: 5

				      window: 60s

				    environment:

				      - TGI_URL=http://tgi-inference:${TGI_INFERENCE_PORT:-8080}

				      - SAFETY_TGI_URL=http://tgi-safety:${TGI_SAFETY_PORT:-8081}

				      - INFERENCE_MODEL=${INFERENCE_MODEL:-meta-llama/Llama-3.2-3B-Instruct}

				      - SAFETY_MODEL=${SAFETY_MODEL:-meta-llama/Llama-Guard-3-1B}

				volumes:

				  tgi-inference:

				  tgi-safety:

				  llamastack:

1

distributions/tgi/run-with-safety.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/tgi/run-with-safety.yaml`

1

distributions/tgi/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/tgi/run.yaml`

1

distributions/together/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/together/build.yaml`

									
										14

distributions/together/compose.yaml
									
										View file
									
				@ -1,14 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-together

				    ports:

				      - "8321:8321"

				    environment:

				      - TOGETHER_API_KEY=${TOGETHER_API_KEY}

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --template together"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

1

distributions/together/run.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/together/run.yaml`

1

distributions/vllm-gpu/build.yaml

View file

				`@ -1 +0,0 @@`
				`../../llama_stack/templates/inline-vllm/build.yaml`

									
										35

distributions/vllm-gpu/compose.yaml
									
										View file
									
				@ -1,35 +0,0 @@

				services:

				  llamastack:

				    image: llamastack/distribution-inline-vllm

				    network_mode: "host"

				    volumes:

				      - ~/.llama:/root/.llama

				      - ./run.yaml:/root/my-run.yaml

				    ports:

				      - "8321:8321"

				    devices:

				      - nvidia.com/gpu=all

				    environment:

				      - CUDA_VISIBLE_DEVICES=0

				    command: []

				    deploy:

				      resources:

				        reservations:

				          devices:

				          - driver: nvidia

				            # that's the closest analogue to --gpus; provide

				            # an integer amount of devices or 'all'

				            count: 1

				            # Devices are reserved using a list of capabilities, making

				            # capabilities the only required field. A device MUST

				            # satisfy all the requested capabilities for a successful

				            # reservation.

				            capabilities: [gpu]

				    runtime: nvidia

				    entrypoint: bash -c "python -m llama_stack.distribution.server.server --yaml_config /root/my-run.yaml"

				    deploy:

				      restart_policy:

				        condition: on-failure

				        delay: 3s

				        max_attempts: 5

				        window: 60s

									
										66

distributions/vllm-gpu/run.yaml
									
										View file
									
				@ -1,66 +0,0 @@

				version: '2'

				image_name: local

				container_image: null

				conda_env: local

				apis:

				- shields

				- agents

				- models

				- memory

				- memory_banks

				- inference

				- safety

				providers:

				  inference:

				  - provider_id: vllm-inference

				    provider_type: inline::vllm

				    config:

				      model: Llama3.2-3B-Instruct

				      tensor_parallel_size: 1

				      gpu_memory_utilization: 0.4

				      enforce_eager: true

				      max_tokens: 4096

				  - provider_id: vllm-inference-safety

				    provider_type: inline::vllm

				    config:

				      model: Llama-Guard-3-1B

				      tensor_parallel_size: 1

				      gpu_memory_utilization: 0.2

				      enforce_eager: true

				      max_tokens: 4096

				  safety:

				  - provider_id: meta0

				    provider_type: inline::llama-guard

				    config:

				      model: Llama-Guard-3-1B

				      excluded_categories: []

				  # Uncomment to use prompt guard

				  # - provider_id: meta1

				  #   provider_type: inline::prompt-guard

				  #   config:

				  #     model: Prompt-Guard-86M

				  memory:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config: {}

				  # Uncomment to use pgvector

				  # - provider_id: pgvector

				  #   provider_type: remote::pgvector

				  #   config:

				  #     host: 127.0.0.1

				  #     port: 5432

				  #     db: postgres

				  #     user: postgres

				  #     password: mysecretpassword

				  agents:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config:

				      persistence_store:

				        namespace: null

				        type: sqlite

				        db_path: ~/.llama/runtime/agents_store.db

				  telemetry:

				  - provider_id: meta0

				    provider_type: inline::meta-reference

				    config: {}

8

docs/readme.md → docs/README.md

View file

 @ -2,6 +2,14 @@
 Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our [ReadTheDocs page](https://llama-stack.readthedocs.io/en/latest/index.html).
 ## Render locally
 From the llama-stack root directory, run the following command to render the docs locally:
 ```bash
 uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
 ```
 You can open up the docs in your browser at http://localhost:8000
 ## Content
 Try out Llama Stack's capabilities through our detailed Jupyter notebooks:

									
										17

docs/_static/css/my_theme.css
									
										vendored
									
										View file
										
				@ -16,3 +16,20 @@

				.hide-title h1 {

				    display: none;

				}

				h2, h3, h4 {

				    font-weight: normal;

				}

				html[data-theme="dark"] .rst-content div[class^="highlight"] {

				  background-color: #0b0b0b;

				}

				pre {

				    white-space: pre-wrap !important;

				    word-break: break-all;

				}

				[data-theme="dark"] .mermaid {

				    background-color: #f4f4f6 !important;

				    border-radius: 6px;

				    padding: 0.5em;

				  }

									
										32

docs/_static/js/detect_theme.js
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,32 @@

				document.addEventListener("DOMContentLoaded", function () {

				  const prefersDark = window.matchMedia("(prefers-color-scheme: dark)").matches;

				  const htmlElement = document.documentElement;

				  // Check if theme is saved in localStorage

				  const savedTheme = localStorage.getItem("sphinx-rtd-theme");

				  if (savedTheme) {

				    // Use the saved theme preference

				    htmlElement.setAttribute("data-theme", savedTheme);

				    document.body.classList.toggle("dark", savedTheme === "dark");

				  } else {

				    // Fall back to system preference

				    const theme = prefersDark ? "dark" : "light";

				    htmlElement.setAttribute("data-theme", theme);

				    document.body.classList.toggle("dark", theme === "dark");

				    // Save initial preference

				    localStorage.setItem("sphinx-rtd-theme", theme);

				  }

				  // Listen for theme changes from the existing toggle

				  const observer = new MutationObserver(function(mutations) {

				    mutations.forEach(function(mutation) {

				      if (mutation.attributeName === "data-theme") {

				        const currentTheme = htmlElement.getAttribute("data-theme");

				        localStorage.setItem("sphinx-rtd-theme", currentTheme);

				      }

				    });

				  });

				  observer.observe(htmlElement, { attributes: true });

				});

									
										14

docs/_static/js/keyboard_shortcuts.js
									
										vendored
									
										Normal file
									
										View file
										
				@ -0,0 +1,14 @@

				document.addEventListener('keydown', function(event) {

				  // command+K or ctrl+K

				  if ((event.metaKey || event.ctrlKey) && event.key === 'k') {

				    event.preventDefault();

				    document.querySelector('.search-input, .search-field, input[name="q"]').focus();

				  }

				  // forward slash

				  if (event.key === '/' &&

				      !event.target.matches('input, textarea, select')) {

				    event.preventDefault();

				    document.querySelector('.search-input, .search-field, input[name="q"]').focus();

				  }

				});

11644

docs/_static/llama-stack-spec.html vendored

View file

File diff suppressed because it is too large Load diff

8842

docs/_static/llama-stack-spec.yaml vendored

View file

File diff suppressed because it is too large Load diff

BIN
docs/_static/providers/vector_io/read_time_comparison_sqlite-vec-faiss.png vendored Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 33 KiB

BIN
docs/_static/providers/vector_io/write_time_comparison_sqlite-vec-faiss.png vendored Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 37 KiB

BIN
docs/_static/providers/vector_io/write_time_sequence_sqlite-vec-faiss.png vendored Normal file

View file

Binary file not shown.

After

Width: | Height: | Size: 56 KiB

									
										15

docs/conftest.py
									
										View file
										
				@ -4,6 +4,21 @@

				# This source code is licensed under the terms described in the LICENSE file in

				# the root directory of this source tree.

				import os

				import time

				def pytest_collection_modifyitems(items):

				    for item in items:

				        item.name = item.name.replace(' ', '_') 

				def pytest_runtest_teardown(item):

				    interval_seconds = os.getenv("LLAMA_STACK_TEST_INTERVAL_SECONDS")

				    if interval_seconds:

				        time.sleep(float(interval_seconds))

				def pytest_configure(config):

				    config.option.tbstyle = "short"

				    config.option.disable_warnings = True

7773

docs/getting_started.ipynb

View file

File diff suppressed because one or more lines are too long

878

docs/getting_started_llama4.ipynb Normal file

View file

File diff suppressed because one or more lines are too long

909

docs/getting_started_llama_api.ipynb Normal file

View file

File diff suppressed because one or more lines are too long

2206

docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb

View file

File diff suppressed because one or more lines are too long

Compare commits

1198 commits v0.1.5.1rc ... main

12 .coveragerc Normal file Unescape Escape View file

2 .github/CODEOWNERS vendored Unescape Escape View file

30 .github/ISSUE_TEMPLATE/tech-debt.yml vendored Normal file Unescape Escape View file

10 .github/PULL_REQUEST_TEMPLATE.md vendored Unescape Escape View file

2 .github/TRIAGERS.md vendored Normal file Unescape Escape View file

88 .github/actions/run-and-record-tests/action.yml vendored Normal file Unescape Escape View file

23 .github/actions/setup-ollama/action.yml vendored Normal file Unescape Escape View file

43 .github/actions/setup-runner/action.yml vendored Normal file Unescape Escape View file

66 .github/actions/setup-test-environment/action.yml vendored Normal file Unescape Escape View file

27 .github/actions/setup-vllm/action.yml vendored Normal file Unescape Escape View file

33 .github/dependabot.yml vendored Normal file Unescape Escape View file

23 .github/workflows/README.md vendored Normal file Unescape Escape View file

31 .github/workflows/changelog.yml vendored Normal file Unescape Escape View file

355 .github/workflows/gha_workflow_llama_stack_tests.yml vendored Unescape Escape View file

39 .github/workflows/install-script-ci.yml vendored Normal file Unescape Escape View file

112 .github/workflows/integration-auth-tests.yml vendored Normal file Unescape Escape View file

72 .github/workflows/integration-sql-store-tests.yml vendored Normal file Unescape Escape View file

87 .github/workflows/integration-tests.yml vendored Normal file Unescape Escape View file

203 .github/workflows/integration-vector-io-tests.yml vendored Normal file Unescape Escape View file

79 .github/workflows/pre-commit.yml vendored Unescape Escape View file

154 .github/workflows/providers-build.yml vendored Normal file Unescape Escape View file

49 .github/workflows/python-build-test.yml vendored Normal file Unescape Escape View file

70 .github/workflows/record-integration-tests.yml vendored Normal file Unescape Escape View file

8 .github/workflows/semantic-pr.yml vendored Unescape Escape View file

47 .github/workflows/stale_bot.yml vendored Normal file Unescape Escape View file

86 .github/workflows/test-external-provider-module.yml vendored Normal file Unescape Escape View file

89 .github/workflows/test-external.yml vendored Normal file Unescape Escape View file

69 .github/workflows/tests.yml vendored Unescape Escape View file

55 .github/workflows/ui-unit-tests.yml vendored Normal file Unescape Escape View file

55 .github/workflows/unit-tests.yml vendored Normal file Unescape Escape View file

29 .github/workflows/update-readthedocs.yml vendored Unescape Escape View file

8 .gitignore vendored Unescape Escape View file

0 .gitmodules vendored Unescape Escape View file

160 .pre-commit-config.yaml Unescape Escape View file

1 .python-version Unescape Escape View file

31 .readthedocs.yaml Unescape Escape View file

516 CHANGELOG.md Normal file Unescape Escape View file

216 CONTRIBUTING.md Unescape Escape View file

9 MANIFEST.in Unescape Escape View file

175 README.md Unescape Escape View file

21 coverage.svg Normal file Unescape Escape View file

1 distributions/bedrock/build.yaml Unescape Escape View file

15 distributions/bedrock/compose.yaml Unescape Escape View file

1 distributions/bedrock/run.yaml Unescape Escape View file

1 distributions/cerebras/build.yaml Unescape Escape View file

16 distributions/cerebras/compose.yaml Unescape Escape View file

1 distributions/cerebras/run.yaml Unescape Escape View file

50 distributions/dell-tgi/compose.yaml Unescape Escape View file

44 distributions/dell-tgi/run.yaml Unescape Escape View file

625 distributions/dependencies.json Unescape Escape View file

1 distributions/fireworks/build.yaml Unescape Escape View file

14 distributions/fireworks/compose.yaml Unescape Escape View file

1 distributions/fireworks/run.yaml Unescape Escape View file

1 distributions/meta-reference-gpu/build.yaml Unescape Escape View file

34 distributions/meta-reference-gpu/compose.yaml Unescape Escape View file

1 distributions/meta-reference-gpu/run-with-safety.yaml Unescape Escape View file

1 distributions/meta-reference-gpu/run.yaml Unescape Escape View file

1 distributions/meta-reference-quantized-gpu/build.yaml Unescape Escape View file

35 distributions/meta-reference-quantized-gpu/compose.yaml Unescape Escape View file

58 distributions/meta-reference-quantized-gpu/run.yaml Unescape Escape View file

1 distributions/ollama/build.yaml Unescape Escape View file

71 distributions/ollama/compose.yaml Unescape Escape View file

18 distributions/ollama/pull-models.sh Unescape Escape View file

1 distributions/ollama/run-with-safety.yaml Unescape Escape View file

1 distributions/ollama/run.yaml Unescape Escape View file

1 distributions/remote-nvidia/build.yaml Unescape Escape View file

19 distributions/remote-nvidia/compose.yaml Unescape Escape View file

1 distributions/remote-nvidia/run.yaml Unescape Escape View file

1 distributions/remote-vllm/build.yaml Unescape Escape View file

100 distributions/remote-vllm/compose.yaml Unescape Escape View file

1 distributions/remote-vllm/run-with-safety.yaml Unescape Escape View file

1 distributions/remote-vllm/run.yaml Unescape Escape View file

9 distributions/runpod/build.yaml Unescape Escape View file

1 distributions/sambanova/build.yaml Unescape Escape View file

16 distributions/sambanova/compose.yaml Unescape Escape View file

1 distributions/sambanova/run.yaml Unescape Escape View file

1 distributions/tgi/build.yaml Unescape Escape View file

103 distributions/tgi/compose.yaml Unescape Escape View file

1198 commits

v0.1.5.1rc ... main

12

.coveragerc Normal file

View file

2

.github/CODEOWNERS vendored

View file

30

.github/ISSUE_TEMPLATE/tech-debt.yml vendored Normal file

View file

10

.github/PULL_REQUEST_TEMPLATE.md vendored

View file

2

.github/TRIAGERS.md vendored Normal file

View file

88

.github/actions/run-and-record-tests/action.yml vendored Normal file

View file

23

.github/actions/setup-ollama/action.yml vendored Normal file

View file

43

.github/actions/setup-runner/action.yml vendored Normal file

View file

66

.github/actions/setup-test-environment/action.yml vendored Normal file

View file

27

.github/actions/setup-vllm/action.yml vendored Normal file

View file

33

.github/dependabot.yml vendored Normal file

View file

23

.github/workflows/README.md vendored Normal file

View file

31

.github/workflows/changelog.yml vendored Normal file

View file

355

.github/workflows/gha_workflow_llama_stack_tests.yml vendored

View file

39

.github/workflows/install-script-ci.yml vendored Normal file

View file

112

.github/workflows/integration-auth-tests.yml vendored Normal file

View file

72

.github/workflows/integration-sql-store-tests.yml vendored Normal file

View file

87

.github/workflows/integration-tests.yml vendored Normal file

View file

203

.github/workflows/integration-vector-io-tests.yml vendored Normal file

View file

79

.github/workflows/pre-commit.yml vendored

View file

154

.github/workflows/providers-build.yml vendored Normal file

View file

49

.github/workflows/python-build-test.yml vendored Normal file

View file

70

.github/workflows/record-integration-tests.yml vendored Normal file

View file

8

.github/workflows/semantic-pr.yml vendored

View file

47

.github/workflows/stale_bot.yml vendored Normal file

View file

86

.github/workflows/test-external-provider-module.yml vendored Normal file

View file

89

.github/workflows/test-external.yml vendored Normal file

View file

69

.github/workflows/tests.yml vendored

View file

55

.github/workflows/ui-unit-tests.yml vendored Normal file

View file

55

.github/workflows/unit-tests.yml vendored Normal file

View file

29

.github/workflows/update-readthedocs.yml vendored

View file

8

.gitignore vendored

View file

0

.gitmodules vendored

View file

160

.pre-commit-config.yaml

View file

1

.python-version

View file

31

.readthedocs.yaml

View file

516

CHANGELOG.md Normal file

View file

216

CONTRIBUTING.md

View file

9

MANIFEST.in

View file

175

README.md

View file

21

coverage.svg Normal file

View file

1

distributions/bedrock/build.yaml

View file

15

distributions/bedrock/compose.yaml

View file

1

distributions/bedrock/run.yaml

View file

1

distributions/cerebras/build.yaml

View file

16

distributions/cerebras/compose.yaml

View file

1

distributions/cerebras/run.yaml

View file

50

distributions/dell-tgi/compose.yaml

View file

44

distributions/dell-tgi/run.yaml

View file

625

distributions/dependencies.json

View file

1

distributions/fireworks/build.yaml

View file

14

distributions/fireworks/compose.yaml

View file

1

distributions/fireworks/run.yaml

View file

1

distributions/meta-reference-gpu/build.yaml

View file

34

distributions/meta-reference-gpu/compose.yaml

View file

1

distributions/meta-reference-gpu/run-with-safety.yaml

View file

1

distributions/meta-reference-gpu/run.yaml

View file

1

distributions/meta-reference-quantized-gpu/build.yaml

View file

35

distributions/meta-reference-quantized-gpu/compose.yaml

View file

58

distributions/meta-reference-quantized-gpu/run.yaml

View file

1

distributions/ollama/build.yaml

View file

71

distributions/ollama/compose.yaml

View file

18

distributions/ollama/pull-models.sh

View file

1

distributions/ollama/run-with-safety.yaml

View file

1

distributions/ollama/run.yaml

View file

1

distributions/remote-nvidia/build.yaml

View file

19

distributions/remote-nvidia/compose.yaml

View file

1

distributions/remote-nvidia/run.yaml

View file

1

distributions/remote-vllm/build.yaml

View file

100

distributions/remote-vllm/compose.yaml

View file

1

distributions/remote-vllm/run-with-safety.yaml

View file

1

distributions/remote-vllm/run.yaml

View file

9

distributions/runpod/build.yaml

View file

1

distributions/sambanova/build.yaml

View file

16

distributions/sambanova/compose.yaml

View file

1

distributions/sambanova/run.yaml

View file

1

distributions/tgi/build.yaml

View file

103

distributions/tgi/compose.yaml

View file

1

distributions/tgi/run-with-safety.yaml

View file