Compare commits

...

186 commits

Author SHA1 Message Date
Ashwin Bharambe
9191005ca1
fix(ci): dump server/container logs when tests fail (#3873)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.12) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 17s
Test Llama Stack Build / generate-matrix (push) Successful in 20s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 36s
Test Llama Stack Build / build (push) Failing after 12s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Pre-commit / pre-commit (push) Successful in 2m5s
Output last 100 lines of server.log or docker container logs when
integration tests fail to aid debugging.
2025-10-20 22:28:55 -07:00
Ashwin Bharambe
0e96279bee
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer
targeted. We use the Responses implementation for knowledge_search which
uses the `openai_vector_stores` pathway.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 22:26:21 -07:00
Ashwin Bharambe
5aaf1a8bca
fix(ci): improve workflow logging and bot notifications (#3872)
## Summary
- Link pre-commit bot comment to workflow run instead of PR for better
debugging
- Dump docker container logs before removal to ensure logs are actually
captured

## Changes
1. **Pre-commit bot**: Changed the initial bot comment to link
"pre-commit hooks" text to the actual workflow run URL instead of just
having the PR number auto-link
2. **Docker logs**: Moved docker container log dumping from GitHub
Actions to the integration-tests.sh script's stop_container() function,
ensuring logs are captured before container removal

## Test plan
- Pre-commit bot comment will now have a clickable link to the workflow
run
- Docker container logs will be successfully captured in CI runs
2025-10-20 22:08:15 -07:00
Ashwin Bharambe
122de785c4
chore(cleanup)!: kill vector_db references as far as possible (#3864)
There should not be "vector db" anywhere.
2025-10-20 20:06:16 -07:00
ehhuang
444f6c88f3
chore: remove build.py (#3869)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
Test llama stack list-deps / list-deps-from-config (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 20s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 23s
Test llama stack list-deps / list-deps (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 57s
Pre-commit / pre-commit (push) Successful in 1m52s
# What does this PR do?


## Test Plan
CI
2025-10-20 16:28:15 -07:00
Charlie Doern
6a13a99e77
chore: add beta group to stainless (#3866)
# What does this PR do?

similarly to `alpha:` move `v1beta` routes under a `beta` group so the
client will have `client.beta`

From what I can tell, the openapi.stainless.yml file is hand written
while the openapi.yml file is generated and copied using the shell
script so I did this by hand.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-20 16:26:06 -07:00
ehhuang
407bade359
chore: migrate stack build (#3867)
# What does this PR do?
Just use editable install here. Not sure about the USE_COPY_NOT_MOUNT
that was used in original scripts and if that's needed.

## Test Plan
<img width="1008" height="587" alt="image"
src="https://github.com/user-attachments/assets/7ddf8e31-2635-45d3-b79c-1b898eefbf07"
/>

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3867).
* #3869
* __->__ #3867
2025-10-20 16:22:48 -07:00
ehhuang
ffeb86385c
chore: fix main (#3868)
# What does this PR do?
dup entry was added for some reason

## Test Plan
2025-10-20 16:01:03 -07:00
ehhuang
b215eb5944
chore: skip shutdown if otel_endpoint is not set (#3865)
# What does this PR do?
rid following error when ctrl+c'd server

│
/Users/erichuang/projects/lst3/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py:92
in │
│ shutdown │
│ │
│ 89 │ │ pass │
│ 90 │ │
│ 91 │ async def shutdown(self) -> None: │
│ ❱ 92 │ │ trace.get_tracer_provider().force_flush() │
│ 93 │ │
│ 94 │ async def log_event(self, event: Event, ttl_seconds: int =
604800) -> None: │
│ 95 │ │ if isinstance(event, UnstructuredLogEvent): │

╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'ProxyTracerProvider' object has no attribute
'force_flush'

## Test Plan
2025-10-20 15:48:37 -07:00
dependabot[bot]
d9274d199e
chore(ui-deps): bump @types/node from 24.3.0 to 24.8.1 in /llama_stack/ui (#3851)
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 24.3.0 to 24.8.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/node&package-manager=npm_and_yarn&previous-version=24.3.0&new-version=24.8.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:11:36 -07:00
dependabot[bot]
ec364499f5
chore(ui-deps): bump @tailwindcss/postcss from 4.1.6 to 4.1.14 in /llama_stack/ui (#3850)
Bumps
[@tailwindcss/postcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss)
from 4.1.6 to 4.1.14.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/releases"><code>@​tailwindcss/postcss</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v4.1.14</h2>
<h3>Fixed</h3>
<ul>
<li>Handle <code>'</code> syntax in ClojureScript when extracting
classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18888">#18888</a>)</li>
<li>Handle <code>@variant</code> inside <code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18885">#18885</a>)</li>
<li>Merge suggestions when using <code>@utility</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18900">#18900</a>)</li>
<li>Ensure that file system watchers created when using the CLI are
always cleaned up (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18905">#18905</a>)</li>
<li>Do not generate <code>grid-column</code> utilities when configuring
<code>grid-column-start</code> or <code>grid-column-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Do not generate <code>grid-row</code> utilities when configuring
<code>grid-row-start</code> or <code>grid-row-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Prevent duplicate CSS when overwriting a static utility with a theme
key (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18056">#18056</a>)</li>
<li>Show Lightning CSS warnings (if any) when optimizing/minifying (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18918">#18918</a>)</li>
<li>Use <code>default</code> export condition for
<code>@tailwindcss/vite</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18948">#18948</a>)</li>
<li>Re-throw errors from PostCSS nodes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18373">#18373</a>)</li>
<li>Detect classes in markdown inline directives (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18967">#18967</a>)</li>
<li>Ensure files with only <code>@theme</code> produce no output when
built (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18979">#18979</a>)</li>
<li>Support Maud templates when extracting classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18988">#18988</a>)</li>
<li>Upgrade: Do not migrate <code>variant = 'outline'</code> during
upgrades (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18922">#18922</a>)</li>
<li>Upgrade: Show version mismatch (if any) when running upgrade tool
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19028">#19028</a>)</li>
<li>Upgrade: Ensure first class inside <code>className</code> is
migrated (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
<li>Upgrade: Migrate classes inside <code>*ClassName</code> and
<code>*Class</code> attributes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
</ul>
<h2>v4.1.13</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>v4.1.12</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
<li>Fix trailing <code>)</code> from interfering with extraction in
Clojure keywords (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
<li>Detect classes inside Elixir charlist, word list, and string sigils
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18432">#18432</a>)</li>
<li>Track source locations through <code>@plugin</code> and
<code>@config</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18345">#18345</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md"><code>@​tailwindcss/postcss</code>'s
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.14] - 2025-10-01</h2>
<h3>Fixed</h3>
<ul>
<li>Handle <code>'</code> syntax in ClojureScript when extracting
classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18888">#18888</a>)</li>
<li>Handle <code>@variant</code> inside <code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18885">#18885</a>)</li>
<li>Merge suggestions when using <code>@utility</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18900">#18900</a>)</li>
<li>Ensure that file system watchers created when using the CLI are
always cleaned up (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18905">#18905</a>)</li>
<li>Do not generate <code>grid-column</code> utilities when configuring
<code>grid-column-start</code> or <code>grid-column-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Do not generate <code>grid-row</code> utilities when configuring
<code>grid-row-start</code> or <code>grid-row-end</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18907">#18907</a>)</li>
<li>Prevent duplicate CSS when overwriting a static utility with a theme
key (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18056">#18056</a>)</li>
<li>Show Lightning CSS warnings (if any) when optimizing/minifying (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18918">#18918</a>)</li>
<li>Use <code>default</code> export condition for
<code>@tailwindcss/vite</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18948">#18948</a>)</li>
<li>Re-throw errors from PostCSS nodes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18373">#18373</a>)</li>
<li>Detect classes in markdown inline directives (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18967">#18967</a>)</li>
<li>Ensure files with only <code>@theme</code> produce no output when
built (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18979">#18979</a>)</li>
<li>Support Maud templates when extracting classes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18988">#18988</a>)</li>
<li>Upgrade: Do not migrate <code>variant = 'outline'</code> during
upgrades (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18922">#18922</a>)</li>
<li>Upgrade: Show version mismatch (if any) when running upgrade tool
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19028">#19028</a>)</li>
<li>Upgrade: Ensure first class inside <code>className</code> is
migrated (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
<li>Upgrade: Migrate classes inside <code>*ClassName</code> and
<code>*Class</code> attributes (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/19031">#19031</a>)</li>
</ul>
<h2>[4.1.13] - 2025-09-03</h2>
<h3>Changed</h3>
<ul>
<li>Drop warning from browser build (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18731">#18731</a>)</li>
<li>Drop exact duplicate declarations when emitting CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/18809">#18809</a>)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Don't transition <code>visibility</code> when using
<code>transition</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18795">#18795</a>)</li>
<li>Discard matched variants with unknown named values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Discard matched variants with non-string values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18799">#18799</a>)</li>
<li>Show suggestions for known <code>matchVariant</code> values (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18798">#18798</a>)</li>
<li>Replace deprecated <code>clip</code> with <code>clip-path</code> in
<code>sr-only</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18769">#18769</a>)</li>
<li>Hide internal fields from completions in <code>matchUtilities</code>
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18820">#18820</a>)</li>
<li>Ignore <code>.vercel</code> folders by default (can be overridden by
<code>@source …</code> rules) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18855">#18855</a>)</li>
<li>Consider variants starting with <code>@-</code> to be invalid (e.g.
<code>@-2xl:flex</code>) (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18869">#18869</a>)</li>
<li>Do not allow custom variants to start or end with a <code>-</code>
or <code>_</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18867">#18867</a>,
<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18872">#18872</a>)</li>
<li>Upgrade: Migrate <code>aria</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18815">#18815</a>)</li>
<li>Upgrade: Migrate <code>data</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18816">#18816</a>)</li>
<li>Upgrade: Migrate <code>supports</code> theme keys to
<code>@custom-variant</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18817">#18817</a>)</li>
</ul>
<h2>[4.1.12] - 2025-08-13</h2>
<h3>Fixed</h3>
<ul>
<li>Don't consider the global important state in <code>@apply</code> (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18404">#18404</a>)</li>
<li>Add missing suggestions for <code>flex-&lt;number&gt;</code>
utilities (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/18642">#18642</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b67cbcf6cc"><code>b67cbcf</code></a>
Prepare v4.1.14 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19037">#19037</a>)</li>
<li><a
href="b497e1eaf3"><code>b497e1e</code></a>
Add <code>Upgrading from Tailwind CSS v…</code> when running upgrade
tool (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19026">#19026</a>)</li>
<li><a
href="210575a6a5"><code>210575a</code></a>
Update dedent 1.6.0 → 1.7.0 (minor) (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/19010">#19010</a>)</li>
<li><a
href="d0f7f82787"><code>d0f7f82</code></a>
Add plugin option documentation to the postcss plugin readme (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18940">#18940</a>)</li>
<li><a
href="5b8136e838"><code>5b8136e</code></a>
Re-throw errors from PostCSS nodes (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18373">#18373</a>)</li>
<li><a
href="1334c99db8"><code>1334c99</code></a>
Prepare v4.1.13 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18868">#18868</a>)</li>
<li><a
href="6791e8133c"><code>6791e81</code></a>
Prepare v4.1.12 release (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18728">#18728</a>)</li>
<li><a
href="492304212f"><code>4923042</code></a>
Allow users to disable url rewriting in the PostCSS plugin (<a
href="https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss/issues/18321">#18321</a>)</li>
<li><a
href="88b9f15b65"><code>88b9f15</code></a>
Center the dropdown icon added to an input with a paired datalist in
Chrome (...</li>
<li><a
href="9169d73aad"><code>9169d73</code></a>
update READMEs</li>
<li>Additional commits viewable in <a
href="https://github.com/tailwindlabs/tailwindcss/commits/v4.1.14/packages/@tailwindcss-postcss">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@tailwindcss/postcss&package-manager=npm_and_yarn&previous-version=4.1.6&new-version=4.1.14)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:11:24 -07:00
dependabot[bot]
6a74894e22
chore(python-deps): bump fastapi from 0.116.1 to 0.119.0 (#3845)
Bumps [fastapi](https://github.com/fastapi/fastapi) from 0.116.1 to
0.119.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fastapi/fastapi/releases">fastapi's
releases</a>.</em></p>
<blockquote>
<h2>0.119.0</h2>
<p>FastAPI now (temporarily) supports both Pydantic v2 models and
<code>pydantic.v1</code> models at the same time in the same app, to
make it easier for any FastAPI apps still using Pydantic v1 to gradually
but quickly <strong>migrate to Pydantic v2</strong>.</p>
<pre lang="Python"><code>from fastapi import FastAPI
from pydantic import BaseModel as BaseModelV2
from pydantic.v1 import BaseModel
<p>class Item(BaseModel):<br />
name: str<br />
description: str | None = None</p>
<p>class ItemV2(BaseModelV2):<br />
title: str<br />
summary: str | None = None</p>
<p>app = FastAPI()</p>
<p><a
href="https://github.com/app"><code>@​app</code></a>.post(&quot;/items/&quot;,
response_model=ItemV2)<br />
def create_item(item: Item):<br />
return {&quot;title&quot;: item.name, &quot;summary&quot;:
item.description}<br />
</code></pre></p>
<p>Adding this feature was a big effort with the main objective of
making it easier for the few applications still stuck in Pydantic v1 to
migrate to Pydantic v2.</p>
<p>And with this, support for <strong>Pydantic v1 is now
deprecated</strong> and will be <strong>removed</strong> from FastAPI in
a future version soon.</p>
<p><strong>Note</strong>: have in mind that the Pydantic team already
stopped supporting Pydantic v1 for recent versions of Python, starting
with Python 3.14.</p>
<p>You can read in the docs more about how to <a
href="https://fastapi.tiangolo.com/how-to/migrate-from-pydantic-v1-to-pydantic-v2/">Migrate
from Pydantic v1 to Pydantic v2</a>.</p>
<h3>Features</h3>
<ul>
<li> Add support for <code>from pydantic.v1 import BaseModel</code>,
mixed Pydantic v1 and v2 models in the same app. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14168">#14168</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h2>0.118.3</h2>
<h3>Upgrades</h3>
<ul>
<li>⬆️ Add support for Python 3.14. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14165">#14165</a>
by <a
href="https://github.com/svlandeg"><code>@​svlandeg</code></a>.</li>
</ul>
<h2>0.118.2</h2>
<h3>Fixes</h3>
<ul>
<li>🐛 Fix tagged discriminated union not recognized as body field. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/12942">#12942</a>
by <a
href="https://github.com/frankie567"><code>@​frankie567</code></a>.</li>
</ul>
<h3>Internal</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2e721e1b02"><code>2e721e1</code></a>
🔖 Release version 0.119.0</li>
<li><a
href="fc7a0686af"><code>fc7a068</code></a>
📝 Update release notes</li>
<li><a
href="3a3879b2c3"><code>3a3879b</code></a>
📝 Update release notes</li>
<li><a
href="d34918abf0"><code>d34918a</code></a>
 Add support for <code>from pydantic.v1 import BaseModel</code>, mixed
Pydantic v1 and ...</li>
<li><a
href="352dbefc63"><code>352dbef</code></a>
🔖 Release version 0.118.3</li>
<li><a
href="96e7d6eaa4"><code>96e7d6e</code></a>
📝 Update release notes</li>
<li><a
href="3611c3fc5b"><code>3611c3f</code></a>
⬆️ Add support for Python 3.14 (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14165">#14165</a>)</li>
<li><a
href="942fce394b"><code>942fce3</code></a>
🔖 Release version 0.118.2</li>
<li><a
href="13b067c9b6"><code>13b067c</code></a>
📝 Update release notes</li>
<li><a
href="185cecd891"><code>185cecd</code></a>
🐛 Fix tagged discriminated union not recognized as body field (<a
href="https://redirect.github.com/fastapi/fastapi/issues/12942">#12942</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/fastapi/fastapi/compare/0.116.1...0.119.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=fastapi&package-manager=uv&previous-version=0.116.1&new-version=0.119.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:11:11 -07:00
dependabot[bot]
5aafce4ff3
chore(python-deps): bump weaviate-client from 4.16.9 to 4.17.0 (#3844)
Bumps
[weaviate-client](https://github.com/weaviate/weaviate-python-client)
from 4.16.9 to 4.17.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/releases">weaviate-client's
releases</a>.</em></p>
<blockquote>
<h2>v4.16.10</h2>
<h2>What's Changed</h2>
<ul>
<li>Add uncompressed quantitizer factory by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1800">weaviate/weaviate-python-client#1800</a></li>
<li>Add support for groups by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1778">weaviate/weaviate-python-client#1778</a></li>
<li>feat: add overwrite_alias to backup restore by <a
href="https://github.com/bevzzz"><code>@​bevzzz</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1808">weaviate/weaviate-python-client#1808</a></li>
<li>Add Multi2vec-aws and text2vec-morph by <a
href="https://github.com/dirkkul"><code>@​dirkkul</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1820">weaviate/weaviate-python-client#1820</a></li>
<li>Add support for exists on aliases. by <a
href="https://github.com/jfrancoa"><code>@​jfrancoa</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1813">weaviate/weaviate-python-client#1813</a></li>
<li>Add note re GPT4All deprecation by <a
href="https://github.com/databyjp"><code>@​databyjp</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1825">weaviate/weaviate-python-client#1825</a></li>
<li>Update setup.cfg with min weaviate agents version by <a
href="https://github.com/cdpierse"><code>@​cdpierse</code></a> in <a
href="https://redirect.github.com/weaviate/weaviate-python-client/pull/1826">weaviate/weaviate-python-client#1826</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.9...v4.16.10">https://github.com/weaviate/weaviate-python-client/compare/v4.16.9...v4.16.10</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/weaviate/weaviate-python-client/blob/main/docs/changelog.rst">weaviate-client's
changelog</a>.</em></p>
<blockquote>
<h2>Version 4.17.0</h2>
<p>This minor version includes:
- Remove support for Weaviate versions &lt; 1.27. Please update your
Weaviate instances
- Support for new 1.33 features:
- OIDC group support in RBAC
- Uncompressed quantizer
- ContainsNone and Not filter operators
- Add support for <code>verbosity</code> and <code>reasoning
effort</code> for generative-openai module
- Add alias.exists method
- Add multi2vec-aws and text2vec-morph modules
- Add support for max_tokens for generative-aws module
- Fix weaviate client installation with other packages depending on
grpc-health-checking</p>
<h2>Version 4.16.10</h2>
<p>This patch version includes:
- Addition of helper to create an uncompressed quantizer for use when
not using default compression
- Support for <code>overwrite_alias</code> option to backup
create/restore
- Support for OIDC groups
- Addition of <code>multi2vec-aws</code> and <code>text2vec-morph</code>
modules
- Support for <code>alias.exists</code> method
- Update to <code>weaviate-agents-client</code> dependency for GA
release of agents</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7acf5c096a"><code>7acf5c0</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1838">#1838</a>
from weaviate/fix_tests</li>
<li><a
href="960559d788"><code>960559d</code></a>
Remove unneeded version checks</li>
<li><a
href="7cc1861b6c"><code>7cc1861</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1837">#1837</a>
from weaviate/changelog_417</li>
<li><a
href="3e124e9dfc"><code>3e124e9</code></a>
Small cleanup in version checking</li>
<li><a
href="e1859f17a7"><code>e1859f1</code></a>
Add changelog for 4.17.0</li>
<li><a
href="1e71c7832e"><code>1e71c78</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1827">#1827</a>
from weaviate/gen_openai_params</li>
<li><a
href="9a4bedfc7b"><code>9a4bedf</code></a>
Fix enum selection</li>
<li><a
href="033542fa8c"><code>033542f</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1824">#1824</a>
from weaviate/dependabot/pip/pydoclint-0.7.3</li>
<li><a
href="158889e6d4"><code>158889e</code></a>
Merge pull request <a
href="https://redirect.github.com/weaviate/weaviate-python-client/issues/1823">#1823</a>
from weaviate/dependabot/pip/polars-gte-0.20.26-and-...</li>
<li><a
href="65191bb1e4"><code>65191bb</code></a>
Merge branch 'dev/1.33'</li>
<li>Additional commits viewable in <a
href="https://github.com/weaviate/weaviate-python-client/compare/v4.16.9...v4.17.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=weaviate-client&package-manager=uv&previous-version=4.16.9&new-version=4.17.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 15:10:31 -07:00
ehhuang
5678c25b9d
chore: remove dead code (#3863)
# What does this PR do?


## Test Plan
2025-10-20 15:04:57 -07:00
dependabot[bot]
7294385df3
chore(github-deps): bump actions/setup-node from 5.0.0 to 6.0.0 (#3843)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from
5.0.0 to 6.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<p><strong>Breaking Changes</strong></p>
<ul>
<li>Limit automatic caching to npm, update workflows and documentation
by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1374">actions/setup-node#1374</a></li>
</ul>
<p><strong>Dependency Upgrades</strong></p>
<ul>
<li>Upgrade ts-jest from 29.1.2 to 29.4.1 and document breaking changes
in v5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1336">#1336</a></li>
<li>Upgrade prettier from 2.8.8 to 3.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1334">#1334</a></li>
<li>Upgrade actions/publish-action from 0.3.0 to 0.4.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1362">#1362</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v5...v6.0.0">https://github.com/actions/setup-node/compare/v5...v6.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2028fbc5c2"><code>2028fbc</code></a>
Limit automatic caching to npm, update workflows and documentation (<a
href="https://redirect.github.com/actions/setup-node/issues/1374">#1374</a>)</li>
<li><a
href="13427813f7"><code>1342781</code></a>
Bump actions/publish-action from 0.3.0 to 0.4.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1362">#1362</a>)</li>
<li><a
href="89d709d423"><code>89d709d</code></a>
Bump prettier from 2.8.8 to 3.6.2 (<a
href="https://redirect.github.com/actions/setup-node/issues/1334">#1334</a>)</li>
<li><a
href="cd2651c462"><code>cd2651c</code></a>
Bump ts-jest from 29.1.2 to 29.4.1 (<a
href="https://redirect.github.com/actions/setup-node/issues/1336">#1336</a>)</li>
<li>See full diff in <a
href="a0853c2454...2028fbc5c2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=5.0.0&new-version=6.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 14:59:39 -07:00
dependabot[bot]
8943335e0b
chore(github-deps): bump astral-sh/setup-uv from 7.0.0 to 7.1.0 (#3842)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
7.0.0 to 7.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.0 🌈 Support all the use cases</h2>
<h2>Changes</h2>
<p><strong>Support all the use cases!!!</strong>
... well, that we know of.</p>
<p>This release adds support for some use cases that most users don't
encounter but are useful for e.g. people running Gitea.</p>
<p>The input <code>resolution-strategy</code> lets you use the lowest
possible version of uv from a version range. Useful if you want to test
your tool with different versions of uv.</p>
<p>If you use <code>activate-environment</code> the path to the
activated venv is now also exposed under the output
<code>venv</code>.</p>
<p>Downloaded python installations can now also be uploaded to the
GitHub Actions cache backend. Useful if you are running in
<code>act</code> and have configured your own backend and don't want to
download python again, and again over a slow internet connection.</p>
<p>Finally the path to installed python interpreters is now added to the
<code>PATH</code> on Windows.</p>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add resolution-strategy input to support oldest compatible version
selection @<a
href="https://github.com/apps/copilot-swe-agent">copilot-swe-agent[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/631">#631</a>)</li>
<li>Add value of UV_PYTHON_INSTALL_DIR to path <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/628">#628</a>)</li>
<li>Set output venv when activate-environment is used <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/627">#627</a>)</li>
<li>Cache python installs <a
href="https://github.com/merlinz01"><code>@​merlinz01</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/621">#621</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>Add copilot-instructions.md <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/630">#630</a>)</li>
<li>chore: update known checksums for 0.9.2 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/626">#626</a>)</li>
<li>chore: update known checksums for 0.9.1 @<a
href="https://github.com/apps/github-actions">github-actions[bot]</a>
(<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/625">#625</a>)</li>
<li>Fall back to PR for updating known versions <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/623">#623</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Split up documentation <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/632">#632</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>Bump deps <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/633">#633</a>)</li>
<li>Bump github/codeql-action from 3.30.6 to 4.30.7 @<a
href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/614">#614</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3259c6206f"><code>3259c62</code></a>
Bump deps (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/633">#633</a>)</li>
<li><a
href="bf8e8ed895"><code>bf8e8ed</code></a>
Split up documentation (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/632">#632</a>)</li>
<li><a
href="9c6b5e9fb5"><code>9c6b5e9</code></a>
Add resolution-strategy input to support oldest compatible version
selection ...</li>
<li><a
href="a5129e99f4"><code>a5129e9</code></a>
Add copilot-instructions.md (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/630">#630</a>)</li>
<li><a
href="d18bcc753a"><code>d18bcc7</code></a>
Add value of UV_PYTHON_INSTALL_DIR to path (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/628">#628</a>)</li>
<li><a
href="bd1f875aba"><code>bd1f875</code></a>
Set output venv when activate-environment is used (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/627">#627</a>)</li>
<li><a
href="1a91c3851d"><code>1a91c38</code></a>
chore: update known checksums for 0.9.2 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/626">#626</a>)</li>
<li><a
href="c79f606987"><code>c79f606</code></a>
chore: update known checksums for 0.9.1 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/625">#625</a>)</li>
<li><a
href="e0249f1599"><code>e0249f1</code></a>
Fall back to PR for updating known versions (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/623">#623</a>)</li>
<li><a
href="6d2eb15b49"><code>6d2eb15</code></a>
Cache python installs (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/621">#621</a>)</li>
<li>Additional commits viewable in <a
href="eb1897b8dc...3259c6206f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=7.0.0&new-version=7.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 14:59:35 -07:00
dependabot[bot]
e7f4ddcc86
chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 (#3841)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2
to 5.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
<li>Prepare release v4.3.0 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/motss"><code>@​motss</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li><a href="https://github.com/mouismail"><code>@​mouismail</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li><a href="https://github.com/benwells"><code>@​benwells</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
<li>README: Suggest <code>user.email</code> to be
<code>41898282+github-actions[bot]@users.noreply.github.com</code> by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1707">actions/checkout#1707</a></li>
</ul>
<h2>v4.1.4</h2>
<ul>
<li>Disable <code>extensions.worktreeConfig</code> when disabling
<code>sparse-checkout</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1692">actions/checkout#1692</a></li>
<li>Add dependabot config by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1688">actions/checkout#1688</a></li>
<li>Bump the minor-actions-dependencies group with 2 updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1693">actions/checkout#1693</a></li>
<li>Bump word-wrap from 1.2.3 to 1.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1643">actions/checkout#1643</a></li>
</ul>
<h2>v4.1.3</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a
href="9f265659d3"><code>9f26565</code></a>
Update actions checkout to use node 24 (<a
href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li><a
href="08eba0b27e"><code>08eba0b</code></a>
Prepare release v4.3.0 (<a
href="https://redirect.github.com/actions/checkout/issues/2237">#2237</a>)</li>
<li><a
href="631c7dc4f8"><code>631c7dc</code></a>
Update package dependencies (<a
href="https://redirect.github.com/actions/checkout/issues/2236">#2236</a>)</li>
<li><a
href="8edcb1bdb4"><code>8edcb1b</code></a>
Update CODEOWNERS for actions (<a
href="https://redirect.github.com/actions/checkout/issues/2224">#2224</a>)</li>
<li><a
href="09d2acae67"><code>09d2aca</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/2194">#2194</a>)</li>
<li><a
href="85e6279cec"><code>85e6279</code></a>
Adjust positioning of user email note and permissions heading (<a
href="https://redirect.github.com/actions/checkout/issues/2044">#2044</a>)</li>
<li><a
href="009b9ae9e4"><code>009b9ae</code></a>
Documentation update - add recommended permissions to Readme (<a
href="https://redirect.github.com/actions/checkout/issues/2043">#2043</a>)</li>
<li><a
href="cbb722410c"><code>cbb7224</code></a>
Update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1977">#1977</a>)</li>
<li><a
href="3b9b8c884f"><code>3b9b8c8</code></a>
docs: update README.md (<a
href="https://redirect.github.com/actions/checkout/issues/1971">#1971</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/checkout/compare/v4.2.2...08c6903cd8c0fde910a37f88322edcfb5dd907a8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4.2.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 14:59:28 -07:00
ehhuang
ab2d5febb4
chore: install client first (#3862)
# What does this PR do?
mirrors build_container.sh

trying to resolve: 

0.105 + [ editable = editable ]
0.105 + [ ! -d /workspace/llama-stack ]
0.105 + uv pip install --no-cache-dir -e /workspace/llama-stack
0.261 Using Python 3.12.12 environment at: /usr/local
0.479   × No solution found when resolving dependencies:
0.479   ╰─▶ Because only llama-stack-client<=0.2.23 is available and
0.479 llama-stack==0.3.0rc4 depends on llama-stack-client>=0.3.0rc4, we
can
0.479       conclude that llama-stack==0.3.0rc4 cannot be used.
0.479 And because only llama-stack==0.3.0rc4 is available and you
require
0.479 llama-stack, we can conclude that your requirements are
unsatisfiable.
------

## Test Plan
2025-10-20 14:56:45 -07:00
Ashwin Bharambe
94faec7bc5
chore(yaml)!: move registered resources to a sub-key (#3861)
**NOTE: this is a backwards incompatible change to the run-configs.**

A small QOL update, but this will prove useful when I do a rename for
"vector_dbs" to "vector_stores" next.

Moves all the `models, shields, ...` keys in run-config under a
`registered_resources` sub-key.
2025-10-20 14:52:48 -07:00
Ashwin Bharambe
483d53cc37
feat(stainless): add stainless source of truth config (#3860)
Source of truth for Stainless should be in this repository.

This was long due.
2025-10-20 14:32:20 -07:00
Francisco Arceo
48581bf651
chore: Updating how default embedding model is set in stack (#3818)
# What does this PR do?

Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).

New config is simply (default for Starter distro):

```yaml
vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: sentence-transformers
    model_id: nomic-ai/nomic-embed-text-v1.5
```

## Test Plan
CI and Unit tests.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-20 14:22:45 -07:00
Ashwin Bharambe
2c43285e22
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.**

Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.

## Key Changes

- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.

## Migration

Before:
```yaml
metadata_store:
  type: sqlite
  db_path: ~/.llama/distributions/foo/registry.db
inference_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
conversations_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
```

After:
```yaml
storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/distributions/foo/kvstore.db
    sql_default:
      type: sql_postgres
      host: ${env.POSTGRES_HOST}
      port: ${env.POSTGRES_PORT}
      db: ${env.POSTGRES_DB}
      user: ${env.POSTGRES_USER}
      password: ${env.POSTGRES_PASSWORD}
  stores:
    metadata:
      backend: kv_default
      namespace: registry
    inference:
      backend: sql_default
      table_name: inference_store
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      backend: sql_default
      table_name: openai_conversations
```

Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      kvstore:
        type: sqlite
        db_path: ~/.llama/distributions/foo/chroma.db
```

to:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      persistence:
        backend: kv_default
        namespace: vector_io::chroma_remote
```

Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
2025-10-20 13:20:09 -07:00
Shabana Baig
add64e8e2a
feat: Add instructions parameter in response object (#3741)
# Problem
The current inline provider appends the user provided instructions to
messages as a system prompt, but the returned response object does not
contain the instructions field (as specified in the OpenAI responses
spec).

# What does this PR do?
This pull request adds the instruction field to the response object
definition and updates the inline provider. It also ensures that
instructions from previous response is not carried over to the next
response (as specified in the openAI spec).

Closes #[3566](https://github.com/llamastack/llama-stack/issues/3566)

## Test Plan

- Tested manually for change in model response w.r.t supplied
instructions field.
- Added unit test to check that the instructions from previous response
is not carried over to the next response.
- Added integration tests to check instructions parameter in the
returned response object.
- Added new recordings for the integration tests.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 13:10:37 -07:00
Derek Higgins
1f38359d95
fix: nested claims mapping in OAuth2 token validation (#3814)
fix: nested claims mapping in OAuth2 token validation
    
The get_attributes_from_claims function was only checking for top-level
claim keys, causing token validation to fail when using nested claims
like "resource_access.llamastack.roles" (common in Keycloak JWT tokens).
    
Updated the function to support dot notation for traversing nested claim
structures. Give precedence to dot notation over literal keys with dots
in claims mapping.
    
Added test coverage.
    
Closes: #3812

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-20 12:34:55 -07:00
dependabot[bot]
08cbb69ef7
chore(python-deps): bump sqlalchemy from 2.0.41 to 2.0.44 (#3848)
Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.41
to 2.0.44.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/sqlalchemy/sqlalchemy/releases">sqlalchemy's
releases</a>.</em></p>
<blockquote>
<h1>2.0.44</h1>
<p>Released: October 10, 2025</p>
<h2>platform</h2>
<ul>
<li><strong>[platform] [bug]</strong> Unblocked automatic greenlet
installation for Python 3.14 now that
there are greenlet wheels on pypi for python 3.14.</li>
</ul>
<h2>orm</h2>
<ul>
<li>
<p><strong>[orm] [usecase]</strong> The way ORM Annotated Declarative
interprets Python <a href="https://peps.python.org/pep-0695">PEP 695</a>
type aliases
in <code>Mapped[]</code> annotations has been refined to expand the
lookup scheme. A
<a href="https://peps.python.org/pep-0695">PEP 695</a> type can now be
resolved based on either its direct presence in
<code>_orm.registry.type_annotation_map</code> or its immediate resolved
value, as long as a recursive lookup across multiple <a
href="https://peps.python.org/pep-0695">PEP 695</a> types is
not required for it to resolve. This change reverses part of the
restrictions introduced in 2.0.37 as part of <a
href="https://www.sqlalchemy.org/trac/ticket/11955">#11955</a>, which
deprecated (and disallowed in 2.1) the ability to resolve any <a
href="https://peps.python.org/pep-0695">PEP 695</a>
type that was not explicitly present in
<code>_orm.registry.type_annotation_map</code>. Recursive lookups of
<a href="https://peps.python.org/pep-0695">PEP 695</a> types remains
deprecated in 2.0 and disallowed in version 2.1,
as do implicit lookups of <code>NewType</code> types without an entry in
<code>_orm.registry.type_annotation_map</code>.</p>
<p>Additionally, new support has been added for generic <a
href="https://peps.python.org/pep-0695">PEP 695</a> aliases that
refer to <a href="https://peps.python.org/pep-0593">PEP 593</a>
<code>Annotated</code> constructs containing
<code>_orm.mapped_column()</code> configurations. See the sections below
for
examples.</p>
<p>References: <a
href="https://www.sqlalchemy.org/trac/ticket/12829">#12829</a></p>
</li>
<li>
<p><strong>[orm] [bug]</strong> Fixed a caching issue where
<code>_orm.with_loader_criteria()</code> would
incorrectly reuse cached bound parameter values when used with
<code>_sql.CompoundSelect</code> constructs such as
<code>_sql.union()</code>. The
issue was caused by the cache key for compound selects not including the
execution options that are part of the <code>_sql.Executable</code> base
class,
which <code>_orm.with_loader_criteria()</code> uses to apply its
criteria
dynamically. The fix ensures that compound selects and other executable
constructs properly include execution options in their cache key
traversal.</p>
<p>References: <a
href="https://www.sqlalchemy.org/trac/ticket/12905">#12905</a></p>
</li>
</ul>
<h2>engine</h2>
<ul>
<li><strong>[engine] [bug]</strong> Implemented initial support for
free-threaded Python by adding new tests
and reworking the test harness to include Python 3.13t and Python 3.14t
in</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/sqlalchemy/sqlalchemy/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sqlalchemy&package-manager=uv&previous-version=2.0.41&new-version=2.0.44)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 12:34:11 -07:00
dependabot[bot]
112a974005
chore(python-deps): bump ruff from 0.9.10 to 0.14.1 (#3846)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.9.10 to 0.14.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.14.1</h2>
<h2>Release Notes</h2>
<p>Released on 2025-10-16.</p>
<h3>Preview features</h3>
<ul>
<li>[formatter] Remove parentheses around multiple exception types on
Python 3.14+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20768">#20768</a>)</li>
<li>[<code>flake8-bugbear</code>] Omit annotation in preview fix for
<code>B006</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20877">#20877</a>)</li>
<li>[<code>flake8-logging-format</code>] Avoid dropping implicitly
concatenated pieces in the <code>G004</code> fix (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20793">#20793</a>)</li>
<li>[<code>pydoclint</code>] Implement
<code>docstring-extraneous-parameter</code> (<code>DOC102</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20376">#20376</a>)</li>
<li>[<code>pyupgrade</code>] Extend <code>UP019</code> to detect
<code>typing_extensions.Text</code> (<code>UP019</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20825">#20825</a>)</li>
<li>[<code>pyupgrade</code>] Fix false negative for <code>TypeVar</code>
with default argument in <code>non-pep695-generic-class</code>
(<code>UP046</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20660">#20660</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Fix false negatives in <code>Truthiness::from_expr</code> for
lambdas, generators, and f-strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20704">#20704</a>)</li>
<li>Fix syntax error false positives for escapes and quotes in f-strings
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20867">#20867</a>)</li>
<li>Fix syntax error false positives on parenthesized context managers
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20846">#20846</a>)</li>
<li>[<code>fastapi</code>] Fix false positives for path parameters that
FastAPI doesn't recognize (<code>FAST003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20687">#20687</a>)</li>
<li>[<code>flake8-pyi</code>] Fix operator precedence by adding
parentheses when needed (<code>PYI061</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20508">#20508</a>)</li>
<li>[<code>ruff</code>] Suppress diagnostic for f-string interpolations
with debug text (<code>RUF010</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20525">#20525</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>airflow</code>] Add warning to
<code>airflow.datasets.DatasetEvent</code> usage (<code>AIR301</code>)
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20551">#20551</a>)</li>
<li>[<code>flake8-bugbear</code>] Mark <code>B905</code> and
<code>B912</code> fixes as unsafe (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20695">#20695</a>)</li>
<li>Use <code>DiagnosticTag</code> for more rules - changes display in
editors (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20758">#20758</a>,<a
href="https://redirect.github.com/astral-sh/ruff/pull/20734">#20734</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Update Python compatibility from 3.13 to 3.14 in README.md (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20852">#20852</a>)</li>
<li>Update <code>lint.flake8-type-checking.quoted-annotations</code>
docs (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20765">#20765</a>)</li>
<li>Update setup instructions for Zed 0.208.0+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20902">#20902</a>)</li>
<li>[<code>flake8-datetimez</code>] Clarify docs for several rules (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20778">#20778</a>)</li>
<li>Fix typo in <code>RUF015</code> description (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20873">#20873</a>)</li>
</ul>
<h3>Other changes</h3>
<ul>
<li>Reduce binary size (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20863">#20863</a>)</li>
<li>Improved error recovery for unclosed strings (including f- and
t-strings) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20848">#20848</a>)</li>
</ul>
<h3>Contributors</h3>
<ul>
<li><a href="https://github.com/ntBre"><code>@​ntBre</code></a></li>
<li><a
href="https://github.com/Paillat-dev"><code>@​Paillat-dev</code></a></li>
<li><a href="https://github.com/terror"><code>@​terror</code></a></li>
<li><a
href="https://github.com/pieterh-oai"><code>@​pieterh-oai</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a href="https://github.com/TaKO8Ki"><code>@​TaKO8Ki</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.14.1</h2>
<p>Released on 2025-10-16.</p>
<h3>Preview features</h3>
<ul>
<li>[formatter] Remove parentheses around multiple exception types on
Python 3.14+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20768">#20768</a>)</li>
<li>[<code>flake8-bugbear</code>] Omit annotation in preview fix for
<code>B006</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20877">#20877</a>)</li>
<li>[<code>flake8-logging-format</code>] Avoid dropping implicitly
concatenated pieces in the <code>G004</code> fix (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20793">#20793</a>)</li>
<li>[<code>pydoclint</code>] Implement
<code>docstring-extraneous-parameter</code> (<code>DOC102</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20376">#20376</a>)</li>
<li>[<code>pyupgrade</code>] Extend <code>UP019</code> to detect
<code>typing_extensions.Text</code> (<code>UP019</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20825">#20825</a>)</li>
<li>[<code>pyupgrade</code>] Fix false negative for <code>TypeVar</code>
with default argument in <code>non-pep695-generic-class</code>
(<code>UP046</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20660">#20660</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Fix false negatives in <code>Truthiness::from_expr</code> for
lambdas, generators, and f-strings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20704">#20704</a>)</li>
<li>Fix syntax error false positives for escapes and quotes in f-strings
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20867">#20867</a>)</li>
<li>Fix syntax error false positives on parenthesized context managers
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20846">#20846</a>)</li>
<li>[<code>fastapi</code>] Fix false positives for path parameters that
FastAPI doesn't recognize (<code>FAST003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20687">#20687</a>)</li>
<li>[<code>flake8-pyi</code>] Fix operator precedence by adding
parentheses when needed (<code>PYI061</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20508">#20508</a>)</li>
<li>[<code>ruff</code>] Suppress diagnostic for f-string interpolations
with debug text (<code>RUF010</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20525">#20525</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>airflow</code>] Add warning to
<code>airflow.datasets.DatasetEvent</code> usage (<code>AIR301</code>)
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/20551">#20551</a>)</li>
<li>[<code>flake8-bugbear</code>] Mark <code>B905</code> and
<code>B912</code> fixes as unsafe (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20695">#20695</a>)</li>
<li>Use <code>DiagnosticTag</code> for more rules - changes display in
editors (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20758">#20758</a>,<a
href="https://redirect.github.com/astral-sh/ruff/pull/20734">#20734</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Update Python compatibility from 3.13 to 3.14 in README.md (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20852">#20852</a>)</li>
<li>Update <code>lint.flake8-type-checking.quoted-annotations</code>
docs (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20765">#20765</a>)</li>
<li>Update setup instructions for Zed 0.208.0+ (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20902">#20902</a>)</li>
<li>[<code>flake8-datetimez</code>] Clarify docs for several rules (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20778">#20778</a>)</li>
<li>Fix typo in <code>RUF015</code> description (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20873">#20873</a>)</li>
</ul>
<h3>Other changes</h3>
<ul>
<li>Reduce binary size (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20863">#20863</a>)</li>
<li>Improved error recovery for unclosed strings (including f- and
t-strings) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/20848">#20848</a>)</li>
</ul>
<h3>Contributors</h3>
<ul>
<li><a href="https://github.com/ntBre"><code>@​ntBre</code></a></li>
<li><a
href="https://github.com/Paillat-dev"><code>@​Paillat-dev</code></a></li>
<li><a href="https://github.com/terror"><code>@​terror</code></a></li>
<li><a
href="https://github.com/pieterh-oai"><code>@​pieterh-oai</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a href="https://github.com/TaKO8Ki"><code>@​TaKO8Ki</code></a></li>
<li><a
href="https://github.com/ageorgou"><code>@​ageorgou</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2bffef5966"><code>2bffef5</code></a>
Bump 0.14.1 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20925">#20925</a>)</li>
<li><a
href="e64d772788"><code>e64d772</code></a>
Standardize syntax error construction (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20903">#20903</a>)</li>
<li><a
href="03696687ea"><code>0369668</code></a>
[<code>pydoclint</code>] Implement
<code>docstring-extraneous-parameter</code> (<code>DOC102</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20376">#20376</a>)</li>
<li><a
href="058fc37542"><code>058fc37</code></a>
[ty] Fix panic 'missing root' when handling completion request (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20917">#20917</a>)</li>
<li><a
href="ec9faa34be"><code>ec9faa3</code></a>
[ty] Run file watching tests serial when using nextest (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20918">#20918</a>)</li>
<li><a
href="7155a62e5c"><code>7155a62</code></a>
[ty] Add version hint for failed stdlib attribute accesses (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20909">#20909</a>)</li>
<li><a
href="a67e0690f2"><code>a67e069</code></a>
More CI improvements (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20920">#20920</a>)</li>
<li><a
href="6a1e91ce97"><code>6a1e91c</code></a>
[ty] Check typeshed VERSIONS for parent modules when reporting failed
stdlib ...</li>
<li><a
href="3db5d5906e"><code>3db5d59</code></a>
Don't use codspeed or depot runners in CI jobs on forks (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20894">#20894</a>)</li>
<li><a
href="d23826ce46"><code>d23826c</code></a>
[ty] cache Type::is_redundant_with (<a
href="https://redirect.github.com/astral-sh/ruff/issues/20477">#20477</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.9.10...0.14.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=uv&previous-version=0.9.10&new-version=0.14.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-20 12:33:44 -07:00
ehhuang
9936f33f7e
chore: disable telemetry if otel endpoint isn't set (#3859)
# What does this PR do?

removes error:
ConnectionError: HTTPConnectionPool(host='localhost', port=4318): Max
retries exceeded with url: /v1/traces
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object
at 0x10fd98e60>: Failed to establish a
         new connection: [Errno 61] Connection refused'))


## Test Plan
uv run llama stack run starter
curl http://localhost:8321/v1/models
observe no error in server logs
2025-10-20 11:42:57 -07:00
ehhuang
359df3a37c
chore: update doc (#3857)
# What does this PR do?
follows https://github.com/llamastack/llama-stack/pull/3839

## Test Plan
2025-10-20 10:33:21 -07:00
ehhuang
21772de5d3
chore: use dockerfile for building containers (#3839)
# What does this PR do?

relates to #2878 

We introduce a Containerfile which is used to replaced the `llama stack
build` command (removal in a separate PR).

```
llama stack build --distro starter --image-type venv --run
```
is replaced by
```
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```


- See the updated workflow files for e2e workflow.

## Test Plan
CI
```
❯ docker build . -f docker/Dockerfile --build-arg DISTRO_NAME=starter --build-arg INSTALL_MODE=editable --tag test_starter
❯ docker run -p 8321:8321 test_starter
❯ curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
```





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3839).
* #3855
* __->__ #3839
2025-10-20 10:23:01 -07:00
Charlie Doern
573e783ff0
docs: fix sidebar of Detailed Tutorial (#3856)
# What does this PR do?

the sidebar currently has an extra `ii. Run the Script` because its
incorrectly put into the doc as an H3 not an H4 (like the other ones)


<img width="239" height="218" alt="Screenshot 2025-10-20 at 1 04 54 PM"
src="https://github.com/user-attachments/assets/eb8cb26e-7ea9-4b61-9101-d64965b39647"
/>

Fix this which will update the sidebar

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-20 13:10:50 -04:00
Jiayi Ni
165b8b07f4
docs: Documentation update for NVIDIA Inference Provider (#3840)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
- Fix examples in the NVIDIA inference documentation to align with
current API requirements.

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
N/A
2025-10-20 09:51:43 -07:00
dependabot[bot]
f675fdda0f
chore(ui-deps): bump jest and @types/jest in /llama_stack/ui (#3853)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 32s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 33s
Test External API and Providers / test-external (venv) (push) Failing after 45s
Vector IO Integration Tests / test-matrix (push) Failing after 47s
API Conformance Tests / check-schema-compatibility (push) Successful in 55s
UI Tests / ui-tests (22) (push) Successful in 2m14s
Pre-commit / pre-commit (push) Successful in 3m28s
Bumps [jest](https://github.com/jestjs/jest/tree/HEAD/packages/jest) and
[@types/jest](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/jest).
These dependencies needed to be updated together.
Updates `jest` from 29.7.0 to 30.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest's
releases</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li>`[jest-snapshot-utils] Fix deprecated goo.gl snapshot guide link not
getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<h1>Chore</h1>
<ul>
<li><code>[expect]</code> Update docblock for <code>toContain()</code>
to display info on substring check (<a
href="https://redirect.github.com/jestjs/jest/pull/15789">#15789</a>)</li>
</ul>
<h2>30.0.2</h2>
<h2>What's Changed</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest's
changelog</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Fix infinite recursion with
self-referential getters in <code>deepCyclicCopyReplaceable</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15831">#15831</a>)</li>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
<li><code>[jest-snapshot-utils]</code> Improve messaging about goo.gl
snapshot link change (<a
href="https://redirect.github.com/jestjs/jest/pull/15821">#15821</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
guide link not getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="855864e3f9"><code>855864e</code></a>
v30.2.0</li>
<li><a
href="da9b532f04"><code>da9b532</code></a>
v30.1.3</li>
<li><a
href="ebfa31cc97"><code>ebfa31c</code></a>
v30.1.2</li>
<li><a
href="d347c0f3f8"><code>d347c0f</code></a>
v30.1.1</li>
<li><a
href="4d5f41d088"><code>4d5f41d</code></a>
v30.1.0</li>
<li><a
href="22236cf58b"><code>22236cf</code></a>
v30.0.5</li>
<li><a
href="f4296d2bc8"><code>f4296d2</code></a>
v30.0.4</li>
<li><a
href="d4a6c94daf"><code>d4a6c94</code></a>
v30.0.3</li>
<li><a
href="393acbfac3"><code>393acbf</code></a>
v30.0.2</li>
<li><a
href="5ce865b406"><code>5ce865b</code></a>
v30.0.1</li>
<li>Additional commits viewable in <a
href="https://github.com/jestjs/jest/commits/v30.2.0/packages/jest">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/jest` from 29.5.14 to 30.0.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/jest">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-18 21:57:57 -04:00
dependabot[bot]
7a256895aa
chore(ui-deps): bump jest-environment-jsdom from 30.1.2 to 30.2.0 in /llama_stack/ui (#3852)
Bumps
[jest-environment-jsdom](https://github.com/jestjs/jest/tree/HEAD/packages/jest-environment-jsdom)
from 30.1.2 to 30.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest-environment-jsdom's
releases</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest-environment-jsdom's
changelog</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore &amp; Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Fix infinite recursion with
self-referential getters in <code>deepCyclicCopyReplaceable</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15831">#15831</a>)</li>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="855864e3f9"><code>855864e</code></a>
v30.2.0</li>
<li>See full diff in <a
href="https://github.com/jestjs/jest/commits/v30.2.0/packages/jest-environment-jsdom">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jest-environment-jsdom&package-manager=npm_and_yarn&previous-version=30.1.2&new-version=30.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-18 21:53:58 -04:00
dependabot[bot]
83d2193077
chore(ui-deps): bump eslint-config-next from 15.5.2 to 15.5.6 in /llama_stack/ui (#3849)
Bumps
[eslint-config-next](https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next)
from 15.5.2 to 15.5.6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">eslint-config-next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.6</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Turbopack: don't define process.cwd() in node_modules <a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83452">#83452</a></li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/mischnic"><code>@​mischnic</code></a> for
helping!</p>
<h2>v15.5.5</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>Split code-frame into separate compiled package (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84238">#84238</a>)</li>
<li>Add deprecation warning to Runtime config (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84650">#84650</a>)</li>
<li>fix: unstable_cache should perform blocking revalidation during ISR
revalidation (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84716">#84716</a>)</li>
<li>feat: <code>experimental.middlewareClientMaxBodySize</code> body
cloning limit (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84722">#84722</a>)</li>
<li>fix: missing next/link types with typedRoutes (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84779">#84779</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>docs: early October improvements and fixes (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84334">#84334</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, and <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a> for
helping!</p>
<h2>v15.5.4</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: ensure onRequestError is invoked when otel enabled (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83343">#83343</a>)</li>
<li>fix: devtools initial position should be from next config (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83571">#83571</a>)</li>
<li>[devtool] fix overlay styles are missing (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83721">#83721</a>)</li>
<li>Turbopack: don't match dynamic pattern for node_modules packages (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83176">#83176</a>)</li>
<li>Turbopack: don't treat metadata routes as RSC (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82911">#82911</a>)</li>
<li>[turbopack] Improve handling of symlink resolution errors in
track_glob and read_glob (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83357">#83357</a>)</li>
<li>Turbopack: throw large static metadata error earlier (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/82939">#82939</a>)</li>
<li>fix: error overlay not closing when backdrop clicked (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83981">#83981</a>)</li>
<li>Turbopack: flush Node.js worker IPC on error (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/84077">#84077</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>[CNA] use linter preference (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83194">#83194</a>)</li>
<li>CI: use KV for test timing data (<a
href="https://github.com/vercel/next.js/tree/HEAD/packages/eslint-config-next/issues/83745">#83745</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="55ef0e3ebc"><code>55ef0e3</code></a>
v15.5.6</li>
<li><a
href="81f530db26"><code>81f530d</code></a>
v15.5.5</li>
<li><a
href="40f1d7814d"><code>40f1d78</code></a>
v15.5.4</li>
<li><a
href="07d1cbc9c6"><code>07d1cbc</code></a>
v15.5.3</li>
<li>See full diff in <a
href="https://github.com/vercel/next.js/commits/v15.5.6/packages/eslint-config-next">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint-config-next&package-manager=npm_and_yarn&previous-version=15.5.2&new-version=15.5.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-18 21:52:17 -04:00
ehhuang
316b76db7a
chore: add telemetry setup to install.sh (#3821)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Installer CI / lint (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Installer CI / smoke-test-on-dev (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Vector IO Integration Tests / test-matrix (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 44s
UI Tests / ui-tests (22) (push) Successful in 1m28s
Pre-commit / pre-commit (push) Successful in 2m27s
# What does this PR do?


## Test Plan

.venv ❯ sh ./scripts/install.sh 
⚠️  Found existing container(s) for 'ollama-server', removing...
⚠️  Found existing container(s) for 'llama-stack', removing...
⚠️  Found existing container(s) for 'jaeger', removing...
⚠️  Found existing container(s) for 'otel-collector', removing...
⚠️  Found existing container(s) for 'prometheus', removing...
⚠️  Found existing container(s) for 'grafana', removing...
📡 Starting telemetry stack...
🦙 Starting Ollama...
 Waiting for Ollama daemon...

📦 Ensuring model is pulled: llama3.2:3b...
🦙 Starting Llama Stack...
 Waiting for Llama Stack API...
..

🎉 Llama Stack is ready!
👉  API endpoint: http://localhost:8321
📖 Documentation:
https://llamastack.github.io/latest/references/api_reference/index.html
💻 To access the llama stack CLI, exec into the container:
   docker exec -ti llama-stack bash
📡 Telemetry dashboards:
   Jaeger UI:      http://localhost:16686
   Prometheus UI:  http://localhost:9090
   Grafana UI:     http://localhost:3000 (admin/admin)
   OTEL Collector: http://localhost:4318
🐛 Report an issue @ https://github.com/llamastack/llama-stack/issues if
you think it's a bug
2025-10-18 06:05:56 -07:00
Charlie Doern
b11bcfde11
refactor(build): rework CLI commands and build process (1/2) (#2974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 22s
Test llama stack list-deps / show-single-provider (push) Failing after 53s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 44s
API Conformance Tests / check-schema-compatibility (push) Successful in 52s
Test llama stack list-deps / generate-matrix (push) Successful in 52s
Test Llama Stack Build / build (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 53s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m2s
Unit Tests / unit-tests (3.13) (push) Failing after 1m30s
Test llama stack list-deps / list-deps-from-config (push) Failing after 1m59s
Test llama stack list-deps / list-deps (push) Failing after 1m10s
UI Tests / ui-tests (22) (push) Successful in 2m26s
Pre-commit / pre-commit (push) Successful in 3m8s
# What does this PR do?

This PR does a few things outlined in #2878 namely:
1. adds `llama stack list-deps` a command which simply takes the build
logic and instead of executing one of the `build_...` scripts, it
displays all of the providers' dependencies using the `module` and `uv`.
2. deprecated `llama stack build` in favor of `llama stack list-deps`
3. updates all tests to use `list-deps` alongside `build`.

PR 2/2 will migrate `llama stack run`'s default behavior to be `llama
stack build --run` and use the new `list-deps` command under the hood
before running the server.

examples of `llama stack list-deps starter`

```
llama stack list-deps starter --format json
{
  "name": "starter",
  "description": "Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments.",
  "apis": [
    {
      "api": "inference",
      "provider": "remote::cerebras"
    },
    {
      "api": "inference",
      "provider": "remote::ollama"
    },
    {
      "api": "inference",
      "provider": "remote::vllm"
    },
    {
      "api": "inference",
      "provider": "remote::tgi"
    },
    {
      "api": "inference",
      "provider": "remote::fireworks"
    },
    {
      "api": "inference",
      "provider": "remote::together"
    },
    {
      "api": "inference",
      "provider": "remote::bedrock"
    },
    {
      "api": "inference",
      "provider": "remote::nvidia"
    },
    {
      "api": "inference",
      "provider": "remote::openai"
    },
    {
      "api": "inference",
      "provider": "remote::anthropic"
    },
    {
      "api": "inference",
      "provider": "remote::gemini"
    },
    {
      "api": "inference",
      "provider": "remote::vertexai"
    },
    {
      "api": "inference",
      "provider": "remote::groq"
    },
    {
      "api": "inference",
      "provider": "remote::sambanova"
    },
    {
      "api": "inference",
      "provider": "remote::azure"
    },
    {
      "api": "inference",
      "provider": "inline::sentence-transformers"
    },
    {
      "api": "vector_io",
      "provider": "inline::faiss"
    },
    {
      "api": "vector_io",
      "provider": "inline::sqlite-vec"
    },
    {
      "api": "vector_io",
      "provider": "inline::milvus"
    },
    {
      "api": "vector_io",
      "provider": "remote::chromadb"
    },
    {
      "api": "vector_io",
      "provider": "remote::pgvector"
    },
    {
      "api": "files",
      "provider": "inline::localfs"
    },
    {
      "api": "safety",
      "provider": "inline::llama-guard"
    },
    {
      "api": "safety",
      "provider": "inline::code-scanner"
    },
    {
      "api": "agents",
      "provider": "inline::meta-reference"
    },
    {
      "api": "telemetry",
      "provider": "inline::meta-reference"
    },
    {
      "api": "post_training",
      "provider": "inline::torchtune-cpu"
    },
    {
      "api": "eval",
      "provider": "inline::meta-reference"
    },
    {
      "api": "datasetio",
      "provider": "remote::huggingface"
    },
    {
      "api": "datasetio",
      "provider": "inline::localfs"
    },
    {
      "api": "scoring",
      "provider": "inline::basic"
    },
    {
      "api": "scoring",
      "provider": "inline::llm-as-judge"
    },
    {
      "api": "scoring",
      "provider": "inline::braintrust"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::brave-search"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::tavily-search"
    },
    {
      "api": "tool_runtime",
      "provider": "inline::rag-runtime"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::model-context-protocol"
    },
    {
      "api": "batches",
      "provider": "inline::reference"
    }
  ],
  "pip_dependencies": [
    "pandas",
    "opentelemetry-exporter-otlp-proto-http",
    "matplotlib",
    "opentelemetry-sdk",
    "sentence-transformers",
    "datasets",
    "pymilvus[milvus-lite]>=2.4.10",
    "codeshield",
    "scipy",
    "torchvision",
    "tree_sitter",
    "h11>=0.16.0",
    "aiohttp",
    "pymongo",
    "tqdm",
    "pythainlp",
    "pillow",
    "torch",
    "emoji",
    "grpcio>=1.67.1,<1.71.0",
    "fireworks-ai",
    "langdetect",
    "psycopg2-binary",
    "asyncpg",
    "redis",
    "together",
    "torchao>=0.12.0",
    "openai",
    "sentencepiece",
    "aiosqlite",
    "google-cloud-aiplatform",
    "faiss-cpu",
    "numpy",
    "sqlite-vec",
    "nltk",
    "scikit-learn",
    "mcp>=1.8.1",
    "transformers",
    "boto3",
    "huggingface_hub",
    "ollama",
    "autoevals",
    "sqlalchemy[asyncio]",
    "torchtune>=0.5.0",
    "chromadb-client",
    "pypdf",
    "requests",
    "anthropic",
    "chardet",
    "aiosqlite",
    "fastapi",
    "fire",
    "httpx",
    "uvicorn",
    "opentelemetry-sdk",
    "opentelemetry-exporter-otlp-proto-http"
  ]
}
```

<img width="1500" height="420" alt="Screenshot 2025-10-16 at 5 53 03 PM"
src="https://github.com/user-attachments/assets/765929fb-93e2-44d7-9c3d-8918b70fc721"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-17 19:52:14 -07:00
Emilio Garcia
943558af36
test(telemetry): Telemetry Tests (#3805)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 30s
API Conformance Tests / check-schema-compatibility (push) Successful in 38s
UI Tests / ui-tests (22) (push) Successful in 1m32s
Pre-commit / pre-commit (push) Successful in 3m16s
# What does this PR do?
Adds a test and a standardized way to build future tests out for
telemetry in llama stack.
Contributes to https://github.com/llamastack/llama-stack/issues/3806

## Test Plan
This is the test plan 😎
2025-10-17 10:43:33 -07:00
Alexey Rybak
224c99560c
docs: update docstrings for better formatting (#3838)
# What does this PR do?
Updates docstrings for Conversations and Eval APIs to render better in
the docs nav sidebar.

Before: 
<img width="363" height="233" alt="Screenshot 2025-10-17 at 9 52 17 AM"
src="https://github.com/user-attachments/assets/3a77f9e3-3b03-43ae-8584-a21d1f44d54d"
/>

After:
<img width="410" height="206" alt="Screenshot 2025-10-17 at 9 52 11 AM"
src="https://github.com/user-attachments/assets/fa5d428d-2bde-4453-84fd-9aceebe712e8"
/>


## Test Plan
* Manual testing
2025-10-17 10:41:50 -07:00
Alexey Rybak
c9f0bebcb7
chore: update API leveling docs with deprecation flag (#3837)
# What does this PR do?
Adds information on the `deprecated=True` flags to the documentation for
extra clarity.

## Test Plan
* Manual testing
2025-10-17 10:17:58 -07:00
Ashwin Bharambe
a701f68bd7
feat(ci): enable docker based server tests (#3833)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 31s
UI Tests / ui-tests (22) (push) Successful in 1m35s
Pre-commit / pre-commit (push) Successful in 2m27s
2025-10-17 09:19:25 +02:00
Ashwin Bharambe
4c9d944380
fix(perf): make batches tests finish 30x faster (#3834)
In replay mode, inference is instantenous. We don't need to wait 15
seconds for the batch to be done. Fixing polling to do exp backoff makes
things work super fast.
2025-10-17 09:16:44 +02:00
Ashwin Bharambe
cd152f4240
feat(ci): add support for docker:distro in tests (#3832)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 12s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
Test Llama Stack Build / build (push) Failing after 7s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 25s
Python Package Build Test / build (3.12) (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 1m26s
Pre-commit / pre-commit (push) Successful in 2m18s
Also a critical bug fix so test recordings can be found inside docker
2025-10-16 19:33:13 -07:00
ehhuang
b3099d40e2
fix(telemetry): remove dependency on old telemetry config (#3830)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 57s
Vector IO Integration Tests / test-matrix (push) Failing after 1m13s
API Conformance Tests / check-schema-compatibility (push) Successful in 1m22s
UI Tests / ui-tests (22) (push) Successful in 1m33s
Pre-commit / pre-commit (push) Successful in 1m55s
# What does this PR do?
old telemetry config was removed in #3815

## Test Plan

❯ OTEL_SERVICE_NAME=aloha
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run
starter
<img width="1888" height="605" alt="image"
src="https://github.com/user-attachments/assets/dd5cc9f0-213a-4dc6-9385-f61a3a13b4c3"
/>
2025-10-16 12:05:10 -07:00
ehhuang
07ff15d917
chore: distrogen enables telemetry by default (#3828)
# What does this PR do?
leftover from #3815

## Test Plan
CI


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3828).
* #3830
* __->__ #3828
2025-10-16 11:29:51 -07:00
Charlie Doern
f22aaef42f
chore!: remove telemetry API usage (#3815)
# What does this PR do?

remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc

since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.

This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.

so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.


relates to #3806

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-16 10:39:32 -07:00
slekkala1
8c5705d39e
fix: test id not being set in headers (#3827)
# What does this PR do?
When stack config is set to server in docker
STACK_CONFIG_ARG=--stack-config=http://localhost:8321, the env variable
was not getting correctly set and test id not set, causing
This is needed for test-and-cut to work 
E openai.BadRequestError: Error code: 400 - {'detail': 'Invalid value:
Test ID is required for file ID allocation'}



5286461406

## Test Plan
CI
2025-10-16 10:29:07 -07:00
Bill Murdock
c19eb9854d
docs: Document known limitations of Responses (#3776)
# What does this PR do?

Adds a subpage of the OpenAI compatibility page in the documentation.
This subpage documents known limitations of the Responses API.

<!-- If resolving an issue, uncomment and update the line below -->

Closes #3575

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-10-16 10:26:23 -07:00
Ashwin Bharambe
185de61d8e
fix(openai_mixin): no yelling for model listing if API keys are not provided (#3826)
As indicated in the title. Our `starter` distribution enables all remote
providers _very intentionally_ because we believe it creates an easier,
more welcoming experience to new folks using the software. If we do
that, and then slam the logs with errors making them question their life
choices, it is not so good :)

Note that this fix is limited in scope. If you ever try to actually
instantiate the OpenAI client from a code path without an API key being
present, you deserve to fail hard.

## Test Plan

Run `llama stack run starter` with `OPENAI_API_KEY` set. No more wall of
text, just one message saying "listed 96 models".
2025-10-16 10:12:13 -07:00
Ashwin Bharambe
07fc8013eb
fix(tests): reduce some test noise (#3825)
a bunch of logger.info()s are good for server code to help debug in
production, but we don't want them killing our unit test output :)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-16 09:52:16 -07:00
Sébastien Han
0c368492b7
chore: update agent call (#3824)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (push) Failing after 11s
API Conformance Tests / check-schema-compatibility (push) Successful in 17s
UI Tests / ui-tests (22) (push) Successful in 1m49s
Pre-commit / pre-commit (push) Successful in 2m51s
followup on https://github.com/llamastack/llama-stack/pull/3810

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-16 16:04:43 +02:00
Derek Higgins
edb8afb219
chore: remove test_cases/openai/responses.json (#3823)
Its unused

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-16 06:59:29 -07:00
Ashwin Bharambe
f70aa99c97
fix(models)!: always prefix models with provider_id when registering (#3822)
**!!BREAKING CHANGE!!**

The lookup is also straightforward -- we always look for this identifier
and don't try to find a match for something without the provider_id
prefix.

Note that, this ideally means we need to update the `register_model()`
API also (we should kill "identifier" from there) but I am not doing
that as part of this PR.

## Test Plan

Existing unit tests
2025-10-16 06:47:39 -07:00
Ashwin Bharambe
f205ab6f6c
fix(responses): fixes, re-record tests (#3820)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 17s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 1m43s
Wanted to re-enable Responses CI but it seems to hang for some reason
due to some interactions with conversations_store or responses_store.

## Test Plan

```
# library client
./scripts/integration-tests.sh --stack-config ci-tests --suite responses

# server
./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses
```
2025-10-15 16:37:42 -07:00
slekkala1
99141c29b1
feat: Add responses and safety impl extra_body (#3781)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?

Have closed the previous PR due to merge conflicts with multiple PRs
Addressed all comments from
https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying
over to this one)


## Test Plan
Added UTs and integration tests
2025-10-15 15:01:37 -07:00
Ashwin Bharambe
8e7e0ddfec
fix(responses): use conversation items when no stored messages exist (#3819)
Handle a base case when no stored messages exist because no Response
call has been made.

## Test Plan

```
./scripts/integration-tests.sh --stack-config server:ci-tests \
   --suite responses   --inference-mode record-if-missing --pattern test_conversation_responses
```
2025-10-15 14:43:44 -07:00
ehhuang
6ba9db3929
chore!: BREAKING CHANGE: remove sqlite from telemetry config (#3808)
# What does this PR do?
- Removed sqlite sink from telemetry config.
- Removed related code
- Updated doc related to telemetry

## Test Plan
CI
2025-10-15 14:24:45 -07:00
Ashwin Bharambe
0a96a7faa5
fix(responses): fix subtle bugs in non-function tool calling (#3817)
We were generating "FunctionToolCall" items even for MCP (and
file-search, etc.) server-side calls. ID mismatches, etc. galore.
2025-10-15 13:57:37 -07:00
ehhuang
d709eeb33f
chore: mark recordings as generated files (#3816)
# What does this PR do?


## Test Plan
<img width="1506" height="653" alt="image"
src="https://github.com/user-attachments/assets/6c28b8e8-effe-41ab-8e31-72482c05662d"
/>
2025-10-15 11:06:42 -07:00
Sumanth Kamenani
bc8b377a7c
fix(vector-io): handle missing document_id in insert_chunks (#3521)
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.

Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError.
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.

 # What does this PR do?

Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing
`document_id` fields. The API
was failing even though `document_id` is optional according to the
schema.

  Closes #3494

  ## Test Plan

  **Before fix:**
  - POST to `/v1/vector-io/insert` with chunks → 500 KeyError
  - Happened regardless of where `document_id` was placed

  **After fix:**
  - Same request works fine → 200 OK
  - Tested with Postman using FAISS backend
  - Added unit test covering missing `document_id` scenarios
2025-10-15 11:02:48 -07:00
Ashwin Bharambe
e9b4278a51
feat(responses)!: improve responses + conversations implementations (#3810)
This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:

- it creates a streaming output item for the final assistant message
output by
  the model. until now we only added content parts and included that
  message in the final response.

- rewrites the conversation update code completely to account for items
  other than messages (tool calls, outputs, etc.)

## Test Plan

Used the test script from
https://github.com/llamastack/llama-stack-client-python/pull/281 for
this

```
TEST_API_BASE_URL=http://localhost:8321/v1 \
  pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs
```
2025-10-15 09:36:11 -07:00
Juan Pérez de Algaba
add8cd801b
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys (#3813)
# Add support for Google Gemini `gemini-embedding-001` embedding model
and correctly registers model type

MR message created with the assistance of Claude-4.5-sonnet

This resolves https://github.com/llamastack/llama-stack/issues/3755

## What does this PR do?

This PR adds support for the `gemini-embedding-001` Google embedding
model to the llama-stack Gemini provider. This model provides
high-dimensional embeddings (3072 dimensions) compared to the existing
`text-embedding-004` model (768 dimensions). Old embeddings models (such
as text-embedding-004) will be deprecated soon according to Google
([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/))

## Problem

The Gemini provider only supported the `text-embedding-004` embedding
model. The newer `gemini-embedding-001` model, which provides
higher-dimensional embeddings for improved semantic representation, was
not available through llama-stack.

## Solution

This PR consists of three commits that implement, fix the model
registration, and enable embedding generation:

### Commit 1: Initial addition of gemini-embedding-001

Added metadata for `gemini-embedding-001` to the
`embedding_model_metadata` dictionary:

```python
embedding_model_metadata: dict[str, dict[str, int]] = {
    "text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
    "gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # NEW
}
```

**Issue discovered:** The model was not being registered correctly
because the dictionary keys didn't match the model IDs returned by
Gemini's API.

### Commit 2: Fix model ID matching with `models/` prefix

Updated both dictionary keys to include the `models/` prefix to match
Gemini's OpenAI-compatible API response format:

```python
embedding_model_metadata: dict[str, dict[str, int]] = {
    "models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},      # UPDATED
    "models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # UPDATED
}
```

**Root cause:** Gemini's OpenAI-compatible API returns model IDs with
the `models/` prefix (e.g., `models/text-embedding-004`). The
`OpenAIMixin.list_models()` method directly matches these IDs against
the `embedding_model_metadata` dictionary keys. Without the prefix, the
models were being registered as LLMs instead of embedding models.

### Commit 3: Fix embedding generation for providers without usage stats

Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented
embedding generation for providers (like Gemini) that don't return usage
statistics:

```python
# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
    prompt_tokens=response.usage.prompt_tokens,  # ← Crashed with AttributeError
    total_tokens=response.usage.total_tokens,
)

# After (Lines 351-362):
if response.usage:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=response.usage.prompt_tokens,
        total_tokens=response.usage.total_tokens,
    )
else:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=0,  # Default when not provided
        total_tokens=0,   # Default when not provided
    )
```

**Impact:** This fix enables embedding generation for **all** Gemini
embedding models, not just the newly added one.

## Changes

### Modified Files

**`llama_stack/providers/remote/inference/gemini/gemini.py`**
- Line 17: Updated `text-embedding-004` key to
`models/text-embedding-004`
- Line 18: Added `models/gemini-embedding-001` with correct metadata

**`llama_stack/providers/utils/inference/openai_mixin.py`**
- Lines 351-362: Added null check for `response.usage` to handle
providers without usage statistics

## Key Technical Details

### Model ID Matching Flow

1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint
2. API returns model IDs like: `models/text-embedding-004`,
`models/gemini-embedding-001`
3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata :=
self.embedding_model_metadata.get(provider_model_id)`
4. If matched, registers as `model_type: "embedding"` with metadata;
otherwise registers as `model_type: "llm"`

### Why Both Keys Needed the Prefix

The `text-embedding-004` model was already working because there was
likely separate configuration or manual registration handling it. For
auto-discovery to work correctly for **both** models, both keys must
match the API's model ID format exactly.

## How to test this PR

Verified the changes by:

1. **Model Auto-Discovery**: Started llama-stack server and confirmed
models are auto-discovered from Gemini API

2. **Model Registration**: Confirmed both embedding models are correctly
registered and visible
```bash
curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'
```

**Results:**
-  `gemini/models/text-embedding-004` - 768 dimensions - `model_type:
"embedding"`
-  `gemini/models/gemini-embedding-001` - 3072 dimensions -
`model_type: "embedding"`

3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"`
without embedding metadata

4. **After Fix (Commit 2)**: Models correctly identified as `model_type:
"embedding"` with proper metadata

5. **Generate Embeddings**: Verified embedding generation works
```bash
curl -X POST http://localhost:8325/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
  jq '.data[0].embedding | length'
```
2025-10-15 12:22:10 -04:00
slekkala1
ce8ea2f505
chore: Support embedding params from metadata for Vector Store (#3811)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m34s
# What does this PR do?
Support reading embedding model and dimensions from metadata for vector
store

## Test Plan
Unit Tests
2025-10-15 15:53:36 +02:00
Francisco Arceo
ef4bc70bbe
feat: Enable setting a default embedding model in the stack (#3803)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
- Unit tests
- Integration tests
- Simple example below:

Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:

```python
vs = client.vector_stores.create(
    extra_body={
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
        "embedding_dimension": 384,
    }
)
```

The `extra_body` is now unnecessary.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-14 18:25:13 -07:00
Jiayi Ni
d875e427bf
refactor: use extra_body to pass in input_type params for asymmetric embedding models for NVIDIA Inference Provider (#3804)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Previously, the NVIDIA inference provider implemented a custom
`openai_embeddings` method with a hardcoded `input_type="query"`
parameter, which is required by NVIDIA asymmetric embedding
models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)).
Recently `extra_body` parameter is added to the embeddings API
([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)).
So, this PR updates the NVIDIA inference provider to use the base
`OpenAIMixin.openai_embeddings` method instead and pass the `input_type`
through the `extra_body` parameter for asymmetric embedding models.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run the following command for the ```embedding_model```:
```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```,
```nvidia/nv-embedqa-mistral-7b-v2```, and
```snowflake/arctic-embed-l```.
```
pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record
```
2025-10-14 13:52:55 -07:00
ehhuang
866c13cdc2
chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs (#3740)
# What does this PR do?
As discussed on discord, we do not need to reinvent the wheel for
telemetry. Instead we'll lean into the canonical OTEL stack.
Logs/traces/metrics will still be sent via OTEL - they just won't be
stored on, queried through Stack.

This is the first of many PRs to remove telemetry API from Stack.
1) removed webmethod decorators to remove from API spec
2) removed tests as @iamemilio is adding them on otel directly.

## Test Plan
2025-10-14 13:48:40 -07:00
Bill Murdock
15900472ad
docs: Update CONTRIBUTING: py 3.12 and pre-commit==4.3.0 (#3807)
# What does this PR do?

Updates CONTRIBUTING.md with the following changes:
- Use Python 3.12 (and why)
- Use pre-commit==4.3.0
- Recommend using -v with pre-commit to get detailed info about why it
is failing if it fails.
- Instructs users to go to the docs/ directory before rebuilding the
docs (it doesn't work unless you do that).

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-10-14 15:47:38 -04:00
IAN MILLER
007efa6eb5
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to replace the Llama Stack's default embedding
model by nomic-embed-text-v1.5.

These are the key reasons why Llama Stack community decided to switch
from all-MiniLM-L6-v2 to nomic-embed-text-v1.5:
1. The training data for
[all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data)
includes a lot of data sets with various licensing terms, so it is
tricky to know when/whether it is appropriate to use this model for
commercial applications.
2. The model is not particularly competitive on major benchmarks. For
example, if you look at the [MTEB
Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click
on Miscellaneous/BEIR to see English information retrieval accuracy, you
see that the top of the leaderboard is dominated by enormous models but
also that there are many, many models of relatively modest size whith
much higher Retrieval scores. If you want to look closely at the data, I
recommend clicking "Download Table" because it is easier to browse that
way.

More discussion info can be founded
[here](https://github.com/llamastack/llama-stack/issues/2418)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2418 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. Run `./scripts/unit-tests.sh`
2. Integration tests via CI wokrflow

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 10:44:20 -04:00
Cesare Pompeiano
0dbf79c328
fix: Fixed WatsonX remote inference provider (#3801)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 9s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Test Llama Stack Build / build (push) Failing after 31s
UI Tests / ui-tests (22) (push) Successful in 46s
Pre-commit / pre-commit (push) Successful in 2m13s
# What does this PR do?
This PR fixes issues with the WatsonX provider so it works correctly
with LiteLLM.

The main problem was that WatsonX requests failed because the provider
data validator didn’t properly handle the API key and project ID. This
was fixed by updating the WatsonXProviderDataValidator and ensuring the
provider data is loaded correctly.

The openai_chat_completion method was also updated to match the behavior
of other providers while adding WatsonX-specific fields like project_id.
It still calls await super().openai_chat_completion.__func__(self,
params) to keep the existing setup and tracing logic.

After these changes, WatsonX requests now run correctly.

## Test Plan
The changes were tested by running chat completion requests and
confirming that credentials and project parameters are passed correctly.
I have tested with my WatsonX credentials, by using the cli with `uv run
llama-stack-client inference chat-completion --session`

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 14:52:32 +02:00
Sébastien Han
1136daf310
fix: replace python-jose with PyJWT for JWT handling (#3756)
# What does this PR do?

This commit migrates the authentication system from python-jose to PyJWT
to eliminate the dependency on the archived rsa package. The migration
includes:

- Refactored OAuth2TokenAuthProvider to use PyJWT's PyJWKClient for
clean JWKS handling
- Removed manual JWKS fetching, caching and key extraction logic in
favor of PyJWT's built-in functionality

The new implementation is cleaner, more maintainable, and follows PyJWT
best practices while maintaining full backward compatibility.

## Test Plan

Unit tests. Auth CI.

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-14 09:35:48 +02:00
Francisco Arceo
968c364a3e
chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… (#3802)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 8s
API Conformance Tests / check-schema-compatibility (push) Successful in 18s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
2 main changes:

1. Remove `provider_id` requirement in call to vector stores and
2. Removes "register first embedding model" logic 
   - Now forces embedding model id as required on Vector Store creation

Simplifies the UX for OpenAI to:

```python
vs = client.vector_stores.create(
    name="my_citations_db",
    extra_body={
        "embedding_model": "ollama/nomic-embed-text:latest",
    }
)
```


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-13 10:25:36 -07:00
Derek Higgins
642126e13b
fix: record job checking wrong directory (#3799)
Fixed CI job to check the correct directory for file changes Artifacts
are now stored in multiple directories not just
./tests/integration/recordings

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-13 09:55:55 -07:00
raghotham
b95f095a54
feat: Allow :memory: for kvstore (#3696)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m21s
## Test Plan
added unit tests
2025-10-13 11:19:27 +02:00
Ashwin Bharambe
ecc8a554d2
feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m23s
Applies the same pattern from
https://github.com/llamastack/llama-stack/pull/3777 to embeddings and
vector_stores.create() endpoints.

This should _not_ be a breaking change since (a) our tests were already
using the `extra_body` parameter when passing in to the backend (b) but
the backend probably wasn't extracting the parameters correctly. This PR
will fix that.

Updated APIs: `openai_embeddings(), openai_create_vector_store(),
openai_create_vector_store_file_batch()`
2025-10-12 19:01:52 -07:00
slekkala1
3bb6ef351b
chore!: Safety api refactoring to use OpenAIMessageParam (#3796)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?
Remove usage of deprecated `Message` from Safety apis


## Test Plan
CI
2025-10-12 08:01:00 -07:00
dependabot[bot]
82cbcada39
chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui (#3788)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m26s
Bumps
[lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react)
from 0.542.0 to 0.545.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/lucide-icons/lucide/releases">lucide-react's
releases</a>.</em></p>
<blockquote>
<h2>Version 0.545.0</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(icons): changed <code>flame</code> icon by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3600">lucide-icons/lucide#3600</a></li>
<li>fix(icons): arcified <code>square-m</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3549">lucide-icons/lucide#3549</a></li>
<li>chore(deps-dev): bump vite from 6.3.5 to 6.3.6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3611">lucide-icons/lucide#3611</a></li>
<li>fix(icons): changed <code>combine</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3200">lucide-icons/lucide#3200</a></li>
<li>fix(icons): changed <code>building-2</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3509">lucide-icons/lucide#3509</a></li>
<li>chore(deps): bump devalue from 5.1.1 to 5.3.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3638">lucide-icons/lucide#3638</a></li>
<li>feat(icons): Add <code>motorbike</code> icon by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3371">lucide-icons/lucide#3371</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.544.0...0.545.0">https://github.com/lucide-icons/lucide/compare/0.544.0...0.545.0</a></p>
<h2>Version 0.544.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update lucide-static documentation about raw string imports by
<a href="https://github.com/pascalduez"><code>@​pascalduez</code></a> in
<a
href="https://redirect.github.com/lucide-icons/lucide/pull/3524">lucide-icons/lucide#3524</a></li>
<li>feat(icons): added <code>ev-charger</code> icon by <a
href="https://github.com/UsamaKhan"><code>@​UsamaKhan</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2781">lucide-icons/lucide#2781</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/pascalduez"><code>@​pascalduez</code></a> made
their first contribution in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3524">lucide-icons/lucide#3524</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.543.0...0.544.0">https://github.com/lucide-icons/lucide/compare/0.543.0...0.544.0</a></p>
<h2>Version 0.543.0</h2>
<h2>What's Changed</h2>
<ul>
<li>feat(preview-comment): put x-ray at top if there are more than 7
changed icons to prevent them from being cut of by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3589">lucide-icons/lucide#3589</a></li>
<li>fix(icons): changed <code>church</code> icon by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/2971">lucide-icons/lucide#2971</a></li>
<li>chore(metadata): Added tags to <code>messages-square</code> by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3529">lucide-icons/lucide#3529</a></li>
<li>fix(icons): Optimise <code>bug</code> icons by <a
href="https://github.com/jamiemlaw"><code>@​jamiemlaw</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3574">lucide-icons/lucide#3574</a></li>
<li>fix(icons): changed list/text &amp; derived icons by <a
href="https://github.com/karsa-mistmere"><code>@​karsa-mistmere</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3568">lucide-icons/lucide#3568</a></li>
<li>fix(icons): changed <code>panel-top-bottom-dashed</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3584">lucide-icons/lucide#3584</a></li>
<li>fix(icons): changed <code>message-square-quote</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3550">lucide-icons/lucide#3550</a></li>
<li>fix(meta): added tag to <code>ship</code> metadata by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3559">lucide-icons/lucide#3559</a></li>
<li>fix(meta): add tags to <code>id-card-lanyard</code> metadata by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3534">lucide-icons/lucide#3534</a></li>
<li>fix(icons): changed <code>calendar-cog</code> icon by <a
href="https://github.com/jguddas"><code>@​jguddas</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3583">lucide-icons/lucide#3583</a></li>
<li>chore(deps): bump astro from 5.5.2 to 5.13.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3564">lucide-icons/lucide#3564</a></li>
<li>feat(packages): add new package for flutter by <a
href="https://github.com/vqh2602"><code>@​vqh2602</code></a> in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3536">lucide-icons/lucide#3536</a></li>
<li>feat(icons): added <code>house-heart</code> icon by <a
href="https://github.com/danielbayley"><code>@​danielbayley</code></a>
in <a
href="https://redirect.github.com/lucide-icons/lucide/pull/3239">lucide-icons/lucide#3239</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/lucide-icons/lucide/compare/0.542.0...0.543.0">https://github.com/lucide-icons/lucide/compare/0.542.0...0.543.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1cfb3ff70e"><code>1cfb3ff</code></a>
chore(deps-dev): bump vite from 6.3.5 to 6.3.6 (<a
href="https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react/issues/3611">#3611</a>)</li>
<li>See full diff in <a
href="https://github.com/lucide-icons/lucide/commits/0.545.0/packages/lucide-react">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lucide-react&package-manager=npm_and_yarn&previous-version=0.542.0&new-version=0.545.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 21:40:48 -04:00
dependabot[bot]
e94840d298
chore(ui-deps): bump framer-motion from 12.23.12 to 12.23.24 in /llama_stack/ui (#3792)
Bumps [framer-motion](https://github.com/motiondivision/motion) from
12.23.12 to 12.23.24.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/motiondivision/motion/blob/main/CHANGELOG.md">framer-motion's
changelog</a>.</em></p>
<blockquote>
<h2>[12.23.24] 2025-10-10</h2>
<h3>Fixed</h3>
<ul>
<li>Ensure that when a component remounts, it continues to fire
animations even when <code>initial={false}</code>.</li>
</ul>
<h2>[12.23.23] 2025-10-10</h2>
<h3>Added</h3>
<ul>
<li>Exporting <code>PresenceChild</code> and <code>PopChild</code> type
for internal use.</li>
</ul>
<h2>[12.23.22] 2025-09-25</h2>
<h3>Added</h3>
<ul>
<li>Exporting <code>HTMLElements</code> and <code>useComposedRefs</code>
type for internal use.</li>
</ul>
<h2>[12.23.21] 2025-09-24</h2>
<h3>Fixed</h3>
<ul>
<li>Fixing main-thread <code>scroll</code> with animations that contain
<code>delay</code>.</li>
</ul>
<h2>[12.23.20] 2025-09-24</h2>
<h3>Fixed</h3>
<ul>
<li>Suppress non-animatable value warning for instant animations.</li>
</ul>
<h2>[12.23.19] 2025-09-23</h2>
<h3>Fixed</h3>
<ul>
<li>Remove support for changing <code>ref</code> prop.</li>
</ul>
<h2>[12.23.18] 2025-09-19</h2>
<h3>Fixed</h3>
<ul>
<li><code>&lt;motion /&gt;</code> components now support changing
<code>ref</code> prop.</li>
</ul>
<h2>[12.23.17] 2025-09-19</h2>
<h3>Fixed</h3>
<ul>
<li>Ensure <code>animate()</code> <code>onComplete</code> only fires
once, when all values are complete.</li>
</ul>
<h2>[12.23.16] 2025-09-19</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b5df740a46"><code>b5df740</code></a>
v12.23.24</li>
<li><a
href="808ebce630"><code>808ebce</code></a>
Updating changelog</li>
<li><a
href="237eee2246"><code>237eee2</code></a>
v12.23.23</li>
<li><a
href="834965c803"><code>834965c</code></a>
Updating changelog</li>
<li><a
href="40690864e9"><code>4069086</code></a>
Update README.md</li>
<li><a
href="6da6b61e94"><code>6da6b61</code></a>
Update README.md with new sponsor links</li>
<li><a
href="e36683149d"><code>e366831</code></a>
Update README.md</li>
<li><a
href="7796f4f1e0"><code>7796f4f</code></a>
Update Gold section with new links and images</li>
<li><a
href="d1bb93757c"><code>d1bb937</code></a>
Update sponsor section in README.md</li>
<li><a
href="97fba16059"><code>97fba16</code></a>
Update sponsorship logos in README</li>
<li>Additional commits viewable in <a
href="https://github.com/motiondivision/motion/compare/v12.23.12...v12.23.24">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=framer-motion&package-manager=npm_and_yarn&previous-version=12.23.12&new-version=12.23.24)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 21:36:01 -04:00
dependabot[bot]
25ea94fcf7
chore(ui-deps): bump eslint from 9.26.0 to 9.37.0 in /llama_stack/ui (#3791)
Bumps [eslint](https://github.com/eslint/eslint) from 9.26.0 to 9.37.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/eslint/eslint/releases">eslint's
releases</a>.</em></p>
<blockquote>
<h2>v9.37.0</h2>
<h2>Features</h2>
<ul>
<li><a
href="39f7fb493a"><code>39f7fb4</code></a>
feat: <code>preserve-caught-error</code> should recognize all static
&quot;cause&quot; keys (<a
href="https://redirect.github.com/eslint/eslint/issues/20163">#20163</a>)
(Pixel998)</li>
<li><a
href="f81eabc584"><code>f81eabc</code></a>
feat: support TS syntax in <code>no-restricted-imports</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/19562">#19562</a>)
(Nitin Kumar)</li>
</ul>
<h2>Bug Fixes</h2>
<ul>
<li><a
href="a129cced7a"><code>a129cce</code></a>
fix: correct <code>no-loss-of-precision</code> false positives for
leading zeros (<a
href="https://redirect.github.com/eslint/eslint/issues/20164">#20164</a>)
(Francesco Trotta)</li>
<li><a
href="09e04fcc3f"><code>09e04fc</code></a>
fix: add missing AST token types (<a
href="https://redirect.github.com/eslint/eslint/issues/20172">#20172</a>)
(Pixel998)</li>
<li><a
href="861c6da2bd"><code>861c6da</code></a>
fix: correct <code>ESLint</code> typings (<a
href="https://redirect.github.com/eslint/eslint/issues/20122">#20122</a>)
(Pixel998)</li>
</ul>
<h2>Documentation</h2>
<ul>
<li><a
href="b950359c5f"><code>b950359</code></a>
docs: fix typos across the docs (<a
href="https://redirect.github.com/eslint/eslint/issues/20182">#20182</a>)
(루밀LuMir)</li>
<li><a
href="42498a2798"><code>42498a2</code></a>
docs: improve ToC accessibility by hiding non-semantic character (<a
href="https://redirect.github.com/eslint/eslint/issues/20181">#20181</a>)
(Percy Ma)</li>
<li><a
href="29ea092b93"><code>29ea092</code></a>
docs: Update README (GitHub Actions Bot)</li>
<li><a
href="5c97a04578"><code>5c97a04</code></a>
docs: show <code>availableUntil</code> in deprecated rule banner (<a
href="https://redirect.github.com/eslint/eslint/issues/20170">#20170</a>)
(Pixel998)</li>
<li><a
href="90a71bf502"><code>90a71bf</code></a>
docs: update <code>README</code> files to add badge and instructions (<a
href="https://redirect.github.com/eslint/eslint/issues/20115">#20115</a>)
(루밀LuMir)</li>
<li><a
href="1603ae1526"><code>1603ae1</code></a>
docs: update references from <code>master</code> to <code>main</code>
(<a
href="https://redirect.github.com/eslint/eslint/issues/20153">#20153</a>)
(루밀LuMir)</li>
</ul>
<h2>Chores</h2>
<ul>
<li><a
href="afe8a13469"><code>afe8a13</code></a>
chore: update <code>@eslint/js</code> dependency to version 9.37.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20183">#20183</a>)
(Francesco Trotta)</li>
<li><a
href="abee4ca1fa"><code>abee4ca</code></a>
chore: package.json update for <code>@​eslint/js</code> release
(Jenkins)</li>
<li><a
href="fc9381f6ca"><code>fc9381f</code></a>
chore: fix typos in comments (<a
href="https://redirect.github.com/eslint/eslint/issues/20175">#20175</a>)
(overlookmotel)</li>
<li><a
href="e1574a22d3"><code>e1574a2</code></a>
chore: unpin jiti (<a
href="https://redirect.github.com/eslint/eslint/issues/20173">#20173</a>)
(renovate[bot])</li>
<li><a
href="e1ac05e2fa"><code>e1ac05e</code></a>
refactor: mark <code>ESLint.findConfigFile()</code> as
<code>async</code>, add missing docs (<a
href="https://redirect.github.com/eslint/eslint/issues/20157">#20157</a>)
(Pixel998)</li>
<li><a
href="347906d627"><code>347906d</code></a>
chore: update eslint (<a
href="https://redirect.github.com/eslint/eslint/issues/20149">#20149</a>)
(renovate[bot])</li>
<li><a
href="0cb5897e24"><code>0cb5897</code></a>
test: remove tmp dir created for circular fixes in multithread mode test
(<a
href="https://redirect.github.com/eslint/eslint/issues/20146">#20146</a>)
(Milos Djermanovic)</li>
<li><a
href="bb995665e3"><code>bb99566</code></a>
ci: pin <code>jiti</code> to version 2.5.1 (<a
href="https://redirect.github.com/eslint/eslint/issues/20151">#20151</a>)
(Pixel998)</li>
<li><a
href="177f669adc"><code>177f669</code></a>
perf: improve worker count calculation for <code>&quot;auto&quot;</code>
concurrency (<a
href="https://redirect.github.com/eslint/eslint/issues/20067">#20067</a>)
(Francesco Trotta)</li>
<li><a
href="448b57bca3"><code>448b57b</code></a>
chore: Mark deprecated formatting rules as available until v11.0.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20144">#20144</a>)
(Milos Djermanovic)</li>
</ul>
<h2>v9.36.0</h2>
<h2>Features</h2>
<ul>
<li><a
href="47afcf668d"><code>47afcf6</code></a>
feat: correct <code>preserve-caught-error</code> edge cases (<a
href="https://redirect.github.com/eslint/eslint/issues/20109">#20109</a>)
(Francesco Trotta)</li>
</ul>
<h2>Bug Fixes</h2>
<ul>
<li><a
href="75b74d865d"><code>75b74d8</code></a>
fix: add missing rule option types (<a
href="https://redirect.github.com/eslint/eslint/issues/20127">#20127</a>)
(ntnyq)</li>
<li><a
href="1c0d85049e"><code>1c0d850</code></a>
fix: update <code>eslint-all.js</code> to use <code>Object.freeze</code>
for <code>rules</code> object (<a
href="https://redirect.github.com/eslint/eslint/issues/20116">#20116</a>)
(루밀LuMir)</li>
<li><a
href="7d61b7fadc"><code>7d61b7f</code></a>
fix: add missing scope types to <code>Scope.type</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/20110">#20110</a>)
(Pixel998)</li>
<li><a
href="7a670c301b"><code>7a670c3</code></a>
fix: correct rule option typings in <code>rules.d.ts</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/20084">#20084</a>)
(Pixel998)</li>
</ul>
<h2>Documentation</h2>
<ul>
<li><a
href="b73ab12acd"><code>b73ab12</code></a>
docs: update examples to use <code>defineConfig</code> (<a
href="https://redirect.github.com/eslint/eslint/issues/20131">#20131</a>)
(sethamus)</li>
<li><a
href="31d9392699"><code>31d9392</code></a>
docs: fix typos (<a
href="https://redirect.github.com/eslint/eslint/issues/20118">#20118</a>)
(Pixel998)</li>
<li><a
href="c7f861b3f8"><code>c7f861b</code></a>
docs: Update README (GitHub Actions Bot)</li>
<li><a
href="6b0c08b106"><code>6b0c08b</code></a>
docs: Update README (GitHub Actions Bot)</li>
<li><a
href="91f97c5046"><code>91f97c5</code></a>
docs: Update README (GitHub Actions Bot)</li>
</ul>
<h2>Chores</h2>
<ul>
<li><a
href="12411e8d45"><code>12411e8</code></a>
chore: upgrade <code>@​eslint/js</code><a
href="https://github.com/9"><code>@​9</code></a>.36.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20139">#20139</a>)
(Milos Djermanovic)</li>
<li><a
href="488cba6b39"><code>488cba6</code></a>
chore: package.json update for <code>@​eslint/js</code> release
(Jenkins)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d5d1bdf5fd"><code>d5d1bdf</code></a>
9.37.0</li>
<li><a
href="94865ff41c"><code>94865ff</code></a>
Build: changelog update for 9.37.0</li>
<li><a
href="afe8a13469"><code>afe8a13</code></a>
chore: update <code>@eslint/js</code> dependency to version 9.37.0 (<a
href="https://redirect.github.com/eslint/eslint/issues/20183">#20183</a>)</li>
<li><a
href="abee4ca1fa"><code>abee4ca</code></a>
chore: package.json update for <code>@​eslint/js</code> release</li>
<li><a
href="b950359c5f"><code>b950359</code></a>
docs: fix typos across the docs (<a
href="https://redirect.github.com/eslint/eslint/issues/20182">#20182</a>)</li>
<li><a
href="42498a2798"><code>42498a2</code></a>
docs: improve ToC accessibility by hiding non-semantic character (<a
href="https://redirect.github.com/eslint/eslint/issues/20181">#20181</a>)</li>
<li><a
href="fc9381f6ca"><code>fc9381f</code></a>
chore: fix typos in comments (<a
href="https://redirect.github.com/eslint/eslint/issues/20175">#20175</a>)</li>
<li><a
href="e1574a22d3"><code>e1574a2</code></a>
chore: unpin jiti (<a
href="https://redirect.github.com/eslint/eslint/issues/20173">#20173</a>)</li>
<li><a
href="29ea092b93"><code>29ea092</code></a>
docs: Update README</li>
<li><a
href="a129cced7a"><code>a129cce</code></a>
fix: correct <code>no-loss-of-precision</code> false positives for
leading zeros (<a
href="https://redirect.github.com/eslint/eslint/issues/20164">#20164</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/eslint/eslint/compare/v9.26.0...v9.37.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=eslint&package-manager=npm_and_yarn&previous-version=9.26.0&new-version=9.37.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 18:00:29 -07:00
dependabot[bot]
190b96ea62
chore(ui-deps): bump @types/react-dom from 19.2.0 to 19.2.1 in /llama_stack/ui (#3789)
Bumps
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom)
from 19.2.0 to 19.2.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react-dom&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 18:00:22 -07:00
dependabot[bot]
4fb39f0a6a
chore(ui-deps): bump @types/react from 19.2.0 to 19.2.2 in /llama_stack/ui (#3790)
Bumps
[@types/react](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react)
from 19.2.0 to 19.2.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@types/react&package-manager=npm_and_yarn&previous-version=19.2.0&new-version=19.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 18:00:18 -07:00
dependabot[bot]
cfd2e303db
chore(python-deps): bump black from 25.1.0 to 25.9.0 (#3783)
Bumps [black](https://github.com/psf/black) from 25.1.0 to 25.9.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/psf/black/releases">black's
releases</a>.</em></p>
<blockquote>
<h2>25.9.0</h2>
<h3>Highlights</h3>
<ul>
<li>Remove support for pre-python 3.7 <code>await/async</code> as soft
keywords/variable names
(<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
</ul>
<h3>Stable style</h3>
<ul>
<li>Fix crash while formatting a long <code>del</code> statement
containing tuples (<a
href="https://redirect.github.com/psf/black/issues/4628">#4628</a>)</li>
<li>Fix crash while formatting expressions using the walrus operator in
complex <code>with</code>
statements (<a
href="https://redirect.github.com/psf/black/issues/4630">#4630</a>)</li>
<li>Handle <code># fmt: skip</code> followed by a comment at the end of
file (<a
href="https://redirect.github.com/psf/black/issues/4635">#4635</a>)</li>
<li>Fix crash when a tuple appears in the <code>as</code> clause of a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4634">#4634</a>)</li>
<li>Fix crash when tuple is used as a context manager inside a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4646">#4646</a>)</li>
<li>Fix crash when formatting a <code>\</code> followed by a
<code>\r</code> followed by a comment (<a
href="https://redirect.github.com/psf/black/issues/4663">#4663</a>)</li>
<li>Fix crash on a <code>\\r\n</code> (<a
href="https://redirect.github.com/psf/black/issues/4673">#4673</a>)</li>
<li>Fix crash on <code>await ...</code> (where <code>...</code> is a
literal <code>Ellipsis</code>) (<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
<li>Fix crash on parenthesized expression inside a type parameter bound
(<a
href="https://redirect.github.com/psf/black/issues/4684">#4684</a>)</li>
<li>Fix crash when using line ranges excluding indented single line
decorated items
(<a
href="https://redirect.github.com/psf/black/issues/4670">#4670</a>)</li>
</ul>
<h3>Preview style</h3>
<ul>
<li>Fix a bug where one-liner functions/conditionals marked with <code>#
fmt: skip</code> would still
be formatted (<a
href="https://redirect.github.com/psf/black/issues/4552">#4552</a>)</li>
<li>Improve <code>multiline_string_handling</code> with ternaries and
dictionaries (<a
href="https://redirect.github.com/psf/black/issues/4657">#4657</a>)</li>
<li>Fix a bug where <code>string_processing</code> would not split
f-strings directly after
expressions (<a
href="https://redirect.github.com/psf/black/issues/4680">#4680</a>)</li>
<li>Wrap the <code>in</code> clause of comprehensions across lines if
necessary (<a
href="https://redirect.github.com/psf/black/issues/4699">#4699</a>)</li>
<li>Remove parentheses around multiple exception types in
<code>except</code> and <code>except*</code> without
<code>as</code>. (<a
href="https://redirect.github.com/psf/black/issues/4720">#4720</a>)</li>
<li>Add <code>\r</code> style newlines to the potential newlines to
normalize file newlines both from
and to (<a
href="https://redirect.github.com/psf/black/issues/4710">#4710</a>)</li>
</ul>
<h3>Parser</h3>
<ul>
<li>Rewrite tokenizer to improve performance and compliance (<a
href="https://redirect.github.com/psf/black/issues/4536">#4536</a>)</li>
<li>Fix bug where certain unusual expressions (e.g., lambdas) were not
accepted in type
parameter bounds and defaults. (<a
href="https://redirect.github.com/psf/black/issues/4602">#4602</a>)</li>
</ul>
<h3>Performance</h3>
<ul>
<li>Avoid using an extra process when running with only one worker (<a
href="https://redirect.github.com/psf/black/issues/4734">#4734</a>)</li>
</ul>
<h3>Integrations</h3>
<ul>
<li>Fix the version check in the vim file to reject Python 3.8 (<a
href="https://redirect.github.com/psf/black/issues/4567">#4567</a>)</li>
<li>Enhance GitHub Action <code>psf/black</code> to read Black version
from an additional section in
pyproject.toml: <code>[project.dependency-groups]</code> (<a
href="https://redirect.github.com/psf/black/issues/4606">#4606</a>)</li>
<li>Build gallery docker image with python3-slim and reduce image size
(<a
href="https://redirect.github.com/psf/black/issues/4686">#4686</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psf/black/blob/main/CHANGES.md">black's
changelog</a>.</em></p>
<blockquote>
<h2>25.9.0</h2>
<h3>Highlights</h3>
<ul>
<li>Remove support for pre-python 3.7 <code>await/async</code> as soft
keywords/variable names
(<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
</ul>
<h3>Stable style</h3>
<ul>
<li>Fix crash while formatting a long <code>del</code> statement
containing tuples (<a
href="https://redirect.github.com/psf/black/issues/4628">#4628</a>)</li>
<li>Fix crash while formatting expressions using the walrus operator in
complex <code>with</code>
statements (<a
href="https://redirect.github.com/psf/black/issues/4630">#4630</a>)</li>
<li>Handle <code># fmt: skip</code> followed by a comment at the end of
file (<a
href="https://redirect.github.com/psf/black/issues/4635">#4635</a>)</li>
<li>Fix crash when a tuple appears in the <code>as</code> clause of a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4634">#4634</a>)</li>
<li>Fix crash when tuple is used as a context manager inside a
<code>with</code> statement (<a
href="https://redirect.github.com/psf/black/issues/4646">#4646</a>)</li>
<li>Fix crash when formatting a <code>\</code> followed by a
<code>\r</code> followed by a comment (<a
href="https://redirect.github.com/psf/black/issues/4663">#4663</a>)</li>
<li>Fix crash on a <code>\\r\n</code> (<a
href="https://redirect.github.com/psf/black/issues/4673">#4673</a>)</li>
<li>Fix crash on <code>await ...</code> (where <code>...</code> is a
literal <code>Ellipsis</code>) (<a
href="https://redirect.github.com/psf/black/issues/4676">#4676</a>)</li>
<li>Fix crash on parenthesized expression inside a type parameter bound
(<a
href="https://redirect.github.com/psf/black/issues/4684">#4684</a>)</li>
<li>Fix crash when using line ranges excluding indented single line
decorated items
(<a
href="https://redirect.github.com/psf/black/issues/4670">#4670</a>)</li>
</ul>
<h3>Preview style</h3>
<ul>
<li>Fix a bug where one-liner functions/conditionals marked with <code>#
fmt: skip</code> would still
be formatted (<a
href="https://redirect.github.com/psf/black/issues/4552">#4552</a>)</li>
<li>Improve <code>multiline_string_handling</code> with ternaries and
dictionaries (<a
href="https://redirect.github.com/psf/black/issues/4657">#4657</a>)</li>
<li>Fix a bug where <code>string_processing</code> would not split
f-strings directly after
expressions (<a
href="https://redirect.github.com/psf/black/issues/4680">#4680</a>)</li>
<li>Wrap the <code>in</code> clause of comprehensions across lines if
necessary (<a
href="https://redirect.github.com/psf/black/issues/4699">#4699</a>)</li>
<li>Remove parentheses around multiple exception types in
<code>except</code> and <code>except*</code> without
<code>as</code>. (<a
href="https://redirect.github.com/psf/black/issues/4720">#4720</a>)</li>
<li>Add <code>\r</code> style newlines to the potential newlines to
normalize file newlines both from
and to (<a
href="https://redirect.github.com/psf/black/issues/4710">#4710</a>)</li>
</ul>
<h3>Parser</h3>
<ul>
<li>Rewrite tokenizer to improve performance and compliance (<a
href="https://redirect.github.com/psf/black/issues/4536">#4536</a>)</li>
<li>Fix bug where certain unusual expressions (e.g., lambdas) were not
accepted in type
parameter bounds and defaults. (<a
href="https://redirect.github.com/psf/black/issues/4602">#4602</a>)</li>
</ul>
<h3>Performance</h3>
<ul>
<li>Avoid using an extra process when running with only one worker (<a
href="https://redirect.github.com/psf/black/issues/4734">#4734</a>)</li>
</ul>
<h3>Integrations</h3>
<ul>
<li>Fix the version check in the vim file to reject Python 3.8 (<a
href="https://redirect.github.com/psf/black/issues/4567">#4567</a>)</li>
<li>Enhance GitHub Action <code>psf/black</code> to read Black version
from an additional section in
pyproject.toml: <code>[project.dependency-groups]</code> (<a
href="https://redirect.github.com/psf/black/issues/4606">#4606</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="af0ba72a73"><code>af0ba72</code></a>
Prepare docs for release 25.9.0 (<a
href="https://redirect.github.com/psf/black/issues/4751">#4751</a>)</li>
<li><a
href="ffc01a0275"><code>ffc01a0</code></a>
Fix schema generation error caused by new click version (<a
href="https://redirect.github.com/psf/black/issues/4750">#4750</a>)</li>
<li><a
href="626b32fe2b"><code>626b32f</code></a>
Add normalizing for <code>\r</code> style newlines (<a
href="https://redirect.github.com/psf/black/issues/4710">#4710</a>)</li>
<li><a
href="57a461258f"><code>57a4612</code></a>
Fix mypy type issue (<a
href="https://redirect.github.com/psf/black/issues/4745">#4745</a>)</li>
<li><a
href="4f6ad7cf8c"><code>4f6ad7c</code></a>
Wrap the <code>in</code> clause of comprehensions across lines if
necessary (<a
href="https://redirect.github.com/psf/black/issues/4699">#4699</a>)</li>
<li><a
href="24f5169617"><code>24f5169</code></a>
ci: Run diff-shades on unstable instead of preview (<a
href="https://redirect.github.com/psf/black/issues/4741">#4741</a>)</li>
<li><a
href="4d55e60179"><code>4d55e60</code></a>
Bump actions/setup-python from 5 to 6 (<a
href="https://redirect.github.com/psf/black/issues/4744">#4744</a>)</li>
<li><a
href="0cf39efdbc"><code>0cf39ef</code></a>
Improve the performance of get_string_prefix (<a
href="https://redirect.github.com/psf/black/issues/4742">#4742</a>)</li>
<li><a
href="1f779dec01"><code>1f779de</code></a>
Fix line ranges decorator edge case (<a
href="https://redirect.github.com/psf/black/issues/4670">#4670</a>)</li>
<li><a
href="203fd6b5cd"><code>203fd6b</code></a>
Optimize Line string method (<a
href="https://redirect.github.com/psf/black/issues/4739">#4739</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/psf/black/compare/25.1.0...25.9.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=black&package-manager=uv&previous-version=25.1.0&new-version=25.9.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 16:48:53 -07:00
dependabot[bot]
055a7664f0
chore(python-deps): bump blobfile from 3.0.0 to 3.1.0 (#3784)
Bumps [blobfile](https://github.com/christopher-hesse/blobfile) from
3.0.0 to 3.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/blobfile/blobfile/blob/master/CHANGES.md">blobfile's
changelog</a>.</em></p>
<blockquote>
<h2>3.1.0</h2>
<ul>
<li>Improve <code>bf.join</code></li>
<li>Add option to support blind writes</li>
<li>Treat <code>EAI_NODATA</code> similarly to <code>EAI_NONAME</code>
in DNS retry logic</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ff0cd5d8ce"><code>ff0cd5d</code></a>
Release 3.1 (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/259">#259</a>)</li>
<li><a
href="395973ae2d"><code>395973a</code></a>
Handle EAI_NODATA in _bad_hostname_check (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/258">#258</a>)</li>
<li><a
href="cdc6e6a5a4"><code>cdc6e6a</code></a>
Improve bf.join (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/255">#255</a>)</li>
<li><a
href="90cb2436a7"><code>90cb243</code></a>
Add option to support blind writes (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/254">#254</a>)</li>
<li><a
href="4a2d011363"><code>4a2d011</code></a>
Add .git-blame-ignore-revs (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/253">#253</a>)</li>
<li><a
href="ab888d0679"><code>ab888d0</code></a>
Replace all CRLF with LF (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/252">#252</a>)</li>
<li><a
href="7eeb2aea87"><code>7eeb2ae</code></a>
Do not ignore warnings in tests (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/250">#250</a>)</li>
<li><a
href="0717345283"><code>0717345</code></a>
Run isort (<a
href="https://redirect.github.com/christopher-hesse/blobfile/issues/249">#249</a>)</li>
<li>See full diff in <a
href="https://github.com/christopher-hesse/blobfile/compare/v3.0.0...v3.1.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=blobfile&package-manager=uv&previous-version=3.0.0&new-version=3.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 16:48:47 -07:00
dependabot[bot]
13518e7562
chore(python-deps): bump ollama from 0.5.1 to 0.6.0 (#3786)
Bumps [ollama](https://github.com/ollama/ollama-python) from 0.5.1 to
0.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/ollama/ollama-python/releases">ollama's
releases</a>.</em></p>
<blockquote>
<h2>v0.6.0</h2>
<h2>What's Changed</h2>
<ul>
<li>
<p>client: add web search and web crawl capabilities by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/578">ollama/ollama-python#578</a></p>
</li>
<li>
<p>client: load OLLAMA_API_KEY on init by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/583">ollama/ollama-python#583</a></p>
</li>
<li>
<p>client/types: update web search and fetch API by <a
href="https://github.com/npardal"><code>@​npardal</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/584">ollama/ollama-python#584</a></p>
</li>
<li>
<p>examples: add mcp server for web_search web_crawl by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/585">ollama/ollama-python#585</a></p>
</li>
<li>
<p>examples: gpt oss browser tool by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/588">ollama/ollama-python#588</a></p>
</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/npardal"><code>@​npardal</code></a> made
their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/584">ollama/ollama-python#584</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/ollama/ollama-python/compare/v0.5.4...v0.6.0">https://github.com/ollama/ollama-python/compare/v0.5.4...v0.6.0</a></p>
<h2>v0.5.4</h2>
<h2>What's Changed</h2>
<ul>
<li>examples: add gpt-oss browser example by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/558">ollama/ollama-python#558</a></li>
<li>build(deps): bump actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/559">ollama/ollama-python#559</a></li>
<li>examples/gpt-oss: fix examples by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/566">ollama/ollama-python#566</a></li>
<li>Fix link for thinking-levels.py in documentation by <a
href="https://github.com/btjanaka"><code>@​btjanaka</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/567">ollama/ollama-python#567</a></li>
<li>examples: fix gpt-oss-tools-stream for adding tool calls by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/568">ollama/ollama-python#568</a></li>
<li>examples: resolve invalid tool usage status code 400 if llm makes a
mistake gpt-oss by <a
href="https://github.com/MarkWard0110"><code>@​MarkWard0110</code></a>
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/569">ollama/ollama-python#569</a></li>
<li>build(deps): bump actions/setup-python from 5 to 6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/571">ollama/ollama-python#571</a></li>
<li>feat: add dimensions to embed request by <a
href="https://github.com/mxyng"><code>@​mxyng</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/574">ollama/ollama-python#574</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/btjanaka"><code>@​btjanaka</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/567">ollama/ollama-python#567</a></li>
<li><a
href="https://github.com/MarkWard0110"><code>@​MarkWard0110</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/569">ollama/ollama-python#569</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/ollama/ollama-python/compare/v0.5.3...v0.5.4">https://github.com/ollama/ollama-python/compare/v0.5.3...v0.5.4</a></p>
<h2>v0.5.3</h2>
<h2>What's Changed</h2>
<ul>
<li>add support for 'high'/'medium'/'low' think values by <a
href="https://github.com/drifkin"><code>@​drifkin</code></a> in <a
href="https://redirect.github.com/ollama/ollama-python/pull/553">ollama/ollama-python#553</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/ollama/ollama-python/compare/v0.5.2...v0.5.3">https://github.com/ollama/ollama-python/compare/v0.5.2...v0.5.3</a></p>
<h2>v0.5.2</h2>
<h2>What's Changed</h2>
<ul>
<li>
<p>types/examples: add tool_name to message and examples by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/537">ollama/ollama-python#537</a></p>
</li>
<li>
<p>types: add <code>context_length</code> to ProcessResponse by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/538">ollama/ollama-python#538</a></p>
</li>
<li>
<p>types: relax type for tools by <a
href="https://github.com/ParthSareen"><code>@​ParthSareen</code></a> in
<a
href="https://redirect.github.com/ollama/ollama-python/pull/550">ollama/ollama-python#550</a></p>
</li>
<li>
<p>add license metadata to package by <a
href="https://github.com/ViViDboarder"><code>@​ViViDboarder</code></a>
in <a
href="https://redirect.github.com/ollama/ollama-python/pull/526">ollama/ollama-python#526</a></p>
</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/hwittenborn"><code>@​hwittenborn</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/525">ollama/ollama-python#525</a></li>
<li><a
href="https://github.com/ViViDboarder"><code>@​ViViDboarder</code></a>
made their first contribution in <a
href="https://redirect.github.com/ollama/ollama-python/pull/526">ollama/ollama-python#526</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d967f048d9"><code>d967f04</code></a>
examples: gpt oss browser tool (<a
href="https://redirect.github.com/ollama/ollama-python/issues/588">#588</a>)</li>
<li><a
href="ab49a669cd"><code>ab49a66</code></a>
examples: add mcp server for web_search web_crawl (<a
href="https://redirect.github.com/ollama/ollama-python/issues/585">#585</a>)</li>
<li><a
href="16f344f635"><code>16f344f</code></a>
client/types: update web search and fetch API (<a
href="https://redirect.github.com/ollama/ollama-python/issues/584">#584</a>)</li>
<li><a
href="d0f71bc8b8"><code>d0f71bc</code></a>
client: load OLLAMA_API_KEY on init (<a
href="https://redirect.github.com/ollama/ollama-python/issues/583">#583</a>)</li>
<li><a
href="b22c5fdabb"><code>b22c5fd</code></a>
init: fix export for web_search (<a
href="https://redirect.github.com/ollama/ollama-python/issues/581">#581</a>)</li>
<li><a
href="4d0b81b37a"><code>4d0b81b</code></a>
client: add web search and web crawl capabilities (<a
href="https://redirect.github.com/ollama/ollama-python/issues/578">#578</a>)</li>
<li><a
href="a1d04f04f2"><code>a1d04f0</code></a>
feat: add dimensions to embed request (<a
href="https://redirect.github.com/ollama/ollama-python/issues/574">#574</a>)</li>
<li><a
href="8af6cac86b"><code>8af6cac</code></a>
build(deps): bump actions/setup-python from 5 to 6 (<a
href="https://redirect.github.com/ollama/ollama-python/issues/571">#571</a>)</li>
<li><a
href="9f41447f20"><code>9f41447</code></a>
examples: make gpt-oss resilient for failed tool calls (<a
href="https://redirect.github.com/ollama/ollama-python/issues/569">#569</a>)</li>
<li><a
href="da79e987f0"><code>da79e98</code></a>
examples: fix gpt-oss-tools-stream for adding toolcalls (<a
href="https://redirect.github.com/ollama/ollama-python/issues/568">#568</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/ollama/ollama-python/compare/v0.5.1...v0.6.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ollama&package-manager=uv&previous-version=0.5.1&new-version=0.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 16:48:42 -07:00
Ashwin Bharambe
e6378872c7 fix(misc): pre-commit fix for server.py 2025-10-11 16:47:59 -07:00
Ashwin Bharambe
7c63aebd64
feat(responses)!: add reasoning and annotation added events (#3793)
Implements missing streaming events from OpenAI Responses API spec: 
 - reasoning text/summary events for o1/o3 models, 
 - refusal events for safety moderation
 - annotation events for citations, 
 - and file search streaming events. 
 
Added optional reasoning_content field to chat completion chunks to
support non-standard provider extensions.

**NOTE:** OpenAI does _not_ fill reasoning_content when users use the
chat_completion APIs. This means there is no way for us to implement
Responses (with reasoning) by using OpenAI chat completions! We'd need
to transparently punt to OpenAI's responses endpoints if we wish to do
that. For others though (vLLM, etc.) we can use it.

## Test Plan

File search streaming test passes:
```
./scripts/integration-tests.sh --stack-config server:ci-tests \
   --suite responses --setup gpt --inference-mode replay --pattern test_response_file_search_streaming_events
```

Need more complex setup and validation for reasoning tests (need a vLLM
powered OSS model maybe gpt-oss which can return reasoning_content). I
will do that in a followup PR.
2025-10-11 16:47:14 -07:00
Ashwin Bharambe
f365961731 fix(tests): handle TEST_CONTEXT not being set 2025-10-11 15:31:08 -07:00
dependabot[bot]
dac1d7be1c
chore(python-deps): bump fire from 0.7.0 to 0.7.1 (#3787)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 19s
Python Package Build Test / build (3.13) (push) Failing after 38s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 42s
Unit Tests / unit-tests (3.12) (push) Failing after 39s
API Conformance Tests / check-schema-compatibility (push) Successful in 51s
UI Tests / ui-tests (22) (push) Successful in 54s
Pre-commit / pre-commit (push) Successful in 1m24s
Bumps [fire](https://github.com/google/python-fire) from 0.7.0 to 0.7.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/google/python-fire/releases">fire's
releases</a>.</em></p>
<blockquote>
<h2>Python Fire v0.7.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Use Neutral theme for IPython Inspector, supporting newer IPython
versions in <a
href="https://redirect.github.com/google/python-fire/pull/588">google/python-fire#588</a></li>
<li>Call inspectutils.GetClassAttrsDict on component, not None in <a
href="https://redirect.github.com/google/python-fire/pull/606">google/python-fire#606</a></li>
<li>Move to pyproject.toml, adding wheel support in pypi</li>
<li>Use ty in place of pytype</li>
<li>Update requirements <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/google/python-fire/compare/v0.7.0...v0.7.1">https://github.com/google/python-fire/compare/v0.7.0...v0.7.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8ea2f631e6"><code>8ea2f63</code></a>
Update email address</li>
<li><a
href="ea8c7f5e74"><code>ea8c7f5</code></a>
Remove unused MANIFEST</li>
<li><a
href="86bf4ca693"><code>86bf4ca</code></a>
Update pylint requirement from &lt;3.3.7 to &lt;3.3.8 (<a
href="https://redirect.github.com/google/python-fire/issues/614">#614</a>)</li>
<li><a
href="8c62e05569"><code>8c62e05</code></a>
Update pytest requirement from &lt;=8.3.5 to &lt;=8.4.1 (<a
href="https://redirect.github.com/google/python-fire/issues/615">#615</a>)</li>
<li><a
href="cec0119b10"><code>cec0119</code></a>
Update hypothesis requirement from &lt;6.133.0 to &lt;6.136.0 (<a
href="https://redirect.github.com/google/python-fire/issues/616">#616</a>)</li>
<li><a
href="8449619604"><code>8449619</code></a>
Use ty in place of pytype (<a
href="https://redirect.github.com/google/python-fire/issues/617">#617</a>)</li>
<li><a
href="d33056cb32"><code>d33056c</code></a>
Move to pyproject.toml (<a
href="https://redirect.github.com/google/python-fire/issues/613">#613</a>)</li>
<li><a
href="2e6f8d2b24"><code>2e6f8d2</code></a>
Bump version to 0.7.1 (<a
href="https://redirect.github.com/google/python-fire/issues/609">#609</a>)</li>
<li><a
href="dba7e1d0da"><code>dba7e1d</code></a>
Update hypothesis requirement in /.github/scripts (<a
href="https://redirect.github.com/google/python-fire/issues/608">#608</a>)</li>
<li><a
href="51974c67bf"><code>51974c6</code></a>
Update pylint requirement from &lt;3.3.5 to &lt;3.3.7 in
/.github/scripts (<a
href="https://redirect.github.com/google/python-fire/issues/591">#591</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/google/python-fire/compare/v0.7.0...v0.7.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=fire&package-manager=uv&previous-version=0.7.0&new-version=0.7.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 14:15:23 -07:00
dependabot[bot]
2cb1b19efe
chore(python-deps): bump psycopg2-binary from 2.9.10 to 2.9.11 (#3785)
Bumps [psycopg2-binary](https://github.com/psycopg/psycopg2) from 2.9.10
to 2.9.11.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psycopg/psycopg2/blob/master/NEWS">psycopg2-binary's
changelog</a>.</em></p>
<blockquote>
<h2>Current release</h2>
<p>What's new in psycopg 2.9.11
^^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Add support for Python 3.14.</li>
<li>Avoid a segfault passing more arguments than placeholders if Python
is built
with assertions enabled
(🎫<code>[#1791](https://github.com/psycopg/psycopg2/issues/1791)</code>).</li>
<li><code>~psycopg2.errorcodes</code> map and
<code>~psycopg2.errors</code> classes updated to
PostgreSQL 18.</li>
<li>Drop support for Python 3.8.</li>
</ul>
<p>What's new in psycopg 2.9.10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Add support for Python 3.13.</li>
<li>Receive notifications on commit
(🎫<code>[#1728](https://github.com/psycopg/psycopg2/issues/1728)</code>).</li>
<li><code>~psycopg2.errorcodes</code> map and
<code>~psycopg2.errors</code> classes updated to
PostgreSQL 17.</li>
<li>Drop support for Python 3.7.</li>
</ul>
<p>What's new in psycopg 2.9.9
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Add support for Python 3.12.</li>
<li>Drop support for Python 3.6.</li>
</ul>
<p>What's new in psycopg 2.9.8
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Wheel package bundled with PostgreSQL 16 libpq in order to add
support for
recent features, such as <code>sslcertmode</code>.</li>
</ul>
<p>What's new in psycopg 2.9.7
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<ul>
<li>Fix propagation of exceptions raised during module initialization

(🎫<code>[#1598](https://github.com/psycopg/psycopg2/issues/1598)</code>).</li>
<li>Fix building when pg_config returns an empty string
(🎫<code>[#1599](https://github.com/psycopg/psycopg2/issues/1599)</code>).</li>
<li>Wheel package bundled with OpenSSL 1.1.1v.</li>
</ul>
<p>What's new in psycopg 2.9.6
^^^^^^^^^^^^^^^^^^^^^^^^^^^</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="fd9ae8cad2"><code>fd9ae8c</code></a>
chore: bump to version 2.9.11</li>
<li><a
href="d923840546"><code>d923840</code></a>
chore: update docs requirements</li>
<li><a
href="d42dc7169d"><code>d42dc71</code></a>
Merge branch 'fix-1791'</li>
<li><a
href="4fde6560c3"><code>4fde656</code></a>
fix: avoid failed assert passing more arguments than placeholders</li>
<li><a
href="8308c19d6a"><code>8308c19</code></a>
fix: drop warning about the use of deprecated PyWeakref_GetObject
function</li>
<li><a
href="1a1eabf098"><code>1a1eabf</code></a>
build(deps): bump actions/github-script from 7 to 8</li>
<li><a
href="897af8b38b"><code>897af8b</code></a>
build(deps): bump peter-evans/repository-dispatch from 3 to 4</li>
<li><a
href="ceefd30511"><code>ceefd30</code></a>
build(deps): bump actions/checkout from 4 to 5</li>
<li><a
href="4dc585430c"><code>4dc5854</code></a>
build(deps): bump actions/setup-python from 5 to 6</li>
<li><a
href="1945788dcf"><code>1945788</code></a>
Merge pull request <a
href="https://redirect.github.com/psycopg/psycopg2/issues/1802">#1802</a>
from edgarrmondragon/cp314-wheels</li>
<li>Additional commits viewable in <a
href="https://github.com/psycopg/psycopg2/compare/2.9.10...2.9.11">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=psycopg2-binary&package-manager=uv&previous-version=2.9.10&new-version=2.9.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 14:15:17 -07:00
dependabot[bot]
f15d865a3e
chore(github-deps): bump astral-sh/setup-uv from 6.8.0 to 7.0.0 (#3782)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.8.0 to 7.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's
releases</a>.</em></p>
<blockquote>
<h2>v7.0.0 🌈 node24 and a lot of bugfixes</h2>
<h2>Changes</h2>
<p>This release comes with a load of bug fixes and a speed up. Because
of switching from node20 to node24 it is also a breaking change. If you
are running on GitHub hosted runners this will just work, if you are
using self-hosted runners make sure, that your runners are up to date.
If you followed the normal installation instructions your self-hosted
runner will keep itself updated.</p>
<p>This release also removes the deprecated input
<code>server-url</code> which was used to download uv releases from a
different server.
The <a
href="https://github.com/astral-sh/setup-uv?tab=readme-ov-file#manifest-file">manifest-file</a>
input supersedes that functionality by adding a flexible way to define
available versions and where they should be downloaded from.</p>
<h3>Fixes</h3>
<ul>
<li>The action now respects when the environment variable
<code>UV_CACHE_DIR</code> is already set and does not overwrite it. It
now also finds <a
href="https://docs.astral.sh/uv/reference/settings/#cache-dir">cache-dir</a>
settings in config files if you set them.</li>
<li>Some users encountered problems that <a
href="https://github.com/astral-sh/setup-uv?tab=readme-ov-file#disable-cache-pruning">cache
pruning</a> took forever because they had some <code>uv</code> processes
running in the background. Starting with uv version <code>0.8.24</code>
this action uses <code>uv cache prune --ci --force</code> to ignore the
running processes</li>
<li>If you just want to install uv but not have it available in path,
this action now respects <code>UV_NO_MODIFY_PATH</code></li>
<li>Some other actions also set the env var <code>UV_CACHE_DIR</code>.
This action can now deal with that but as this could lead to unwanted
behavior in some edgecases a warning is now displayed.</li>
</ul>
<h3>Improvements</h3>
<p>If you are using minimum version specifiers for the version of uv to
install for example</p>
<pre lang="toml"><code>[tool.uv]
required-version = &quot;&gt;=0.8.17&quot;
</code></pre>
<p>This action now detects that and directly uses the latest version.
Previously it would download all available releases from the uv repo
to determine the highest matching candidate for the version specifier,
which took much more time.</p>
<p>If you are using other specifiers like <code>0.8.x</code> this action
still needs to download all available releases because the specifier
defines an upper bound (not 0.9.0 or later) and &quot;latest&quot; would
possibly not satisfy that.</p>
<h2>🚨 Breaking changes</h2>
<ul>
<li>Use node24 instead of node20 <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/608">#608</a>)</li>
<li>Remove deprecated input server-url <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/607">#607</a>)</li>
</ul>
<h2>🐛 Bug fixes</h2>
<ul>
<li>Respect UV_CACHE_DIR and cache-dir <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/612">#612</a>)</li>
<li>Use --force when pruning cache <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/611">#611</a>)</li>
<li>Respect UV_NO_MODIFY_PATH <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/603">#603</a>)</li>
<li>Warn when <code>UV_CACHE_DIR</code> has changed <a
href="https://github.com/jamesbraza"><code>@​jamesbraza</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/601">#601</a>)</li>
</ul>
<h2>🚀 Enhancements</h2>
<ul>
<li>Shortcut to latest version for minimum version specifier <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/598">#598</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>Bump dependencies <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/613">#613</a>)</li>
<li>Fix test-uv-no-modify-path <a
href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/604">#604</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="eb1897b8dc"><code>eb1897b</code></a>
Bump dependencies (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/613">#613</a>)</li>
<li><a
href="d78d791822"><code>d78d791</code></a>
Bump github/codeql-action from 3.30.5 to 3.30.6 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/605">#605</a>)</li>
<li><a
href="535dc2664c"><code>535dc26</code></a>
Respect UV_CACHE_DIR and cache-dir (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/612">#612</a>)</li>
<li><a
href="f610be5ff9"><code>f610be5</code></a>
Use --force when pruning cache (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/611">#611</a>)</li>
<li><a
href="3deccc0075"><code>3deccc0</code></a>
Use node24 instead of node20 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/608">#608</a>)</li>
<li><a
href="d9ee7e2f26"><code>d9ee7e2</code></a>
Remove deprecated input server-url (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/607">#607</a>)</li>
<li><a
href="59a0868fea"><code>59a0868</code></a>
Bump github/codeql-action from 3.30.3 to 3.30.5 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/594">#594</a>)</li>
<li><a
href="c952556164"><code>c952556</code></a>
Bump <code>@​renovatebot/pep440</code> from 4.2.0 to 4.2.1 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/581">#581</a>)</li>
<li><a
href="51c3328db2"><code>51c3328</code></a>
Fix test-uv-no-modify-path (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/604">#604</a>)</li>
<li><a
href="f2859da213"><code>f2859da</code></a>
Respect UV_NO_MODIFY_PATH (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/603">#603</a>)</li>
<li>Additional commits viewable in <a
href="d0cc045d04...eb1897b8dc">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.8.0&new-version=7.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-11 14:14:43 -07:00
Francisco Arceo
a165b8b5bb
chore!: BREAKING CHANGE removing VectorDB APIs (#3774)
# What does this PR do?
Removes VectorDBs from API surface and our tests.

Moves tests to Vector Stores.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-11 14:07:08 -07:00
ehhuang
06e4cd8e02
feat(api)!: BREAKING CHANGE: support passing extra_body through to providers (#3777)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?
Allows passing through extra_body parameters to inference providers.

With this, we removed the 2 vllm-specific parameters from completions
API into `extra_body`.
Before/After
<img width="1883" height="324" alt="image"
src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1"
/>



closes #2720

## Test Plan
CI and added new test
```
❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes
Uninstalled 3 packages in 125ms
Installed 3 packages in 19ms
INFO     2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base
INFO     2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server
         (stack_config=server:starter)
============================================================================================================== test session starts ==============================================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-1
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 285 items / 284 deselected / 1 selected

tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 11.773s
PASSEDTerminating llama stack server process...
Terminating process 98444 and its group...
Server process and children terminated gracefully


============================================================================================================= slowest 10 durations ==============================================================================================================
11.88s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
3.02s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s =================================================================================================
```
2025-10-10 16:21:44 -07:00
ehhuang
80d58ab519
chore: refactor (chat)completions endpoints to use shared params struct (#3761)
# What does this PR do?

Converts openai(_chat)_completions params to pydantic BaseModel to
reduce code duplication across all providers.

## Test Plan
CI









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761).
* #3777
* __->__ #3761
2025-10-10 15:46:34 -07:00
Derek Higgins
6954fe2274
fix(auth): allow unauthenticated access to health and version endpoints (#3736)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 2m1s
The AuthenticationMiddleware was blocking all requests without an
Authorization header, including health and version endpoints that are
needed by monitoring tools, load balancers, and Kubernetes probes.

This commit allows endpoints ending in /health or /version to bypass
authentication, enabling operational tooling to function properly
without requiring credentials.

Closes: #3735

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-10 13:41:43 -07:00
Varsha
32fde8d9a8
feat: Add /v1/embeddings endpoint to batches API (#3384)
# What does this PR do?
This PR extends the Llama Stack Batches API to support the
/v1/embeddings endpoint, enabling efficient batch processing of
embedding requests alongside the existing /v1/chat/completions and
/v1/completions support.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes: https://github.com/llamastack/llama-stack/issues/3145

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
(stack-client) ➜  llama-stack git:(support/embeddings-api) conda activate stack-client && python -m pytest tests/unit/providers/batches/test_reference.py -v                             
============================================================================================================================================ test session starts =============================================================================================================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 46 items                                                                                                                                                                                                                                                                                           

tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_and_retrieve_batch_success PASSED                                                                                                                                                                                [  2%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_without_metadata PASSED                                                                                                                                                                                    [  4%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_completion_window PASSED                                                                                                                                                                                   [  6%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[/v1/invalid/endpoint] PASSED                                                                                                                                                             [  8%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[] PASSED                                                                                                                                                                                 [ 10%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_metadata PASSED                                                                                                                                                                                    [ 13%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_retrieve_batch_not_found PASSED                                                                                                                                                                                         [ 15%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_success PASSED                                                                                                                                                                                             [ 17%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[failed] PASSED                                                                                                                                                                            [ 19%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[expired] PASSED                                                                                                                                                                           [ 21%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[completed] PASSED                                                                                                                                                                         [ 23%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_not_found PASSED                                                                                                                                                                                           [ 26%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_empty PASSED                                                                                                                                                                                               [ 28%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_single_batch PASSED                                                                                                                                                                                        [ 30%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_multiple_batches PASSED                                                                                                                                                                                    [ 32%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_limit PASSED                                                                                                                                                                                          [ 34%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_pagination PASSED                                                                                                                                                                                     [ 36%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_invalid_after PASSED                                                                                                                                                                                       [ 39%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_kvstore_persistence PASSED                                                                                                                                                                                              [ 41%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_not_found PASSED                                                                                                                                                                                    [ 43%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_exists_empty_content PASSED                                                                                                                                                                         [ 45%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_mixed_valid_invalid_json PASSED                                                                                                                                                                     [ 47%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_model PASSED                                                                                                                                                                                     [ 50%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED                                                                         [ 52%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED                                                                                  [ 54%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED                                                                                           [ 56%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED                                                                                        [ 58%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[model-body.model-invalid_request-Model parameter is required] PASSED                                                                                                 [ 60%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[messages-body.messages-invalid_request-Messages parameter is required] PASSED                                                                                        [ 63%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED                                                                              [ 65%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED                                                                                       [ 67%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED                                                                                                [ 69%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED                                                                                             [ 71%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[model-body.model-invalid_request-Model parameter is required] PASSED                                                                                                      [ 73%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[prompt-body.prompt-invalid_request-Prompt parameter is required] PASSED                                                                                                   [ 76%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_url_mismatch PASSED                                                                                                                                                                                      [ 78%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_multiple_errors_per_request PASSED                                                                                                                                                                       [ 80%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_request_format PASSED                                                                                                                                                                            [ 82%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[custom_id-custom_id-12345-Custom_id must be a string] PASSED                                                                                                                     [ 84%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[url-url-123-URL must be a string] PASSED                                                                                                                                         [ 86%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[method-method-invalid_value2-Method must be a string] PASSED                                                                                                                     [ 89%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[body-body-invalid_value3-Body must be a JSON dictionary object] PASSED                                                                                                           [ 91%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[model-body.model-123-Model must be a string] PASSED                                                                                                                              [ 93%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[messages-body.messages-invalid messages format-Messages must be an array] PASSED                                                                                                 [ 95%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_max_concurrent_batches PASSED                                                                                                                                                                                           [ 97%]
tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_embeddings_endpoint PASSED                                                                                                                                                                                 [100%]

```

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 13:25:58 -07:00
Ashwin Bharambe
1394403360
feat(responses): implement usage tracking in streaming responses (#3771)
Implementats usage accumulation to StreamingResponseOrchestrator. 

The most important part was to pass `stream_options = { "include_usage":
true }` to the chat_completion call. This means I will have to record
all responses tests again because request hash will change :)

Test changes:
- Add usage assertions to streaming and non-streaming tests
- Update test recordings with actual usage data from OpenAI
2025-10-10 12:27:03 -07:00
Francisco Arceo
e7d21e1ee3
feat: Add support for Conversations in Responses API (#3743)
# What does this PR do?
This PR adds support for Conversations in Responses.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Unit tests
Integration tests

<Details>
<Summary>Manual testing with this script: (click to expand)</Summary>

```python
from openai import OpenAI

client = OpenAI()
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")

def test_conversation_create():
    print("Testing conversation create...")
    conversation = client.conversations.create(
        metadata={"topic": "demo"},
        items=[
            {"type": "message", "role": "user", "content": "Hello!"}
        ]
    )
    print(f"Created: {conversation}")
    return conversation

def test_conversation_retrieve(conv_id):
    print(f"Testing conversation retrieve for {conv_id}...")
    retrieved = client.conversations.retrieve(conv_id)
    print(f"Retrieved: {retrieved}")
    return retrieved

def test_conversation_update(conv_id):
    print(f"Testing conversation update for {conv_id}...")
    updated = client.conversations.update(
        conv_id,
        metadata={"topic": "project-x"}
    )
    print(f"Updated: {updated}")
    return updated

def test_conversation_delete(conv_id):
    print(f"Testing conversation delete for {conv_id}...")
    deleted = client.conversations.delete(conv_id)
    print(f"Deleted: {deleted}")
    return deleted

def test_conversation_items_create(conv_id):
    print(f"Testing conversation items create for {conv_id}...")
    items = client.conversations.items.create(
        conv_id,
        items=[
            {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "Hello!"}]
            },
            {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "How are you?"}]
            }
        ]
    )
    print(f"Items created: {items}")
    return items

def test_conversation_items_list(conv_id):
    print(f"Testing conversation items list for {conv_id}...")
    items = client.conversations.items.list(conv_id, limit=10)
    print(f"Items list: {items}")
    return items

def test_conversation_item_retrieve(conv_id, item_id):
    print(f"Testing conversation item retrieve for {conv_id}/{item_id}...")
    item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id)
    print(f"Item retrieved: {item}")
    return item

def test_conversation_item_delete(conv_id, item_id):
    print(f"Testing conversation item delete for {conv_id}/{item_id}...")
    deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id)
    print(f"Item deleted: {deleted}")
    return deleted

def test_conversation_responses_create():
    print("\nTesting conversation create for a responses example...")
    conversation = client.conversations.create()
    print(f"Created: {conversation}")

    response = client.responses.create(
      model="gpt-4.1",
      input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
      conversation=conversation.id,
    )
    print(f"Created response: {response} for conversation {conversation.id}")

    return response, conversation

def test_conversations_responses_create_followup(
        conversation,
        content="Repeat what you just said but add 'this is my second time saying this'",
    ):
    print(f"Using: {conversation.id}")

    response = client.responses.create(
      model="gpt-4.1",
      input=[{"role": "user", "content": content}],
      conversation=conversation.id,
    )
    print(f"Created response: {response} for conversation {conversation.id}")

    conv_items = client.conversations.items.list(conversation.id)
    print(f"\nRetrieving list of items for conversation {conversation.id}:")
    print(conv_items.model_dump_json(indent=2))

def test_response_with_fake_conv_id():
    fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44"
    print(f"Using {fake_conv_id}")
    try:
        response = client.responses.create(
          model="gpt-4.1",
          input=[{"role": "user", "content": "say hello"}],
          conversation=fake_conv_id,
        )
        print(f"Created response: {response} for conversation {fake_conv_id}")
    except Exception as e:
        print(f"failed to create response for conversation {fake_conv_id} with error {e}")


def main():
    print("Testing OpenAI Conversations API...")

    # Create conversation
    conversation = test_conversation_create()
    conv_id = conversation.id

    # Retrieve conversation
    test_conversation_retrieve(conv_id)

    # Update conversation
    test_conversation_update(conv_id)

    # Create items
    items = test_conversation_items_create(conv_id)

    # List items
    items_list = test_conversation_items_list(conv_id)

    # Retrieve specific item
    if items_list.data:
        item_id = items_list.data[0].id
        test_conversation_item_retrieve(conv_id, item_id)

        # Delete item
        test_conversation_item_delete(conv_id, item_id)

    # Delete conversation
    test_conversation_delete(conv_id)

    response, conversation2 = test_conversation_responses_create()
    print('\ntesting reseponse retrieval')
    test_conversation_retrieve(conversation2.id)

    print('\ntesting responses follow up')
    test_conversations_responses_create_followup(conversation2)

    print('\ntesting responses follow up x2!')

    test_conversations_responses_create_followup(
        conversation2,
        content="Repeat what you just said but add 'this is my third time saying this'",
    )

    test_response_with_fake_conv_id()

    print("All tests completed!")


if __name__ == "__main__":
    main()
```
</Details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 11:57:40 -07:00
Ashwin Bharambe
932fea813a
fix(ci): remove responses from CI for now (#3773)
There are many changes to responses which are landing. They are
introducing fundamental new types. This means re-recordings even from
the inference calls. Let's avoid that for now.

Once everything lands I will re-record everything, make things pass and
re-enable.
2025-10-10 11:52:17 -07:00
Ashwin Bharambe
548ccff368
fix(mypy): fix wrong attribute access (#3770) 2025-10-10 09:30:43 -07:00
grs
8bf07f91cb
feat: reuse previous mcp tool listings where possible (#3710)
# What does this PR do?
This PR checks whether, if a previous response is linked, there are
mcp_list_tools objects that can be reused instead of listing the tools
explicitly every time.

 Closes #3106 

## Test Plan
Tested manually.
Added unit tests to cover new behaviour.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 09:28:25 -07:00
Matthew Farrellee
0066d986c5
feat: use SecretStr for inference provider auth credentials (#3724)
# What does this PR do?

use SecretStr for OpenAIMixin providers

- RemoteInferenceProviderConfig now has auth_credential: SecretStr
- the default alias is api_key (most common name)
- some providers override to use api_token (RunPod, vLLM, Databricks)
- some providers exclude it (Ollama, TGI, Vertex AI)

addresses #3517 

## Test Plan

ci w/ new tests
2025-10-10 07:32:50 -07:00
Derek Higgins
6d8f61206e
fix: update normalize to search all recordings dirs (#3767)
Updated scripts/normalize_recordings.py to dynamically find and process
all 'recordings' directories under tests/ using pathlib.rglob() instead
of hardcoding a single path.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-10-10 07:32:14 -07:00
Ashwin Bharambe
e039b61d26
feat(responses)!: add in_progress, failed, content part events (#3765)
## Summary
- add schema + runtime support for response.in_progress /
response.failed / response.incomplete
- stream content parts with proper indexes and reasoning slots
- align tests + docs with the richer event payloads

## Testing
- uv run pytest
tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input
- uv run pytest
tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py
2025-10-10 07:27:34 -07:00
Akram Ben Aissi
a548169b99
fix: allow skipping model availability check for vLLM (#3739)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Allows model check to fail gracefully instead of crashing on startup.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

set VLLM_URL to your VLLM server 

```
(base) akram@Mac llama-stack % LAMA_STACK_LOGGING="all=debug" VLLM_ENABLE_MODEL_DISCOVERY=false  MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter  --image-type venv --run
```



```

INFO     2025-10-08 20:11:24,637 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-08 20:11:24,866 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-08 20:11:26,160 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=9fba207425
         5851c718aca717a5887d76%3A%2Fmodels">Found</a>.
         
[...]
INFO     2025-10-08 20:11:26,295 uvicorn.error:84 uncategorized: Started server process [83144]
INFO     2025-10-08 20:11:26,296 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-10-08 20:11:26,297 llama_stack.core.server.server:170 core::server: Starting up
INFO     2025-10-08 20:11:26,297 llama_stack.core.stack:399 core: starting registry refresh task
INFO     2025-10-08 20:11:26,311 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-10-08 20:11:26,312 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
ERROR    2025-10-08 20:11:26,791 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=8ef0cba3e1
         71a4f8b04cb445cfb91a4c%3A%2Fmodels">Found</a>.

```
2025-10-10 07:23:13 -07:00
Ashwin Bharambe
aaf5036235
feat(responses): add usage types to inference and responses APIs (#3764)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
## Summary
Adds OpenAI-compatible usage tracking types to enable reporting token
consumption for both streaming and non-streaming responses.

## Type Definitions
**Chat Completion Usage** (inference API):
```python
class OpenAIChatCompletionUsage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails | None
    completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails | None
```

**Response Usage** (responses API):
```python
class OpenAIResponseUsage(BaseModel):
    input_tokens: int
    output_tokens: int
    total_tokens: int
    input_tokens_details: OpenAIResponseUsageInputTokensDetails | None
    output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None
```

This matches OpenAI's usage reporting format and enables PR #3766 to
implement usage tracking in streaming responses.

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-10 09:22:59 -04:00
Ashwin Bharambe
ebae0385bb
fix: update dangling references to llama download command (#3763)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 2m14s
## Summary
After removing model management CLI in #3700, this PR updates remaining
references to the old `llama download` command to use `huggingface-cli
download` instead.

## Changes
- Updated error messages in `meta_reference/common.py` to recommend
`huggingface-cli download`
- Updated error messages in
`torchtune/recipes/lora_finetuning_single_device.py` to use
`huggingface-cli download`
- Updated post-training notebook to use `huggingface-cli download`
instead of `llama download`
- Fixed typo: "you model" -> "your model"

## Test Plan
- Verified error messages provide correct guidance for users
- Checked that notebook instructions are up-to-date with current tooling
2025-10-09 18:35:02 -07:00
Ashwin Bharambe
8fe4a216b5
fix(inference): propagate 401/403 errors from remote providers (#3762)
## Summary
Fixes #2990

Remote provider authentication errors (401/403) were being converted to
500 Internal Server Error, preventing users from understanding why their
requests failed.

## The Problem
When a request with an invalid API key was sent to a remote provider:
- Provider correctly returns 401 with error details
- Llama Stack's `translate_exception()` didn't recognize provider SDK
exceptions
- Fell through to generic 500 error handler
- User received: "Internal server error: An unexpected error occurred."

## The Fix
Added handler in `translate_exception()` that checks for exceptions with
a `status_code` attribute and preserves the original HTTP status code
and error message.

**Before:**
```json
HTTP 500
{"detail": "Internal server error: An unexpected error occurred."}
```

**After:**
```json
HTTP 401
{"detail": "Error code: 401 - {'error': {'message': 'Invalid API Key', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}"}
```

## Tested With
-  groq: 401 "Invalid API Key"  
-  openai: 401 "Incorrect API key provided"
-  together: 401 "Invalid API key provided"
-  fireworks: 403 "unauthorized"

## Test Plan

**Automated test script:**
https://gist.github.com/ashwinb/1199dd7585ffa3f4be67b111cc65f2f3

The test script:
1. Builds separate stacks for each provider
2. Registers models (with validation temporarily disabled for testing)
3. Sends requests with invalid API keys via `x-llamastack-provider-data`
header
4. Verifies HTTP status codes are 401/403 (not 500)

**Results before fix:** All providers returned 500  
**Results after fix:** All providers correctly return 401/403

**Manual verification:**
```bash
# 1. Build stack
llama stack build --image-type venv --providers inference=remote::groq

# 2. Start stack
llama stack run

# 3. Send request with invalid API key
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H 'x-llamastack-provider-data: {"groq_api_key": "invalid-key"}' \
  -d '{"model": "groq/llama3-70b-8192", "messages": [{"role": "user", "content": "test"}]}'

# Expected: HTTP 401 with provider error message (not 500)
```

## Impact
- Works with all remote providers using OpenAI SDK (groq, openai,
together, fireworks, etc.)
- Works with any provider SDK that follows the pattern of exceptions
with `status_code` attribute
- No breaking changes - only affects error responses
2025-10-09 18:34:39 -07:00
Matthew Farrellee
145b2bcf25
feat: make object registration idempotent (#3752)
# What does this PR do?

objects (vector dbs, models, scoring functions, etc) have an identifier
and associated object values.

we allow exact duplicate registrations.

we reject registrations when the identifier exists and the associated
object values differ.

note: model are namespaced, i.e. {provider_id}/{identifier}, while other
object types are not

## Test Plan

ci w/ new tests
2025-10-09 17:04:28 -07:00
Sébastien Han
7ee0ee7843
chore!: remove model mgmt from CLI for Hugging Face CLI (#3700)
This change removes the `llama model` and `llama download` subcommands
from the CLI, replacing them with recommendations to use the Hugging
Face CLI instead.

Rationale for this change:
- The model management functionality was largely duplicating what
Hugging Face CLI already provides, leading to unnecessary maintenance
overhead (except the download source from Meta?)
- Maintaining our own implementation required fixing bugs and keeping up
with changes in model repositories and download mechanisms
- The Hugging Face CLI is more mature, widely adopted, and better
maintained
- This allows us to focus on the core Llama Stack functionality rather
than reimplementing model management tools

Changes made:
- Removed all model-related CLI commands and their implementations
- Updated documentation to recommend using `huggingface-cli` for model
downloads
- Removed Meta-specific download logic and statements
- Simplified the CLI to focus solely on stack management operations

Users should now use:
- `huggingface-cli download` for downloading models
- `huggingface-cli scan-cache` for listing downloaded models

This is a breaking change as it removes previously available CLI
commands.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-09 16:50:33 -07:00
Ashwin Bharambe
841d0c3583
fix(testing): improve api_recorder error messages for missing recordings (#3760)
Replaces opaque error messages when recordings are not found with
somewhat better guidance

Before:
```
No recorded response found for request hash: abc123...
To record this response, run with LLAMA_STACK_TEST_INFERENCE_MODE=record
```

After:
```
Recording not found for request hash: abc123
Model: gpt-4 | Request: POST https://api.openai.com/v1/chat/completions

Run './scripts/integration-tests.sh --inference-mode record-if-missing' with required API keys to generate.
```
2025-10-09 15:04:16 -07:00
Ashwin Bharambe
a055a32ee4
fix(tests): remove chroma and qdrant from vector io unit tests (#3759)
These vector databases are already thoroughly tested in integration
tests.
Unit tests now focus on sqlite_vec, faiss, and pgvector with mocked
dependencies, removing the need for external service dependencies.

## Changes:
- Deleted test_qdrant.py unit test file
- Removed chroma/qdrant fixtures and parametrization from conftest.py
- Fixed SqliteKVStoreConfig import to use correct location
- Removed chromadb, qdrant-client, pymilvus, milvus-lite, and
  weaviate-client from unit test dependencies in pyproject.toml
2025-10-09 14:36:34 -07:00
Ashwin Bharambe
f50ce11a3b
feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403)
Renames `inference_recorder.py` to `api_recorder.py` and extends it to
support recording/replaying tool invocations in addition to inference
calls.

This allows us to record web-search, etc. tool calls and thereafter
apply recordings for `tests/integration/responses`

## Test Plan

```
export OPENAI_API_KEY=...
export TAVILY_SEARCH_API_KEY=...

./scripts/integration-tests.sh --stack-config ci-tests \
   --suite responses --inference-mode record-if-missing
```
2025-10-09 14:27:51 -07:00
grs
26fd5dbd34
fix: add traces for tool calls and mcp tool listing (#3722)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
Adds traces around tool execution and mcp tool listing for better
observability.

Closes #3108 

## Test Plan
Manually examined traces in jaeger to verify the added information was
available.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-10-09 09:59:09 -07:00
Sébastien Han
4b9ebbf6a2
chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750)
Reverts llamastack/llama-stack#3624
Causing https://github.com/llamastack/llama-stack/issues/3749
2025-10-09 09:17:37 -04:00
ehhuang
05a62a6ffb
chore: print integration tests command (#3747)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m23s
# What does this PR do?


## Test Plan

<img width="1104" height="60" alt="image"
src="https://github.com/user-attachments/assets/d4691a2e-c5ec-4df5-a15a-f86e667fdf8c"
/>
2025-10-08 15:12:13 -07:00
Ashwin Bharambe
16db42e7e5
feat(tests): add --collect-only option to integration test script (#3745)
Adds --collect-only flag to scripts/integration-tests.sh that skips
server startup and passes the flag to pytest for test collection only.
When specified, minimal flags are required (no --stack-config or --setup
needed).

## Changes
- Added `--collect-only` flag that skips server startup
- Made `--stack-config` and `--setup` optional when using
`--collect-only`
- Skip `llama` command check when collecting tests only

## Usage
```bash
# Collect tests without starting server
./scripts/integration-tests.sh --subdirs inference --collect-only
```
2025-10-08 14:20:34 -07:00
Francisco Arceo
b96640eca3
chore: Removing Weaviate, PGVector, and Milvus from unit tests (#3742)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 48s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?
Removing Weaviate, PostGres, and Milvus unit tests

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-08 12:25:51 -07:00
Ashwin Bharambe
79bed44b04
fix(tests): ensure test isolation in server mode (#3737)
Propagate test IDs from client to server via HTTP headers to maintain
proper test isolation when running with server-based stack configs.
Without
this, recorded/replayed inference requests in server mode would leak
across
tests.

Changes:
- Patch client _prepare_request to inject test ID into provider data
header
- Sync test context from provider data on server side before storage
operations
- Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config
- Configure console width for cleaner log output in CI
- Add SQLITE_STORE_DIR temp directory for test data isolation
2025-10-08 12:03:36 -07:00
grs
96886afaca
fix(responses): fix regression in support for mcp tool require_approval argument (#3731)
# What does this PR do?

It prevents a tool call message being added to the chat completions
message without a corresponding tool call result, which is needed in the
case that an approval is required first or if the approval request is
denied. In both these cases the tool call messages is popped of the next
turn messages.

Closes #3728

## Test Plan
Ran the integration tests
Manual check of both approval and denial against gpt-4o

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-10-08 10:47:17 -04:00
Bill Murdock
5d711d4bcb
fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 32s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

- The watsonx.ai provider now uses the LiteLLM mixin instead of using
IBM's library, which does not seem to be working (see #3165 for
context).
- The watsonx.ai provider now lists all the models available by calling
the watsonx.ai server instead of having a hard coded list of known
models. (That list gets out of date quickly)
- An edge case in
[llama_stack/core/routers/inference.py](https://github.com/llamastack/llama-stack/pull/3674/files#diff-a34bc966ed9befd9f13d4883c23705dff49be0ad6211c850438cdda6113f3455)
is addressed that was causing my manual tests to fail.
- Fixes `b64_encode_openai_embeddings_response` which was trying to
enumerate over a dictionary and then reference elements of the
dictionary using .field instead of ["field"]. That method is called by
the LiteLLM mixin for embedding models, so it is needed to get the
watsonx.ai embedding models to work.
- A unit test along the lines of the one in #3348 is added. A more
comprehensive plan for automatically testing the end-to-end
functionality for inference providers would be a good idea, but is out
of scope for this PR.
- Updates to the watsonx distribution. Some were in response to the
switch to LiteLLM (e.g., updating the Python packages needed). Others
seem to be things that were already broken that I found along the way
(e.g., a reference to a watsonx specific doc template that doesn't seem
to exist).

Closes #3165

Also it is related to a line-item in #3387 but doesn't really address
that goal (because it uses the LiteLLM mixin, not the OpenAI one). I
tried the OpenAI one and it doesn't work with watsonx.ai, presumably
because the watsonx.ai service is not OpenAI compatible. It works with
LiteLLM because LiteLLM has a provider implementation for watsonx.ai.

## Test Plan

The test script below goes back and forth between the OpenAI and watsonx
providers. The idea is that the OpenAI provider shows how it should work
and then the watsonx provider output shows that it is also working with
watsonx. Note that the result from the MCP test is not as good (the
Llama 3.3 70b model does not choose tools as wisely as gpt-4o), but it
is still working and providing a valid response. For more details on
setup and the MCP server being used for testing, see [the AI Alliance
sample
notebook](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/)
that these examples are drawn from.

```python
#!/usr/bin/env python3

import json
from llama_stack_client import LlamaStackClient
from litellm import completion
import http.client


def print_response(response):
    """Print response in a nicely formatted way"""
    print(f"ID: {response.id}")
    print(f"Status: {response.status}")
    print(f"Model: {response.model}")
    print(f"Created at: {response.created_at}")
    print(f"Output items: {len(response.output)}")
    
    for i, output_item in enumerate(response.output):
        if len(response.output) > 1:
            print(f"\n--- Output Item {i+1} ---")
        print(f"Output type: {output_item.type}")
        
        if output_item.type in ("text", "message"):
            print(f"Response content: {output_item.content[0].text}")
        elif output_item.type == "file_search_call":
            print(f"  Tool Call ID: {output_item.id}")
            print(f"  Tool Status: {output_item.status}")
            # 'queries' is a list, so we join it for clean printing
            print(f"  Queries: {', '.join(output_item.queries)}")
            # Display results if they exist, otherwise note they are empty
            print(f"  Results: {output_item.results if output_item.results else 'None'}")
        elif output_item.type == "mcp_list_tools":
            print_mcp_list_tools(output_item)
        elif output_item.type == "mcp_call":
            print_mcp_call(output_item)
        else:
            print(f"Response content: {output_item.content}")


def print_mcp_call(mcp_call):
    """Print MCP call in a nicely formatted way"""
    print(f"\n🛠️  MCP Tool Call: {mcp_call.name}")
    print(f"   Server: {mcp_call.server_label}")
    print(f"   ID: {mcp_call.id}")
    print(f"   Arguments: {mcp_call.arguments}")
    
    if mcp_call.error:
        print("Error: {mcp_call.error}")
    elif mcp_call.output:
        print("Output:")
        # Try to format JSON output nicely
        try:
            parsed_output = json.loads(mcp_call.output)
            print(json.dumps(parsed_output, indent=4))
        except:
            # If not valid JSON, print as-is
            print(f"   {mcp_call.output}")
    else:
        print("    No output yet")


def print_mcp_list_tools(mcp_list_tools):
    """Print MCP list tools in a nicely formatted way"""
    print(f"\n🔧 MCP Server: {mcp_list_tools.server_label}")
    print(f"   ID: {mcp_list_tools.id}")
    print(f"   Available Tools: {len(mcp_list_tools.tools)}")
    print("=" * 80)
    
    for i, tool in enumerate(mcp_list_tools.tools, 1):
        print(f"\n{i}. {tool.name}")
        print(f"   Description: {tool.description}")
        
        # Parse and display input schema
        schema = tool.input_schema
        if schema and 'properties' in schema:
            properties = schema['properties']
            required = schema.get('required', [])
            
            print("   Parameters:")
            for param_name, param_info in properties.items():
                param_type = param_info.get('type', 'unknown')
                param_desc = param_info.get('description', 'No description')
                required_marker = " (required)" if param_name in required else " (optional)"
                print(f"     • {param_name} ({param_type}){required_marker}")
                if param_desc:
                    print(f"       {param_desc}")
        
        if i < len(mcp_list_tools.tools):
            print("-" * 40)


def main():
    """Main function to run all the tests"""
    
    # Configuration
    LLAMA_STACK_URL = "http://localhost:8321/"
    LLAMA_STACK_MODEL_IDS = [
        "openai/gpt-3.5-turbo",
        "openai/gpt-4o",
        "llama-openai-compat/Llama-3.3-70B-Instruct",
        "watsonx/meta-llama/llama-3-3-70b-instruct"
    ]
    
    # Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml.
    OPENAI_MODEL_ID = LLAMA_STACK_MODEL_IDS[1]
    WATSONX_MODEL_ID = LLAMA_STACK_MODEL_IDS[-1]
    NPS_MCP_URL = "http://localhost:3005/sse/"
    
    print("=== Llama Stack Testing Script ===")
    print(f"Using OpenAI model: {OPENAI_MODEL_ID}")
    print(f"Using WatsonX model: {WATSONX_MODEL_ID}")
    print(f"MCP URL: {NPS_MCP_URL}")
    print()
    
    # Initialize client
    print("Initializing LlamaStackClient...")
    client = LlamaStackClient(base_url="http://localhost:8321")
    
    # Test 1: List models
    print("\n=== Test 1: List Models ===")
    try:
        models = client.models.list()
        print(f"Found {len(models)} models")
    except Exception as e:
        print(f"Error listing models: {e}")
        raise e
    
    # Test 2: Basic chat completion with OpenAI
    print("\n=== Test 2: Basic Chat Completion (OpenAI) ===")
    try:
        chat_completion_response = client.chat.completions.create(
            model=OPENAI_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}]
        )
        
        print("OpenAI Response:")
        for chunk in chat_completion_response.choices[0].message.content:
            print(chunk, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with OpenAI chat completion: {e}")
        raise e
    
    # Test 3: Basic chat completion with WatsonX
    print("\n=== Test 3: Basic Chat Completion (WatsonX) ===")
    try:
        chat_completion_response_wxai = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}],
        )
        
        print("WatsonX Response:")
        for chunk in chat_completion_response_wxai.choices[0].message.content:
            print(chunk, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with WatsonX chat completion: {e}")
        raise e
    
    # Test 4: Tool calling with OpenAI
    print("\n=== Test 4: Tool Calling (OpenAI) ===")
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather for a specific location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g., San Francisco, CA",
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    
    messages = [
        {"role": "user", "content": "What's the weather like in Boston, MA?"}
    ]
    
    try:
        print("--- Initial API Call ---")
        response = client.chat.completions.create(
            model=OPENAI_MODEL_ID,
            messages=messages,
            tools=tools,
            tool_choice="auto",  # "auto" is the default
        )
        print("OpenAI tool calling response received")
    except Exception as e:
        print(f"Error with OpenAI tool calling: {e}")
        raise e
    
    # Test 5: Tool calling with WatsonX
    print("\n=== Test 5: Tool Calling (WatsonX) ===")
    try:
        wxai_response = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=messages,
            tools=tools,
            tool_choice="auto",  # "auto" is the default
        )
        print("WatsonX tool calling response received")
    except Exception as e:
        print(f"Error with WatsonX tool calling: {e}")
        raise e
    
    # Test 6: Streaming with WatsonX
    print("\n=== Test 6: Streaming Response (WatsonX) ===")
    try:
        chat_completion_response_wxai_stream = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}],
            stream=True
        )
        print("Model response: ", end="")
        for chunk in chat_completion_response_wxai_stream:
            # Each 'chunk' is a ChatCompletionChunk object.
            # We want the content from the 'delta' attribute.
            if hasattr(chunk, 'choices') and chunk.choices is not None:
                content = chunk.choices[0].delta.content
                # The first few chunks might have None content, so we check for it.
                if content is not None:
                    print(content, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with streaming: {e}")
        raise e
    
    # Test 7: MCP with OpenAI
    print("\n=== Test 7: MCP Integration (OpenAI) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=OPENAI_MODEL_ID,
            input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
            tools=[
                {
                    "type": "mcp",
                    "server_url": NPS_MCP_URL,
                    "server_label": "National Parks Service tools",
                    "allowed_tools": ["search_parks", "get_park_events"],
                }
            ]
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (OpenAI): {e}")
        raise e
    
    # Test 8: MCP with WatsonX
    print("\n=== Test 8: MCP Integration (WatsonX) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=WATSONX_MODEL_ID,
            input="What is the capital of France?"
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (WatsonX): {e}")
        raise e
    
    # Test 9: MCP with Llama 3.3
    print("\n=== Test 9: MCP Integration (Llama 3.3) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=WATSONX_MODEL_ID,
            input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
            tools=[
                {
                    "type": "mcp",
                    "server_url": NPS_MCP_URL,
                    "server_label": "National Parks Service tools",
                    "allowed_tools": ["search_parks", "get_park_events"],
                }
            ]
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (Llama 3.3): {e}")
        raise e
    
    # Test 10: Embeddings
    print("\n=== Test 10: Embeddings ===")
    try:
        conn = http.client.HTTPConnection("localhost:8321")
        payload = json.dumps({
            "model": "watsonx/ibm/granite-embedding-278m-multilingual",
            "input": "Hello, world!",
        })
        headers = {
            'Content-Type': 'application/json',
            'Accept': 'application/json'
        }
        conn.request("POST", "/v1/openai/v1/embeddings", payload, headers)
        res = conn.getresponse()
        data = res.read()
        print(data.decode("utf-8"))
    except Exception as e:
        print(f"Error with Embeddings: {e}")
        raise e

    print("\n=== Testing Complete ===")


if __name__ == "__main__":
    main()
```

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-08 07:29:43 -04:00
dependabot[bot]
62bac0aad4
chore(github-deps): bump actions/stale from 10.0.0 to 10.1.0 (#3684)
Bumps [actions/stale](https://github.com/actions/stale) from 10.0.0 to
10.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/stale/releases">actions/stale's
releases</a>.</em></p>
<blockquote>
<h2>v10.1.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add <code>only-issue-types</code> option to filter issues by type by
<a href="https://github.com/Bibo-Joshi"><code>@​Bibo-Joshi</code></a> in
<a
href="https://redirect.github.com/actions/stale/pull/1255">actions/stale#1255</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/Bibo-Joshi"><code>@​Bibo-Joshi</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1255">actions/stale#1255</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/stale/compare/v10...v10.1.0">https://github.com/actions/stale/compare/v10...v10.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5f858e3efb"><code>5f858e3</code></a>
Add <code>only-issue-types</code> option to filter issues by type (<a
href="https://redirect.github.com/actions/stale/issues/1255">#1255</a>)</li>
<li>See full diff in <a
href="3a9db7e6a4...5f858e3efb">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/stale&package-manager=github_actions&previous-version=10.0.0&new-version=10.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-08 12:16:54 +02:00
Omar Abdelwahab
702fcd1abf
fix: Raising an error message to the user when registering an existing provider. (#3624)
When the user wants to change the attributes (which could include model
name, dimensions,...etc) of an already registered provider, they will
get an error message asking that they first unregister the provider
before registering a new one.

# What does this PR do?
This PR updated the register function to raise an error to the user when
they attempt to register a provider that was already registered asking
them to un-register the existing provider first.

<!-- If resolving an issue, uncomment and update the line below -->
#2313

## Test Plan
Tested the change with /tests/unit/registry/test_registry.py

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-10-08 12:09:23 +02:00
ehhuang
0cde3d956d
chore: require valid logging category (#3712)
# What does this PR do?
grep'd and audited all usage of 'get_logger' with help of Claude.

## Test Plan
CI
2025-10-08 11:10:33 +02:00
ehhuang
a3f5072776
chore!: remove --env from llama stack run (#3711)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / smoke-test-on-dev (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?
user can simply set env vars in the beginning of the command.`FOO=BAR
llama stack run ...`

## Test Plan
Run
TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build
--distro=starter --image-type=venv --run




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711).
* #3714
* __->__ #3711
2025-10-07 20:58:15 -07:00
slekkala1
1ac320b7e6
chore: remove dead code (#3729)
# What does this PR do?
Removing some dead code, found by vulture and checked by claude that
there are no references or imports for these


## Test Plan
CI
2025-10-07 20:26:02 -07:00
ehhuang
b6e9f41041
chore: Revert "fix: fix nvidia provider (#3716)" (#3730)
This reverts commit c940fe7938.

@wukaixingxp I stamped to fast. Let's wait for @mattf's review.
2025-10-07 19:16:51 -07:00
Kai Wu
c940fe7938
fix: fix nvidia provider (#3716)
# What does this PR do?
(Used claude to solve #3715, coded with claude but tested by me)
## From claude summary:
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
**Problem**: The `NVIDIAInferenceAdapter` class was missing the
`alias_to_provider_id_map` attribute, which caused the error:

`ERROR 'NVIDIAInferenceAdapter' object has no attribute
'alias_to_provider_id_map'`

**Root Cause**: The `NVIDIAInferenceAdapter` only inherited from
`OpenAIMixin`, but some parts of the system expected it to have the
`alias_to_provider_id_map` attribute, which is provided by the
`ModelRegistryHelper` class.

**Solution**:

1. **Added ModelRegistryHelper import**: Imported the
`ModelRegistryHelper` class from
`llama_stack.providers.utils.inference.model_registry`
2. **Updated inheritance**: Changed the class declaration to inherit
from both `OpenAIMixin` and `ModelRegistryHelper`
3. **Added proper initialization**: Added an `__init__` method that
properly initializes the `ModelRegistryHelper` with empty model entries
(since NVIDIA uses dynamic model discovery) and the allowed models from
the configuration

**Key Changes**:

* Added `from llama_stack.providers.utils.inference.model_registry
import ModelRegistryHelper`
* Changed class declaration from `class
NVIDIAInferenceAdapter(OpenAIMixin):` to `class
NVIDIAInferenceAdapter(OpenAIMixin, ModelRegistryHelper):`
* Added `__init__` method that calls `ModelRegistryHelper.__init__(self,
model_entries=[], allowed_models=config.allowed_models)`

The inheritance order is important - `OpenAIMixin` comes first to ensure
its `check_model_availability()` method takes precedence over the
`ModelRegistryHelper` version, as mentioned in the class documentation.

This fix ensures that the `NVIDIAInferenceAdapter` has the required
`alias_to_provider_id_map` attribute while maintaining all existing
functionality.<!-- If resolving an issue, uncomment and update the line
below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching llama-stack server successfully, see logs:
```
NVIDIA_API_KEY=dummy NVIDIA_BASE_URL=http://localhost:8912 llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --image-type venv &
[2] 3753042
(venv) nvidia@nv-meta-H100-testing-gpu01:~/kai/llama-stack$ WARNING  2025-10-07 00:29:09,848 root:266 uncategorized: Unknown logging category:
         openai::conversations. Falling back to default 'root' level: 20
WARNING  2025-10-07 00:29:09,932 root:266 uncategorized: Unknown logging category: cli.
         Falling back to default 'root' level: 20
INFO     2025-10-07 00:29:09,937 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:09,937 llama_stack.cli.stack.run:136 cli: Using run
         configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml
Using virtual environment: /home/nvidia/kai/venv
Virtual environment already activated
+ '[' -n /home/nvidia/.llama/distributions/starter/starter-run.yaml ']'
+ yaml_config_arg=/home/nvidia/.llama/distributions/starter/starter-run.yaml
+ llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --port 8321
WARNING  2025-10-07 00:29:11,432 root:266 uncategorized: Unknown logging category:
         openai::conversations. Falling back to default 'root' level: 20
WARNING  2025-10-07 00:29:11,593 root:266 uncategorized: Unknown logging category: cli.
         Falling back to default 'root' level: 20
INFO     2025-10-07 00:29:11,603 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,604 llama_stack.cli.stack.run:136 cli: Using run
         configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,624 llama_stack.cli.stack.run:155 cli: No image type or
         image name provided. Assuming environment packages.
INFO     2025-10-07 00:29:11,625 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,644 llama_stack.cli.stack.run:230 cli: HTTPS enabled with
         certificates:
           Key: None
           Cert: None
INFO     2025-10-07 00:29:11,645 llama_stack.cli.stack.run:232 cli: Listening on ['::',
         '0.0.0.0']:8321
INFO     2025-10-07 00:29:11,816 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,836 llama_stack.core.server.server:480 core::server: Run
         configuration:
INFO     2025-10-07 00:29:11,845 llama_stack.core.server.server:483 core::server: apis:
         - agents
         - batches
         - datasetio
         - eval
         - files
         - inference
         - post_training
         - safety
         - scoring
         - telemetry
         - tool_runtime
         - vector_io
         benchmarks: []
         datasets: []
         image_name: starter
         inference_store:
           db_path: /home/nvidia/.llama/distributions/starter/inference_store.db
           type: sqlite
         metadata_store:
           db_path: /home/nvidia/.llama/distributions/starter/registry.db
           type: sqlite
         models: []
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /home/nvidia/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /home/nvidia/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           batches:
           - config:
               kvstore:
                 db_path: /home/nvidia/.llama/distributions/starter/batches.db
                 type: sqlite
             provider_id: reference
             provider_type: inline::reference
           datasetio:
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /home/nvidia/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /home/nvidia/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               api_key: '********'
               url: https://api.fireworks.ai/inference/v1
             provider_id: fireworks
             provider_type: remote::fireworks
           - config:
               api_key: '********'
               url: https://api.together.xyz/v1
             provider_id: together
             provider_type: remote::together
           - config: {}
             provider_id: bedrock
             provider_type: remote::bedrock
           - config:
               api_key: '********'
               append_api_version: true
               url: http://localhost:8912
             provider_id: nvidia
             provider_type: remote::nvidia
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
           - config:
               api_key: '********'
             provider_id: anthropic
             provider_type: remote::anthropic
           - config:
               api_key: '********'
             provider_id: gemini
             provider_type: remote::gemini
           - config:
               api_key: '********'
               url: https://api.groq.com
             provider_id: groq
             provider_type: remote::groq
           - config:
               api_key: '********'
               url: https://api.sambanova.ai/v1
             provider_id: sambanova
             provider_type: remote::sambanova
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: meta
             provider_id: torchtune-cpu
             provider_type: inline::torchtune-cpu
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           - config: {}
             provider_id: code-scanner
             provider_type: inline::code-scanner
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               service_name: "\u200B"
               sinks: sqlite
               sqlite_db_path: /home/nvidia/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /home/nvidia/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
           - config:
               db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec.db
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/sqlite_vec_registry.db
                 type: sqlite
             provider_id: sqlite-vec
             provider_type: inline::sqlite-vec
         scoring_fns: []
         server:
           port: 8321
         shields: []
         tool_groups:
         - provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2
INFO     2025-10-07 00:29:12,138
         llama_stack.providers.remote.inference.nvidia.nvidia:49 inference::nvidia:
         Initializing NVIDIAInferenceAdapter(http://localhost:8912)...
INFO     2025-10-07 00:29:12,921
         llama_stack.providers.utils.inference.inference_store:74 inference: Write
         queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 00:29:13,524
         llama_stack.providers.utils.responses.responses_store:96 openai_responses:
         Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 00:29:13,679 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:13,681 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider fireworks: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:13,682 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed
         with: Pass Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
WARNING  2025-10-07 00:29:13,684 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider together: Pass
         Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
Handling connection for 8912
INFO     2025-10-07 00:29:14,047 llama_stack.providers.utils.inference.openai_mixin:448
         providers::utils: NVIDIAInferenceAdapter.list_provider_model_ids() returned 3
         models
ERROR    2025-10-07 00:29:14,062 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,063 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider openai: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,099 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed
         with: "Could not resolve authentication method. Expected either api_key or
         auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers
         to be explicitly omitted"
WARNING  2025-10-07 00:29:14,100 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider anthropic: "Could not
         resolve authentication method. Expected either api_key or auth_token to be
         set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly
         omitted"
ERROR    2025-10-07 00:29:14,102 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,103 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider gemini: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,105 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with:
         API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in
         the provider config.
WARNING  2025-10-07 00:29:14,106 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider groq: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider
         config.
ERROR    2025-10-07 00:29:14,107 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,109 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider sambanova: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,454 uvicorn.error:84 uncategorized: Started server process
         [3753046]
INFO     2025-10-07 00:29:14,455 uvicorn.error:48 uncategorized: Waiting for
         application startup.
INFO     2025-10-07 00:29:14,457 llama_stack.core.server.server:170 core::server:
         Starting up
INFO     2025-10-07 00:29:14,458 llama_stack.core.stack:415 core: starting registry
         refresh task
ERROR    2025-10-07 00:29:14,459 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,461 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider fireworks: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,462 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed
         with: Pass Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
WARNING  2025-10-07 00:29:14,463 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider together: Pass
         Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
ERROR    2025-10-07 00:29:14,465 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,466 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider openai: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,500 uvicorn.error:62 uncategorized: Application startup
         complete.
ERROR    2025-10-07 00:29:14,502 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed
         with: "Could not resolve authentication method. Expected either api_key or
         auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers
         to be explicitly omitted"
WARNING  2025-10-07 00:29:14,503 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider anthropic: "Could not
         resolve authentication method. Expected either api_key or auth_token to be
         set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly
         omitted"
ERROR    2025-10-07 00:29:14,504 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,506 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider gemini: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,507 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with:
         API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in
         the provider config.
WARNING  2025-10-07 00:29:14,508 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider groq: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider
         config.
ERROR    2025-10-07 00:29:14,510 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,511 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider sambanova: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,513 uvicorn.error:216 uncategorized: Uvicorn running on
         http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

tested with curl model, it also works:
```
curl http://localhost:8321/v1/models
{"data":[{"identifier":"bedrock/meta.llama3-1-8b-instruct-v1:0","provider_resource_id":"meta.llama3-1-8b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-70b-instruct-v1:0","provider_resource_id":"meta.llama3-1-70b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-405b-instruct-v1:0","provider_resource_id":"meta.llama3-1-405b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/bigcode/starcoder2-7b","provider_resource_id":"bigcode/starcoder2-7b","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/meta/llama-3.3-70b-instruct","provider_resource_id":"meta/llama-3.3-70b-instruct","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2","provider_resource_id":"nvidia/llama-3.2-nv-embedqa-1b-v2","provider_id":"nvidia","type":"model","metadata":{"embedding_dimension":2048,"context_length":8192},"model_type":"embedding"},{"identifier":"sentence-transformers/all-MiniLM-L6-v2","provider_resource_id":"all-MiniLM-L6-v2","provider_id":"sentence-transformers","type":"model","metadata":{"embedding_dimension":384},"model_type":"embedding"}]}%
```

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-07 18:23:12 -07:00
Emilio Garcia
bc7d4b423b
fix(scripts): select container runtime for telemetry (#3727)
# What does this PR do?
script runs with either docker or podman

## Test Plan
passes when run
2025-10-07 14:59:53 -07:00
slekkala1
c2d97a9db9
chore: fix flaky unit test and add proper shutdown for file batches (#3725)
# What does this PR do?
Have been running into flaky unit test failures:
5217035494
Fixing below
1. Shutting down properly by cancelling any stale file batches tasks
running in background.
2. Also, use unique_kvstore_config, so the test dont use same db path
and maintain test isolation
## Test Plan
Ran unit test locally and CI
2025-10-07 14:23:14 -07:00
Akram Ben Aissi
1970b4aa4b
fix: improve model availability checks: Allows use of unavailable models on startup (#3717)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m28s
- Allows use of unavailable models on startup
- Add has_model method to ModelsRoutingTable for checking pre-registered
models
- Update check_model_availability to check model_store before provider
APIs

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->


Start llama stack and point unavailable vLLM

```
VLLM_URL=https://my-unavailable-vllm/v1 MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run
```

llama stack will start without crashing but only notifying error. 

```


         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

INFO     2025-10-07 06:40:41,804 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 06:40:42,066 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 06:40:58,882 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:40:58,883 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
[...]
INFO     2025-10-07 06:40:59,036 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO     2025-10-07 06:41:04,064 openai._base_client:1618 uncategorized: Retrying request to /models in 0.398814 seconds
INFO     2025-10-07 06:41:09,497 openai._base_client:1618 uncategorized: Retrying request to /models in 0.781908 seconds
ERROR    2025-10-07 06:41:15,282 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:41:15,283 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
```
2025-10-07 14:27:24 -04:00
Francisco Arceo
d5b136ac66
feat: Enabling Annotations in Responses (#3698)
# What does this PR do?
Implements annotations for `file_search` tool.

Also adds some logs and tests.

## How does this work? 
1. **Citation Markers**: Models insert `<|file-id|>` tokens during
generation with instructions from search results
2. **Post-Processing**: Extract markers using regex to calculate
character positions and create `AnnotationFileCitation` objects
3. **File Mapping**: Store filename metadata during vector store
operations for proper citation display

## Example 
This is the updated `quickstart.py` script, which uses the `extra_body`
to register the embedding model.

```python
import io, requests
from openai import OpenAI

url="https://www.paulgraham.com/greatwork.html"
model = "gpt-4o-mini"
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")

vs = client.vector_stores.create(
    name="my_citations_db",
    extra_body={
        "embedding_model": "ollama/nomic-embed-text:latest",
        "embedding_dimension": 768,
    }
)
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)

resp = client.responses.create(
    model=model,
    input="How do you do great work? Use our existing knowledge_search tool.",
    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
    include=["file_search_call.results"],
)

print(resp)
```

<details>
<summary> Example of the full response </summary>

```python
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_0f6f7e35-f48b-4850-8604-8117d9a50e0a/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"
Response(id='resp-28f5793d-3272-4de3-81f6-8cbf107d5bcd', created_at=1759797954.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-4o-mini', object='response', output=[ResponseFileSearchToolCall(id='call_xWtvEQETN5GNiRLLiBIDKntg', queries=['how to do great work tips'], status='completed', type='file_search_call', results=[Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.3722624322210302, text='\\\'re looking where few have looked before.<br /><br />One sign that you\\\'re suited for some kind of work is when you like\\neven the parts that other people find tedious or frightening.<br /><br />But fields aren\\\'t people; you don\\\'t owe them any loyalty. If in the\\ncourse of working on one thing you discover another that\\\'s more\\nexciting, don\\\'t be afraid to switch.<br /><br />If you\\\'re making something for people, make sure it\\\'s something\\nthey actually want. The best way to do this is to make something\\nyou yourself want. Write the story you want to read; build the tool\\nyou want to use. Since your friends probably have similar interests,\\nthis will also get you your initial audience.<br /><br />This <i>should</i> follow from the excitingness rule. Obviously the most\\nexciting story to write will be the one you want to read. The reason\\nI mention this case explicitly is that so many people get it wrong.\\nInstead of making what they want, they try to make what some\\nimaginary, more sophisticated audience wants. And once you go down\\nthat route, you\\\'re lost.\\n<font color=#dddddd>[<a href="#f6n"><font color=#dddddd>6</font></a>]</font><br /><br />There are a lot of forces that will lead you astray when you\\\'re\\ntrying to figure out what to work on. Pretentiousness, fashion,\\nfear, money, politics, other people\\\'s wishes, eminent frauds. But\\nif you stick to what you find genuinely interesting, you\\\'ll be proof\\nagainst all of them. If you\\\'re interested, you\\\'re not astray.<br /><br /><br /><br /><br /><br />\\nFollowing your interests may sound like a rather passive strategy,\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.2532794607643494, text=' with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good colleagues\\noffer <i>surprising</i> insights. They can see and do things that you\\ncan\\\'t. So if you have a handful of colleagues good enough to keep\\nyou on your toes in this sense, you\\\'re probably over the threshold.<br /><br />Most of us can benefit from collaborating with colleagues, but some\\nprojects require people on a larger scale, and starting one of those\\nis not for everyone. If you want to run a project like that, you\\\'ll\\nhave to become a manager, and managing well takes aptitude and\\ninterest like any other kind of work. If you don\\\'t have them, there\\nis no middle path: you must either force yourself to learn management\\nas a second language, or avoid such projects.\\n<font color=#dddddd>[<a href="#f27n"><font color=#dddddd>27</font></a>]</font><br /><br /><br /><br /><br /><br />\\nHusband your morale. It\\\'s the basis of everything when you\\\'re working\\non ambitious projects. You have to nurture and protect it like a\\nliving organism.<br /><br />Morale starts with your view of life. You\\\'re more likely to do great\\nwork if you\\\'re an optimist, and more likely to if you think of\\nyourself as lucky than if you think of yourself as a victim.<br /><br />Indeed, work can to some extent protect you from your problems. If\\nyou choose work that\\\'s pure, its very difficulties will serve as a\\nrefuge from the difficulties of everyday life. If this is escapism,\\nit\\\'s a very productive form of it, and one that has been used by\\nsome of the greatest minds in history.<br /><br />Morale compounds via work: high morale helps you do good work, which\\nincreases your morale and helps you do even'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1973485818164222, text=' your\\nability and interest can take you. And you can only answer that by\\ntrying.<br /><br />Many more people could try to do great work than do. What holds\\nthem back is a combination of modesty and fear. It seems presumptuous\\nto try to be Newton or Shakespeare. It also seems hard; surely if\\nyou tried something like that, you\\\'d fail. Presumably the calculation\\nis rarely explicit. Few people consciously decide not to try to do\\ngreat work. But that\\\'s what\\\'s going on subconsciously; they shy\\naway from the question.<br /><br />So I\\\'m going to pull a sneaky trick on you. Do you want to do great\\nwork, or not? Now you have to decide consciously. Sorry about that.\\nI wouldn\\\'t have done it to a general audience. But we already know\\nyou\\\'re interested.<br /><br />Don\\\'t worry about being presumptuous. You don\\\'t have to tell anyone.\\nAnd if it\\\'s too hard and you fail, so what? Lots of people have\\nworse problems than that. In fact you\\\'ll be lucky if it\\\'s the worst\\nproblem you have.<br /><br />Yes, you\\\'ll have to work hard. But again, lots of people have to\\nwork hard. And if you\\\'re working on something you find very\\ninteresting, which you necessarily will if you\\\'re on the right path,\\nthe work will probably feel less burdensome than a lot of your\\npeers\\\'.<br /><br />The discoveries are out there, waiting to be made. Why not by you?<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />\\n<b>Notes</b><br /><br />[<a name="f1n"><font color=#000000>1</font></a>]\\nI don\\\'t think you could give a precise definition of what\\ncounts as great work. Doing great work means doing something important\\nso well that you expand people\\\'s ideas of what\\\'s possible. But\\nthere\\\'s no threshold for importance. It\\\'s a matter of degree, and\\noften hard to judge at the time anyway. So I\\\'d rather people focused\\non developing their interests rather than worrying about whether\\nthey\\\'re important or not. Just try to do something amazing, and\\nleave it to future generations to say if you succeeded.<br /><br />[<a name="f2n"><font color=#000000>2</font></a>]\\nA lot of standup comedy is based on noticing anomalies in\\neveryday life. "Did you ever notice...?" New ideas come from doing\\nthis about nontrivial things. Which may help explain why people\\\'s\\nreaction to a new idea is often the first half of laughing: Ha!<br /><br />[<a name="f3n"><font color=#000000>3</font></a>]\\nThat second qualifier is critical. If you\\\'re excited about\\nsomething most authorities discount, but you can\\\'t give a more\\nprecise explanation than "they don\\\'t get it," then you\\\'re starting\\nto drift into the territory of cranks.<br /><br />[<a name="f4n"><font color=#000000>4</font></a>]\\nFinding something to work on is not simply a matter of finding\\na match between the current version of you and a list of known\\nproblems. You\\\'ll often have to coevolve with the problem. That\\\'s\\nwhy it can sometimes be so hard to figure out what to work on. The\\nsearch space is huge. It\\\'s the cartesian product of all possible\\nt'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1764591706535943, text='\\noptimistic, and even though one of the sources of their optimism\\nis ignorance, in this case ignorance can sometimes beat knowledge.<br /><br />Try to finish what you start, though, even if it turns out to be\\nmore work than you expected. Finishing things is not just an exercise\\nin tidiness or self-discipline. In many projects a lot of the best\\nwork happens in what was meant to be the final stage.<br /><br />Another permissible lie is to exaggerate the importance of what\\nyou\\\'re working on, at least in your own mind. If that helps you\\ndiscover something new, it may turn out not to have been a lie after\\nall.\\n<font color=#dddddd>[<a href="#f7n"><font color=#dddddd>7</font></a>]</font><br /><br /><br /><br /><br /><br />\\nSince there are two senses of starting work &mdash; per day and per\\nproject &mdash; there are also two forms of procrastination. Per-project\\nprocrastination is far the more dangerous. You put off starting\\nthat ambitious project from year to year because the time isn\\\'t\\nquite right. When you\\\'re procrastinating in units of years, you can\\nget a lot not done.\\n<font color=#dddddd>[<a href="#f8n"><font color=#dddddd>8</font></a>]</font><br /><br />One reason per-project procrastination is so dangerous is that it\\nusually camouflages itself as work. You\\\'re not just sitting around\\ndoing nothing; you\\\'re working industriously on something else. So\\nper-project procrastination doesn\\\'t set off the alarms that per-day\\nprocrastination does. You\\\'re too busy to notice it.<br /><br />The way to beat it is to stop occasionally and ask yourself: Am I\\nworking on what I most want to work on? When you\\\'re young it\\\'s ok\\nif the answer is sometimes no, but this gets increasingly dangerous\\nas you get older.\\n<font color=#dddddd>[<a href="#f9n"><font color=#dddddd>9</font></a>]</font><br /><br /><br /><br /><br /><br />\\nGreat work usually entails spending what would seem to most people\\nan unreasonable amount of time on a problem. You can\\\'t think of\\nthis time as a cost, or it will seem too high. You have to find the\\nwork sufficiently engaging as it\\\'s happening.<br /><br />There may be some jobs where you have to work diligently for years\\nat things you hate before you get to the good part, but this is not\\nhow great work happens. Great work happens by focusing consistently\\non something you\\\'re genuinely interested in. When you pause to take\\nstock, you\\\'re surprised how far you\\\'ve come.<br /><br />The reason we\\\'re surprised is that we underestimate the cumulative\\neffect of work. Writing a page a day doesn\\\'t sound like much, but\\nif you do it every day you\\\'ll write a book a year. That\\\'s the key:\\nconsistency. People who do great things don\\\'t get a lot done every\\nday. They get something done, rather than nothing.<br /><br />If you do work that compounds, you\\\'ll get exponential growth. Most\\npeople who do this do it unconsciously, but it\\\'s worth stopping to\\nthink about. Learning, for example, is an instance of this phenomenon:\\nthe more you learn about something, the easier it is to learn more.\\nGrowing an audience is another: the more fans you have, the more\\nnew fans they\\\'ll bring you.<br /><br />'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.174069664815369, text='\\ninside.<br /><br /><br /><br /><br /><br />Let\\\'s talk a little more about the complicated business of figuring\\nout what to work on. The main reason it\\\'s hard is that you can\\\'t\\ntell what most kinds of work are like except by doing them. Which\\nmeans the four steps overlap: you may have to work at something for\\nyears before you know how much you like it or how good you are at\\nit. And in the meantime you\\\'re not doing, and thus not learning\\nabout, most other kinds of work. So in the worst case you choose\\nlate based on very incomplete information.\\n<font color=#dddddd>[<a href="#f4n"><font color=#dddddd>4</font></a>]</font><br /><br />The nature of ambition exacerbates this problem. Ambition comes in\\ntwo forms, one that precedes interest in the subject and one that\\ngrows out of it. Most people who do great work have a mix, and the\\nmore you have of the former, the harder it will be to decide what\\nto do.<br /><br />The educational systems in most countries pretend it\\\'s easy. They\\nexpect you to commit to a field long before you could know what\\nit\\\'s really like. And as a result an ambitious person on an optimal\\ntrajectory will often read to the system as an instance of breakage.<br /><br />It would be better if they at least admitted it &mdash; if they admitted\\nthat the system not only can\\\'t do much to help you figure out what\\nto work on, but is designed on the assumption that you\\\'ll somehow\\nmagically guess as a teenager. They don\\\'t tell you, but I will:\\nwhen it comes to figuring out what to work on, you\\\'re on your own.\\nSome people get lucky and do guess correctly, but the rest will\\nfind themselves scrambling diagonally across tracks laid down on\\nthe assumption that everyone does.<br /><br />What should you do if you\\\'re young and ambitious but don\\\'t know\\nwhat to work on? What you should <i>not</i> do is drift along passively,\\nassuming the problem will solve itself. You need to take action.\\nBut there is no systematic procedure you can follow. When you read\\nbiographies of people who\\\'ve done great work, it\\\'s remarkable how\\nmuch luck is involved. They discover what to work on as a result\\nof a chance meeting, or by reading a book they happen to pick up.\\nSo you need to make yourself a big target for luck, and the way to\\ndo that is to be curious. Try lots of things, meet lots of people,\\nread lots of books, ask lots of questions.\\n<font color=#dddddd>[<a href="#f5n"><font color=#dddddd>5</font></a>]</font><br /><br />When in doubt, optimize for interestingness. Fields change as you\\nlearn more about them. What mathematicians do, for example, is very\\ndifferent from what you do in high school math classes. So you need\\nto give different types of work a chance to show you what they\\\'re\\nlike. But a field should become <i>increasingly</i> interesting as you\\nlearn more about it. If it doesn\\\'t, it\\\'s probably not for you.<br /><br />Don\\\'t worry if you find you\\\'re interested in different things than\\nother people. The stranger your tastes in interestingness, the\\nbetter. Strange tastes are often strong ones, and a strong taste\\nfor work means you\\\'ll be productive. And you\\\'re more likely to find\\nnew things if you'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.158095578895721, text='. Don\\\'t copy the manner of\\nan eminent 50 year old professor if you\\\'re 18, for example, or the\\nidiom of a Renaissance poem hundreds of years later.<br /><br />Some of the features of things you admire are flaws they succeeded\\ndespite. Indeed, the features that are easiest to imitate are the\\nmost likely to be the flaws.<br /><br />This is particularly true for behavior. Some talented people are\\njerks, and this sometimes makes it seem to the inexperienced that\\nbeing a jerk is part of being talented. It isn\\\'t; being talented\\nis merely how they get away with it.<br /><br />One of the most powerful kinds of copying is to copy something from\\none field into another. History is so full of chance discoveries\\nof this type that it\\\'s probably worth giving chance a hand by\\ndeliberately learning about other kinds of work. You can take ideas\\nfrom quite distant fields if you let them be metaphors.<br /><br />Negative examples can be as inspiring as positive ones. In fact you\\ncan sometimes learn more from things done badly than from things\\ndone well; sometimes it only becomes clear what\\\'s needed when it\\\'s\\nmissing.<br /><br /><br /><br /><br /><br />\\nIf a lot of the best people in your field are collected in one\\nplace, it\\\'s usually a good idea to visit for a while. It will\\nincrease your ambition, and also, by showing you that these people\\nare human, increase your self-confidence.\\n<font color=#dddddd>[<a href="#f26n"><font color=#dddddd>26</font></a>]</font><br /><br />If you\\\'re earnest you\\\'ll probably get a warmer welcome than you\\nmight expect. Most people who are very good at something are happy\\nto talk about it with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1566747762241967, text=',\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if you do that you\\\'ll find you get diminishing returns:\\nfatigue will make you stupid, and eventually even damage your health.\\nThe point at which work yields diminishing returns depends on the\\ntype. Some of the hardest types you might only be able to do for\\nfour or five hours a day.<br /><br />Ideally those hours will be contiguous. To the extent you can, try\\nto arrange your life so you have big blocks of time to work in.\\nYou\\\'ll shy away from hard tasks if you know you might be interrupted.<br /><br />It will probably be harder to start working than to keep working.\\nYou\\\'ll often have to trick yourself to get over that initial\\nthreshold. Don\\\'t worry about this; it\\\'s the nature of work, not a\\nflaw in your character. Work has a sort of activation energy, both\\nper day and per project. And since this threshold is fake in the\\nsense that it\\\'s higher than the energy required to keep going, it\\\'s\\nok to tell yourself a lie of corresponding magnitude to get over\\nit.<br /><br />It\\\'s usually a mistake to lie to yourself if you want to do great\\nwork, but this is one of the rare cases where it isn\\\'t. When I\\\'m\\nreluctant to start work in the morning, I often trick myself by\\nsaying "I\\\'ll just read over what I\\\'ve got so far." Five minutes\\nlater I\\\'ve found something that seems mistaken or incomplete, and\\nI\\\'m off.<br /><br />Similar techniques work for starting new projects. It\\\'s ok to lie\\nto yourself about how much work a project will entail, for example.\\nLots of great things began with someone saying "How hard could it\\nbe?"<br /><br />This is one case where the young have an advantage. They\\\'re more'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1349744395573516, text=' audience\\nin the traditional sense. Either way it doesn\\\'t need to be big.\\nThe value of an audience doesn\\\'t grow anything like linearly with\\nits size. Which is bad news if you\\\'re famous, but good news if\\nyou\\\'re just starting out, because it means a small but dedicated\\naudience can be enough to sustain you. If a handful of people\\ngenuinely love what you\\\'re doing, that\\\'s enough.<br /><br />To the extent you can, avoid letting intermediaries come between\\nyou and your audience. In some types of work this is inevitable,\\nbut it\\\'s so liberating to escape it that you might be better off\\nswitching to an adjacent type if that will let you go direct.\\n<font color=#dddddd>[<a href="#f28n"><font color=#dddddd>28</font></a>]</font><br /><br />The people you spend time with will also have a big effect on your\\nmorale. You\\\'ll find there are some who increase your energy and\\nothers who decrease it, and the effect someone has is not always\\nwhat you\\\'d expect. Seek out the people who increase your energy and\\navoid those who decrease it. Though of course if there\\\'s someone\\nyou need to take care of, that takes precedence.<br /><br />Don\\\'t marry someone who doesn\\\'t understand that you need to work,\\nor sees your work as competition for your attention. If you\\\'re\\nambitious, you need to work; it\\\'s almost like a medical condition;\\nso someone who won\\\'t let you work either doesn\\\'t understand you,\\nor does and doesn\\\'t care.<br /><br />Ultimately morale is physical. You think with your body, so it\\\'s\\nimportant to take care of it. That means exercising regularly,\\neating and sleeping well, and avoiding the more dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.123214818076958, text='b\'<html><head><meta name="Keywords" content="" /><title>How to Do Great Work</title><!-- <META NAME="ROBOTS" CONTENT="NOODP"> -->\\n<link rel="shortcut icon" href="http://ycombinator.com/arc/arc.png">\\n</head><body bgcolor="#ffffff" background="https://s.turbifycdn.com/aah/paulgraham/bel-6.gif" text="#000000" link="#000099" vlink="#464646"><table border="0" cellspacing="0" cellpadding="0"><tr valign="top"><td><map name=118ab66adb24b4f><area shape=rect coords="0,0,67,21" href="index.html"><area shape=rect coords="0,21,67,42" href="articles.html"><area shape=rect coords="0,42,67,63" href="http://www.amazon.com/gp/product/0596006624"><area shape=rect coords="0,63,67,84" href="books.html"><area shape=rect coords="0,84,67,105" href="http://ycombinator.com"><area shape=rect coords="0,105,67,126" href="arc.html"><area shape=rect coords="0,126,67,147" href="bel.html"><area shape=rect coords="0,147,67,168" href="lisp.html"><area shape=rect coords="0,168,67,189" href="antispam.html"><area shape=rect coords="0,189,67,210" href="kedrosky.html"><area shape=rect coords="0,210,67,231" href="faq.html"><area shape=rect coords="0,231,67,252" href="raq.html"><area shape=rect coords="0,252,67,273" href="quo.html"><area shape=rect coords="0,273,67,294" href="rss.html"><area shape=rect coords="0,294,67,315" href="bio.html"><area shape=rect coords="0,315,67,336" href="https://twitter.com/paulg"><area shape=rect coords="0,336,67,357" href="https://mas.to/@paulg"></map><img src="https://s.turbifycdn.com/aah/paulgraham/bel-7.gif" width="69" height="357" usemap=#118ab66adb24b4f border="0" hspace="0" vspace="0" ismap /></td><td><img src="https://sep.turbifycdn.com/ca/Img/trans_1x1.gif" height="1" width="26" border="0" /></td><td><a href="index.html"><img src="https://s.turbifycdn.com/aah/paulgraham/bel-8.gif" width="410" height="45" border="0" hspace="0" vspace="0" /></a><br /><br /><table border="0" cellspacing="0" cellpadding="0" width="435"><tr valign="top"><td width="435"><img src="https://s.turbifycdn.com/aah/paulgraham/how-to-do-great-work-2.gif" width="185" height="18" border="0" hspace="0" vspace="0" alt="How to Do Great Work" /><br /><br /><font size="2" face="verdana">July 2023<br /><br />If you collected lists of techniques for doing great work in a lot\\nof different fields, what would the intersection look like? I decided\\nto find out'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1193194369249235, text=' dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied with a single\\nword, my bet would be on "curiosity."<br /><br />That doesn\\\'t translate directly to advice. It\\\'s not enough just to\\nbe curious, and you can\\\'t command curiosity anyway. But you can\\nnurture it and let it drive you.<br /><br />Curiosity is the key to all four steps in doing great work: it will\\nchoose the field for you, get you to the frontier, cause you to\\nnotice the gaps in it, and drive you to explore them. The whole\\nprocess is a kind of dance with curiosity.<br /><br /><br /><br /><br /><br />\\nBelieve it or not, I tried to make this essay as short as I could.\\nBut its length at least means it acts as a filter. If you made it\\nthis far, you must be interested in doing great work. And if so\\nyou\\\'re already further along than you might realize, because the\\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\\nmathematical sense, and they are: ability, interest, effort, and\\nluck. Luck by definition you can\\\'t do anything about, so we can\\nignore that. And we can assume effort, if you do in fact want to\\ndo great work. So the problem boils down to ability and interest.\\nCan you find a kind of work where your ability and interest will\\ncombine to yield an explosion of new ideas?<br /><br />Here there are grounds for optimism. There are so many different\\nways to do great work, and even more that are still undiscovered.\\nOut of all those different types of work, the one you\\\'re most suited\\nfor is probably a pretty close match. Probably a comically close\\nmatch. It\\\'s just a question of finding it, and how far into it')]), ResponseOutputMessage(id='msg_3591ea71-8b35-4efd-a5ad-c1c250801971', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')], text='To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=False, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity=None), top_logprobs=None, truncation=None, usage=None, user=None)

In [34]: resp.output[1].content[0].text
Out[34]: 'To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.'
```
</details>

The relevant output looks like this:

```python
>resp.output[1].content[0].annotations
[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')]```
And
```python
In [144]: print(resp.output[1].content[0].text)
To do great work, consider the following principles:

1. **Follow Your Interests**: Engage in work that genuinely excites you.
If you find an area intriguing, pursue it without being overly concerned
about external pressures or norms. You should create things that you
would want for yourself, as this often aligns with what others in your
circle might want too.

2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should
be tempered by genuine interest. Instead of detailed planning for the
future, focus on exciting projects that keep your options open. This
approach, known as "staying upwind," allows for adaptability and can
lead to unforeseen achievements.

3. **Choose Quality Colleagues**: Collaborating with talented colleagues
can significantly affect your own work. Seek out individuals who offer
surprising insights and whom you admire. The presence of good colleagues
can elevate the quality of your work and inspire you.

4. **Maintain High Morale**: Your attitude towards work and life affects
your performance. Cultivating optimism and viewing yourself as lucky
rather than victimized can boost your productivity. It’s essential to
care for your physical health as well since it directly impacts your
mental faculties and morale.

5. **Be Consistent**: Great work often comes from cumulative effort.
Daily progress, even in small amounts, can result in substantial
achievements over time. Emphasize consistency and make the work
engaging, as this reduces the perceived burden of hard labor.

6. **Embrace Curiosity**: Curiosity is a driving force that can guide
you in selecting fields of interest, pushing you to explore uncharted
territories. Allow it to shape your work and continually seek knowledge
and insights.

By focusing on these aspects, you can create an environment conducive to
great work and personal fulfillment.
```

And the code below outputs only periods highlighting that the position/index behaves as expected—i.e., the annotation happens at the end of the sentence.

```python
print([resp.output[1].content[0].text[j.index] for j in
resp.output[1].content[0].annotations])
Out[41]: ['.', '.', '.', '.', '.', '.']
```

## Test Plan
Unit tests added.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-07 14:00:56 -04:00
Charlie Doern
6389bf5ffb
fix: make telemetry optional for agents (#3705)
# What does this PR do?

there is a lot of code in the agents API using the telemetry API and its
helpers without checking if that API is even enabled.

This is the only API besides inference actively using telemetry code, so
after this telemetry can be optional for the entire stack


resolves #3665


## Test Plan

existing agent tests.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-07 16:09:03 +02:00
Matthew Farrellee
e892a3f7f4
feat: add refresh_models support to inference adapters (default: false) (#3719)
# What does this PR do?

inference adapters can now configure `refresh_models: bool` to control
periodic model listing from their providers

BREAKING CHANGE: together inference adapter default changed. previously
always refreshed, now follows config.

addresses "models: refresh" on #3517

## Test Plan

ci w/ new tests
2025-10-07 15:19:56 +02:00
Charlie Doern
8b9af03a1b
fix: refresh log should be debug (#3720)
# What does this PR do?

when using a distro like starter where a bunch of providers are disabled
I should not see logs like:

```
         in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:38:52,117 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,123 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass Fireworks API Key in the header
         X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}
WARNING  2025-10-07 08:43:52,126 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header
         X-LlamaStack-Provider-Data as { "together_api_key": <your api key>}
WARNING  2025-10-07 08:43:52,129 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API
         key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,132 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,136 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API
         key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,139 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key
         in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,142 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config.
^CINFO     2025-10-07 08:46:11,996 llama_stack.core.utils.exec:75 core:
```

as WARNING. Switch to Debug.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-07 09:04:07 -04:00
Sumanth Kamenani
1fcde5fc2f
fix: update pyproject.toml dependencies for vector processing (#3555)
What does this PR do?

Updates pyproject.toml dependencies to fix vector processing
compatibility issues.

closes: #3495 

  Test Plan

  Tested llama stack server with faiss vector database:

1. Built and ran server: llama stack build --distro starter --image-type
venv --image-name llamastack-faiss
3. Tested file upload: Successfully uploaded PDF via /v1/openai/v1/files
  4. Tested vector operations:
    - Created vector store with faiss backend
    - Added PDF to vector store
    - Performed semantic search queries
2025-10-07 15:01:36 +02:00
Justin
509ac4a659
feat: enable Runpod inference adapter (#3707)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 30s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
Sorry to @mattf I thought I could close the other PR and reopen it.. But
I didn't have the option to reopen it now. I just didn't want it to keep
notifying maintainers if I would make other commits for testing.

Continuation of: https://github.com/llamastack/llama-stack/pull/3641

PR fixes Runpod Adapter
https://github.com/llamastack/llama-stack/issues/3517

## What I fixed from before:
Continuation of: https://github.com/llamastack/llama-stack/pull/3641

1. Made it all OpenAI
2. Fixed the class up since the OpenAIMixin had a couple changes with
the pydantic base model stuff.
3. Test to make sure that we could dynamically find models and use the
resulting identifier to make requests
```bash
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"
```

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

```
# RunPod Provider Quick Start

## Prerequisites
- Python 3.10+
- Git
- RunPod API token

## Setup for Development

```bash
# 1. Clone and enter the repository
cd (into the repo)

# 2. Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Remove any existing llama-stack installation
pip uninstall llama-stack llama-stack-client -y

# 4. Install llama-stack in development mode
pip install -e .

# 5. Build using local development code
(Found this through the Discord)
LLAMA_STACK_DIR=. llama stack build

# When prompted during build:
# - Name: runpod-dev
# - Image type: venv
# - Inference provider: remote::runpod
# - Safety provider: "llama-guard" 
# - Other providers: first defaults
```

## Configure the Stack

The RunPod adapter automatically discovers models from your endpoint via the `/v1/models` API.
No manual model configuration is required - just set your environment variables.

## Run the Server

### Important: Use the Build-Created Virtual Environment

```bash
# Exit the development venv if you're in it
deactivate

# Activate the build-created venv (NOT .venv)
cd (lama-stack folder github repo)
source llamastack-runpod-dev/bin/activate
```

### For Qwen3-32B-AWQ Public Endpoint (Recommended)

```bash
# Set environment variables
export RUNPOD_URL="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1"
export RUNPOD_API_TOKEN="your_runpod_api_key"

# Start server
llama stack run
~/.llama/distributions/llamastack-runpod-dev/llamastack-runpod-dev-run.yaml
```

## Quick Test

### 1. List Available Models (Dynamic Discovery)

First, check which models are available on your RunPod endpoint:

```bash
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"
```

**Example Response:**
```json
{
  "data": [
    {
      "identifier": "qwen3-32b-awq",
      "provider_resource_id": "Qwen/Qwen3-32B-AWQ",
      "provider_id": "runpod",
      "type": "model",
      "metadata": {},
      "model_type": "llm"
    }
  ]
}
```

**Note:** Use the `identifier` value from the response above in your requests below.

### 2. Chat Completion (Non-streaming)

Replace `qwen3-32b-awq` with your model identifier from step 1:

```bash
curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Hello, count to 3"}],
    "stream": false
  }'
```

### 3. Chat Completion (Streaming)

```bash
curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'
```

**Clean streaming output:**
```bash
curl -N -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
-d '{"model": "qwen3-32b-awq", "messages": [{"role": "user", "content":
"Count to 5"}], "stream": true}' \
  2>/dev/null | while read -r line; do
echo "$line" | grep "^data: " | sed 's/^data: //' | jq -r
'.choices[0].delta.content // empty' 2>/dev/null
  done
```

**Expected Output:**
```
1
2
3
4
5
```
2025-10-07 12:24:50 +02:00
ehhuang
50f9ca3541
chore: remove dead code (#3713)
# What does this PR do?


## Test Plan
2025-10-07 12:13:11 +02:00
slekkala1
bba9957edd
feat(api): Add vector store file batches api (#3642)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Add Open AI Compatible vector store file batches api. This functionality
is needed to attach many files to a vector store as a batch.
https://github.com/llamastack/llama-stack/issues/3533

API Stubs have been merged
https://github.com/llamastack/llama-stack/pull/3615
Adds persistence for file batches as discussed in diff
https://github.com/llamastack/llama-stack/pull/3544
(Used claude code for generation and reviewed by me)


## Test Plan
1. Unit tests pass
2. Also verified the cc-vec integration with LLamaStackClient works with
the file batches api. https://github.com/raghotham/cc-vec
2. Integration tests pass
2025-10-06 16:58:22 -07:00
ehhuang
597d405e13
chore: fix closing error (#3709)
# What does this PR do?
Gets rid of this error message below (disclaimer: not sure why, but it
does).

ERROR 2025-10-06 12:04:22,837 asyncio:118 uncategorized: Task exception
was never retrieved
future: <Task finished name='Task-36' coro=<AsyncClient.aclose() done,
defined at

/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_client.py:1978>
exception=RuntimeError('unable to perform operation on <TCPTransport
closed=True reading=False 0x122dc7ad0>; the handler is closed')>
╭───────────────────────────────────────────────────────────────────
Traceback (most recent call last)
───────────────────────────────────────────────────────────────────╮
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_client.py:1985
in aclose │
│ │
│ 1982 │ │ if self._state != ClientState.CLOSED: │
│ 1983 │ │ │ self._state = ClientState.CLOSED │
│ 1984 │ │ │ │
│ ❱ 1985 │ │ │ await self._transport.aclose() │
│ 1986 │ │ │ for proxy in self._mounts.values(): │
│ 1987 │ │ │ │ if proxy is not None: │
│ 1988 │ │ │ │ │ await proxy.aclose() │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_transports/default.py:406
in aclose │
│ │
│ 403 │ │ ) │
│ 404 │ │
│ 405 │ async def aclose(self) -> None: │
│ ❱ 406 │ │ await self._pool.aclose() │
│ 407 │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py:353
in aclose │
│ │
│ 350 │ │ with self._optional_thread_lock: │
│ 351 │ │ │ closing_connections = list(self._connections) │
│ 352 │ │ │ self._connections = [] │
│ ❱ 353 │ │ await self._close_connections(closing_connections) │
│ 354 │ │
│ 355 │ async def __aenter__(self) -> AsyncConnectionPool: │
│ 356 │ │ return self │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py:345
in _close_connections │
│ │
│ 342 │ │ # Close connections which have been removed from the pool. │
│ 343 │ │ with AsyncShieldCancellation(): │
│ 344 │ │ │ for connection in closing: │
│ ❱ 345 │ │ │ │ await connection.aclose() │
│ 346 │ │
│ 347 │ async def aclose(self) -> None: │
│ 348 │ │ # Explicitly close the connection pool. │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py:173
in aclose │
│ │
│ 170 │ async def aclose(self) -> None: │
│ 171 │ │ if self._connection is not None: │
│ 172 │ │ │ async with Trace("close", logger, None, {}): │
│ ❱ 173 │ │ │ │ await self._connection.aclose() │
│ 174 │ │
│ 175 │ def is_available(self) -> bool: │
│ 176 │ │ if self._connection is None: │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py:258
in aclose │
│ │
│ 255 │ │ # Note that this method unilaterally closes the connection,
and does │
│ 256 │ │ # not have any kind of locking in place around it. │
│ 257 │ │ self._state = HTTPConnectionState.CLOSED │
│ ❱ 258 │ │ await self._network_stream.aclose() │
│ 259 │ │
│ 260 │ # The AsyncConnectionInterface methods provide information about
the state of │
│ 261 │ # the connection, allowing for a connection pooling
implementation to │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py:53
in aclose │
│ │
│ 50 │ │ │ │ await self._stream.send(item=buffer) │
│ 51 │ │
│ 52 │ async def aclose(self) -> None: │
│ ❱ 53 │ │ await self._stream.aclose() │
│ 54 │ │
│ 55 │ async def start_tls( │
│ 56 │ │ self, │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/anyio/streams/tls.py:216
in aclose │
│ │
│ 213 │ │ │ │ await aclose_forcefully(self.transport_stream) │
│ 214 │ │ │ │ raise │
│ 215 │ │ │
│ ❱ 216 │ │ await self.transport_stream.aclose() │
│ 217 │ │
│ 218 │ async def receive(self, max_bytes: int = 65536) -> bytes: │
│ 219 │ │ data = await
self._call_sslobject_method(self._ssl_object.read, max_bytes) │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py:1310
in aclose │
│ │
│ 1307 │ │ if not self._transport.is_closing(): │
│ 1308 │ │ │ self._closed = True │
│ 1309 │ │ │ try: │
│ ❱ 1310 │ │ │ │ self._transport.write_eof() │
│ 1311 │ │ │ except OSError: │
│ 1312 │ │ │ │ pass │
│ 1313 │
│ │
│ in uvloop.loop.UVStream.write_eof:703 │
│ │
│ in uvloop.loop.UVHandle._ensure_alive:159 │

╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: unable to perform operation on <TCPTransport closed=True
reading=False 0x122dc7ad0>; the handler is closed

## Test Plan
Run
uv run --with llama-stack llama stack build --distro=starter
--image-type=venv --run

No more error
2025-10-06 14:44:01 -07:00
ehhuang
696fefbf17
chore: logger category fix (#3706)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test Llama Stack Build / build (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m22s
# What does this PR do?
WARNING 2025-10-06 12:01:45,137 root:266 uncategorized: Unknown logging
category: tokenizer_utils. Falling back to default 'root' level: 20

## Test Plan
2025-10-06 12:16:26 -07:00
Alexey Rybak
a8da6ba3a7
docs: API docstrings cleanup for better documentation rendering (#3661)
# What does this PR do?
* Cleans up API docstrings for better documentation rendering

<img width="2346" height="1126" alt="image"
src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1"
/>

## Test Plan
* Manual testing

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Co-authored-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com>
Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu>
Co-authored-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 10:46:33 -07:00
Matthew Farrellee
892ea759fa
chore: remove together inference adapter's custom check_model_availability (#3702)
# What does this PR do?

remove Together inference adapter's check_model_availability impl, rely
on standard impl instead


## Test Plan

ci
2025-10-06 13:28:36 -04:00
Matthew Farrellee
de9940c697
chore: disable openai_embeddings on inference=remote::llama-openai-compat (#3704)
# What does this PR do?

api.llama.com does not provide embedding models, this makes that clear


## Test Plan

ci
2025-10-06 13:27:40 -04:00
Matthew Farrellee
ae74b31ae3
chore: remove vLLM inference adapter's custom list_models (#3703)
# What does this PR do?

remove vLLM inference adapter's custom list_models impl, rely on
standard impl instead

## Test Plan

ci
2025-10-06 13:27:30 -04:00
Matthew Farrellee
d23ed26238
chore: turn OpenAIMixin into a pydantic.BaseModel (#3671)
# What does this PR do?

- implement get_api_key instead of relying on
LiteLLMOpenAIMixin.get_api_key
 - remove use of LiteLLMOpenAIMixin
 - add default initialize/shutdown methods to OpenAIMixin
 - remove __init__s to allow proper pydantic construction
- remove dead code from vllm adapter and associated / duplicate unit
tests
 - update vllm adapter to use openaimixin for model registration
 - remove ModelRegistryHelper from fireworks & together adapters
 - remove Inference from nvidia adapter
 - complete type hints on embedding_model_metadata
- allow extra fields on OpenAIMixin, for model_store, __provider_id__,
etc
 - new recordings for ollama
 - enhance the list models error handling
- update cerebras (remove cerebras-cloud-sdk) and anthropic (custom
model listing) inference adapters
 - parametrized test_inference_client_caching
- remove cerebras, databricks, fireworks, together from blanket mypy
exclude
 - removed unnecessary litellm deps

## Test Plan

ci
2025-10-06 11:33:19 -04:00
Matthew Farrellee
724dac498c
chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details (#3682)
# What does this PR do?

close the _model_cache abstraction leak

## Test Plan

ci w/ new tests
2025-10-06 09:44:33 -04:00
Charlie Doern
f00bcd9561
feat: allow for multiple external provider specs (#3341)
# What does this PR do?

when using the providers.d method of installation users could hand craft
their AdapterSpec's to use overlapping code meaning one repo could
contain an inline and remote impl. Currently installing a provider via
module does not allow for that as each repo is only allowed to have one
`get_provider_spec` method with one Spec returned

add an optional way for `get_provider_spec` to return a list of
`ProviderSpec` where each can be either an inline or remote impl.

Note: the `adapter_type` in `get_provider_spec` MUST match the
`provider_type` in the build/run yaml for this to work.

resolves #3226

## Test Plan

once this merges we need to re-enable the external provider test and
account for this functionality. Work needs to be done in the external
provider repos to support this functionality.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-06 15:26:38 +02:00
ehhuang
426cac078b
chore: use uvicorn to start llama stack server everywhere (#3625)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn
to start llama stack server which supports spawning multiple workers.

This PR enables us to launch >1 workers from `llama stack run` (will add
the parameter in a follow-up PR, keeping this PR on simplifying) by
removing the old way of launching stack server and consolidates
launching via uvicorn.run only.


## Test Plan
ran `llama stack run starter`
CI
2025-10-06 14:27:40 +02:00
dependabot[bot]
92219fd8fb
chore(python-deps): bump pandas from 2.3.1 to 2.3.3 (#3689)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m26s
Bumps [pandas](https://github.com/pandas-dev/pandas) from 2.3.1 to
2.3.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pandas-dev/pandas/releases">pandas's
releases</a>.</em></p>
<blockquote>
<h2>Pandas 2.3.3</h2>
<p>We are pleased to announce the release of pandas 2.3.3.
This release includes some improvements and fixes to the future string
data type (preview feature for the upcoming pandas 3.0). We recommend
that all users upgrade to this version.</p>
<p>See the <a
href="https://pandas.pydata.org/pandas-docs/version/2.3/whatsnew/v2.3.3.html">full
whatsnew</a> for a list of all the changes.
Pandas 2.3.3 supports Python 3.9 and higher, and is the first release to
support Python 3.14.</p>
<p>The release will be available on the conda-forge channel:</p>
<pre><code>conda install pandas --channel conda-forge
</code></pre>
<p>Or via PyPI:</p>
<pre><code>python3 -m pip install --upgrade pandas
</code></pre>
<p>Please report any issues with the release on the <a
href="https://github.com/pandas-dev/pandas/issues">pandas issue
tracker</a>.</p>
<p>Thanks to all the contributors who made this release possible.</p>
<h2>Pandas 2.3.2</h2>
<p>We are pleased to announce the release of pandas 2.3.2.
This release includes some improvements and fixes to the future string
data type (preview feature for the upcoming pandas 3.0). We recommend
that all users upgrade to this version.</p>
<p>See the <a
href="https://pandas.pydata.org/pandas-docs/version/2.3/whatsnew/v2.3.2.html">full
whatsnew</a> for a list of all the changes.
Pandas 2.3.2 supports Python 3.9 and higher.</p>
<p>The release will be available on the conda-forge channel:</p>
<pre><code>conda install pandas --channel conda-forge
</code></pre>
<p>Or via PyPI:</p>
<pre><code>python3 -m pip install --upgrade pandas
</code></pre>
<p>Please report any issues with the release on the <a
href="https://github.com/pandas-dev/pandas/issues">pandas issue
tracker</a>.</p>
<p>Thanks to all the contributors who made this release possible.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9c8bc3e551"><code>9c8bc3e</code></a>
RLS: 2.3.3</li>
<li><a
href="6aa788a00b"><code>6aa788a</code></a>
[backport 2.3.x] DOC: prepare 2.3.3 whatsnew notes for release (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62499">#62499</a>)
(<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62508">#62508</a>)</li>
<li><a
href="b64f0df403"><code>b64f0df</code></a>
[backport 2.3.x] BUG: avoid validation error for ufunc with
string[python] ar...</li>
<li><a
href="058eb2b0ed"><code>058eb2b</code></a>
[backport 2.3.x] BUG: String[pyarrow] comparison with mixed object (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62424">#62424</a>)
(...</li>
<li><a
href="2ca088daef"><code>2ca088d</code></a>
[backport 2.3.x] DEPR: remove the Period resampling deprecation (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62480">#62480</a>)
(<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62">#62</a>...</li>
<li><a
href="92bf98f623"><code>92bf98f</code></a>
[backport 2.3.x] BUG: fix .str.isdigit to honor unicode superscript for
older...</li>
<li><a
href="e57c7d6a22"><code>e57c7d6</code></a>
Backport PR <a
href="https://redirect.github.com/pandas-dev/pandas/issues/62452">#62452</a>
on branch 2.3.x (TST: Adjust tests for numexpr 2.13) (<a
href="https://redirect.github.com/pandas-dev/pandas/issues/62454">#62454</a>)</li>
<li><a
href="e0fe9a03c9"><code>e0fe9a0</code></a>
Backport to 2.3.x: REGR: from_records not initializing subclasses
properly (#...</li>
<li><a
href="23a1085e64"><code>23a1085</code></a>
BUG: improve future warning for boolean operations with missaligned
indexes (...</li>
<li><a
href="61136969fb"><code>6113696</code></a>
Backport PR <a
href="https://redirect.github.com/pandas-dev/pandas/issues/62396">#62396</a>
on branch 2.3.x (PKG/DOC: indicate Python 3.14 support in ...</li>
<li>Additional commits viewable in <a
href="https://github.com/pandas-dev/pandas/compare/v2.3.1...v2.3.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pandas&package-manager=uv&previous-version=2.3.1&new-version=2.3.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:20:29 -07:00
dependabot[bot]
198536f136
chore(github-deps): bump actions/github-script from 7.0.1 to 8.0.0 (#3685)
Bumps [actions/github-script](https://github.com/actions/github-script)
from 7.0.1 to 8.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/github-script/releases">actions/github-script's
releases</a>.</em></p>
<blockquote>
<h2>v8.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update Node.js version support to 24.x by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/637">actions/github-script#637</a></li>
<li>README for updating actions/github-script from v7 to v8 by <a
href="https://github.com/sneha-krip"><code>@​sneha-krip</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/653">actions/github-script#653</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/637">actions/github-script#637</a></li>
<li><a
href="https://github.com/sneha-krip"><code>@​sneha-krip</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/653">actions/github-script#653</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/github-script/compare/v7.1.0...v8.0.0">https://github.com/actions/github-script/compare/v7.1.0...v8.0.0</a></p>
<h2>v7.1.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Upgrade husky to v9 by <a
href="https://github.com/benelan"><code>@​benelan</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/482">actions/github-script#482</a></li>
<li>Add workflow file for publishing releases to immutable action
package by <a
href="https://github.com/Jcambass"><code>@​Jcambass</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/485">actions/github-script#485</a></li>
<li>Upgrade IA Publish by <a
href="https://github.com/Jcambass"><code>@​Jcambass</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/486">actions/github-script#486</a></li>
<li>Fix workflow status badges by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/497">actions/github-script#497</a></li>
<li>Update usage of <code>actions/upload-artifact</code> by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/512">actions/github-script#512</a></li>
<li>Clear up package name confusion by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/514">actions/github-script#514</a></li>
<li>Update dependencies with <code>npm audit fix</code> by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/515">actions/github-script#515</a></li>
<li>Specify that the used script is JavaScript by <a
href="https://github.com/timotk"><code>@​timotk</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/478">actions/github-script#478</a></li>
<li>chore: Add Dependabot for NPM and Actions by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/472">actions/github-script#472</a></li>
<li>Define <code>permissions</code> in workflows and update actions by
<a href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in
<a
href="https://redirect.github.com/actions/github-script/pull/531">actions/github-script#531</a></li>
<li>chore: Add Dependabot for .github/actions/install-dependencies by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/532">actions/github-script#532</a></li>
<li>chore: Remove .vscode settings by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/533">actions/github-script#533</a></li>
<li>ci: Use github/setup-licensed by <a
href="https://github.com/nschonni"><code>@​nschonni</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/473">actions/github-script#473</a></li>
<li>make octokit instance available as octokit on top of github, to make
it easier to seamlessly copy examples from GitHub rest api or octokit
documentations by <a
href="https://github.com/iamstarkov"><code>@​iamstarkov</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/508">actions/github-script#508</a></li>
<li>Remove <code>octokit</code> README updates for v7 by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/557">actions/github-script#557</a></li>
<li>docs: add &quot;exec&quot; usage examples by <a
href="https://github.com/neilime"><code>@​neilime</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/546">actions/github-script#546</a></li>
<li>Bump ruby/setup-ruby from 1.213.0 to 1.222.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/github-script/pull/563">actions/github-script#563</a></li>
<li>Bump ruby/setup-ruby from 1.222.0 to 1.229.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/github-script/pull/575">actions/github-script#575</a></li>
<li>Clearly document passing inputs to the <code>script</code> by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/603">actions/github-script#603</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/github-script/pull/610">actions/github-script#610</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/benelan"><code>@​benelan</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/482">actions/github-script#482</a></li>
<li><a href="https://github.com/Jcambass"><code>@​Jcambass</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/485">actions/github-script#485</a></li>
<li><a href="https://github.com/timotk"><code>@​timotk</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/478">actions/github-script#478</a></li>
<li><a
href="https://github.com/iamstarkov"><code>@​iamstarkov</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/508">actions/github-script#508</a></li>
<li><a href="https://github.com/neilime"><code>@​neilime</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/546">actions/github-script#546</a></li>
<li><a href="https://github.com/nebuk89"><code>@​nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/github-script/pull/610">actions/github-script#610</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/github-script/compare/v7...v7.1.0">https://github.com/actions/github-script/compare/v7...v7.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ed597411d8"><code>ed59741</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/github-script/issues/653">#653</a>
from actions/sneha-krip/readme-for-v8</li>
<li><a
href="2dc352e4ba"><code>2dc352e</code></a>
Bold minimum Actions Runner version in README</li>
<li><a
href="01e118c8d0"><code>01e118c</code></a>
Update README for Node 24 runtime requirements</li>
<li><a
href="8b222ac82e"><code>8b222ac</code></a>
Apply suggestion from <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a></li>
<li><a
href="adc0eeac99"><code>adc0eea</code></a>
README for updating actions/github-script from v7 to v8</li>
<li><a
href="20fe497b3f"><code>20fe497</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/github-script/issues/637">#637</a>
from actions/node24</li>
<li><a
href="e7b7f222b1"><code>e7b7f22</code></a>
update licenses</li>
<li><a
href="2c81ba05f3"><code>2c81ba0</code></a>
Update Node.js version support to 24.x</li>
<li><a
href="f28e40c7f3"><code>f28e40c</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/github-script/issues/610">#610</a>
from actions/nebuk89-patch-1</li>
<li><a
href="1ae9958572"><code>1ae9958</code></a>
Update README.md</li>
<li>Additional commits viewable in <a
href="60a0d83039...ed597411d8">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/github-script&package-manager=github_actions&previous-version=7.0.1&new-version=8.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:20:00 -07:00
dependabot[bot]
59e5bde991
chore(github-deps): bump astral-sh/setup-uv from 6.7.0 to 6.8.0 (#3686)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from
6.7.0 to 6.8.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d0cc045d04"><code>d0cc045</code></a>
Always show prune cache output (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/597">#597</a>)</li>
<li><a
href="2841f9f5c1"><code>2841f9f</code></a>
Bump zizmorcore/zizmor-action from 0.1.2 to 0.2.0 (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/571">#571</a>)</li>
<li><a
href="e554b93b80"><code>e554b93</code></a>
Add **/*.py.lock to cache-dependency-glob (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/590">#590</a>)</li>
<li><a
href="c7d85d9988"><code>c7d85d9</code></a>
chore: update known versions for 0.8.20</li>
<li><a
href="07f2cb5db9"><code>07f2cb5</code></a>
persist credentials for version update (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/584">#584</a>)</li>
<li><a
href="208b0c0ee4"><code>208b0c0</code></a>
README.md: Fix Python versions and update checkout action (<a
href="https://redirect.github.com/astral-sh/setup-uv/issues/572">#572</a>)</li>
<li>See full diff in <a
href="b75a909f75...d0cc045d04">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.7.0&new-version=6.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:19:50 -07:00
dependabot[bot]
45cf74db33
chore(python-deps): bump requests from 2.32.4 to 2.32.5 (#3691)
Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.32.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/psf/requests/releases">requests's
releases</a>.</em></p>
<blockquote>
<h2>v2.32.5</h2>
<h2>2.32.5 (2025-08-18)</h2>
<p><strong>Bugfixes</strong></p>
<ul>
<li>The SSLContext caching feature originally introduced in 2.32.0 has
created
a new class of issues in Requests that have had negative impact across a
number
of use cases. The Requests team has decided to revert this feature as
long term
maintenance of it is proving to be unsustainable in its current
iteration.</li>
</ul>
<p><strong>Deprecations</strong></p>
<ul>
<li>Added support for Python 3.14.</li>
<li>Dropped support for Python 3.8 following its end of support.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/psf/requests/blob/main/HISTORY.md">requests's
changelog</a>.</em></p>
<blockquote>
<h2>2.32.5 (2025-08-18)</h2>
<p><strong>Bugfixes</strong></p>
<ul>
<li>The SSLContext caching feature originally introduced in 2.32.0 has
created
a new class of issues in Requests that have had negative impact across a
number
of use cases. The Requests team has decided to revert this feature as
long term
maintenance of it is proving to be unsustainable in its current
iteration.</li>
</ul>
<p><strong>Deprecations</strong></p>
<ul>
<li>Added support for Python 3.14.</li>
<li>Dropped support for Python 3.8 following its end of support.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b25c87d7cb"><code>b25c87d</code></a>
v2.32.5</li>
<li><a
href="131e506079"><code>131e506</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/7010">#7010</a>
from psf/dependabot/github_actions/actions/checkout-...</li>
<li><a
href="b336cb2bc6"><code>b336cb2</code></a>
Bump actions/checkout from 4.2.0 to 5.0.0</li>
<li><a
href="46e939b552"><code>46e939b</code></a>
Update publish workflow to use <code>artifact-id</code> instead of
<code>name</code></li>
<li><a
href="4b9c546aa3"><code>4b9c546</code></a>
Merge pull request <a
href="https://redirect.github.com/psf/requests/issues/6999">#6999</a>
from psf/dependabot/github_actions/step-security/har...</li>
<li><a
href="7618dbef01"><code>7618dbe</code></a>
Bump step-security/harden-runner from 2.12.0 to 2.13.0</li>
<li><a
href="2edca11103"><code>2edca11</code></a>
Add support for Python 3.14 and drop support for Python 3.8 (<a
href="https://redirect.github.com/psf/requests/issues/6993">#6993</a>)</li>
<li><a
href="fec96cd597"><code>fec96cd</code></a>
Update Makefile rules (<a
href="https://redirect.github.com/psf/requests/issues/6996">#6996</a>)</li>
<li><a
href="d58d8aa2f4"><code>d58d8aa</code></a>
docs: clarify timeout parameter uses seconds in Session.request (<a
href="https://redirect.github.com/psf/requests/issues/6994">#6994</a>)</li>
<li><a
href="91a3eabd3d"><code>91a3eab</code></a>
Bump github/codeql-action from 3.28.5 to 3.29.0</li>
<li>Additional commits viewable in <a
href="https://github.com/psf/requests/compare/v2.32.4...v2.32.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=requests&package-manager=uv&previous-version=2.32.4&new-version=2.32.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-05 21:19:19 -07:00
dependabot[bot]
c0f0a03529
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3693)
Bumps
[react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom)
and
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom).
These dependencies needed to be updated together.
Updates `react-dom` from 19.1.1 to 19.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/releases">react-dom's
releases</a>.</em></p>
<blockquote>
<h2>19.2.0 (Oct 1, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h2>New React Features</h2>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h2>New React DOM Features</h2>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h2>Notable changes</h2>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h2>All Changes</h2>
<h3>React</h3>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h3>React DOM</h3>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
<li>Warn for using a React owned node as a Container if it also has text
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32774">#32774</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's
changelog</a>.</em></p>
<blockquote>
<h2>19.2.0 (October 1st, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h3>New React Features</h3>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h3>New React DOM Features</h3>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h3>Notable changes</h3>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h3>All Changes</h3>
<h4>React</h4>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h4>React DOM</h4>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="861811347b"><code>8618113</code></a>
Bump scheduler version (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34671">#34671</a>)</li>
<li><a
href="1bd1f01f2a"><code>1bd1f01</code></a>
Ship partial-prerendering APIs to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34633">#34633</a>)</li>
<li><a
href="2f0649a0b2"><code>2f0649a</code></a>
[Fizz] Remove <code>nonce</code> option from resume-and-prerender APIs
(<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34664">#34664</a>)</li>
<li><a
href="5667a41fe4"><code>5667a41</code></a>
Bump next prerelease version numbers (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34639">#34639</a>)</li>
<li><a
href="e08f53b182"><code>e08f53b</code></a>
Match <code>react-dom/static</code> test entrypoints and published
entrypoints (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34599">#34599</a>)</li>
<li><a
href="8bb7241f4c"><code>8bb7241</code></a>
Bump useEffectEvent to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34610">#34610</a>)</li>
<li><a
href="83c88ad470"><code>83c88ad</code></a>
Handle fabric root level fragment with compareDocumentPosition (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34533">#34533</a>)</li>
<li><a
href="68f00c901c"><code>68f00c9</code></a>
Release Activity in Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34374">#34374</a>)</li>
<li><a
href="3168e08f83"><code>3168e08</code></a>
[flags] enable opt-in for enableDefaultTransitionIndicator (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34373">#34373</a>)</li>
<li><a
href="3434ff4f4b"><code>3434ff4</code></a>
Add scrollIntoView to fragment instances (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32814">#32814</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/facebook/react/commits/v19.2.0/packages/react-dom">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/react-dom` from 19.1.9 to 19.2.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:02:31 -04:00
dependabot[bot]
91c6a8a3a3
chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui (#3694)
Bumps [next](https://github.com/vercel/next.js) from 15.5.3 to 15.5.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.4</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: ensure onRequestError is invoked when otel enabled (<a
href="https://redirect.github.com/vercel/next.js/issues/83343">#83343</a>)</li>
<li>fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)</li>
<li>[devtool] fix overlay styles are missing (<a
href="https://redirect.github.com/vercel/next.js/issues/83721">#83721</a>)</li>
<li>Turbopack: don't match dynamic pattern for node_modules packages (<a
href="https://redirect.github.com/vercel/next.js/issues/83176">#83176</a>)</li>
<li>Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/82911">#82911</a>)</li>
<li>[turbopack] Improve handling of symlink resolution errors in
track_glob and read_glob (<a
href="https://redirect.github.com/vercel/next.js/issues/83357">#83357</a>)</li>
<li>Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/82939">#82939</a>)</li>
<li>fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)</li>
<li>Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84077">#84077</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>[CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)</li>
<li>CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83745">#83745</a>)</li>
<li>docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/yiminghe"><code>@​yiminghe</code></a>, <a
href="https://github.com/huozhi"><code>@​huozhi</code></a>, <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/mischnic"><code>@​mischnic</code></a>, <a
href="https://github.com/lukesandberg"><code>@​lukesandberg</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a>, <a
href="https://github.com/leerob"><code>@​leerob</code></a>, <a
href="https://github.com/fufuShih"><code>@​fufuShih</code></a>, <a
href="https://github.com/dwrth"><code>@​dwrth</code></a>, <a
href="https://github.com/aymericzip"><code>@​aymericzip</code></a>, <a
href="https://github.com/obendev"><code>@​obendev</code></a>, <a
href="https://github.com/molebox"><code>@​molebox</code></a>, <a
href="https://github.com/OoMNoO"><code>@​OoMNoO</code></a>, <a
href="https://github.com/pontasan"><code>@​pontasan</code></a>, <a
href="https://github.com/styfle"><code>@​styfle</code></a>, <a
href="https://github.com/HondaYt"><code>@​HondaYt</code></a>, <a
href="https://github.com/ryuapp"><code>@​ryuapp</code></a>, <a
href="https://github.com/lpalmes"><code>@​lpalmes</code></a>, and <a
href="https://github.com/ijjk"><code>@​ijjk</code></a> for helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40f1d7814d"><code>40f1d78</code></a>
v15.5.4</li>
<li><a
href="cb30f0a176"><code>cb30f0a</code></a>
[backport] docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
<li><a
href="b6a32bb579"><code>b6a32bb</code></a>
[backport] [CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/84087">#84087</a>)</li>
<li><a
href="26d61f1e9a"><code>26d61f1</code></a>
[backport] Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84079">#84079</a>)</li>
<li><a
href="e11e87a547"><code>e11e87a</code></a>
[backport] fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83">#83</a>...</li>
<li><a
href="0a29888575"><code>0a29888</code></a>
[backport] fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)...</li>
<li><a
href="7a53950c13"><code>7a53950</code></a>
[backport] Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/83804">#83804</a>)</li>
<li><a
href="050bdf1ae7"><code>050bdf1</code></a>
[backport] Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/83816">#83816</a>)</li>
<li><a
href="1f6ea09f85"><code>1f6ea09</code></a>
[backport] Turbopack: Improve handling of symlink resolution errors (<a
href="https://redirect.github.com/vercel/next.js/issues/83805">#83805</a>)</li>
<li><a
href="c7d1855499"><code>c7d1855</code></a>
[backport] CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83860">#83860</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.5.3...v15.5.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.5.3&new-version=15.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:01:38 -04:00
Matthew Farrellee
351c4b98e4
chore: inference=remote::llama-openai-compat does not support /v1/completion (#3683)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m22s
## What does this PR do?

skip completion tests for inference=remote::llama-openai-compat

## Test Plan

ci
2025-10-04 11:36:48 -07:00
Ashwin Bharambe
045a0c1d57
feat(tests): implement test isolation for inference recordings (#3681)
Uses test_id in request hashes and test-scoped subdirectories to prevent
cross-test contamination. Model list endpoints exclude test_id to enable
merging recordings from different servers.

Additionally, this PR adds a `record-if-missing` mode (which we will use
instead of `record` which records everything) which is very useful.

🤖 Co-authored with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-04 11:34:18 -07:00
Young Han
f176196fba
docs: Update links in README for quick start and documentation (#3678)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m59s
Previous quick start and documentation links linked to `Page Not Found`.

# What does this PR do?
<img width="900" height="316" alt="image"
src="https://github.com/user-attachments/assets/60ceac27-18db-4a3b-852f-8d139309f4cb"
/>
2025-10-03 20:51:46 -07:00
ehhuang
c21bb0e837
chore: fix setup_telemetry script (#3680)
# What does this PR do?
Added missing configuration files

## Test Plan
run ./scripts/telemetry/setup_telemetry.sh
```
OTEL_SERVICE_NAME=llama_stack OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 TELEMETRY_SINKS=otel_trace,otel_metric uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run
```
Navigate to grafana localhost:3000, query metrics and traces
2025-10-03 17:36:35 -07:00
Ashwin Bharambe
3f36bfaeaa
chore(tests): normalize recording IDs and timestamps to reduce git diff noise (#3676)
IDs are now deterministic hashes based on request content, and
timestamps are normalized to constants, eliminating spurious changes
when re-recording tests.

## Changes
- Updated `inference_recorder.py` to normalize IDs and timestamps during
recording
- Added `scripts/normalize_recordings.py` utility to re-normalize
existing recordings
- Created documentation in `tests/integration/recordings/README.md`
- Normalized 350 existing recording files
2025-10-03 17:26:11 -07:00
Alexey Rybak
6bcd3e25f2
chore: update CODEOWNERS (#3613)
# What does this PR do?

Update CODEOWNERS file 

## Test Plan
N/A
2025-10-03 17:12:34 -07:00
Francisco Arceo
7ec7e0c1ac
chore: Add weaviate client to unit group in pyproject.toml and uv.lock (#3675)
# What does this PR do?
`uv add "weaviate-client>=4.16.4" --group unit`

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-03 14:02:20 -07:00
Ashwin Bharambe
61b4238912
feat(api): add extra_body parameter support with shields example (#3670)
## Summary
Introduce `ExtraBodyField` annotation to enable parameters that arrive
via extra_body in client SDKs but are accessible server-side with full
typing.

These parameters are documented in OpenAPI specs under
**`x-llama-stack-extra-body-params`** but excluded from generated SDK
signatures.

Add `shields` parameter to `create_openai_response` as the first
implementation using this pattern.

## Test Plan
- added an integration test which checks that shields parameter passed
via extra_body reaches server implementation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-03 13:25:09 -07:00
Ashwin Bharambe
188a56af5c fix: merge workflows to avoid GITHUB_TOKEN limitation
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m16s
2025-10-03 12:04:02 -07:00
Ashwin Bharambe
f232b78ad6 fix(ci): update hashes 2025-10-03 11:58:49 -07:00
Ashwin Bharambe
5a44b9ff82
feat: add comment-triggered pre-commit bot for PRs (#3672)
## Summary

This PR adds a comment-triggered GitHub Actions workflow that allows
running pre-commit hooks on-demand for any pull request. When someone
comments `@github-actions run precommit` on a PR, the bot automatically
runs all pre-commit hooks and commits any formatting or linting fixes
directly to the PR branch.

The implementation uses a secure two-workflow approach: a trigger
workflow validates permissions and dispatches to an execution workflow
that runs pre-commit in a privileged context. This works safely for both
same-repo and fork PRs, with permission checks ensuring only PR authors
or repository collaborators can trigger the bot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-03 11:51:40 -07:00
Alexey Rybak
9f6c658f2a
docs: update OG image (#3669)
# What does this PR do?
* Updates OG image for docs preview

## Test Plan
* Manual testing
2025-10-03 10:22:54 -07:00
Matthew Farrellee
ce77c27ff8
chore: use remoteinferenceproviderconfig for remote inference providers (#3668)
# What does this PR do?

on the path to maintainable impls of inference providers. make all
configs instances of RemoteInferenceProviderConfig.

## Test Plan

ci
2025-10-03 08:48:42 -07:00
Francisco Arceo
a20e8eac8c
feat: Add OpenAI Conversations API (#3429)
# What does this PR do?

Initial implementation for `Conversations` and `ConversationItems` using
`AuthorizedSqlStore` with endpoints to:
- CREATE
- UPDATE
- GET/RETRIEVE/LIST
- DELETE

Set `level=LLAMA_STACK_API_V1`.

NOTE: This does not currently incorporate changes for Responses, that'll
be done in a subsequent PR.

Closes https://github.com/llamastack/llama-stack/issues/3235

## Test Plan
- Unit tests
- Integration tests

Also comparison of [OpenAPI spec for OpenAI
API](https://github.com/openai/openai-openapi/tree/manual_spec)
```bash
oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \
--match-path '(^/v1/openai/v1/conversations.*|^/conversations.*)'
```

Note I still have some uncertainty about this, I borrowed this info from
@cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need
to spend more time to confirm it's working, at the moment it suggests it
does.

UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks
like currently the spec does not list Conversations, so that analysis is
useless. Noting for future reference.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-03 08:47:18 -07:00
Charlie Doern
a09e30bd87
docs!: adjust external provider docs (#3484)
# What does this PR do?

now that we consolidated the providerspec types and got rid of
`AdapterSpec`, adjust external.md

BREAKING CHANGE: external providers must update their
`get_provider_spec` function to use `RemoteProviderSpec` properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-03 15:48:41 +02:00
Matthew Farrellee
d266c59c2a
chore: remove deprecated inference.chat_completion implementations (#3654)
# What does this PR do?

remove unused chat_completion implementations

vllm features ported -
 - requires max_tokens be set, use config value
 - set tool_choice to none if no tools provided


## Test Plan

ci
2025-10-03 07:55:34 -04:00
Anastas Stoyanovsky
4dfbe46954
fix(docs): Correct indentation in documented example for access_policy (#3652)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 17s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m21s
`access_policy` needs to be inside the `auth` section in config; this PR
corrects indentation in a documented example of configuring that
section.
2025-10-03 12:19:52 +02:00
Christian Zaccaria
bcdbb53be3
feat: implement keyword and hybrid search for Weaviate provider (#3264)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- This PR implements keyword and hybrid search for Weaviate DB based on
its inbuilt functions.
- Added fixtures to conftest.py for Weaviate.
- Enabled integration tests for remote Weaviate on all 3 search modes.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3010 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Unit tests and integration tests should pass on this PR.
2025-10-03 10:22:30 +02:00
Doug Edgar
52c8df2322
feat: auto-detect Console width (#3327)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Addresses Issue #3271 - "Starting LLS server locally on a terminal with
120 chars width results in an output with empty lines".

This removes the specific 150-character width limit specified for the
Console, and will now auto-detect the terminal width instead. Now the
formatting of Console output is consistent across different sizes of
terminal windows.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3271

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching the server with several different sizes of terminal windows
results in Console output without unexpected spacing. e.g. `python -m
llama_stack.core.server.server /tmp/run.yaml --port 8321`

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-10-03 10:19:31 +02:00
Matthew Farrellee
0a41c4ead0
chore: OpenAIMixin implements ModelsProtocolPrivate (#3662)
# What does this PR do?

add ModelsProtocolPrivate methods to OpenAIMixin

this will allow providers using OpenAIMixin to use a common interface


## Test Plan

ci w/ new tests
2025-10-02 21:32:02 -07:00
ehhuang
14a94e9894
fix: responses <> chat completion input conversion (#3645)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?

closes #3268
closes #3498

When resuming from previous response ID, currently we attempt to convert
from the stored responses input to chat completion messages, which is
not always possible, e.g. for tool calls where some data is lost once
converted from chat completion message to repsonses input format.

This PR stores the chat completion messages that correspond to the
_last_ call to chat completion, which is sufficient to be resumed from
in the next responses API call, where we load these saved messages and
skip conversion entirely.

Separate issue to optimize storage:
https://github.com/llamastack/llama-stack/issues/3646

## Test Plan
existing CI tests
2025-10-02 16:01:08 -07:00
Ashwin Bharambe
ef0736527d
feat(tools)!: substantial clean up of "Tool" related datatypes (#3627)
This is a sweeping change to clean up some gunk around our "Tool"
definitions.

First, we had two types `Tool` and `ToolDef`. The first of these was a
"Resource" type for the registry but we had stopped registering tools
inside the Registry long back (and only registered ToolGroups.) The
latter was for specifying tools for the Agents API. This PR removes the
former and adds an optional `toolgroup_id` field to the latter.

Secondly, as pointed out by @bbrowning in
https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132,
we were doing a lossy conversion from a full JSON schema from the MCP
tool specification into our ToolDefinition to send it to the model.
There is no necessity to do this -- we ourselves aren't doing any
execution at all but merely passing it to the chat completions API which
supports this. By doing this (and by doing it poorly), we encountered
limitations like not supporting array items, or not resolving $refs,
etc.

To fix this, we replaced the `parameters` field by `{ input_schema,
output_schema }` which can be full blown JSON schemas.

Finally, there were some types in our llama-related chat format
conversion which needed some cleanup. We are taking this opportunity to
clean those up.

This PR is a substantial breaking change to the API. However, given our
window for introducing breaking changes, this suits us just fine. I will
be landing a concurrent `llama-stack-client` change as well since API
shapes are changing.
2025-10-02 15:12:03 -07:00
ehhuang
1f5003d50e
chore: fix precommit (#3663)
# What does this PR do?


## Test Plan
2025-10-02 14:51:41 -07:00
ehhuang
ceca3c056f
chore: fix/add logging categories (#3658)
# What does this PR do?
These aren't controllable by LLAMA_STACK_LOGGING

```

tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [  3%]
tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [  7%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] 
instantiating llama_stack_client
WARNING  2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20                  
WARNING  2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
INFO     2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20                    
WARNING  2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20             
WARNING  2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20        
INFO     2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20 
```

## Test Plan
2025-10-02 13:10:13 -07:00
Ashwin Bharambe
6afa96b0b9 fix(api): fix a mistake from #3636 which overwrote POST /responses 2025-10-02 13:03:17 -07:00
Matthew Farrellee
0e13512dd7
chore: fix agents tests for non-ollama providers, provide max_tokens (#3657)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?

closes #3656 


## Test Plan

openai is not enabled in ci, so manual testing with:

```
$ ./scripts/integration-tests.sh --stack-config ci-tests --suite base --setup gpt --subdirs agents --inference-mode live                            
=== Llama Stack Integration Test Runner ===
Stack Config: ci-tests
Setup: gpt
Inference Mode: live
Test Suite: base
Test Subdirs: agents
Test Pattern: 

Checking llama packages
llama-stack                              0.2.23          .../llama-stack
llama-stack-client                       0.3.0a3
ollama                                   0.5.1
=== System Resources Before Tests ===
...
=== Applying Setup Environment Variables ===
Setting up environment variables:
=== Running Integration Tests ===
Test subdirs to run: agents
Added test files from agents: 3 files

=== Running all collected tests in a single pytest command ===
Total test files: 3
+ pytest -s -v tests/integration/agents/test_persistence.py tests/integration/agents/test_openai_responses.py tests/integration/agents/test_agents.py --stack-config=ci-tests --inference-mode=live -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag )' --setup=gpt --color=yes --capture=tee-sys
WARNING  2025-10-02 13:14:32,653 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,043 root:258 uncategorized: Unknown logging category: tests. Falling back to default 'root' level: 20                    
INFO     2025-10-02 13:14:33,063 tests.integration.conftest:86 tests: Applying setup 'gpt'                                                            
========================================= test session starts ==========================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- .../.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'Linux-6.16.7-200.fc42.x86_64-x86_64-with-glibc2.41', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'html': '4.1.1', 'anyio': '4.9.0', 'timeout': '2.4.0', 'cov': '6.2.1', 'asyncio': '1.1.0', 'nbval': '0.11.0', 'socket': '0.7.0', 'json-report': '1.5.0', 'metadata': '3.1.1'}}
rootdir: ...
configfile: pyproject.toml
plugins: html-4.1.1, anyio-4.9.0, timeout-2.4.0, cov-6.2.1, asyncio-1.1.0, nbval-0.11.0, socket-0.7.0, json-report-1.5.0, metadata-3.1.1
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 32 items / 6 deselected / 26 selected                                                        

tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [  3%]
tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [  7%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] 
instantiating llama_stack_client
WARNING  2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20                  
WARNING  2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
INFO     2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20                    
WARNING  2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20             
WARNING  2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20        
INFO     2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:36,726 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass    
         Fireworks API Key in the header X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}                                         
WARNING  2025-10-02 13:14:36,727 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass     
         Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>}                                           
WARNING  2025-10-02 13:14:38,404 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key 
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
WARNING  2025-10-02 13:14:38,406 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is 
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in 
         the provider config.                                                                                                                         
WARNING  2025-10-02 13:14:38,408 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is   
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in   
         the provider config.                                                                                                                         
WARNING  2025-10-02 13:14:38,411 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key 
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
llama_stack_client instantiated in 5.237s
SKIPPED [ 11%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items[openai_client-txt=openai/gpt-4o] SKIPPED [ 15%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items_with_limit_and_order[txt=openai/gpt-4o] SKIPPED [ 19%]
tests/integration/agents/test_openai_responses.py::test_function_call_output_response[txt=openai/gpt-4o] SKIPPED [ 23%]
tests/integration/agents/test_openai_responses.py::test_function_call_output_response_with_none_arguments[txt=openai/gpt-4o] SKIPPED [ 26%]
tests/integration/agents/test_agents.py::test_agent_simple[openai/gpt-4o] PASSED                 [ 30%]
tests/integration/agents/test_agents.py::test_agent_name[txt=openai/gpt-4o] SKIPPED (this te...) [ 34%]
tests/integration/agents/test_agents.py::test_tool_config[openai/gpt-4o] PASSED                  [ 38%]
tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o] FAILED                  [ 42%]
tests/integration/agents/test_agents.py::test_custom_tool_infinite_loop[openai/gpt-4o] PASSED    [ 46%]
tests/integration/agents/test_agents.py::test_tool_choice_required[openai/gpt-4o] INFO     2025-10-02 13:14:51,559 llama_stack.providers.inline.agents.meta_reference.agent_instance:691 agents::meta_reference: done with MAX          
         iterations (2), exiting.                                                                                                                     
PASSED         [ 50%]
tests/integration/agents/test_agents.py::test_tool_choice_none[openai/gpt-4o] PASSED             [ 53%]
tests/integration/agents/test_agents.py::test_tool_choice_get_boiling_point[openai/gpt-4o] XFAIL [ 57%]
tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools0] PASSED [ 61%]
tests/integration/agents/test_agents.py::test_multi_tool_calls[openai/gpt-4o] PASSED             [ 65%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-False] SKIPPED [ 69%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items[client_with_models-txt=openai/gpt-4o] PASSED [ 73%]
tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools1] PASSED [ 76%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools1-True] SKIPPED [ 80%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools1-False] SKIPPED [ 84%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools0-True] SKIPPED [ 88%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools0-False] SKIPPED [ 92%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools1-True] SKIPPED [ 96%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools1-False] SKIPPED [100%]

=============================================== FAILURES ===============================================
___________________________________ test_custom_tool[openai/gpt-4o] ____________________________________
tests/integration/agents/test_agents.py:370: in test_custom_tool
    assert "-100" in logs_str
E   assert '-100' in "inference> Polyjuice Potion is a fictional substance from the Harry Potter series, and it doesn't have a scientifically defined boiling point. If you have any other real liquid in mind, feel free to ask!"
========================================= slowest 10 durations =========================================
5.47s setup    tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True]
4.78s call     tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o]
3.01s call     tests/integration/agents/test_agents.py::test_tool_choice_required[openai/gpt-4o]
2.97s call     tests/integration/agents/test_agents.py::test_agent_simple[openai/gpt-4o]
2.85s call     tests/integration/agents/test_agents.py::test_tool_choice_none[openai/gpt-4o]
2.06s call     tests/integration/agents/test_agents.py::test_multi_tool_calls[openai/gpt-4o]
1.83s call     tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools0]
1.83s call     tests/integration/agents/test_agents.py::test_custom_tool_infinite_loop[openai/gpt-4o]
1.29s call     tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools1]
0.57s call     tests/integration/agents/test_openai_responses.py::test_list_response_input_items[client_with_models-txt=openai/gpt-4o]
======================================= short test summary info ========================================
FAILED tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o] - assert '-100' in "inference> Polyjuice Potion is a fictional substance from the Harry Potter series...
=========== 1 failed, 9 passed, 15 skipped, 6 deselected, 1 xfailed, 139 warnings in 27.18s ============
```
note: the failure is separate from the issue being fixed
2025-10-02 14:30:13 -04:00
Alexey Rybak
24ee577cb0
docs: API spec generation for Stainless (#3655)
# What does this PR do?
* Adds stainless-llama-stack-spec.yaml for Stainless client generation,
which comprises stable + experimental APIs

## Test Plan
* Manual generation
2025-10-02 09:25:09 -07:00
Kelly Brown
1d02385e48
docs: Update docs navbar config (#3653)
## Description

Currently, the docs page has the home book opened by default. This PR
updates the .ts so that the sidebar books are collapsed when you first
open the webpage
2025-10-02 16:48:38 +02:00
Sébastien Han
4161102100
chore!: add double routes for v1/openai/v1 (#3636)
So that users get a warning in 0.3.0 and we remove them in 0.4.0.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-02 16:11:05 +02:00
Charlie Doern
f1748e2f92
fix: re-enable conformance skipping ability (#3651)
# What does this PR do?

this was broken by #3631, re-enable this ability by only using oasdiff
when .skip != 'true'

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-02 15:04:26 +02:00
Aakanksha Duggal
7e48cc48bc
refactor(agents): migrate to OpenAI chat completions API (#3323)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 18s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m16s
2025-10-02 06:50:32 -04:00
Chacksu
426dc54883
docs: Fix Dell distro documentation code snippets (#3640)
# What does this PR do?
* Updates code snippets for Dell distribution, fixing specific user home
directory in code (replacing with $HOME) and updates docker instructions
to use `docker` instead of `podman`.

## Test Plan
N.A.

Co-authored-by: Connor Hack <connorhack@fb.com>
2025-10-02 11:11:30 +02:00
Alexey Rybak
382eb25398
docs: fix more broken links (#3649)
# What does this PR do?
* Fixes some more documentation links

## Test Plan
* Manual testing
2025-10-02 10:43:49 +02:00
Alexey Rybak
cb36b3bab1
docs: add favicon and mobile styling (#3650)
# What does this PR do?
* Adds favicon
* Replaces old llama-stack theme image 
* Adds some mobile styling

## Test Plan
* Manual testing
2025-10-02 10:42:54 +02:00
2211 changed files with 971257 additions and 110402 deletions

19
.dockerignore Normal file
View file

@ -0,0 +1,19 @@
.venv
__pycache__
*.pyc
*.pyo
*.pyd
*.so
.git
.gitignore
htmlcov*
.coverage
coverage*
.cache
.mypy_cache
.pytest_cache
.ruff_cache
uv.lock
node_modules
build
/tmp

1
.gitattributes vendored Normal file
View file

@ -0,0 +1 @@
tests/**/recordings/** linguist-generated=true

2
.github/CODEOWNERS vendored
View file

@ -2,4 +2,4 @@
# These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence,
* @ashwinb @yanxi0830 @hardikjshah @raghotham @ehhuang @terrytangyuan @leseb @bbrowning @reluctantfuturist @mattf @slekkala1
* @ashwinb @yanxi0830 @hardikjshah @raghotham @ehhuang @terrytangyuan @leseb @bbrowning @reluctantfuturist @mattf @slekkala1 @franciscojavierarceo

1
.github/TRIAGERS.md vendored
View file

@ -1,2 +1 @@
# This file documents Triage members in the Llama Stack community
@franciscojavierarceo

View file

@ -54,6 +54,10 @@ runs:
SCRIPT_ARGS="$SCRIPT_ARGS --pattern ${{ inputs.pattern }}"
fi
echo "=== Running command ==="
echo "uv run --no-sync ./scripts/integration-tests.sh $SCRIPT_ARGS"
echo ""
uv run --no-sync ./scripts/integration-tests.sh $SCRIPT_ARGS | tee pytest-${{ inputs.inference-mode }}.log
@ -62,11 +66,11 @@ runs:
shell: bash
run: |
echo "Checking for recording changes"
git status --porcelain tests/integration/recordings/
git status --porcelain tests/integration/
if [[ -n $(git status --porcelain tests/integration/recordings/) ]]; then
if [[ -n $(git status --porcelain tests/integration/) ]]; then
echo "New recordings detected, committing and pushing"
git add tests/integration/recordings/
git add tests/integration/
git commit -m "Recordings update from CI (suite: ${{ inputs.suite }})"
git fetch origin ${{ github.ref_name }}
@ -78,11 +82,13 @@ runs:
echo "No recording changes"
fi
- name: Write inference logs to file
- name: Write docker logs to file
if: ${{ always() }}
shell: bash
run: |
sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log || true
# Ollama logs (if ollama container exists)
sudo docker logs ollama > ollama-${{ inputs.inference-mode }}.log 2>&1 || true
# Note: distro container logs are now dumped in integration-tests.sh before container is removed
- name: Upload logs
if: ${{ always() }}

View file

@ -57,7 +57,7 @@ runs:
echo "Building Llama Stack"
LLAMA_STACK_DIR=. \
uv run --no-sync llama stack build --template ci-tests --image-type venv
uv run --no-sync llama stack list-deps ci-tests | xargs -L1 uv pip install
- name: Configure git for commits
shell: bash

View file

@ -12,7 +12,9 @@ Llama Stack uses GitHub Actions for Continuous Integration (CI). Below is a tabl
| Integration Tests (Replay) | [integration-tests.yml](integration-tests.yml) | Run the integration test suites from tests/integration in replay mode |
| Vector IO Integration Tests | [integration-vector-io-tests.yml](integration-vector-io-tests.yml) | Run the integration test suite with various VectorIO providers |
| Pre-commit | [pre-commit.yml](pre-commit.yml) | Run pre-commit checks |
| Pre-commit Bot | [precommit-trigger.yml](precommit-trigger.yml) | Pre-commit bot for PR |
| Test Llama Stack Build | [providers-build.yml](providers-build.yml) | Test llama stack build |
| Test llama stack list-deps | [providers-list-deps.yml](providers-list-deps.yml) | Test llama stack list-deps |
| Python Package Build Test | [python-build-test.yml](python-build-test.yml) | Test building the llama-stack PyPI project |
| Integration Tests (Record) | [record-integration-tests.yml](record-integration-tests.yml) | Run the integration test suite from tests/integration |
| Check semantic PR titles | [semantic-pr.yml](semantic-pr.yml) | Ensure that PR titles follow the conventional commit spec |

View file

@ -43,9 +43,9 @@ jobs:
# Check if we should skip conformance testing due to breaking changes
- name: Check if conformance test should be skipped
id: skip-check
env:
PR_TITLE: ${{ github.event.pull_request.title }}
run: |
PR_TITLE="${{ github.event.pull_request.title }}"
# Skip if title contains "!:" indicating breaking change (like "feat!:")
if [[ "$PR_TITLE" == *"!:"* ]]; then
echo "skip=true" >> $GITHUB_OUTPUT
@ -96,6 +96,7 @@ jobs:
# Verify API specs exist for conformance testing
- name: Check API Specs
if: steps.skip-check.outputs.skip != 'true'
run: |
echo "Checking for API specification files..."
@ -134,10 +135,10 @@ jobs:
- name: Run OpenAPI Breaking Change Diff
if: steps.skip-check.outputs.skip != 'true'
run: |
oasdiff breaking --fail-on ERR base/docs/static/llama-stack-spec.yaml docs/static/llama-stack-spec.yaml --match-path '^/v1/'
oasdiff breaking --fail-on ERR $BASE_SPEC $CURRENT_SPEC --match-path '^/v1/'
# Report when test is skipped
- name: Report skip reason
if: steps.skip-check.outputs.skip == 'true'
run: |
oasdiff breaking --fail-on ERR $BASE_SPEC $CURRENT_SPEC --match-path '^/v1/'
echo "Conformance test skipped due to breaking change indicator"

View file

@ -30,8 +30,11 @@ jobs:
- name: Build a single provider
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync \
llama stack build --template starter --image-type container --image-name test
docker build . \
-f containers/Containerfile \
--build-arg INSTALL_MODE=editable \
--build-arg DISTRO_NAME=starter \
--tag llama-stack:starter-ci
- name: Run installer end-to-end
run: |

View file

@ -73,6 +73,24 @@ jobs:
image_name: kube
apis: []
providers: {}
storage:
backends:
kv_default:
type: kv_sqlite
db_path: $run_dir/kvstore.db
sql_default:
type: sql_sqlite
db_path: $run_dir/sql_store.db
stores:
metadata:
namespace: registry
backend: kv_default
inference:
table_name: inference_store
backend: sql_default
conversations:
table_name: openai_conversations
backend: sql_default
server:
port: 8321
EOF
@ -84,13 +102,16 @@ jobs:
yq eval '.server.auth.provider_config.jwks.token = "${{ env.TOKEN }}"' -i $run_dir/run.yaml
cat $run_dir/run.yaml
nohup uv run llama stack run $run_dir/run.yaml --image-type venv > server.log 2>&1 &
# avoid line breaks in the server log, especially because we grep it below.
export LLAMA_STACK_LOG_WIDTH=200
nohup uv run llama stack run $run_dir/run.yaml > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready
run: |
echo "Waiting for Llama Stack server..."
for i in {1..30}; do
if curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://localhost:8321/v1/health | grep -q "OK"; then
# Note: /v1/health does not require authentication
if curl -s -L http://localhost:8321/v1/health | grep -q "OK"; then
echo "Llama Stack server is up!"
if grep -q "Enabling authentication with provider: ${{ matrix.auth-provider }}" server.log; then
echo "Llama Stack server is configured to use ${{ matrix.auth-provider }} auth"
@ -109,4 +130,27 @@ jobs:
- name: Test auth
run: |
curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://127.0.0.1:8321/v1/providers|jq
echo "Testing /v1/version without token (should succeed)..."
if curl -s -L -o /dev/null -w "%{http_code}" http://127.0.0.1:8321/v1/version | grep -q "200"; then
echo "/v1/version accessible without token (200)"
else
echo "/v1/version returned non-200 status without token"
exit 1
fi
echo "Testing /v1/providers without token (should fail with 401)..."
if curl -s -L -o /dev/null -w "%{http_code}" http://127.0.0.1:8321/v1/providers | grep -q "401"; then
echo "/v1/providers blocked without token (401)"
else
echo "/v1/providers did not return 401 without token"
exit 1
fi
echo "Testing /v1/providers with valid token (should succeed)..."
curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://127.0.0.1:8321/v1/providers | jq
if [ $? -eq 0 ]; then
echo "/v1/providers accessible with valid token"
else
echo "/v1/providers failed with valid token"
exit 1
fi

View file

@ -42,18 +42,27 @@ jobs:
run-replay-mode-tests:
runs-on: ubuntu-latest
name: ${{ format('Integration Tests ({0}, {1}, {2}, client={3}, {4})', matrix.client-type, matrix.setup, matrix.python-version, matrix.client-version, matrix.suite) }}
name: ${{ format('Integration Tests ({0}, {1}, {2}, client={3}, {4})', matrix.client-type, matrix.config.setup, matrix.python-version, matrix.client-version, matrix.config.suite) }}
strategy:
fail-fast: false
matrix:
client-type: [library, server]
# Use vllm on weekly schedule, otherwise use test-setup input (defaults to ollama)
setup: ${{ (github.event.schedule == '1 0 * * 0') && fromJSON('["vllm"]') || fromJSON(format('["{0}"]', github.event.inputs.test-setup || 'ollama')) }}
client-type: [library, server, docker]
# Use Python 3.13 only on nightly schedule (daily latest client test), otherwise use 3.12
python-version: ${{ github.event.schedule == '0 0 * * *' && fromJSON('["3.12", "3.13"]') || fromJSON('["3.12"]') }}
client-version: ${{ (github.event.schedule == '0 0 * * *' || github.event.inputs.test-all-client-versions == 'true') && fromJSON('["published", "latest"]') || fromJSON('["latest"]') }}
suite: [base, vision]
# Define (setup, suite) pairs - they are always matched and cannot be independent
# Weekly schedule (Sun 1 AM): vllm+base
# Input test-setup=ollama-vision: ollama-vision+vision
# Default (including test-setup=ollama): ollama+base, ollama-vision+vision, gpt+responses
config: >-
${{
github.event.schedule == '1 0 * * 0'
&& fromJSON('[{"setup": "vllm", "suite": "base"}]')
|| github.event.inputs.test-setup == 'ollama-vision'
&& fromJSON('[{"setup": "ollama-vision", "suite": "vision"}]')
|| fromJSON('[{"setup": "ollama", "suite": "base"}, {"setup": "ollama-vision", "suite": "vision"}]')
}}
steps:
- name: Checkout repository
@ -64,14 +73,16 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
client-version: ${{ matrix.client-version }}
setup: ${{ matrix.setup }}
suite: ${{ matrix.suite }}
setup: ${{ matrix.config.setup }}
suite: ${{ matrix.config.suite }}
inference-mode: 'replay'
- name: Run tests
uses: ./.github/actions/run-and-record-tests
env:
OPENAI_API_KEY: dummy
with:
stack-config: ${{ matrix.client-type == 'library' && 'ci-tests' || 'server:ci-tests' }}
setup: ${{ matrix.setup }}
stack-config: ${{ matrix.client-type == 'library' && 'ci-tests' || matrix.client-type == 'server' && 'server:ci-tests' || 'docker:ci-tests' }}
setup: ${{ matrix.config.setup }}
inference-mode: 'replay'
suite: ${{ matrix.suite }}
suite: ${{ matrix.config.suite }}

View file

@ -144,7 +144,7 @@ jobs:
- name: Build Llama Stack
run: |
uv run --no-sync llama stack build --template ci-tests --image-type venv
uv run --no-sync llama stack list-deps ci-tests | xargs -L1 uv pip install
- name: Check Storage and Memory Available Before Tests
if: ${{ always() }}
@ -169,8 +169,7 @@ jobs:
run: |
uv run --no-sync \
pytest -sv --stack-config="files=inline::localfs,inference=inline::sentence-transformers,vector_io=${{ matrix.vector-io-provider }}" \
tests/integration/vector_io \
--embedding-model inline::sentence-transformers/all-MiniLM-L6-v2
tests/integration/vector_io
- name: Check Storage and Memory Available After Tests
if: ${{ always() }}

View file

@ -37,7 +37,7 @@ jobs:
.pre-commit-config.yaml
- name: Set up Node.js
uses: actions/setup-node@a0853c24544627f65ddf259abe73b1d18a591444 # v5.0.0
uses: actions/setup-node@2028fbc5c25fe9cf00d9f06a71cc4710d4507903 # v6.0.0
with:
node-version: '20'
cache: 'npm'

227
.github/workflows/precommit-trigger.yml vendored Normal file
View file

@ -0,0 +1,227 @@
name: Pre-commit Bot
run-name: Pre-commit bot for PR #${{ github.event.issue.number }}
on:
issue_comment:
types: [created]
jobs:
pre-commit:
# Only run on pull request comments
if: github.event.issue.pull_request && contains(github.event.comment.body, '@github-actions run precommit')
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- name: Check comment author and get PR details
id: check_author
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
// Get PR details
const pr = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: context.issue.number
});
// Check if commenter has write access or is the PR author
const commenter = context.payload.comment.user.login;
const prAuthor = pr.data.user.login;
let hasPermission = false;
// Check if commenter is PR author
if (commenter === prAuthor) {
hasPermission = true;
console.log(`Comment author ${commenter} is the PR author`);
} else {
// Check if commenter has write/admin access
try {
const permission = await github.rest.repos.getCollaboratorPermissionLevel({
owner: context.repo.owner,
repo: context.repo.repo,
username: commenter
});
const level = permission.data.permission;
hasPermission = ['write', 'admin', 'maintain'].includes(level);
console.log(`Comment author ${commenter} has permission: ${level}`);
} catch (error) {
console.log(`Could not check permissions for ${commenter}: ${error.message}`);
}
}
if (!hasPermission) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: `❌ @${commenter} You don't have permission to trigger pre-commit. Only PR authors or repository collaborators can run this command.`
});
core.setFailed(`User ${commenter} does not have permission`);
return;
}
// Save PR info for later steps
core.setOutput('pr_number', context.issue.number);
core.setOutput('pr_head_ref', pr.data.head.ref);
core.setOutput('pr_head_sha', pr.data.head.sha);
core.setOutput('pr_head_repo', pr.data.head.repo.full_name);
core.setOutput('pr_base_ref', pr.data.base.ref);
core.setOutput('is_fork', pr.data.head.repo.full_name !== context.payload.repository.full_name);
core.setOutput('authorized', 'true');
- name: React to comment
if: steps.check_author.outputs.authorized == 'true'
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
await github.rest.reactions.createForIssueComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: context.payload.comment.id,
content: 'rocket'
});
- name: Comment starting
if: steps.check_author.outputs.authorized == 'true'
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ steps.check_author.outputs.pr_number }},
body: `⏳ Running [pre-commit hooks](https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}) on PR #${{ steps.check_author.outputs.pr_number }}...`
});
- name: Checkout PR branch (same-repo)
if: steps.check_author.outputs.authorized == 'true' && steps.check_author.outputs.is_fork == 'false'
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
ref: ${{ steps.check_author.outputs.pr_head_ref }}
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PR branch (fork)
if: steps.check_author.outputs.authorized == 'true' && steps.check_author.outputs.is_fork == 'true'
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
repository: ${{ steps.check_author.outputs.pr_head_repo }}
ref: ${{ steps.check_author.outputs.pr_head_ref }}
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- name: Verify checkout
if: steps.check_author.outputs.authorized == 'true'
run: |
echo "Current SHA: $(git rev-parse HEAD)"
echo "Expected SHA: ${{ steps.check_author.outputs.pr_head_sha }}"
if [[ "$(git rev-parse HEAD)" != "${{ steps.check_author.outputs.pr_head_sha }}" ]]; then
echo "::error::Checked out SHA does not match expected SHA"
exit 1
fi
- name: Set up Python
if: steps.check_author.outputs.authorized == 'true'
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: '3.12'
cache: pip
cache-dependency-path: |
**/requirements*.txt
.pre-commit-config.yaml
- name: Set up Node.js
if: steps.check_author.outputs.authorized == 'true'
uses: actions/setup-node@2028fbc5c25fe9cf00d9f06a71cc4710d4507903 # v6.0.0
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: 'llama_stack/ui/'
- name: Install npm dependencies
if: steps.check_author.outputs.authorized == 'true'
run: npm ci
working-directory: llama_stack/ui
- name: Run pre-commit
if: steps.check_author.outputs.authorized == 'true'
id: precommit
uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
continue-on-error: true
env:
SKIP: no-commit-to-branch
RUFF_OUTPUT_FORMAT: github
- name: Check for changes
if: steps.check_author.outputs.authorized == 'true'
id: changes
run: |
if ! git diff --exit-code || [ -n "$(git ls-files --others --exclude-standard)" ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
echo "Changes detected after pre-commit"
else
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "No changes after pre-commit"
fi
- name: Commit and push changes
if: steps.check_author.outputs.authorized == 'true' && steps.changes.outputs.has_changes == 'true'
run: |
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git add -A
git commit -m "style: apply pre-commit fixes
🤖 Applied by @github-actions bot via pre-commit workflow"
# Push changes
git push origin HEAD:${{ steps.check_author.outputs.pr_head_ref }}
- name: Comment success with changes
if: steps.check_author.outputs.authorized == 'true' && steps.changes.outputs.has_changes == 'true'
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ steps.check_author.outputs.pr_number }},
body: `✅ Pre-commit hooks completed successfully!\n\n🔧 Changes have been committed and pushed to the PR branch.`
});
- name: Comment success without changes
if: steps.check_author.outputs.authorized == 'true' && steps.changes.outputs.has_changes == 'false' && steps.precommit.outcome == 'success'
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ steps.check_author.outputs.pr_number }},
body: `✅ Pre-commit hooks passed!\n\n✨ No changes needed - your code is already formatted correctly.`
});
- name: Comment failure
if: failure()
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ steps.check_author.outputs.pr_number }},
body: `❌ Pre-commit workflow failed!\n\nPlease check the [workflow logs](https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}) for details.`
});

View file

@ -14,6 +14,8 @@ on:
- '.github/workflows/providers-build.yml'
- 'llama_stack/distributions/**'
- 'pyproject.toml'
- 'containers/Containerfile'
- '.dockerignore'
pull_request:
paths:
@ -24,6 +26,8 @@ on:
- '.github/workflows/providers-build.yml'
- 'llama_stack/distributions/**'
- 'pyproject.toml'
- 'containers/Containerfile'
- '.dockerignore'
concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.ref }}
@ -60,15 +64,19 @@ jobs:
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Print build dependencies
- name: Install distribution into venv
if: matrix.image-type == 'venv'
run: |
uv run llama stack build --distro ${{ matrix.distro }} --image-type ${{ matrix.image-type }} --image-name test --print-deps-only
uv run llama stack list-deps ${{ matrix.distro }} | xargs -L1 uv pip install
- name: Run Llama Stack Build
- name: Build container image
if: matrix.image-type == 'container'
run: |
# USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead
# LLAMA_STACK_DIR is set to the current directory so we are building from the source
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --distro ${{ matrix.distro }} --image-type ${{ matrix.image-type }} --image-name test
docker build . \
-f containers/Containerfile \
--build-arg INSTALL_MODE=editable \
--build-arg DISTRO_NAME=${{ matrix.distro }} \
--tag llama-stack:${{ matrix.distro }}-ci
- name: Print dependencies in the image
if: matrix.image-type == 'venv'
@ -86,8 +94,8 @@ jobs:
- name: Build a single provider
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --image-type venv --image-name test --providers inference=remote::ollama
uv pip install -e .
uv run --no-sync llama stack list-deps --providers inference=remote::ollama | xargs -L1 uv pip install
build-custom-container-distribution:
runs-on: ubuntu-latest
steps:
@ -97,11 +105,16 @@ jobs:
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Build a single provider
- name: Build container image
run: |
yq -i '.image_type = "container"' llama_stack/distributions/ci-tests/build.yaml
yq -i '.image_name = "test"' llama_stack/distributions/ci-tests/build.yaml
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config llama_stack/distributions/ci-tests/build.yaml
BASE_IMAGE=$(yq -r '.distribution_spec.container_image // "python:3.12-slim"' llama_stack/distributions/ci-tests/build.yaml)
docker build . \
-f containers/Containerfile \
--build-arg INSTALL_MODE=editable \
--build-arg DISTRO_NAME=ci-tests \
--build-arg BASE_IMAGE="$BASE_IMAGE" \
--build-arg RUN_CONFIG_PATH=/workspace/llama_stack/distributions/ci-tests/run.yaml \
-t llama-stack:ci-tests
- name: Inspect the container image entrypoint
run: |
@ -112,7 +125,7 @@ jobs:
fi
entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)
echo "Entrypoint: $entrypoint"
if [ "$entrypoint" != "[python -m llama_stack.core.server.server /app/run.yaml]" ]; then
if [ "$entrypoint" != "[/usr/local/bin/llama-stack-entrypoint.sh]" ]; then
echo "Entrypoint is not correct"
exit 1
fi
@ -129,17 +142,19 @@ jobs:
- name: Pin distribution to UBI9 base
run: |
yq -i '
.image_type = "container" |
.image_name = "ubi9-test" |
.distribution_spec.container_image = "registry.access.redhat.com/ubi9:latest"
' llama_stack/distributions/ci-tests/build.yaml
- name: Build dev container (UBI9)
env:
USE_COPY_NOT_MOUNT: "true"
LLAMA_STACK_DIR: "."
- name: Build UBI9 container image
run: |
uv run llama stack build --config llama_stack/distributions/ci-tests/build.yaml
BASE_IMAGE=$(yq -r '.distribution_spec.container_image // "registry.access.redhat.com/ubi9:latest"' llama_stack/distributions/ci-tests/build.yaml)
docker build . \
-f containers/Containerfile \
--build-arg INSTALL_MODE=editable \
--build-arg DISTRO_NAME=ci-tests \
--build-arg BASE_IMAGE="$BASE_IMAGE" \
--build-arg RUN_CONFIG_PATH=/workspace/llama_stack/distributions/ci-tests/run.yaml \
-t llama-stack:ci-tests-ubi9
- name: Inspect UBI9 image
run: |
@ -150,7 +165,7 @@ jobs:
fi
entrypoint=$(docker inspect --format '{{ .Config.Entrypoint }}' $IMAGE_ID)
echo "Entrypoint: $entrypoint"
if [ "$entrypoint" != "[python -m llama_stack.core.server.server /app/run.yaml]" ]; then
if [ "$entrypoint" != "[/usr/local/bin/llama-stack-entrypoint.sh]" ]; then
echo "Entrypoint is not correct"
exit 1
fi

View file

@ -0,0 +1,105 @@
name: Test llama stack list-deps
run-name: Test llama stack list-deps
on:
push:
branches:
- main
paths:
- 'llama_stack/cli/stack/list_deps.py'
- 'llama_stack/cli/stack/_list_deps.py'
- 'llama_stack/core/build.*'
- 'llama_stack/core/*.sh'
- '.github/workflows/providers-list-deps.yml'
- 'llama_stack/templates/**'
- 'pyproject.toml'
pull_request:
paths:
- 'llama_stack/cli/stack/list_deps.py'
- 'llama_stack/cli/stack/_list_deps.py'
- 'llama_stack/core/build.*'
- 'llama_stack/core/*.sh'
- '.github/workflows/providers-list-deps.yml'
- 'llama_stack/templates/**'
- 'pyproject.toml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
generate-matrix:
runs-on: ubuntu-latest
outputs:
distros: ${{ steps.set-matrix.outputs.distros }}
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Generate Distribution List
id: set-matrix
run: |
distros=$(ls llama_stack/distributions/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]')
echo "distros=$distros" >> "$GITHUB_OUTPUT"
list-deps:
needs: generate-matrix
runs-on: ubuntu-latest
strategy:
matrix:
distro: ${{ fromJson(needs.generate-matrix.outputs.distros) }}
image-type: [venv, container]
fail-fast: false # We want to run all jobs even if some fail
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Print dependencies
run: |
uv run llama stack list-deps ${{ matrix.distro }}
- name: Install Distro using llama stack list-deps
run: |
# USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead
# LLAMA_STACK_DIR is set to the current directory so we are building from the source
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack list-deps ${{ matrix.distro }} | xargs -L1 uv pip install
- name: Print dependencies in the image
if: matrix.image-type == 'venv'
run: |
uv pip list
show-single-provider:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: Show a single provider
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack list-deps --providers inference=remote::ollama
list-deps-from-config:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install dependencies
uses: ./.github/actions/setup-runner
- name: list-des from Config
env:
USE_COPY_NOT_MOUNT: "true"
LLAMA_STACK_DIR: "."
run: |
uv run llama stack list-deps llama_stack/distributions/ci-tests/build.yaml

View file

@ -24,7 +24,7 @@ jobs:
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Install uv
uses: astral-sh/setup-uv@b75a909f75acd358c2196fb9a5f1299a9a8868a4 # v6.7.0
uses: astral-sh/setup-uv@3259c6206f993105e3a61b142c2d97bf4b9ef83d # v7.1.0
with:
python-version: ${{ matrix.python-version }}
activate-environment: true
@ -43,7 +43,5 @@ jobs:
uv pip list
uv pip show llama-stack
command -v llama
llama model prompt-format -m Llama3.2-90B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

View file

@ -61,6 +61,9 @@ jobs:
- name: Run and record tests
uses: ./.github/actions/run-and-record-tests
env:
# Set OPENAI_API_KEY if using gpt setup
OPENAI_API_KEY: ${{ inputs.test-setup == 'gpt' && secrets.OPENAI_API_KEY || '' }}
with:
stack-config: 'server:ci-tests' # recording must be done with server since more tests are run
setup: ${{ inputs.test-setup || 'ollama' }}

View file

@ -24,7 +24,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Stale Action
uses: actions/stale@3a9db7e6a41a89f618792c92c0e97cc736e1b13f # v10.0.0
uses: actions/stale@5f858e3efba33a5ca4407a664cc011ad407f2008 # v10.1.0
with:
stale-issue-label: 'stale'
stale-issue-message: >

View file

@ -46,9 +46,9 @@ jobs:
yq -i '.image_type = "${{ matrix.image-type }}"' tests/external/ramalama-stack/run.yaml
cat tests/external/ramalama-stack/run.yaml
- name: Build distro from config file
- name: Install distribution dependencies
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --config tests/external/ramalama-stack/build.yaml
uv run llama stack list-deps tests/external/ramalama-stack/build.yaml | xargs -L1 uv pip install
- name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv'
@ -59,7 +59,7 @@ jobs:
# Use the virtual environment created by the build step (name comes from build config)
source ramalama-stack-test/bin/activate
uv pip list
nohup llama stack run tests/external/ramalama-stack/run.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &
nohup llama stack run tests/external/ramalama-stack/run.yaml > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready
run: |

View file

@ -44,11 +44,14 @@ jobs:
- name: Print distro dependencies
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync llama stack build --config tests/external/build.yaml --print-deps-only
uv run --no-sync llama stack list-deps tests/external/build.yaml
- name: Build distro from config file
run: |
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --no-sync llama stack build --config tests/external/build.yaml
uv venv ci-test
source ci-test/bin/activate
uv pip install -e .
LLAMA_STACK_LOGGING=all=CRITICAL llama stack list-deps tests/external/build.yaml | xargs -L1 uv pip install
- name: Start Llama Stack server in background
if: ${{ matrix.image-type }} == 'venv'
@ -59,7 +62,7 @@ jobs:
# Use the virtual environment created by the build step (name comes from build config)
source ci-test/bin/activate
uv pip list
nohup llama stack run tests/external/run-byoa.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &
nohup llama stack run tests/external/run-byoa.yaml > server.log 2>&1 &
- name: Wait for Llama Stack server to be ready
run: |

View file

@ -29,7 +29,7 @@ jobs:
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Setup Node.js
uses: actions/setup-node@a0853c24544627f65ddf259abe73b1d18a591444 # v5.0.0
uses: actions/setup-node@2028fbc5c25fe9cf00d9f06a71cc4710d4507903 # v6.0.0
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'

View file

@ -11,14 +11,17 @@ You can install the dependencies by running:
```bash
cd llama-stack
uv venv --python 3.12
uv sync --group dev
uv pip install -e .
source .venv/bin/activate
```
```{note}
You can use a specific version of Python with `uv` by adding the `--python <version>` flag (e.g. `--python 3.12`).
Otherwise, `uv` will automatically select a Python version according to the `requires-python` section of the `pyproject.toml`.
If you are making changes to Llama Stack, it is essential that you use Python 3.12 as shown above.
Llama Stack can work with Python 3.13 but the pre-commit hooks used to validate code changes only work with Python 3.12.
If you don't specify a Python version, `uv` will automatically select a Python version according to the `requires-python`
section of the `pyproject.toml`, which is fine for running Llama Stack but not for committing changes.
For more info, see the [uv docs around Python versions](https://docs.astral.sh/uv/concepts/python-versions/).
```
@ -42,17 +45,22 @@ uv run --env-file .env -- pytest -v tests/integration/inference/test_text_infere
We use [pre-commit](https://pre-commit.com/) to run linting and formatting checks on your code. You can install the pre-commit hooks by running:
```bash
uv pip install pre-commit==4.3.0
uv run pre-commit install
```
After that, pre-commit hooks will run automatically before each commit.
Note that the only version of pre-commit that works with the Llama Stack continuous integration is `4.3.0` so it is essential that you pull
that specific version as shown above. Once you have run these commands, pre-commit hooks will run automatically before each commit.
Alternatively, if you don't want to install the pre-commit hooks, you can run the checks manually by running:
Alternatively, if you don't want to install the pre-commit hooks (or if you want to check if your changes are ready before committing),
you can run the checks manually by running:
```bash
uv run pre-commit run --all-files
uv run pre-commit run --all-files -v
```
The `-v` (verbose) parameter is optional but often helpful for getting more information about any issues with that the pre-commit checks identify.
```{caution}
Before pushing your changes, make sure that the pre-commit hooks have passed successfully.
```
@ -61,7 +69,7 @@ Before pushing your changes, make sure that the pre-commit hooks have passed suc
We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
If in doubt, please open a [discussion](https://github.com/llamastack/llama-stack/discussions); we can always convert that to an issue later.
### Issues
We use GitHub issues to track public bugs. Please ensure your description is
@ -83,6 +91,7 @@ If you are new to the project, start by looking at the issues tagged with "good
leave a comment on the issue and a triager will assign it to you.
Please avoid picking up too many issues at once. This helps you stay focused and ensures that others in the community also have opportunities to contribute.
- Try to work on only 12 issues at a time, especially if youre still getting familiar with the codebase.
- Before taking an issue, check if its already assigned or being actively discussed.
- If youre blocked or cant continue with an issue, feel free to unassign yourself or leave a comment so others can step in.
@ -158,17 +167,22 @@ under the LICENSE file in the root directory of this source tree.
Some tips about common tasks you work on while contributing to Llama Stack:
### Using `llama stack build`
### Installing dependencies of distributions
Building a stack image will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands.
When installing dependencies for a distribution, you can use `llama stack list-deps` to view and install the required packages.
Example:
```bash
cd work/
git clone https://github.com/meta-llama/llama-stack.git
git clone https://github.com/meta-llama/llama-stack-client-python.git
git clone https://github.com/llamastack/llama-stack.git
git clone https://github.com/llamastack/llama-stack-client-python.git
cd llama-stack
LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --distro <...>
# Show dependencies for a distribution
llama stack list-deps <distro-name>
# Install dependencies
llama stack list-deps <distro-name> | xargs -L1 uv pip install
```
### Updating distribution configurations
@ -191,6 +205,7 @@ If you are making changes to the documentation at [https://llamastack.github.io/
```bash
# This rebuilds the documentation pages and the OpenAPI spec.
cd docs/
npm install
npm run gen-api-docs all
npm run build

View file

@ -7,7 +7,7 @@
[![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)
[![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)
[**Quick Start**](https://llamastack.github.io/latest/getting_started/index.html) | [**Documentation**](https://llamastack.github.io/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
[**Quick Start**](https://llamastack.github.io/docs/getting_started/quickstart) | [**Documentation**](https://llamastack.github.io/docs) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
### ✨🎉 Llama 4 Support 🎉✨
@ -25,10 +25,13 @@ pip install -U llama_stack
MODEL="Llama-4-Scout-17B-16E-Instruct"
# get meta url from llama.com
llama model download --source meta --model-id $MODEL --meta-url <META_URL>
huggingface-cli download meta-llama/$MODEL --local-dir ~/.llama/$MODEL
# install dependencies for the distribution
llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install
# start a llama stack server
INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu
INFERENCE_MODEL=meta-llama/$MODEL llama stack run meta-reference-gpu
# install client to interact with the server
pip install llama-stack-client
@ -89,7 +92,7 @@ As more providers start supporting Llama 4, you can use them in Llama Stack as w
To try Llama Stack locally, run:
```bash
curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/scripts/install.sh | bash
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash
```
### Overview
@ -120,7 +123,7 @@ By reducing friction and complexity, Llama Stack empowers developers to focus on
### API Providers
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
Please checkout for [full list](https://llamastack.github.io/providers/index.html)
Please checkout for [full list](https://llamastack.github.io/docs/providers)
| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |
|:--------------------:|:------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:--------:|
@ -151,7 +154,7 @@ Please checkout for [full list](https://llamastack.github.io/providers/index.htm
| NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ |
| NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ |
> **Note**: Additional providers are available through external packages. See [External Providers](https://llamastack.github.io/providers/external/index.html) documentation.
> **Note**: Additional providers are available through external packages. See [External Providers](https://llamastack.github.io/docs/providers/external) documentation.
### Distributions

View file

@ -98,25 +98,34 @@ data:
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
metadata_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: llamastack_kvstore
inference_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
storage:
backends:
kv_default:
type: kv_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: ${env.POSTGRES_TABLE_NAME:=llamastack_kvstore}
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
references:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
models:
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
- model_id: ${env.INFERENCE_MODEL}
@ -137,5 +146,4 @@ data:
port: 8323
kind: ConfigMap
metadata:
creationTimestamp: null
name: llama-stack-config

View file

@ -95,25 +95,34 @@ providers:
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
metadata_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: llamastack_kvstore
inference_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
storage:
backends:
kv_default:
type: kv_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: ${env.POSTGRES_TABLE_NAME:=llamastack_kvstore}
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
references:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
models:
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
- model_id: ${env.INFERENCE_MODEL}

View file

@ -0,0 +1,8 @@
These are the source-of-truth configuration files used to generate the Stainless client SDKs via Stainless.
- `openapi.yml`: this is the OpenAPI specification for the Llama Stack API.
- `openapi.stainless.yml`: this is the Stainless _configuration_ which instructs Stainless how to generate the client SDKs.
A small side note: notice the `.yml` suffixes since Stainless uses that suffix typically for its configuration files.
These files go hand-in-hand. As of now, only the `openapi.yml` file is automatically generated using the `run_openapi_generator.sh` script.

View file

@ -0,0 +1,610 @@
# yaml-language-server: $schema=https://app.stainlessapi.com/config-internal.schema.json
organization:
# Name of your organization or company, used to determine the name of the client
# and headings.
name: llama-stack-client
docs: https://llama-stack.readthedocs.io/en/latest/
contact: llamastack@meta.com
security:
- {}
- BearerAuth: []
security_schemes:
BearerAuth:
type: http
scheme: bearer
# `targets` define the output targets and their customization options, such as
# whether to emit the Node SDK and what it's package name should be.
targets:
node:
package_name: llama-stack-client
production_repo: llamastack/llama-stack-client-typescript
publish:
npm: false
python:
package_name: llama_stack_client
production_repo: llamastack/llama-stack-client-python
options:
use_uv: true
publish:
pypi: true
project_name: llama_stack_client
kotlin:
reverse_domain: com.llama_stack_client.api
production_repo: null
publish:
maven: false
go:
package_name: llama-stack-client
production_repo: llamastack/llama-stack-client-go
options:
enable_v2: true
back_compat_use_shared_package: false
# `client_settings` define settings for the API client, such as extra constructor
# arguments (used for authentication), retry behavior, idempotency, etc.
client_settings:
default_env_prefix: LLAMA_STACK_CLIENT
opts:
api_key:
type: string
read_env: LLAMA_STACK_CLIENT_API_KEY
auth: { security_scheme: BearerAuth }
nullable: true
# `environments` are a map of the name of the environment (e.g. "sandbox",
# "production") to the corresponding url to use.
environments:
production: http://any-hosted-llama-stack.com
# `pagination` defines [pagination schemes] which provides a template to match
# endpoints and generate next-page and auto-pagination helpers in the SDKs.
pagination:
- name: datasets_iterrows
type: offset
request:
dataset_id:
type: string
start_index:
type: integer
x-stainless-pagination-property:
purpose: offset_count_param
limit:
type: integer
response:
data:
type: array
items:
type: object
next_index:
type: integer
x-stainless-pagination-property:
purpose: offset_count_start_field
- name: openai_cursor_page
type: cursor
request:
limit:
type: integer
after:
type: string
x-stainless-pagination-property:
purpose: next_cursor_param
response:
data:
type: array
items: {}
has_more:
type: boolean
last_id:
type: string
x-stainless-pagination-property:
purpose: next_cursor_field
# `resources` define the structure and organziation for your API, such as how
# methods and models are grouped together and accessed. See the [configuration
# guide] for more information.
#
# [configuration guide]:
# https://app.stainlessapi.com/docs/guides/configure#resources
resources:
$shared:
models:
agent_config: AgentConfig
interleaved_content_item: InterleavedContentItem
interleaved_content: InterleavedContent
param_type: ParamType
safety_violation: SafetyViolation
sampling_params: SamplingParams
scoring_result: ScoringResult
message: Message
user_message: UserMessage
completion_message: CompletionMessage
tool_response_message: ToolResponseMessage
system_message: SystemMessage
tool_call: ToolCall
query_result: RAGQueryResult
document: RAGDocument
query_config: RAGQueryConfig
response_format: ResponseFormat
toolgroups:
models:
tool_group: ToolGroup
list_tool_groups_response: ListToolGroupsResponse
methods:
register: post /v1/toolgroups
get: get /v1/toolgroups/{toolgroup_id}
list: get /v1/toolgroups
unregister: delete /v1/toolgroups/{toolgroup_id}
tools:
methods:
get: get /v1/tools/{tool_name}
list:
endpoint: get /v1/tools
paginated: false
tool_runtime:
models:
tool_def: ToolDef
tool_invocation_result: ToolInvocationResult
methods:
list_tools:
endpoint: get /v1/tool-runtime/list-tools
paginated: false
invoke_tool: post /v1/tool-runtime/invoke
subresources:
rag_tool:
methods:
insert: post /v1/tool-runtime/rag-tool/insert
query: post /v1/tool-runtime/rag-tool/query
responses:
models:
response_object_stream: OpenAIResponseObjectStream
response_object: OpenAIResponseObject
methods:
create:
type: http
endpoint: post /v1/responses
streaming:
stream_event_model: responses.response_object_stream
param_discriminator: stream
retrieve: get /v1/responses/{response_id}
list:
type: http
endpoint: get /v1/responses
delete:
type: http
endpoint: delete /v1/responses/{response_id}
subresources:
input_items:
methods:
list:
type: http
endpoint: get /v1/responses/{response_id}/input_items
conversations:
models:
conversation_object: Conversation
methods:
create:
type: http
endpoint: post /v1/conversations
retrieve: get /v1/conversations/{conversation_id}
update:
type: http
endpoint: post /v1/conversations/{conversation_id}
delete:
type: http
endpoint: delete /v1/conversations/{conversation_id}
subresources:
items:
methods:
get:
type: http
endpoint: get /v1/conversations/{conversation_id}/items/{item_id}
list:
type: http
endpoint: get /v1/conversations/{conversation_id}/items
create:
type: http
endpoint: post /v1/conversations/{conversation_id}/items
inspect:
models:
healthInfo: HealthInfo
providerInfo: ProviderInfo
routeInfo: RouteInfo
versionInfo: VersionInfo
methods:
health: get /v1/health
version: get /v1/version
embeddings:
models:
create_embeddings_response: OpenAIEmbeddingsResponse
methods:
create: post /v1/embeddings
chat:
models:
chat_completion_chunk: OpenAIChatCompletionChunk
subresources:
completions:
methods:
create:
type: http
endpoint: post /v1/chat/completions
streaming:
stream_event_model: chat.chat_completion_chunk
param_discriminator: stream
list:
type: http
endpoint: get /v1/chat/completions
retrieve:
type: http
endpoint: get /v1/chat/completions/{completion_id}
completions:
methods:
create:
type: http
endpoint: post /v1/completions
streaming:
param_discriminator: stream
vector_io:
models:
queryChunksResponse: QueryChunksResponse
methods:
insert: post /v1/vector-io/insert
query: post /v1/vector-io/query
vector_stores:
models:
vector_store: VectorStoreObject
list_vector_stores_response: VectorStoreListResponse
vector_store_delete_response: VectorStoreDeleteResponse
vector_store_search_response: VectorStoreSearchResponsePage
methods:
create: post /v1/vector_stores
list:
endpoint: get /v1/vector_stores
retrieve: get /v1/vector_stores/{vector_store_id}
update: post /v1/vector_stores/{vector_store_id}
delete: delete /v1/vector_stores/{vector_store_id}
search: post /v1/vector_stores/{vector_store_id}/search
subresources:
files:
models:
vector_store_file: VectorStoreFileObject
methods:
list: get /v1/vector_stores/{vector_store_id}/files
retrieve: get /v1/vector_stores/{vector_store_id}/files/{file_id}
update: post /v1/vector_stores/{vector_store_id}/files/{file_id}
delete: delete /v1/vector_stores/{vector_store_id}/files/{file_id}
create: post /v1/vector_stores/{vector_store_id}/files
content: get /v1/vector_stores/{vector_store_id}/files/{file_id}/content
file_batches:
models:
vector_store_file_batches: VectorStoreFileBatchObject
list_vector_store_files_in_batch_response: VectorStoreFilesListInBatchResponse
methods:
create: post /v1/vector_stores/{vector_store_id}/file_batches
retrieve: get /v1/vector_stores/{vector_store_id}/file_batches/{batch_id}
list_files: get /v1/vector_stores/{vector_store_id}/file_batches/{batch_id}/files
cancel: post /v1/vector_stores/{vector_store_id}/file_batches/{batch_id}/cancel
models:
models:
model: Model
list_models_response: ListModelsResponse
methods:
retrieve: get /v1/models/{model_id}
list:
endpoint: get /v1/models
paginated: false
register: post /v1/models
unregister: delete /v1/models/{model_id}
subresources:
openai:
methods:
list:
endpoint: get /v1/models
paginated: false
providers:
models:
list_providers_response: ListProvidersResponse
methods:
list:
endpoint: get /v1/providers
paginated: false
retrieve: get /v1/providers/{provider_id}
routes:
models:
list_routes_response: ListRoutesResponse
methods:
list:
endpoint: get /v1/inspect/routes
paginated: false
moderations:
models:
create_response: ModerationObject
methods:
create: post /v1/moderations
safety:
models:
run_shield_response: RunShieldResponse
methods:
run_shield: post /v1/safety/run-shield
shields:
models:
shield: Shield
list_shields_response: ListShieldsResponse
methods:
retrieve: get /v1/shields/{identifier}
list:
endpoint: get /v1/shields
paginated: false
register: post /v1/shields
delete: delete /v1/shields/{identifier}
synthetic_data_generation:
models:
syntheticDataGenerationResponse: SyntheticDataGenerationResponse
methods:
generate: post /v1/synthetic-data-generation/generate
telemetry:
models:
span_with_status: SpanWithStatus
trace: Trace
query_spans_response: QuerySpansResponse
event: Event
query_condition: QueryCondition
methods:
query_traces:
endpoint: post /v1alpha/telemetry/traces
skip_test_reason: 'unsupported query params in java / kotlin'
get_span_tree: post /v1alpha/telemetry/spans/{span_id}/tree
query_spans:
endpoint: post /v1alpha/telemetry/spans
skip_test_reason: 'unsupported query params in java / kotlin'
query_metrics:
endpoint: post /v1alpha/telemetry/metrics/{metric_name}
skip_test_reason: 'unsupported query params in java / kotlin'
# log_event: post /v1alpha/telemetry/events
save_spans_to_dataset: post /v1alpha/telemetry/spans/export
get_span: get /v1alpha/telemetry/traces/{trace_id}/spans/{span_id}
get_trace: get /v1alpha/telemetry/traces/{trace_id}
scoring:
methods:
score: post /v1/scoring/score
score_batch: post /v1/scoring/score-batch
scoring_functions:
methods:
retrieve: get /v1/scoring-functions/{scoring_fn_id}
list:
endpoint: get /v1/scoring-functions
paginated: false
register: post /v1/scoring-functions
models:
scoring_fn: ScoringFn
scoring_fn_params: ScoringFnParams
list_scoring_functions_response: ListScoringFunctionsResponse
benchmarks:
methods:
retrieve: get /v1alpha/eval/benchmarks/{benchmark_id}
list:
endpoint: get /v1alpha/eval/benchmarks
paginated: false
register: post /v1alpha/eval/benchmarks
models:
benchmark: Benchmark
list_benchmarks_response: ListBenchmarksResponse
files:
methods:
create: post /v1/files
list: get /v1/files
retrieve: get /v1/files/{file_id}
delete: delete /v1/files/{file_id}
content: get /v1/files/{file_id}/content
models:
file: OpenAIFileObject
list_files_response: ListOpenAIFileResponse
delete_file_response: OpenAIFileDeleteResponse
alpha:
subresources:
inference:
methods:
rerank: post /v1alpha/inference/rerank
post_training:
models:
algorithm_config: AlgorithmConfig
post_training_job: PostTrainingJob
list_post_training_jobs_response: ListPostTrainingJobsResponse
methods:
preference_optimize: post /v1alpha/post-training/preference-optimize
supervised_fine_tune: post /v1alpha/post-training/supervised-fine-tune
subresources:
job:
methods:
artifacts: get /v1alpha/post-training/job/artifacts
cancel: post /v1alpha/post-training/job/cancel
status: get /v1alpha/post-training/job/status
list:
endpoint: get /v1alpha/post-training/jobs
paginated: false
eval:
methods:
evaluate_rows: post /v1alpha/eval/benchmarks/{benchmark_id}/evaluations
run_eval: post /v1alpha/eval/benchmarks/{benchmark_id}/jobs
evaluate_rows_alpha: post /v1alpha/eval/benchmarks/{benchmark_id}/evaluations
run_eval_alpha: post /v1alpha/eval/benchmarks/{benchmark_id}/jobs
subresources:
jobs:
methods:
cancel: delete /v1alpha/eval/benchmarks/{benchmark_id}/jobs/{job_id}
status: get /v1alpha/eval/benchmarks/{benchmark_id}/jobs/{job_id}
retrieve: get /v1alpha/eval/benchmarks/{benchmark_id}/jobs/{job_id}/result
models:
evaluate_response: EvaluateResponse
benchmark_config: BenchmarkConfig
job: Job
agents:
methods:
create: post /v1alpha/agents
list: get /v1alpha/agents
retrieve: get /v1alpha/agents/{agent_id}
delete: delete /v1alpha/agents/{agent_id}
models:
inference_step: InferenceStep
tool_execution_step: ToolExecutionStep
tool_response: ToolResponse
shield_call_step: ShieldCallStep
memory_retrieval_step: MemoryRetrievalStep
subresources:
session:
models:
session: Session
methods:
list: get /v1alpha/agents/{agent_id}/sessions
create: post /v1alpha/agents/{agent_id}/session
delete: delete /v1alpha/agents/{agent_id}/session/{session_id}
retrieve: get /v1alpha/agents/{agent_id}/session/{session_id}
steps:
methods:
retrieve: get /v1alpha/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
turn:
models:
turn: Turn
turn_response_event: AgentTurnResponseEvent
agent_turn_response_stream_chunk: AgentTurnResponseStreamChunk
methods:
create:
type: http
endpoint: post /v1alpha/agents/{agent_id}/session/{session_id}/turn
streaming:
stream_event_model: alpha.agents.turn.agent_turn_response_stream_chunk
param_discriminator: stream
retrieve: get /v1alpha/agents/{agent_id}/session/{session_id}/turn/{turn_id}
resume:
type: http
endpoint: post /v1alpha/agents/{agent_id}/session/{session_id}/turn/{turn_id}/resume
streaming:
stream_event_model: alpha.agents.turn.agent_turn_response_stream_chunk
param_discriminator: stream
beta:
subresources:
datasets:
models:
list_datasets_response: ListDatasetsResponse
methods:
register: post /v1beta/datasets
retrieve: get /v1beta/datasets/{dataset_id}
list:
endpoint: get /v1beta/datasets
paginated: false
unregister: delete /v1beta/datasets/{dataset_id}
iterrows: get /v1beta/datasetio/iterrows/{dataset_id}
appendrows: post /v1beta/datasetio/append-rows/{dataset_id}
settings:
license: MIT
unwrap_response_fields: [ data ]
openapi:
transformations:
- command: renameValue
reason: pydantic reserved name
args:
filter:
only:
- '$.components.schemas.InferenceStep.properties.model_response'
rename:
python:
property_name: 'inference_model_response'
# - command: renameValue
# reason: pydantic reserved name
# args:
# filter:
# only:
# - '$.components.schemas.Model.properties.model_type'
# rename:
# python:
# property_name: 'type'
- command: mergeObject
reason: Better return_type using enum
args:
target:
- '$.components.schemas'
object:
ReturnType:
additionalProperties: false
properties:
type:
enum:
- string
- number
- boolean
- array
- object
- json
- union
- chat_completion_input
- completion_input
- agent_turn_input
required:
- type
type: object
- command: replaceProperties
reason: Replace return type properties with better model (see above)
args:
filter:
only:
- '$.components.schemas.ScoringFn.properties.return_type'
- '$.components.schemas.RegisterScoringFunctionRequest.properties.return_type'
value:
$ref: '#/components/schemas/ReturnType'
- command: oneOfToAnyOf
reason: Prism (mock server) doesn't like one of our requests as it technically matches multiple variants
- reason: For better names
command: extractToRefs
args:
ref:
target: '$.components.schemas.ToolCallDelta.properties.tool_call'
name: '#/components/schemas/ToolCallOrString'
# `readme` is used to configure the code snippets that will be rendered in the
# README.md of various SDKs. In particular, you can change the `headline`
# snippet's endpoint and the arguments to call it with.
readme:
example_requests:
default:
type: request
endpoint: post /v1/chat/completions
params: &ref_0 {}
headline:
type: request
endpoint: post /v1/models
params: *ref_0
pagination:
type: request
endpoint: post /v1/chat/completions
params: {}

File diff suppressed because it is too large Load diff

137
containers/Containerfile Normal file
View file

@ -0,0 +1,137 @@
# syntax=docker/dockerfile:1.6
#
# This Dockerfile is used to build the Llama Stack container image.
# Example:
# docker build \
# -f containers/Containerfile \
# --build-arg DISTRO_NAME=starter \
# --tag llama-stack:starter .
ARG BASE_IMAGE=python:3.12-slim
FROM ${BASE_IMAGE}
ARG INSTALL_MODE="pypi"
ARG LLAMA_STACK_DIR="/workspace"
ARG LLAMA_STACK_CLIENT_DIR=""
ARG PYPI_VERSION=""
ARG TEST_PYPI_VERSION=""
ARG KEEP_WORKSPACE=""
ARG DISTRO_NAME="starter"
ARG RUN_CONFIG_PATH=""
ARG UV_HTTP_TIMEOUT=500
ENV UV_HTTP_TIMEOUT=${UV_HTTP_TIMEOUT}
ENV PYTHONDONTWRITEBYTECODE=1
ENV PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /app
RUN set -eux; \
if command -v dnf >/dev/null 2>&1; then \
dnf -y update && \
dnf install -y iputils git net-tools wget \
vim-minimal python3.12 python3.12-pip python3.12-wheel \
python3.12-setuptools python3.12-devel gcc gcc-c++ make && \
ln -sf /usr/bin/pip3.12 /usr/local/bin/pip && \
ln -sf /usr/bin/python3.12 /usr/local/bin/python && \
dnf clean all; \
elif command -v apt-get >/dev/null 2>&1; then \
apt-get update && \
apt-get install -y --no-install-recommends \
iputils-ping net-tools iproute2 dnsutils telnet \
curl wget git procps psmisc lsof traceroute bubblewrap \
gcc g++ && \
rm -rf /var/lib/apt/lists/*; \
else \
echo "Unsupported base image: expected dnf or apt-get" >&2; \
exit 1; \
fi
RUN pip install --no-cache-dir uv
ENV UV_SYSTEM_PYTHON=1
ENV INSTALL_MODE=${INSTALL_MODE}
ENV LLAMA_STACK_DIR=${LLAMA_STACK_DIR}
ENV LLAMA_STACK_CLIENT_DIR=${LLAMA_STACK_CLIENT_DIR}
ENV PYPI_VERSION=${PYPI_VERSION}
ENV TEST_PYPI_VERSION=${TEST_PYPI_VERSION}
ENV KEEP_WORKSPACE=${KEEP_WORKSPACE}
ENV DISTRO_NAME=${DISTRO_NAME}
ENV RUN_CONFIG_PATH=${RUN_CONFIG_PATH}
# Copy the repository so editable installs and run configurations are available.
COPY . /workspace
# Install the client package if it is provided
# NOTE: this is installed before llama-stack since llama-stack depends on llama-stack-client-python
RUN set -eux; \
if [ -n "$LLAMA_STACK_CLIENT_DIR" ]; then \
if [ ! -d "$LLAMA_STACK_CLIENT_DIR" ]; then \
echo "LLAMA_STACK_CLIENT_DIR is set but $LLAMA_STACK_CLIENT_DIR does not exist" >&2; \
exit 1; \
fi; \
uv pip install --no-cache-dir -e "$LLAMA_STACK_CLIENT_DIR"; \
fi;
# Install llama-stack
RUN set -eux; \
if [ "$INSTALL_MODE" = "editable" ]; then \
if [ ! -d "$LLAMA_STACK_DIR" ]; then \
echo "INSTALL_MODE=editable requires LLAMA_STACK_DIR to point to a directory inside the build context" >&2; \
exit 1; \
fi; \
uv pip install --no-cache-dir -e "$LLAMA_STACK_DIR"; \
elif [ "$INSTALL_MODE" = "test-pypi" ]; then \
uv pip install --no-cache-dir fastapi libcst; \
if [ -n "$TEST_PYPI_VERSION" ]; then \
uv pip install --no-cache-dir --extra-index-url https://test.pypi.org/simple/ --index-strategy unsafe-best-match "llama-stack==$TEST_PYPI_VERSION"; \
else \
uv pip install --no-cache-dir --extra-index-url https://test.pypi.org/simple/ --index-strategy unsafe-best-match llama-stack; \
fi; \
else \
if [ -n "$PYPI_VERSION" ]; then \
uv pip install --no-cache-dir "llama-stack==$PYPI_VERSION"; \
else \
uv pip install --no-cache-dir llama-stack; \
fi; \
fi;
# Install the dependencies for the distribution
RUN set -eux; \
if [ -z "$DISTRO_NAME" ]; then \
echo "DISTRO_NAME must be provided" >&2; \
exit 1; \
fi; \
deps="$(llama stack list-deps "$DISTRO_NAME")"; \
if [ -n "$deps" ]; then \
printf '%s\n' "$deps" | xargs -L1 uv pip install --no-cache-dir; \
fi
# Cleanup
RUN set -eux; \
pip uninstall -y uv; \
should_remove=1; \
if [ -n "$KEEP_WORKSPACE" ]; then should_remove=0; fi; \
if [ "$INSTALL_MODE" = "editable" ]; then should_remove=0; fi; \
case "$RUN_CONFIG_PATH" in \
/workspace*) should_remove=0 ;; \
esac; \
if [ "$should_remove" -eq 1 ] && [ -d /workspace ]; then rm -rf /workspace; fi
RUN cat <<'EOF' >/usr/local/bin/llama-stack-entrypoint.sh
#!/bin/sh
set -e
if [ -n "$RUN_CONFIG_PATH" ] && [ -f "$RUN_CONFIG_PATH" ]; then
exec llama stack run "$RUN_CONFIG_PATH" "$@"
fi
if [ -n "$DISTRO_NAME" ]; then
exec llama stack run "$DISTRO_NAME" "$@"
fi
exec llama stack run "$@"
EOF
RUN chmod +x /usr/local/bin/llama-stack-entrypoint.sh
RUN mkdir -p /.llama /.cache && chmod -R g+rw /app /.llama /.cache
ENTRYPOINT ["/usr/local/bin/llama-stack-entrypoint.sh"]

View file

@ -51,8 +51,8 @@ device: cpu
You can access the HuggingFace trainer via the `starter` distribution:
```bash
llama stack build --distro starter --image-type venv
llama stack run --image-type venv ~/.llama/distributions/starter/starter-run.yaml
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```
### Usage Example

View file

@ -175,8 +175,7 @@ llama-stack-client benchmarks register \
**1. Start the Llama Stack API Server**
```bash
# Build and run a distribution (example: together)
llama stack build --distro together --image-type venv
llama stack list-deps together | xargs -L1 uv pip install
llama stack run together
```
@ -209,7 +208,7 @@ The playground works with any Llama Stack distribution. Popular options include:
<TabItem value="together" label="Together AI">
```bash
llama stack build --distro together --image-type venv
llama stack list-deps together | xargs -L1 uv pip install
llama stack run together
```
@ -222,7 +221,7 @@ llama stack run together
<TabItem value="ollama" label="Ollama (Local)">
```bash
llama stack build --distro ollama --image-type venv
llama stack list-deps ollama | xargs -L1 uv pip install
llama stack run ollama
```
@ -235,7 +234,7 @@ llama stack run ollama
<TabItem value="meta-reference" label="Meta Reference">
```bash
llama stack build --distro meta-reference --image-type venv
llama stack list-deps meta-reference | xargs -L1 uv pip install
llama stack run meta-reference
```

View file

@ -10,358 +10,114 @@ import TabItem from '@theme/TabItem';
# Retrieval Augmented Generation (RAG)
RAG enables your applications to reference and recall information from previous interactions or external documents.
RAG enables your applications to reference and recall information from external documents. Llama Stack makes Agentic RAG available through OpenAI's Responses API.
## Quick Start
### 1. Start the Server
In one terminal, start the Llama Stack server:
```bash
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```
### 2. Connect with OpenAI Client
In another terminal, use the standard OpenAI client with the Responses API:
```python
import io, requests
from openai import OpenAI
url = "https://www.paulgraham.com/greatwork.html"
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
# Create vector store - auto-detects default embedding model
vs = client.vector_stores.create()
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)
resp = client.responses.create(
model="gpt-4o",
input="How do you do great work? Use the existing knowledge_search tool.",
tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
include=["file_search_call.results"],
)
print(resp.output[-1].content[-1].text)
```
Which should give output like:
```
Doing great work is about more than just hard work and ambition; it involves combining several elements:
1. **Pursue What Excites You**: Engage in projects that are both ambitious and exciting to you. It's important to work on something you have a natural aptitude for and a deep interest in.
2. **Explore and Discover**: Great work often feels like a blend of discovery and creation. Focus on seeing possibilities and let ideas take their natural shape, rather than just executing a plan.
3. **Be Bold Yet Flexible**: Take bold steps in your work without over-planning. An adaptable approach that evolves with new ideas can often lead to breakthroughs.
4. **Work on Your Own Projects**: Develop a habit of working on projects of your own choosing, as these often lead to great achievements. These should be projects you find exciting and that challenge you intellectually.
5. **Be Earnest and Authentic**: Approach your work with earnestness and authenticity. Trying to impress others with affectation can be counterproductive, as genuine effort and intellectual honesty lead to better work outcomes.
6. **Build a Supportive Environment**: Work alongside great colleagues who inspire you and enhance your work. Surrounding yourself with motivating individuals creates a fertile environment for great work.
7. **Maintain High Morale**: High morale significantly impacts your ability to do great work. Stay optimistic and protect your mental well-being to maintain progress and momentum.
8. **Balance**: While hard work is essential, overworking can lead to diminishing returns. Balance periods of intensive work with rest to sustain productivity over time.
This approach shows that great work is less about following a strict formula and more about aligning your interests, ambition, and environment to foster creativity and innovation.
```
## Architecture Overview
Llama Stack organizes the APIs that enable RAG into three layers:
Llama Stack provides OpenAI-compatible RAG capabilities through:
1. **Lower-Level APIs**: Deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon)
2. **RAG Tool**: A first-class tool as part of the [Tools API](./tools) that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly
3. **Agents API**: The top-level [Agents API](./agent) that allows you to create agents that can use the tools to answer questions, perform tasks, and more
- **Vector Stores API**: OpenAI-compatible vector storage with automatic embedding model detection
- **Files API**: Document upload and processing using OpenAI's file format
- **Responses API**: Enhanced chat completions with agentic tool calling via file search
![RAG System Architecture](/img/rag.png)
## Configuring Default Embedding Models
The RAG system uses lower-level storage for different types of data:
- **Vector IO**: For semantic search and retrieval
- **Key-Value and Relational IO**: For structured data storage
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your run.yaml like so:
:::info[Future Storage Types]
We may add more storage types like Graph IO in the future.
:::
## Setting up Vector Databases
For this guide, we will use [Ollama](https://ollama.com/) as the inference provider. Ollama is an LLM runtime that allows you to run Llama models locally.
Here's how to set up a vector database for RAG:
```python
# Create HTTP client
import os
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}")
# Register a vector database
vector_db_id = "my_documents"
response = client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model="all-MiniLM-L6-v2",
embedding_dimension=384,
provider_id="faiss",
)
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
```
## Document Ingestion
With this configuration:
- `client.vector_stores.create()` works without requiring embedding model or provider parameters
- The system automatically uses the default vector store provider (`faiss`) when multiple providers are available
- The system automatically uses the default embedding model (`sentence-transformers/nomic-ai/nomic-embed-text-v1.5`) for any newly created vector store
- The `default_provider_id` specifies which vector storage backend to use
- The `default_embedding_model` specifies both the inference provider and model for embeddings
You can ingest documents into the vector database using two methods: directly inserting pre-chunked documents or using the RAG Tool.
## Vector Store Operations
### Direct Document Insertion
### Creating Vector Stores
<Tabs>
<TabItem value="basic" label="Basic Insertion">
You can create vector stores with automatic or explicit embedding model selection:
```python
# You can insert a pre-chunked document directly into the vector db
chunks = [
{
"content": "Your document text here",
"mime_type": "text/plain",
"metadata": {
"document_id": "doc1",
"author": "Jane Doe",
},
},
]
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
```
# Automatic - uses default configured embedding model and vector store provider
vs = client.vector_stores.create()
</TabItem>
<TabItem value="embeddings" label="With Precomputed Embeddings">
If you decide to precompute embeddings for your documents, you can insert them directly into the vector database by including the embedding vectors in the chunk data. This is useful if you have a separate embedding service or if you want to customize the ingestion process.
```python
chunks_with_embeddings = [
{
"content": "First chunk of text",
"mime_type": "text/plain",
"embedding": [0.1, 0.2, 0.3, ...], # Your precomputed embedding vector
"metadata": {"document_id": "doc1", "section": "introduction"},
},
{
"content": "Second chunk of text",
"mime_type": "text/plain",
"embedding": [0.2, 0.3, 0.4, ...], # Your precomputed embedding vector
"metadata": {"document_id": "doc1", "section": "methodology"},
},
]
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks_with_embeddings)
```
:::warning[Embedding Dimensions]
When providing precomputed embeddings, ensure the embedding dimension matches the `embedding_dimension` specified when registering the vector database.
:::
</TabItem>
</Tabs>
### Document Retrieval
You can query the vector database to retrieve documents based on their embeddings.
```python
# You can then query for these chunks
chunks_response = client.vector_io.query(
vector_db_id=vector_db_id,
query="What do you know about..."
# Explicit - specify embedding model and/or provider when you need specific ones
vs = client.vector_stores.create(
extra_body={
"provider_id": "faiss", # Optional: specify vector store provider
"embedding_model": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
"embedding_dimension": 768 # Optional: will be auto-detected if not provided
}
)
```
## Using the RAG Tool
:::danger[Deprecation Notice]
The RAG Tool is being deprecated in favor of directly using the OpenAI-compatible Search API. We recommend migrating to the OpenAI APIs for better compatibility and future support.
:::
A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces. More examples for how to format a RAGDocument can be found in the [appendix](#more-ragdocument-examples).
### OpenAI API Integration & Migration
The RAG tool has been updated to use OpenAI-compatible APIs. This provides several benefits:
- **Files API Integration**: Documents are now uploaded using OpenAI's file upload endpoints
- **Vector Stores API**: Vector storage operations use OpenAI's vector store format with configurable chunking strategies
- **Error Resilience**: When processing multiple documents, individual failures are logged but don't crash the operation. Failed documents are skipped while successful ones continue processing.
### Migration Path
We recommend migrating to the OpenAI-compatible Search API for:
1. **Better OpenAI Ecosystem Integration**: Direct compatibility with OpenAI tools and workflows including the Responses API
2. **Future-Proof**: Continued support and feature development
3. **Full OpenAI Compatibility**: Vector Stores, Files, and Search APIs are fully compatible with OpenAI's Responses API
The OpenAI APIs are used under the hood, so you can continue to use your existing RAG Tool code with minimal changes. However, we recommend updating your code to use the new OpenAI-compatible APIs for better long-term support. If any documents fail to process, they will be logged in the response but will not cause the entire operation to fail.
### RAG Tool Example
```python
from llama_stack_client import RAGDocument
urls = ["memory_optimizations.rst", "chat.rst", "llama3.rst"]
documents = [
RAGDocument(
document_id=f"num-{i}",
content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
mime_type="text/plain",
metadata={},
)
for i, url in enumerate(urls)
]
client.tool_runtime.rag_tool.insert(
documents=documents,
vector_db_id=vector_db_id,
chunk_size_in_tokens=512,
)
# Query documents
results = client.tool_runtime.rag_tool.query(
vector_db_ids=[vector_db_id],
content="What do you know about...",
)
```
### Custom Context Configuration
You can configure how the RAG tool adds metadata to the context if you find it useful for your application:
```python
# Query documents with custom template
results = client.tool_runtime.rag_tool.query(
vector_db_ids=[vector_db_id],
content="What do you know about...",
query_config={
"chunk_template": "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n",
},
)
```
## Building RAG-Enhanced Agents
One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example:
### Agent with Knowledge Search
```python
from llama_stack_client import Agent
# Create agent with memory
agent = Agent(
client,
model="meta-llama/Llama-3.3-70B-Instruct",
instructions="You are a helpful assistant",
tools=[
{
"name": "builtin::rag/knowledge_search",
"args": {
"vector_db_ids": [vector_db_id],
# Defaults
"query_config": {
"chunk_size_in_tokens": 512,
"chunk_overlap_in_tokens": 0,
"chunk_template": "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n",
},
},
}
],
)
session_id = agent.create_session("rag_session")
# Ask questions about documents in the vector db, and the agent will query the db to answer the question.
response = agent.create_turn(
messages=[{"role": "user", "content": "How to optimize memory in PyTorch?"}],
session_id=session_id,
)
```
:::tip[Agent Instructions]
The `instructions` field in the `AgentConfig` can be used to guide the agent's behavior. It is important to experiment with different instructions to see what works best for your use case.
:::
### Document-Aware Conversations
You can also pass documents along with the user's message and ask questions about them:
```python
# Initial document ingestion
response = agent.create_turn(
messages=[
{"role": "user", "content": "I am providing some documents for reference."}
],
documents=[
{
"content": "https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/memory_optimizations.rst",
"mime_type": "text/plain",
}
],
session_id=session_id,
)
# Query with RAG
response = agent.create_turn(
messages=[{"role": "user", "content": "What are the key topics in the documents?"}],
session_id=session_id,
)
```
### Viewing Agent Responses
You can print the response with the following:
```python
from llama_stack_client import AgentEventLogger
for log in AgentEventLogger().log(response):
log.print()
```
## Vector Database Management
### Unregistering Vector DBs
If you need to clean up and unregister vector databases, you can do so as follows:
<Tabs>
<TabItem value="single" label="Single Database">
```python
# Unregister a specified vector database
vector_db_id = "my_vector_db_id"
print(f"Unregistering vector database: {vector_db_id}")
client.vector_dbs.unregister(vector_db_id=vector_db_id)
```
</TabItem>
<TabItem value="all" label="All Databases">
```python
# Unregister all vector databases
for vector_db_id in client.vector_dbs.list():
print(f"Unregistering vector database: {vector_db_id.identifier}")
client.vector_dbs.unregister(vector_db_id=vector_db_id.identifier)
```
</TabItem>
</Tabs>
## Best Practices
### 🎯 **Document Chunking**
- Use appropriate chunk sizes (512 tokens is often a good starting point)
- Consider overlap between chunks for better context preservation
- Experiment with different chunking strategies for your content type
### 🔍 **Embedding Strategy**
- Choose embedding models that match your domain
- Consider the trade-off between embedding dimension and performance
- Test different embedding models for your specific use case
### 📊 **Query Optimization**
- Use specific, well-formed queries for better retrieval
- Experiment with different search strategies
- Consider hybrid approaches (keyword + semantic search)
### 🛡️ **Error Handling**
- Implement proper error handling for failed document processing
- Monitor ingestion success rates
- Have fallback strategies for retrieval failures
## Appendix
### More RAGDocument Examples
Here are various ways to create RAGDocument objects for different content types:
```python
from llama_stack_client import RAGDocument
import base64
# File URI
RAGDocument(document_id="num-0", content={"uri": "file://path/to/file"})
# Plain text
RAGDocument(document_id="num-1", content="plain text")
# Explicit text input
RAGDocument(
document_id="num-2",
content={
"type": "text",
"text": "plain text input",
}, # for inputs that should be treated as text explicitly
)
# Image from URL
RAGDocument(
document_id="num-3",
content={
"type": "image",
"image": {"url": {"uri": "https://mywebsite.com/image.jpg"}},
},
)
# Base64 encoded image
B64_ENCODED_IMAGE = base64.b64encode(
requests.get(
"https://raw.githubusercontent.com/meta-llama/llama-stack/refs/heads/main/docs/_static/llama-stack.png"
).content
)
RAGDocument(
document_id="num-4",
content={"type": "image", "image": {"data": B64_ENCODED_IMAGE}},
)
```
For more strongly typed interaction use the typed dicts found [here](https://github.com/meta-llama/llama-stack-client-python/blob/38cd91c9e396f2be0bec1ee96a19771582ba6f17/src/llama_stack_client/types/shared_params/document.py).

View file

@ -10,58 +10,8 @@ import TabItem from '@theme/TabItem';
# Telemetry
The Llama Stack telemetry system provides comprehensive tracing, metrics, and logging capabilities. It supports multiple sink types including OpenTelemetry, SQLite, and Console output for complete observability of your AI applications.
The Llama Stack uses OpenTelemetry to provide comprehensive tracing, metrics, and logging capabilities.
## Event Types
The telemetry system supports three main types of events:
<Tabs>
<TabItem value="unstructured" label="Unstructured Logs">
Free-form log messages with severity levels for general application logging:
```python
unstructured_log_event = UnstructuredLogEvent(
message="This is a log message",
severity=LogSeverity.INFO
)
```
</TabItem>
<TabItem value="metrics" label="Metric Events">
Numerical measurements with units for tracking performance and usage:
```python
metric_event = MetricEvent(
metric="my_metric",
value=10,
unit="count"
)
```
</TabItem>
<TabItem value="structured" label="Structured Logs">
System events like span start/end that provide structured operation tracking:
```python
structured_log_event = SpanStartPayload(
name="my_span",
parent_span_id="parent_span_id"
)
```
</TabItem>
</Tabs>
## Spans and Traces
- **Spans**: Represent individual operations with timing information and hierarchical relationships
- **Traces**: Collections of related spans that form a complete request flow across your application
This hierarchical structure allows you to understand the complete execution path of requests through your Llama Stack application.
## Automatic Metrics Generation
@ -129,21 +79,6 @@ Send events to an OpenTelemetry Collector for integration with observability pla
- Compatible with all OpenTelemetry collectors
- Supports both traces and metrics
</TabItem>
<TabItem value="sqlite" label="SQLite">
Store events in a local SQLite database for direct querying:
**Use Cases:**
- Local development and debugging
- Custom analytics and reporting
- Offline analysis of application behavior
**Features:**
- Direct SQL querying capabilities
- Persistent local storage
- No external dependencies
</TabItem>
<TabItem value="console" label="Console">
@ -174,9 +109,8 @@ telemetry:
provider_type: inline::meta-reference
config:
service_name: "llama-stack-service"
sinks: ['console', 'sqlite', 'otel_trace', 'otel_metric']
sinks: ['console', 'otel_trace', 'otel_metric']
otel_exporter_otlp_endpoint: "http://localhost:4318"
sqlite_db_path: "/path/to/telemetry.db"
```
### Environment Variables
@ -185,23 +119,23 @@ Configure telemetry behavior using environment variables:
- **`OTEL_EXPORTER_OTLP_ENDPOINT`**: OpenTelemetry Collector endpoint (default: `http://localhost:4318`)
- **`OTEL_SERVICE_NAME`**: Service name for telemetry (default: empty string)
- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `console,sqlite`)
- **`TELEMETRY_SINKS`**: Comma-separated list of sinks (default: `[]`)
## Visualization with Jaeger
### Quick Setup: Complete Telemetry Stack
The `otel_trace` sink works with any service compatible with the OpenTelemetry collector. Traces and metrics use separate endpoints but can share the same collector.
### Starting Jaeger
Start a Jaeger instance with OTLP HTTP endpoint at 4318 and the Jaeger UI at 16686:
Use the automated setup script to launch the complete telemetry stack (Jaeger, OpenTelemetry Collector, Prometheus, and Grafana):
```bash
docker run --pull always --rm --name jaeger \
-p 16686:16686 -p 4318:4318 \
jaegertracing/jaeger:2.1.0
./scripts/telemetry/setup_telemetry.sh
```
Once running, you can visualize traces by navigating to [http://localhost:16686/](http://localhost:16686/).
This sets up:
- **Jaeger UI**: http://localhost:16686 (traces visualization)
- **Prometheus**: http://localhost:9090 (metrics)
- **Grafana**: http://localhost:3000 (dashboards with auto-configured data sources)
- **OTEL Collector**: http://localhost:4318 (OTLP endpoint)
Once running, you can visualize traces by navigating to [Grafana](http://localhost:3000/) and login with login `admin` and password `admin`.
## Querying Metrics
@ -248,37 +182,10 @@ Forward metrics to other observability systems:
</TabItem>
</Tabs>
## SQLite Querying
The `sqlite` sink allows you to query traces without an external system. This is particularly useful for development and custom analytics.
### Example Queries
```sql
-- Query recent traces
SELECT * FROM traces WHERE timestamp > datetime('now', '-1 hour');
-- Analyze span durations
SELECT name, AVG(duration_ms) as avg_duration
FROM spans
GROUP BY name
ORDER BY avg_duration DESC;
-- Find slow operations
SELECT * FROM spans
WHERE duration_ms > 1000
ORDER BY duration_ms DESC;
```
:::tip[Advanced Analytics]
Refer to the [Getting Started notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for more examples on querying traces and spans programmatically.
:::
## Best Practices
### 🔍 **Monitoring Strategy**
- Use OpenTelemetry for production environments
- Combine multiple sinks for development (console + SQLite)
- Set up alerts on key metrics like token usage and error rates
### 📊 **Metrics Analysis**
@ -293,45 +200,8 @@ Refer to the [Getting Started notebook](https://github.com/meta-llama/llama-stac
### 🔧 **Configuration Management**
- Use environment variables for flexible deployment
- Configure appropriate retention policies for SQLite
- Ensure proper network access to OpenTelemetry collectors
## Integration Examples
### Basic Telemetry Setup
```python
from llama_stack_client import LlamaStackClient
# Client with telemetry headers
client = LlamaStackClient(
base_url="http://localhost:8000",
extra_headers={
"X-Telemetry-Service": "my-ai-app",
"X-Telemetry-Version": "1.0.0"
}
)
# All API calls will be automatically traced
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-3B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
```
### Custom Telemetry Context
```python
# Add custom span attributes for better tracking
with tracer.start_as_current_span("custom_operation") as span:
span.set_attribute("user_id", "user123")
span.set_attribute("operation_type", "chat_completion")
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-3B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
```
## Related Resources

View file

@ -219,13 +219,10 @@ group_tools = client.tools.list_tools(toolgroup_id="search_tools")
<TabItem value="setup" label="Setup & Configuration">
1. Start by registering a Tavily API key at [Tavily](https://tavily.com/).
2. [Optional] Provide the API key directly to the Llama Stack server
2. [Optional] Set the API key in your environment before starting the Llama Stack server
```bash
export TAVILY_SEARCH_API_KEY="your key"
```
```bash
--env TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY}
```
</TabItem>
<TabItem value="implementation" label="Implementation">
@ -273,9 +270,9 @@ for log in EventLogger().log(response):
<TabItem value="setup" label="Setup & Configuration">
1. Start by registering for a WolframAlpha API key at [WolframAlpha Developer Portal](https://developer.wolframalpha.com/access).
2. Provide the API key either when starting the Llama Stack server:
2. Provide the API key either by setting it in your environment before starting the Llama Stack server:
```bash
--env WOLFRAM_ALPHA_API_KEY=${WOLFRAM_ALPHA_API_KEY}
export WOLFRAM_ALPHA_API_KEY="your key"
```
or from the client side:
```python

View file

@ -62,6 +62,10 @@ The new `/v2` API must be introduced alongside the existing `/v1` API and run in
When a `/v2` API is introduced, a clear and generous deprecation policy for the `/v1` API must be published simultaneously. This policy must outline the timeline for the eventual removal of the `/v1` API, giving users ample time to migrate.
### Deprecated APIs
Deprecated APIs are those that are no longer actively maintained or supported. Depreated APIs are marked with the flag `deprecated = True` in the OpenAPI spec. These APIs will be removed in a future release.
### API Stability vs. Provider Stability
The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.

View file

@ -152,7 +152,6 @@ __all__ = ["WeatherAPI", "available_providers"]
from typing import Protocol
from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
ProviderSpec,
RemoteProviderSpec,
@ -166,12 +165,10 @@ def available_providers() -> list[ProviderSpec]:
api=Api.weather,
provider_type="remote::kaze",
config_class="llama_stack_provider_kaze.KazeProviderConfig",
adapter=AdapterSpec(
adapter_type="kaze",
module="llama_stack_provider_kaze",
pip_packages=["llama_stack_provider_kaze"],
config_class="llama_stack_provider_kaze.KazeProviderConfig",
),
adapter_type="kaze",
module="llama_stack_provider_kaze",
pip_packages=["llama_stack_provider_kaze"],
config_class="llama_stack_provider_kaze.KazeProviderConfig",
),
]
@ -325,11 +322,10 @@ class WeatherKazeAdapter(WeatherProvider):
```yaml
# ~/.llama/providers.d/remote/weather/kaze.yaml
adapter:
adapter_type: kaze
pip_packages: ["llama_stack_provider_kaze"]
config_class: llama_stack_provider_kaze.config.KazeProviderConfig
module: llama_stack_provider_kaze
adapter_type: kaze
pip_packages: ["llama_stack_provider_kaze"]
config_class: llama_stack_provider_kaze.config.KazeProviderConfig
module: llama_stack_provider_kaze
optional_api_dependencies: []
```
@ -361,7 +357,7 @@ server:
8. Run the server:
```bash
python -m llama_stack.core.server.server --yaml-config ~/.llama/run-byoa.yaml
llama stack run ~/.llama/run-byoa.yaml
```
9. Test the API:

View file

@ -158,17 +158,16 @@ under the LICENSE file in the root directory of this source tree.
Some tips about common tasks you work on while contributing to Llama Stack:
### Using `llama stack build`
### Setup for development
Building a stack image will use the production version of the `llama-stack` and `llama-stack-client` packages. If you are developing with a llama-stack repository checked out and need your code to be reflected in the stack image, set `LLAMA_STACK_DIR` and `LLAMA_STACK_CLIENT_DIR` to the appropriate checked out directories when running any of the `llama` CLI commands.
Example:
```bash
cd work/
git clone https://github.com/meta-llama/llama-stack.git
git clone https://github.com/meta-llama/llama-stack-client-python.git
cd llama-stack
LLAMA_STACK_DIR=$(pwd) LLAMA_STACK_CLIENT_DIR=../llama-stack-client-python llama stack build --distro <...>
uv run llama stack list-deps <distro-name> | xargs -L1 uv pip install
# (Optional) If you are developing the llama-stack-client-python package, you can add it as an editable package.
git clone https://github.com/meta-llama/llama-stack-client-python.git
uv add --editable ../llama-stack-client-python
```
### Updating distribution configurations

View file

@ -67,7 +67,7 @@ def get_base_url(self) -> str:
## Testing the Provider
Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, you should install dependencies via `llama stack build --distro together`.
Before running tests, you must have required dependencies installed. This depends on the providers or distributions you are testing. For example, if you are testing the `together` distribution, install its dependencies with `llama stack list-deps together | xargs -L1 uv pip install`.
### 1. Integration Testing
@ -76,7 +76,7 @@ Integration tests are located in [tests/integration](https://github.com/meta-lla
Consult [tests/integration/README.md](https://github.com/meta-llama/llama-stack/blob/main/tests/integration/README.md) for more details on how to run the tests.
Note that each provider's `sample_run_config()` method (in the configuration class for that provider)
typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
typically references some environment variables for specifying API keys and the like. You can set these in the environment before running the test command.
### 2. Unit Testing

View file

@ -68,7 +68,9 @@ recordings/
Direct API calls with no recording or replay:
```python
with inference_recording(mode=InferenceMode.LIVE):
from llama_stack.testing.api_recorder import api_recording, APIRecordingMode
with api_recording(mode=APIRecordingMode.LIVE):
response = await client.chat.completions.create(...)
```
@ -79,7 +81,7 @@ Use for initial development and debugging against real APIs.
Captures API interactions while passing through real responses:
```python
with inference_recording(mode=InferenceMode.RECORD, storage_dir="./recordings"):
with api_recording(mode=APIRecordingMode.RECORD, storage_dir="./recordings"):
response = await client.chat.completions.create(...)
# Real API call made, response captured AND returned
```
@ -96,7 +98,7 @@ The recording process:
Returns stored responses instead of making API calls:
```python
with inference_recording(mode=InferenceMode.REPLAY, storage_dir="./recordings"):
with api_recording(mode=APIRecordingMode.REPLAY, storage_dir="./recordings"):
response = await client.chat.completions.create(...)
# No API call made, cached response returned instantly
```

View file

@ -170,7 +170,7 @@ spec:
- name: llama-stack
image: localhost/llama-stack-run-k8s:latest
imagePullPolicy: IfNotPresent
command: ["python", "-m", "llama_stack.core.server.server", "--config", "/app/config.yaml"]
command: ["llama", "stack", "run", "/app/config.yaml"]
ports:
- containerPort: 5000
volumeMounts:

View file

@ -5,225 +5,79 @@ sidebar_label: Build your own Distribution
sidebar_position: 3
---
This guide will walk you through the steps to get started with building a Llama Stack distribution from scratch with your choice of API providers.
This guide walks you through inspecting existing distributions, customising their configuration, and building runnable artefacts for your own deployment.
### Explore existing distributions
### Setting your log level
All first-party distributions live under `llama_stack/distributions/`. Each directory contains:
In order to specify the proper logging level users can apply the following environment variable `LLAMA_STACK_LOGGING` with the following format:
- `build.yaml` the distribution specification (providers, additional dependencies, optional external provider directories).
- `run.yaml` sample run configuration (when provided).
- Documentation fragments that power this site.
`LLAMA_STACK_LOGGING=server=debug;core=info`
Where each category in the following list:
- all
- core
- server
- router
- inference
- agents
- safety
- eval
- tools
- client
Can be set to any of the following log levels:
- debug
- info
- warning
- error
- critical
The default global log level is `info`. `all` sets the log level for all components.
A user can also set `LLAMA_STACK_LOG_FILE` which will pipe the logs to the specified path as well as to the terminal. An example would be: `export LLAMA_STACK_LOG_FILE=server.log`
### Llama Stack Build
In order to build your own distribution, we recommend you clone the `llama-stack` repository.
```
git clone git@github.com:meta-llama/llama-stack.git
cd llama-stack
pip install -e .
```
Use the CLI to build your distribution.
The main points to consider are:
1. **Image Type** - Do you want a venv environment or a Container (eg. Docker)
2. **Template** - Do you want to use a template to build your distribution? or start from scratch ?
3. **Config** - Do you want to use a pre-existing config file to build your distribution?
```
llama stack build -h
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--distro DISTRIBUTION] [--list-distros] [--image-type {container,venv}] [--image-name IMAGE_NAME] [--print-deps-only]
[--run] [--providers PROVIDERS]
Build a Llama stack container
options:
-h, --help show this help message and exit
--config CONFIG Path to a config file to use for the build. You can find example configs in llama_stack.cores/**/build.yaml. If this argument is not provided, you will be prompted to
enter information interactively (default: None)
--template TEMPLATE (deprecated) Name of the example template config to use for build. You may use `llama stack build --list-distros` to check out the available distributions (default:
None)
--distro DISTRIBUTION, --distribution DISTRIBUTION
Name of the distribution to use for build. You may use `llama stack build --list-distros` to check out the available distributions (default: None)
--list-distros, --list-distributions
Show the available distributions for building a Llama Stack distribution (default: False)
--image-type {container,venv}
Image Type to use for the build. If not specified, will use the image type from the template config. (default: None)
--image-name IMAGE_NAME
[for image-type=container|venv] Name of the virtual environment to use for the build. If not specified, currently active environment will be used if found. (default:
None)
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
--providers PROVIDERS
Build a config for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple providers per
API. (default: None)
```
After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
Browse that folder to understand available providers and copy a distribution to use as a starting point. When creating a new stack, duplicate an existing directory, rename it, and adjust the `build.yaml` file to match your requirements.
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<Tabs>
<TabItem value="template" label="Building from a template">
To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
<TabItem value="container" label="Building a container">
The following command will allow you to see the available templates and their corresponding providers.
```
llama stack build --list-templates
Use the Containerfile at `containers/Containerfile`, which installs `llama-stack`, resolves distribution dependencies via `llama stack list-deps`, and sets the entrypoint to `llama stack run`.
```bash
docker build . \
-f containers/Containerfile \
--build-arg DISTRO_NAME=starter \
--tag llama-stack:starter
```
```
------------------------------+-----------------------------------------------------------------------------+
| Template Name | Description |
+------------------------------+-----------------------------------------------------------------------------+
| watsonx | Use watsonx for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| vllm-gpu | Use a built-in vLLM engine for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| together | Use Together.AI for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| tgi | Use (an external) TGI server for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| starter | Quick start template for running Llama Stack with several popular providers |
+------------------------------+-----------------------------------------------------------------------------+
| sambanova | Use SambaNova for running LLM inference and safety |
+------------------------------+-----------------------------------------------------------------------------+
| remote-vllm | Use (an external) vLLM server for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| postgres-demo | Quick start template for running Llama Stack with several popular providers |
+------------------------------+-----------------------------------------------------------------------------+
| passthrough | Use Passthrough hosted llama-stack endpoint for LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| open-benchmark | Distribution for running open benchmarks |
+------------------------------+-----------------------------------------------------------------------------+
| ollama | Use (an external) Ollama server for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| nvidia | Use NVIDIA NIM for running LLM inference, evaluation and safety |
+------------------------------+-----------------------------------------------------------------------------+
| meta-reference-gpu | Use Meta Reference for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| llama_api | Distribution for running e2e tests in CI |
+------------------------------+-----------------------------------------------------------------------------+
| hf-serverless | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| hf-endpoint | Use (an external) Hugging Face Inference Endpoint for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| groq | Use Groq for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| fireworks | Use Fireworks.AI for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| experimental-post-training | Experimental template for post training |
+------------------------------+-----------------------------------------------------------------------------+
| dell | Dell's distribution of Llama Stack. TGI inference via Dell's custom |
| | container |
+------------------------------+-----------------------------------------------------------------------------+
| ci-tests | Distribution for running e2e tests in CI |
+------------------------------+-----------------------------------------------------------------------------+
| cerebras | Use Cerebras for running LLM inference |
+------------------------------+-----------------------------------------------------------------------------+
| bedrock | Use AWS Bedrock for running LLM inference and safety |
+------------------------------+-----------------------------------------------------------------------------+
```
Handy build arguments:
You may then pick a template to build your distribution with providers fitted to your liking.
- `DISTRO_NAME` distribution directory name (defaults to `starter`).
- `RUN_CONFIG_PATH` absolute path inside the build context for a run config that should be baked into the image (e.g. `/workspace/run.yaml`).
- `INSTALL_MODE=editable` install the repository copied into `/workspace` with `uv pip install -e`. Pair it with `--build-arg LLAMA_STACK_DIR=/workspace`.
- `LLAMA_STACK_CLIENT_DIR` optional editable install of the Python client.
- `PYPI_VERSION` / `TEST_PYPI_VERSION` pin specific releases when not using editable installs.
- `KEEP_WORKSPACE=1` retain `/workspace` in the final image if you need to access additional files (such as sample configs or provider bundles).
For example, to build a distribution with TGI as the inference provider, you can run:
```
$ llama stack build --distro starter
...
You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-starter/starter-run.yaml`
```
Make sure any custom `build.yaml`, run configs, or provider directories you reference are included in the Docker build context so the Containerfile can read them.
```{tip}
The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](customizing_run_yaml.md).
```
</TabItem>
<TabItem value="scratch" label="Building from Scratch">
<TabItem value="external" label="Building with external providers">
If the provided templates do not fit your use case, you could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
External providers live outside the main repository but can be bundled by pointing `external_providers_dir` to a directory that contains your provider packages.
It would be best to start with a template and understand the structure of the config file and the various concepts ( APIS, providers, resources, etc.) before starting from scratch.
```
llama stack build
1. Copy providers into the build context, for example `cp -R path/to/providers providers.d`.
2. Update `build.yaml` with the directory and provider entries.
3. Adjust run configs to use the in-container path (usually `/.llama/providers.d`). Pass `--build-arg RUN_CONFIG_PATH=/workspace/run.yaml` if you want to bake the config.
> Enter a name for your Llama Stack (e.g. my-local-stack): my-stack
> Enter the image type you want your Llama Stack to be built as (container or venv): venv
Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.
Tip: use <TAB> to see options for the providers.
> Enter provider for API inference: inline::meta-reference
> Enter provider for API safety: inline::llama-guard
> Enter provider for API agents: inline::meta-reference
> Enter provider for API memory: inline::faiss
> Enter provider for API datasetio: inline::meta-reference
> Enter provider for API scoring: inline::meta-reference
> Enter provider for API eval: inline::meta-reference
> Enter provider for API telemetry: inline::meta-reference
> (Optional) Enter a short description for your Llama Stack:
You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml`
```
</TabItem>
<TabItem value="config" label="Building from a pre-existing build config file">
- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
- The config file will be of contents like the ones in `llama_stack/distributions/*build.yaml`.
```
llama stack build --config llama_stack/distributions/starter/build.yaml
```
</TabItem>
<TabItem value="external" label="Building with External Providers">
Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently or use community-provided providers.
To build a distribution with external providers, you need to:
1. Configure the `external_providers_dir` in your build configuration file:
Example `build.yaml` excerpt for a custom Ollama provider:
```yaml
# Example my-external-stack.yaml with external providers
version: '2'
distribution_spec:
description: Custom distro for CI tests
providers:
inference:
- remote::custom_ollama
# Add more providers as needed
image_type: container
image_name: ci-test
# Path to external provider implementations
external_providers_dir: ~/.llama/providers.d
- remote::custom_ollama
external_providers_dir: /workspace/providers.d
```
Inside `providers.d/custom_ollama/provider.py`, define `get_provider_spec()` so the CLI can discover dependencies:
```python
from llama_stack.providers.datatypes import ProviderSpec
def get_provider_spec() -> ProviderSpec:
return ProviderSpec(
provider_type="remote::custom_ollama",
module="llama_stack_ollama_provider",
config_class="llama_stack_ollama_provider.config.OllamaImplConfig",
pip_packages=[
"ollama",
"aiohttp",
"llama-stack-provider-ollama",
],
)
```
Here's an example for a custom Ollama provider:
@ -232,9 +86,9 @@ Here's an example for a custom Ollama provider:
adapter:
adapter_type: custom_ollama
pip_packages:
- ollama
- aiohttp
- llama-stack-provider-ollama # This is the provider package
- ollama
- aiohttp
- llama-stack-provider-ollama # This is the provider package
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
api_dependencies: []
@ -245,54 +99,23 @@ The `pip_packages` section lists the Python packages required by the provider, a
provider package itself. The package must be available on PyPI or can be provided from a local
directory or a git repository (git must be installed on the build environment).
2. Build your distribution using the config file:
For deeper guidance, see the [External Providers documentation](../providers/external/).
```
llama stack build --config my-external-stack.yaml
```
For more information on external providers, including directory structure, provider types, and implementation requirements, see the [External Providers documentation](../providers/external/).
</TabItem>
<TabItem value="container" label="Building Container">
</Tabs>
:::tip Podman Alternative
Podman is supported as an alternative to Docker. Set `CONTAINER_BINARY` to `podman` in your environment to use Podman.
:::
### Run your stack server
To build a container image, you may start off from a template and use the `--image-type container` flag to specify `container` as the build image type.
```
llama stack build --distro starter --image-type container
```
```
$ llama stack build --distro starter --image-type container
...
Containerfile created successfully in /tmp/tmp.viA3a3Rdsg/ContainerfileFROM python:3.10-slim
...
```
You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml`
```
Now set some environment variables for the inference model ID and Llama Stack Port and create a local directory to mount into the container's file system.
After building the image, launch it directly with Docker or Podman—the entrypoint calls `llama stack run` using the baked distribution or the bundled run config:
```bash
export INFERENCE_MODEL="llama3.2:3b"
export LLAMA_STACK_PORT=8321
mkdir -p ~/.llama
```
After this step is successful, you should be able to find the built container image and test it with the below Docker command:
```
docker run -d \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
localhost/distribution-ollama:dev \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env OLLAMA_URL=http://host.docker.internal:11434
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e OLLAMA_URL=http://host.docker.internal:11434 \
llama-stack:starter \
--port $LLAMA_STACK_PORT
```
Here are the docker flags and their uses:
@ -305,141 +128,20 @@ Here are the docker flags and their uses:
* `localhost/distribution-ollama:dev`: The name and tag of the container image to run
* `-e INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the INFERENCE_MODEL environment variable in the container
* `-e OLLAMA_URL=http://host.docker.internal:11434`: Sets the OLLAMA_URL environment variable in the container
* `--port $LLAMA_STACK_PORT`: Port number for the server to listen on
* `--env INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the model to use for inference
* `--env OLLAMA_URL=http://host.docker.internal:11434`: Configures the URL for the Ollama service
</TabItem>
</Tabs>
### Running your Stack server
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
If you prepared a custom run config, mount it into the container and reference it explicitly:
```bash
docker run \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $(pwd)/run.yaml:/app/run.yaml \
llama-stack:starter \
/app/run.yaml
```
llama stack run -h
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
[--image-type {venv}] [--enable-ui]
[config | template]
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
positional arguments:
config | template Path to config file to use for the run or name of known template (`llama stack list` for a list). (default: None)
options:
-h, --help show this help message and exit
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
--image-name IMAGE_NAME
Name of the image to run. Defaults to the current environment (default: None)
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
--image-type {venv}
Image Type used during the build. This should be venv. (default: None)
--enable-ui Start the UI server (default: False)
```
**Note:** Container images built with `llama stack build --image-type container` cannot be run using `llama stack run`. Instead, they must be run directly using Docker or Podman commands as shown in the container building section above.
```
# Start using template name
llama stack run tgi
# Start using config file
llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
# Start using a venv
llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
```
```
$ llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
Serving API inspect
GET /health
GET /providers/list
GET /routes/list
Serving API inference
POST /inference/chat_completion
POST /inference/completion
POST /inference/embeddings
...
Serving API agents
POST /agents/create
POST /agents/session/create
POST /agents/turn/create
POST /agents/delete
POST /agents/session/delete
POST /agents/session/get
POST /agents/step/get
POST /agents/turn/get
Listening on ['::', '0.0.0.0']:8321
INFO: Started server process [2935911]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: 2401:db00:35c:2d2b:face:0:c9:0:54678 - "GET /models/list HTTP/1.1" 200 OK
```
### Listing Distributions
Using the list command, you can view all existing Llama Stack distributions, including stacks built from templates, from scratch, or using custom configuration files.
```
llama stack list -h
usage: llama stack list [-h]
list the build stacks
options:
-h, --help show this help message and exit
```
Example Usage
```
llama stack list
```
```
------------------------------+-----------------------------------------------------------------+--------------+------------+
| Stack Name | Path | Build Config | Run Config |
+------------------------------+-----------------------------------------------------------------------------+--------------+
| together | ~/.llama/distributions/together | Yes | No |
+------------------------------+-----------------------------------------------------------------------------+--------------+
| bedrock | ~/.llama/distributions/bedrock | Yes | No |
+------------------------------+-----------------------------------------------------------------------------+--------------+
| starter | ~/.llama/distributions/starter | Yes | Yes |
+------------------------------+-----------------------------------------------------------------------------+--------------+
| remote-vllm | ~/.llama/distributions/remote-vllm | Yes | Yes |
+------------------------------+-----------------------------------------------------------------------------+--------------+
```
### Removing a Distribution
Use the remove command to delete a distribution you've previously built.
```
llama stack rm -h
usage: llama stack rm [-h] [--all] [name]
Remove the build stack
positional arguments:
name Name of the stack to delete (default: None)
options:
-h, --help show this help message and exit
--all, -a Delete all stacks (use with caution) (default: False)
```
Example
```
llama stack rm llamastack-test
```
To keep your environment organized and avoid clutter, consider using `llama stack list` to review old or unused distributions and `llama stack rm <name>` to delete them when they're no longer needed.
### Troubleshooting
If you encounter any issues, ask questions in our discord or search through our [GitHub Issues](https://github.com/meta-llama/llama-stack/issues), or file an new issue.

View file

@ -44,18 +44,32 @@ providers:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: sqlite
namespace: null
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/agents_store.db
persistence:
agent_state:
backend: kv_default
namespace: agents
responses:
backend: sql_default
table_name: responses
telemetry:
- provider_id: meta-reference
provider_type: inline::meta-reference
config: {}
metadata_store:
namespace: null
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/registry.db
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/kvstore.db
sql_default:
type: sql_sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/sqlstore.db
references:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
models:
- metadata: {}
model_id: ${env.INFERENCE_MODEL}
@ -101,7 +115,7 @@ A few things to note:
- The id is a string you can choose freely.
- You can instantiate any number of provider instances of the same type.
- The configuration dictionary is provider-specific.
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server, you can set environment variables in your shell before running `llama stack run` to override the default values.
### Environment Variable Substitution
@ -173,13 +187,10 @@ optional_token: ${env.OPTIONAL_TOKEN:+}
#### Runtime Override
You can override environment variables at runtime when starting the server:
You can override environment variables at runtime by setting them in your shell before starting the server:
```bash
# Override specific environment variables
llama stack run --config run.yaml --env API_KEY=sk-123 --env BASE_URL=https://custom-api.com
# Or set them in your shell
# Set environment variables in your shell
export API_KEY=sk-123
export BASE_URL=https://custom-api.com
llama stack run --config run.yaml
@ -509,16 +520,16 @@ server:
provider_config:
type: "github_token"
github_api_base_url: "https://api.github.com"
access_policy:
- permit:
principal: user-1
actions: [create, read, delete]
description: user-1 has full access to all resources
- permit:
principal: user-2
actions: [read]
resource: model::model-1
description: user-2 has read access to model-1 only
access_policy:
- permit:
principal: user-1
actions: [create, read, delete]
description: user-1 has full access to all resources
- permit:
principal: user-2
actions: [read]
resource: model::model-1
description: user-2 has read access to model-1 only
```
Similarly, the following restricts access to particular kubernetes

View file

@ -12,7 +12,7 @@ This avoids the overhead of setting up a server.
```bash
# setup
uv pip install llama-stack
llama stack build --distro starter --image-type venv
llama stack list-deps starter | xargs -L1 uv pip install
```
```python

View file

@ -1,56 +1,155 @@
apiVersion: v1
data:
stack_run_config.yaml: "version: '2'\nimage_name: kubernetes-demo\napis:\n- agents\n-
inference\n- files\n- safety\n- telemetry\n- tool_runtime\n- vector_io\nproviders:\n
\ inference:\n - provider_id: vllm-inference\n provider_type: remote::vllm\n
\ config:\n url: ${env.VLLM_URL:=http://localhost:8000/v1}\n max_tokens:
${env.VLLM_MAX_TOKENS:=4096}\n api_token: ${env.VLLM_API_TOKEN:=fake}\n tls_verify:
${env.VLLM_TLS_VERIFY:=true}\n - provider_id: vllm-safety\n provider_type:
remote::vllm\n config:\n url: ${env.VLLM_SAFETY_URL:=http://localhost:8000/v1}\n
\ max_tokens: ${env.VLLM_MAX_TOKENS:=4096}\n api_token: ${env.VLLM_API_TOKEN:=fake}\n
\ tls_verify: ${env.VLLM_TLS_VERIFY:=true}\n - provider_id: sentence-transformers\n
\ provider_type: inline::sentence-transformers\n config: {}\n vector_io:\n
\ - provider_id: ${env.ENABLE_CHROMADB:+chromadb}\n provider_type: remote::chromadb\n
\ config:\n url: ${env.CHROMADB_URL:=}\n kvstore:\n type: postgres\n
\ host: ${env.POSTGRES_HOST:=localhost}\n port: ${env.POSTGRES_PORT:=5432}\n
\ db: ${env.POSTGRES_DB:=llamastack}\n user: ${env.POSTGRES_USER:=llamastack}\n
\ password: ${env.POSTGRES_PASSWORD:=llamastack}\n files:\n - provider_id:
meta-reference-files\n provider_type: inline::localfs\n config:\n storage_dir:
${env.FILES_STORAGE_DIR:=~/.llama/distributions/starter/files}\n metadata_store:\n
\ type: sqlite\n db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/files_metadata.db
\ \n safety:\n - provider_id: llama-guard\n provider_type: inline::llama-guard\n
\ config:\n excluded_categories: []\n agents:\n - provider_id: meta-reference\n
\ provider_type: inline::meta-reference\n config:\n persistence_store:\n
\ type: postgres\n host: ${env.POSTGRES_HOST:=localhost}\n port:
${env.POSTGRES_PORT:=5432}\n db: ${env.POSTGRES_DB:=llamastack}\n user:
${env.POSTGRES_USER:=llamastack}\n password: ${env.POSTGRES_PASSWORD:=llamastack}\n
\ responses_store:\n type: postgres\n host: ${env.POSTGRES_HOST:=localhost}\n
\ port: ${env.POSTGRES_PORT:=5432}\n db: ${env.POSTGRES_DB:=llamastack}\n
\ user: ${env.POSTGRES_USER:=llamastack}\n password: ${env.POSTGRES_PASSWORD:=llamastack}\n
\ telemetry:\n - provider_id: meta-reference\n provider_type: inline::meta-reference\n
\ config:\n service_name: \"${env.OTEL_SERVICE_NAME:=\\u200B}\"\n sinks:
${env.TELEMETRY_SINKS:=console}\n tool_runtime:\n - provider_id: brave-search\n
\ provider_type: remote::brave-search\n config:\n api_key: ${env.BRAVE_SEARCH_API_KEY:+}\n
\ max_results: 3\n - provider_id: tavily-search\n provider_type: remote::tavily-search\n
\ config:\n api_key: ${env.TAVILY_SEARCH_API_KEY:+}\n max_results:
3\n - provider_id: rag-runtime\n provider_type: inline::rag-runtime\n config:
{}\n - provider_id: model-context-protocol\n provider_type: remote::model-context-protocol\n
\ config: {}\nmetadata_store:\n type: postgres\n host: ${env.POSTGRES_HOST:=localhost}\n
\ port: ${env.POSTGRES_PORT:=5432}\n db: ${env.POSTGRES_DB:=llamastack}\n user:
${env.POSTGRES_USER:=llamastack}\n password: ${env.POSTGRES_PASSWORD:=llamastack}\n
\ table_name: llamastack_kvstore\ninference_store:\n type: postgres\n host:
${env.POSTGRES_HOST:=localhost}\n port: ${env.POSTGRES_PORT:=5432}\n db: ${env.POSTGRES_DB:=llamastack}\n
\ user: ${env.POSTGRES_USER:=llamastack}\n password: ${env.POSTGRES_PASSWORD:=llamastack}\nmodels:\n-
metadata:\n embedding_dimension: 384\n model_id: all-MiniLM-L6-v2\n provider_id:
sentence-transformers\n model_type: embedding\n- metadata: {}\n model_id: ${env.INFERENCE_MODEL}\n
\ provider_id: vllm-inference\n model_type: llm\n- metadata: {}\n model_id:
${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}\n provider_id: vllm-safety\n
\ model_type: llm\nshields:\n- shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}\nvector_dbs:
[]\ndatasets: []\nscoring_fns: []\nbenchmarks: []\ntool_groups:\n- toolgroup_id:
builtin::websearch\n provider_id: tavily-search\n- toolgroup_id: builtin::rag\n
\ provider_id: rag-runtime\nserver:\n port: 8321\n auth:\n provider_config:\n
\ type: github_token\n"
stack_run_config.yaml: |
version: '2'
image_name: kubernetes-demo
apis:
- agents
- inference
- files
- safety
- telemetry
- tool_runtime
- vector_io
providers:
inference:
- provider_id: vllm-inference
provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: vllm-safety
provider_type: remote::vllm
config:
url: ${env.VLLM_SAFETY_URL:=http://localhost:8000/v1}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
vector_io:
- provider_id: ${env.ENABLE_CHROMADB:+chromadb}
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL:=}
kvstore:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
files:
- provider_id: meta-reference-files
provider_type: inline::localfs
config:
storage_dir: ${env.FILES_STORAGE_DIR:=~/.llama/distributions/starter/files}
metadata_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/files_metadata.db
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
responses_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
telemetry:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
sinks: ${env.TELEMETRY_SINKS:=console}
tool_runtime:
- provider_id: brave-search
provider_type: remote::brave-search
config:
api_key: ${env.BRAVE_SEARCH_API_KEY:+}
max_results: 3
- provider_id: tavily-search
provider_type: remote::tavily-search
config:
api_key: ${env.TAVILY_SEARCH_API_KEY:+}
max_results: 3
- provider_id: rag-runtime
provider_type: inline::rag-runtime
config: {}
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
storage:
backends:
kv_default:
type: kv_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: ${env.POSTGRES_TABLE_NAME:=llamastack_kvstore}
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
references:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
models:
- metadata:
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
- metadata: {}
model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_type: llm
- metadata: {}
model_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
provider_id: vllm-safety
model_type: llm
shields:
- shield_id: ${env.SAFETY_MODEL:=meta-llama/Llama-Guard-3-1B}
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::websearch
provider_id: tavily-search
- toolgroup_id: builtin::rag
provider_id: rag-runtime
server:
port: 8321
auth:
provider_config:
type: github_token
kind: ConfigMap
metadata:
creationTimestamp: null
name: llama-stack-config

View file

@ -52,7 +52,7 @@ spec:
value: "${SAFETY_MODEL}"
- name: TAVILY_SEARCH_API_KEY
value: "${TAVILY_SEARCH_API_KEY}"
command: ["python", "-m", "llama_stack.core.server.server", "/etc/config/stack_run_config.yaml", "--port", "8321"]
command: ["llama", "stack", "run", "/etc/config/stack_run_config.yaml", "--port", "8321"]
ports:
- containerPort: 8321
volumeMounts:

View file

@ -93,25 +93,34 @@ providers:
- provider_id: model-context-protocol
provider_type: remote::model-context-protocol
config: {}
metadata_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: llamastack_kvstore
inference_store:
type: postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
storage:
backends:
kv_default:
type: kv_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
table_name: ${env.POSTGRES_TABLE_NAME:=llamastack_kvstore}
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST:=localhost}
port: ${env.POSTGRES_PORT:=5432}
db: ${env.POSTGRES_DB:=llamastack}
user: ${env.POSTGRES_USER:=llamastack}
password: ${env.POSTGRES_PASSWORD:=llamastack}
references:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
models:
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
embedding_dimension: 768
model_id: nomic-embed-text-v1.5
provider_id: sentence-transformers
model_type: embedding
- metadata: {}

View file

@ -59,7 +59,7 @@ Start a Llama Stack server on localhost. Here is an example of how you can do th
uv venv starter --python 3.12
source starter/bin/activate # On Windows: starter\Scripts\activate
pip install --no-cache llama-stack==0.2.2
llama stack build --distro starter --image-type venv
llama stack list-deps starter | xargs -L1 uv pip install
export FIREWORKS_API_KEY=<SOME_KEY>
llama stack run starter --port 5050
```

View file

@ -69,10 +69,10 @@ docker run \
-it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-e WATSONX_API_KEY=$WATSONX_API_KEY \
-e WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
-e WATSONX_BASE_URL=$WATSONX_BASE_URL \
llamastack/distribution-watsonx \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env WATSONX_API_KEY=$WATSONX_API_KEY \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
--env WATSONX_BASE_URL=$WATSONX_BASE_URL
--port $LLAMA_STACK_PORT
```

View file

@ -102,7 +102,7 @@ You can start a chroma-db easily using docker.
# This is where the indices are persisted
mkdir -p $HOME/chromadb
podman run --rm -it \
docker run --rm -it \
--network host \
--name chromadb \
-v $HOME/chromadb:/chroma/chroma \
@ -127,13 +127,13 @@ docker run -it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $HOME/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes else not needed
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \
-v $HOME/git/llama-stack:/app/llama-stack-source -v $HOME/git/llama-models:/app/llama-models-source \
# localhost/distribution-dell:dev if building / testing locally
llamastack/distribution-dell\
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e DEH_URL=$DEH_URL \
-e CHROMA_URL=$CHROMA_URL \
llamastack/distribution-dell \
--port $LLAMA_STACK_PORT
```
@ -154,37 +154,37 @@ docker run \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $HOME/.llama:/root/.llama \
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e DEH_URL=$DEH_URL \
-e SAFETY_MODEL=$SAFETY_MODEL \
-e DEH_SAFETY_URL=$DEH_SAFETY_URL \
-e CHROMA_URL=$CHROMA_URL \
llamastack/distribution-dell \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env SAFETY_MODEL=$SAFETY_MODEL \
--env DEH_SAFETY_URL=$DEH_SAFETY_URL \
--env CHROMA_URL=$CHROMA_URL
--port $LLAMA_STACK_PORT
```
### Via venv
Make sure you have done `pip install llama-stack` and have the Llama Stack CLI available.
Install the distribution dependencies before launching:
```bash
llama stack build --distro dell --image-type venv
llama stack run dell
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
llama stack list-deps dell | xargs -L1 uv pip install
INFERENCE_MODEL=$INFERENCE_MODEL \
DEH_URL=$DEH_URL \
CHROMA_URL=$CHROMA_URL \
llama stack run dell \
--port $LLAMA_STACK_PORT
```
If you are using Llama Stack Safety / Shield APIs, use:
```bash
INFERENCE_MODEL=$INFERENCE_MODEL \
DEH_URL=$DEH_URL \
SAFETY_MODEL=$SAFETY_MODEL \
DEH_SAFETY_URL=$DEH_SAFETY_URL \
CHROMA_URL=$CHROMA_URL \
llama stack run ./run-with-safety.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env SAFETY_MODEL=$SAFETY_MODEL \
--env DEH_SAFETY_URL=$DEH_SAFETY_URL \
--env CHROMA_URL=$CHROMA_URL
--port $LLAMA_STACK_PORT
```

View file

@ -21,8 +21,7 @@ The `llamastack/distribution-meta-reference-gpu` distribution consists of the fo
| inference | `inline::meta-reference` |
| safety | `inline::llama-guard` |
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
| telemetry | `inline::meta-reference` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `remote::model-context-protocol` |
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
@ -41,31 +40,7 @@ The following environment variables can be configured:
## Prerequisite: Downloading Models
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
```
$ llama model list --downloaded
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Model ┃ Size ┃ Modified Time ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ Llama3.2-1B-Instruct:int4-qlora-eo8 │ 1.53 GB │ 2025-02-26 11:22:28 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-1B │ 2.31 GB │ 2025-02-18 21:48:52 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Prompt-Guard-86M │ 0.02 GB │ 2025-02-26 11:29:28 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB │ 2025-02-26 11:37:41 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-3B │ 5.99 GB │ 2025-02-18 21:51:26 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.1-8B │ 14.97 GB │ 2025-02-16 10:36:37 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB │ 2025-02-26 11:35:02 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama-Guard-3-1B │ 2.80 GB │ 2025-02-26 11:20:46 │
├─────────────────────────────────────────┼──────────┼─────────────────────┤
│ Llama-Guard-3-1B:int4 │ 0.43 GB │ 2025-02-26 11:33:33 │
└─────────────────────────────────────────┴──────────┴─────────────────────┘
Please check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models using the Hugging Face CLI.
```
## Running the Distribution
@ -84,9 +59,9 @@ docker run \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--port $LLAMA_STACK_PORT
```
If you are using Llama Stack Safety / Shield APIs, use:
@ -98,28 +73,28 @@ docker run \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
-e SAFETY_MODEL=meta-llama/Llama-Guard-3-1B \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
--port $LLAMA_STACK_PORT
```
### Via venv
Make sure you have done `uv pip install llama-stack` and have the Llama Stack CLI available.
Make sure you have the Llama Stack CLI available.
```bash
llama stack build --distro meta-reference-gpu --image-type venv
llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llama stack run distributions/meta-reference-gpu/run.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--port 8321
```
If you are using Llama Stack Safety / Shield APIs, use:
```bash
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
SAFETY_MODEL=meta-llama/Llama-Guard-3-1B \
llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
--port 8321
```

View file

@ -16,8 +16,7 @@ The `llamastack/distribution-nvidia` distribution consists of the following prov
| post_training | `remote::nvidia` |
| safety | `remote::nvidia` |
| scoring | `inline::basic` |
| telemetry | `inline::meta-reference` |
| tool_runtime | `inline::rag-runtime` |
| tool_runtime | |
| vector_io | `inline::faiss` |
@ -129,23 +128,23 @@ docker run \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-e NVIDIA_API_KEY=$NVIDIA_API_KEY \
llamastack/distribution-nvidia \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
--port $LLAMA_STACK_PORT
```
### Via venv
If you've set up your local development environment, you can also build the image using your local virtual environment.
If you've set up your local development environment, you can also install the distribution dependencies using your local virtual environment.
```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
llama stack build --distro nvidia --image-type venv
llama stack list-deps nvidia | xargs -L1 uv pip install
NVIDIA_API_KEY=$NVIDIA_API_KEY \
INFERENCE_MODEL=$INFERENCE_MODEL \
llama stack run ./run.yaml \
--port 8321 \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
--env INFERENCE_MODEL=$INFERENCE_MODEL
--port 8321
```
## Example Notebooks

View file

@ -119,7 +119,7 @@ The following environment variables can be configured:
### Telemetry Configuration
- `OTEL_SERVICE_NAME`: OpenTelemetry service name
- `TELEMETRY_SINKS`: Telemetry sinks (default: `console,sqlite`)
- `TELEMETRY_SINKS`: Telemetry sinks (default: `[]`)
## Enabling Providers
@ -169,7 +169,11 @@ docker run \
Ensure you have configured the starter distribution using the environment variables explained above.
```bash
uv run --with llama-stack llama stack build --distro starter --image-type venv --run
# Install dependencies for the starter distribution
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
# Run the server
uv run --with llama-stack llama stack run starter
```
## Example Usage
@ -216,7 +220,6 @@ The starter distribution uses SQLite for local storage of various components:
- **Files metadata**: `~/.llama/distributions/starter/files_metadata.db`
- **Agents store**: `~/.llama/distributions/starter/agents_store.db`
- **Responses store**: `~/.llama/distributions/starter/responses_store.db`
- **Trace store**: `~/.llama/distributions/starter/trace_store.db`
- **Evaluation store**: `~/.llama/distributions/starter/meta_reference_eval.db`
- **Dataset I/O stores**: Various HuggingFace and local filesystem stores

View file

@ -23,6 +23,17 @@ Another simple way to start interacting with Llama Stack is to just spin up a co
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](../deploying/kubernetes_deployment) for more details.
## Configure logging
Control log output via environment variables before starting the server.
- `LLAMA_STACK_LOGGING` sets per-component levels, e.g. `LLAMA_STACK_LOGGING=server=debug;core=info`.
- Supported categories: `all`, `core`, `server`, `router`, `inference`, `agents`, `safety`, `eval`, `tools`, `client`.
- Levels: `debug`, `info`, `warning`, `error`, `critical` (default is `info`). Use `all=<level>` to apply globally.
- `LLAMA_STACK_LOG_FILE=/path/to/log` mirrors logs to a file while still printing to stdout.
Export these variables prior to running `llama stack run`, launching a container, or starting the server through any other pathway.
```{toctree}
:maxdepth: 1
:hidden:

View file

@ -58,15 +58,19 @@ Llama Stack is a server that exposes multiple APIs, you connect with it using th
<Tabs>
<TabItem value="venv" label="Using venv">
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
You can use Python to install dependencies and run the Llama Stack server, which is useful for testing and development.
Llama Stack uses a [YAML configuration file](../distributions/configuration) to specify the stack setup,
which defines the providers and their settings. The generated configuration serves as a starting point that you can [customize for your specific needs](../distributions/customizing_run_yaml).
Now let's build and run the Llama Stack config for Ollama.
Now let's install dependencies and run the Llama Stack config for Ollama.
We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables.
```bash
llama stack build --distro starter --image-type venv --run
# Install dependencies for the starter distribution
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
# Run the server
llama stack run starter
```
</TabItem>
<TabItem value="container" label="Using a Container">
@ -86,9 +90,9 @@ docker run -it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e OLLAMA_URL=http://host.docker.internal:11434 \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT \
--env OLLAMA_URL=http://host.docker.internal:11434
--port $LLAMA_STACK_PORT
```
Note to start the container with Podman, you can do the same but replace `docker` at the start of the command with
`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL`
@ -106,9 +110,9 @@ docker run -it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
--network=host \
-e OLLAMA_URL=http://localhost:11434 \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT \
--env OLLAMA_URL=http://localhost:11434
--port $LLAMA_STACK_PORT
```
:::
You will see output like below:
@ -164,7 +168,7 @@ Available Models
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ model_type ┃ identifier ┃ provider_resource_id ┃ metadata ┃ provider_id ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ embedding │ ollama/all-minilm:l6-v2 │ all-minilm:l6-v2 │ {'embedding_dimension': 384.0} │ ollama │
│ embedding │ ollama/nomic-embed-text:v1.5 │ nomic-embed-text:v1.5 │ {'embedding_dimension': 768.0} │ ollama │
├─────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────┼───────────────────────┤
│ ... │ ... │ ... │ │ ... │
├─────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────┼───────────────────────┤
@ -304,7 +308,7 @@ stream = agent.create_turn(
for event in AgentEventLogger().log(stream):
event.print()
```
### ii. Run the Script
#### ii. Run the Script
Let's run the script using `uv`
```bash
uv run python agent.py

View file

@ -24,10 +24,13 @@ ollama run llama3.2:3b --keepalive 60m
#### Step 2: Run the Llama Stack server
We will use `uv` to run the Llama Stack server.
We will use `uv` to install dependencies and run the Llama Stack server.
```bash
OLLAMA_URL=http://localhost:11434 \
uv run --with llama-stack llama stack build --distro starter --image-type venv --run
# Install dependencies for the starter distribution
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
# Run the server
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter
```
#### Step 3: Run the demo
Now open up a new terminal and copy the following script into a file named `demo_script.py`.

View file

@ -14,13 +14,13 @@ Llama Stack is the open-source framework for building generative AI applications
:::tip Llama 4 is here!
Check out [Getting Started with Llama 4](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started_llama4.ipynb)
Check out [Getting Started with Llama 4](https://colab.research.google.com/github/llamastack/llama-stack/blob/main/docs/getting_started_llama4.ipynb)
:::
:::tip News
Llama Stack is now available! See the [release notes](https://github.com/meta-llama/llama-stack/releases) for more details.
Llama Stack is now available! See the [release notes](https://github.com/llamastack/llama-stack/releases) for more details.
:::

View file

@ -14,16 +14,18 @@ Meta's reference implementation of an agent system that can use tools, access ve
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `persistence_store` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `responses_store` | `utils.sqlstore.sqlstore.SqliteSqlStoreConfig \| utils.sqlstore.sqlstore.PostgresSqlStoreConfig` | No | sqlite | |
| `persistence` | `<class 'inline.agents.meta_reference.config.AgentPersistenceConfig'>` | No | | |
## Sample Configuration
```yaml
persistence_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/agents_store.db
responses_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/responses_store.db
persistence:
agent_state:
namespace: agents
backend: kv_default
responses:
table_name: responses
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
```

View file

@ -14,7 +14,7 @@ Reference implementation of batches API with KVStore persistence.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Configuration for the key-value store backend. |
| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | Configuration for the key-value store backend. |
| `max_concurrent_batches` | `<class 'int'>` | No | 1 | Maximum number of concurrent batches to process simultaneously. |
| `max_concurrent_requests_per_batch` | `<class 'int'>` | No | 10 | Maximum number of concurrent requests to process per batch. |
@ -22,6 +22,6 @@ Reference implementation of batches API with KVStore persistence.
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/batches.db
namespace: batches
backend: kv_default
```

View file

@ -14,12 +14,12 @@ Local filesystem-based dataset I/O provider for reading and writing datasets to
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/localfs_datasetio.db
namespace: datasetio::localfs
backend: kv_default
```

View file

@ -14,12 +14,12 @@ HuggingFace datasets provider for accessing and managing datasets from the Huggi
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/huggingface_datasetio.db
namespace: datasetio::huggingface
backend: kv_default
```

View file

@ -1,5 +1,7 @@
---
description: "Llama Stack Evaluation API for running evaluations on model and agent candidates."
description: "Evaluations
Llama Stack Evaluation API for running evaluations on model and agent candidates."
sidebar_label: Eval
title: Eval
---
@ -8,6 +10,8 @@ title: Eval
## Overview
Llama Stack Evaluation API for running evaluations on model and agent candidates.
Evaluations
Llama Stack Evaluation API for running evaluations on model and agent candidates.
This section contains documentation for all available providers for the **eval** API.

View file

@ -14,12 +14,12 @@ Meta's reference implementation of evaluation tasks with support for multiple la
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `kvstore` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/meta_reference_eval.db
namespace: eval
backend: kv_default
```

View file

@ -11,38 +11,6 @@ an example entry in your build.yaml should look like:
module: ramalama_stack
```
Additionally you can configure the `external_providers_dir` in your Llama Stack configuration. This method is in the process of being deprecated in favor of the `module` method. If using this method, the external provider directory should contain your external provider specifications:
```yaml
external_providers_dir: ~/.llama/providers.d/
```
## Directory Structure
The external providers directory should follow this structure:
```
providers.d/
remote/
inference/
custom_ollama.yaml
vllm.yaml
vector_io/
qdrant.yaml
safety/
llama-guard.yaml
inline/
inference/
custom_ollama.yaml
vllm.yaml
vector_io/
qdrant.yaml
safety/
llama-guard.yaml
```
Each YAML file in these directories defines a provider specification for that particular API.
## Provider Types
Llama Stack supports two types of external providers:
@ -50,30 +18,37 @@ Llama Stack supports two types of external providers:
1. **Remote Providers**: Providers that communicate with external services (e.g., cloud APIs)
2. **Inline Providers**: Providers that run locally within the Llama Stack process
### Provider Specification (Common between inline and remote providers)
- `provider_type`: The type of the provider to be installed (remote or inline). eg. `remote::ollama`
- `api`: The API for this provider, eg. `inference`
- `config_class`: The full path to the configuration class
- `module`: The Python module containing the provider implementation
- `optional_api_dependencies`: List of optional Llama Stack APIs that this provider can use
- `api_dependencies`: List of Llama Stack APIs that this provider depends on
- `provider_data_validator`: Optional validator for provider data.
- `pip_packages`: List of Python packages required by the provider
### Remote Provider Specification
Remote providers are used when you need to communicate with external services. Here's an example for a custom Ollama provider:
```yaml
adapter:
adapter_type: custom_ollama
pip_packages:
- ollama
- aiohttp
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
adapter_type: custom_ollama
provider_type: "remote::ollama"
pip_packages:
- ollama
- aiohttp
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
```
#### Adapter Configuration
#### Remote Provider Configuration
The `adapter` section defines how to load and configure the provider:
- `adapter_type`: A unique identifier for this adapter
- `pip_packages`: List of Python packages required by the provider
- `config_class`: The full path to the configuration class
- `module`: The Python module containing the provider implementation
- `adapter_type`: A unique identifier for this adapter, eg. `ollama`
### Inline Provider Specification
@ -81,6 +56,7 @@ Inline providers run locally within the Llama Stack process. Here's an example f
```yaml
module: llama_stack_vector_provider
provider_type: inline::llama_stack_vector_provider
config_class: llama_stack_vector_provider.config.VectorStoreConfig
pip_packages:
- faiss-cpu
@ -95,12 +71,6 @@ container_image: custom-vector-store:latest # optional
#### Inline Provider Fields
- `module`: The Python module containing the provider implementation
- `config_class`: The full path to the configuration class
- `pip_packages`: List of Python packages required by the provider
- `api_dependencies`: List of Llama Stack APIs that this provider depends on
- `optional_api_dependencies`: List of optional Llama Stack APIs that this provider can use
- `provider_data_validator`: Optional validator for provider data
- `container_image`: Optional container image to use instead of pip packages
## Required Fields
@ -113,20 +83,17 @@ All providers must contain a `get_provider_spec` function in their `provider` mo
from llama_stack.providers.datatypes import (
ProviderSpec,
Api,
AdapterSpec,
remote_provider_spec,
RemoteProviderSpec,
)
def get_provider_spec() -> ProviderSpec:
return remote_provider_spec(
return RemoteProviderSpec(
api=Api.inference,
adapter=AdapterSpec(
adapter_type="ramalama",
pip_packages=["ramalama>=0.8.5", "pymilvus"],
config_class="ramalama_stack.config.RamalamaImplConfig",
module="ramalama_stack",
),
adapter_type="ramalama",
pip_packages=["ramalama>=0.8.5", "pymilvus"],
config_class="ramalama_stack.config.RamalamaImplConfig",
module="ramalama_stack",
)
```
@ -197,18 +164,16 @@ information. Execute the test for the Provider type you are developing.
If your external provider isn't being loaded:
1. Check that `module` points to a published pip package with a top level `provider` module including `get_provider_spec`.
1. Check that the `external_providers_dir` path is correct and accessible.
2. Verify that the YAML files are properly formatted.
3. Ensure all required Python packages are installed.
4. Check the Llama Stack server logs for any error messages - turn on debug logging to get more
information using `LLAMA_STACK_LOGGING=all=debug`.
5. Verify that the provider package is installed in your Python environment if using `external_providers_dir`.
## Examples
### Example using `external_providers_dir`: Custom Ollama Provider
### How to create an external provider module
Here's a complete example of creating and using a custom Ollama provider:
If you are creating a new external provider called `llama-stack-provider-ollama` here is how you would set up the package properly:
1. First, create the provider package:
@ -230,33 +195,28 @@ requires-python = ">=3.12"
dependencies = ["llama-stack", "pydantic", "ollama", "aiohttp"]
```
3. Create the provider specification:
```yaml
# ~/.llama/providers.d/remote/inference/custom_ollama.yaml
adapter:
adapter_type: custom_ollama
pip_packages: ["ollama", "aiohttp"]
config_class: llama_stack_provider_ollama.config.OllamaImplConfig
module: llama_stack_provider_ollama
api_dependencies: []
optional_api_dependencies: []
```
4. Install the provider:
3. Install the provider:
```bash
uv pip install -e .
```
5. Configure Llama Stack to use external providers:
4. Edit `provider.py`
```yaml
external_providers_dir: ~/.llama/providers.d/
provider.py must be updated to contain `get_provider_spec`. This is used by llama stack to install the provider.
```python
def get_provider_spec() -> ProviderSpec:
return RemoteProviderSpec(
api=Api.inference,
adapter_type="llama-stack-provider-ollama",
pip_packages=["ollama", "aiohttp"],
config_class="llama_stack_provider_ollama.config.OllamaImplConfig",
module="llama_stack_provider_ollama",
)
```
The provider will now be available in Llama Stack with the type `remote::custom_ollama`.
5. Implement the provider as outlined above with `get_provider_impl` or `get_adapter_impl`, etc.
### Example using `module`: ramalama-stack
@ -275,12 +235,11 @@ distribution_spec:
module: ramalama_stack==0.3.0a0
image_type: venv
image_name: null
external_providers_dir: null
additional_pip_packages:
- aiosqlite
- sqlalchemy[asyncio]
```
No other steps are required other than `llama stack build` and `llama stack run`. The build process will use `module` to install all of the provider dependencies, retrieve the spec, etc.
No other steps are required beyond installing dependencies with `llama stack list-deps <distro> | xargs -L1 uv pip install` and then running `llama stack run`. The CLI will use `module` to install the provider dependencies, retrieve the spec, etc.
The provider will now be available in Llama Stack with the type `remote::ramalama`.

View file

@ -1,4 +1,7 @@
---
description: "Files
This API is used to upload documents that can be used with other Llama Stack APIs."
sidebar_label: Files
title: Files
---
@ -7,4 +10,8 @@ title: Files
## Overview
Files
This API is used to upload documents that can be used with other Llama Stack APIs.
This section contains documentation for all available providers for the **files** API.

View file

@ -15,7 +15,7 @@ Local filesystem-based file storage provider for managing files and documents lo
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `storage_dir` | `<class 'str'>` | No | | Directory to store uploaded files |
| `metadata_store` | `utils.sqlstore.sqlstore.SqliteSqlStoreConfig \| utils.sqlstore.sqlstore.PostgresSqlStoreConfig` | No | sqlite | SQL store configuration for file metadata |
| `metadata_store` | `<class 'llama_stack.core.storage.datatypes.SqlStoreReference'>` | No | | SQL store configuration for file metadata |
| `ttl_secs` | `<class 'int'>` | No | 31536000 | |
## Sample Configuration
@ -23,6 +23,6 @@ Local filesystem-based file storage provider for managing files and documents lo
```yaml
storage_dir: ${env.FILES_STORAGE_DIR:=~/.llama/dummy/files}
metadata_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/files_metadata.db
table_name: files_metadata
backend: sql_default
```

View file

@ -20,7 +20,7 @@ AWS S3-based file storage provider for scalable cloud file management with metad
| `aws_secret_access_key` | `str \| None` | No | | AWS secret access key (optional if using IAM roles) |
| `endpoint_url` | `str \| None` | No | | Custom S3 endpoint URL (for MinIO, LocalStack, etc.) |
| `auto_create_bucket` | `<class 'bool'>` | No | False | Automatically create the S3 bucket if it doesn't exist |
| `metadata_store` | `utils.sqlstore.sqlstore.SqliteSqlStoreConfig \| utils.sqlstore.sqlstore.PostgresSqlStoreConfig` | No | sqlite | SQL store configuration for file metadata |
| `metadata_store` | `<class 'llama_stack.core.storage.datatypes.SqlStoreReference'>` | No | | SQL store configuration for file metadata |
## Sample Configuration
@ -32,6 +32,6 @@ aws_secret_access_key: ${env.AWS_SECRET_ACCESS_KEY:=}
endpoint_url: ${env.S3_ENDPOINT_URL:=}
auto_create_bucket: ${env.S3_AUTO_CREATE_BUCKET:=false}
metadata_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/s3_files_metadata.db
table_name: s3_files_metadata
backend: sql_default
```

View file

@ -22,7 +22,6 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
## Provider Categories
- **[External Providers](external/index.mdx)** - Guide for building and using external providers
- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
- **[Inference](inference/index.mdx)** - LLM and embedding model providers
- **[Agents](agents/index.mdx)** - Agentic system providers
- **[DatasetIO](datasetio/index.mdx)** - Dataset and data loader providers
@ -31,3 +30,7 @@ Importantly, Llama Stack always strives to provide at least one fully inline pro
- **[Vector IO](vector_io/index.mdx)** - Vector database providers
- **[Tool Runtime](tool_runtime/index.mdx)** - Tool and protocol providers
- **[Files](files/index.mdx)** - File system and storage providers
## Other information about Providers
- **[OpenAI Compatibility](./openai.mdx)** - OpenAI API compatibility layer
- **[OpenAI-Compatible Responses Limitations](./openai_responses_limitations.mdx)** - Known limitations of the Responses API in Llama Stack

View file

@ -1,5 +1,7 @@
---
description: "Llama Stack Inference API for generating completions, chat completions, and embeddings.
description: "Inference
Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Two kinds of models are supported:
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
@ -12,7 +14,9 @@ title: Inference
## Overview
Llama Stack Inference API for generating completions, chat completions, and embeddings.
Inference
Llama Stack Inference API for generating completions, chat completions, and embeddings.
This API provides the raw interface to the underlying models. Two kinds of models are supported:
- LLM models: these models generate "raw" and "chat" (conversational) completions.

View file

@ -14,7 +14,9 @@ Anthropic inference provider for accessing Claude models and Anthropic's AI serv
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | API key for Anthropic models |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
## Sample Configuration

View file

@ -21,7 +21,9 @@ https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `<class 'pydantic.types.SecretStr'>` | No | | Azure API key for Azure |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `api_base` | `<class 'pydantic.networks.HttpUrl'>` | No | | Azure API base for Azure (e.g., https://your-resource-name.openai.azure.com) |
| `api_version` | `str \| None` | No | | Azure API version for Azure (e.g., 2024-12-01-preview) |
| `api_type` | `str \| None` | No | azure | Azure API type for Azure (e.g., azure) |

View file

@ -14,6 +14,8 @@ AWS Bedrock inference provider for accessing various AI models through AWS's man
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |

View file

@ -14,8 +14,10 @@ Cerebras inference provider for running models on Cerebras Cloud platform.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `base_url` | `<class 'str'>` | No | https://api.cerebras.ai | Base URL for the Cerebras API |
| `api_key` | `<class 'pydantic.types.SecretStr'>` | No | | Cerebras API Key |
## Sample Configuration

View file

@ -14,8 +14,10 @@ Databricks inference provider for running models on Databricks' unified analytic
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `url` | `<class 'str'>` | No | | The URL for the Databricks model serving endpoint |
| `api_token` | `<class 'pydantic.types.SecretStr'>` | No | | The Databricks API token |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_token` | `pydantic.types.SecretStr \| None` | No | | The Databricks API token |
| `url` | `str \| None` | No | | The URL for the Databricks model serving endpoint |
## Sample Configuration

View file

@ -15,8 +15,9 @@ Fireworks AI inference provider for Llama models and other AI models on the Fire
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | https://api.fireworks.ai/inference/v1 | The URL for the Fireworks server |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Fireworks.ai API Key |
## Sample Configuration

View file

@ -14,7 +14,9 @@ Google Gemini inference provider for accessing Gemini models and Google's AI ser
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | API key for Gemini models |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
## Sample Configuration

View file

@ -14,7 +14,9 @@ Groq inference provider for ultra-fast inference using Groq's LPU technology.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | The Groq API key |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | https://api.groq.com | The URL for the Groq AI server |
## Sample Configuration

View file

@ -14,7 +14,9 @@ Llama OpenAI-compatible provider for using Llama models with OpenAI API format.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | The Llama API key |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `openai_compat_api_base` | `<class 'str'>` | No | https://api.llama.com/compat/v1/ | The URL for the Llama API server |
## Sample Configuration

View file

@ -14,8 +14,10 @@ NVIDIA inference provider for accessing NVIDIA NIM models and AI services.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | https://integrate.api.nvidia.com | A base url for accessing the NVIDIA NIM |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The NVIDIA API key, only needed of using the hosted service |
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
| `append_api_version` | `<class 'bool'>` | No | True | When set to false, the API version will not be appended to the base_url. By default, it is true. |

View file

@ -14,8 +14,9 @@ Ollama inference provider for running local models through the Ollama runtime.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `url` | `<class 'str'>` | No | http://localhost:11434 | |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
## Sample Configuration

View file

@ -14,7 +14,9 @@ OpenAI inference provider for accessing GPT models and other OpenAI services.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `api_key` | `str \| None` | No | | API key for OpenAI models |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `base_url` | `<class 'str'>` | No | https://api.openai.com/v1 | Base URL for OpenAI API |
## Sample Configuration

View file

@ -14,8 +14,10 @@ Passthrough inference provider for connecting to any external inference service
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | API Key for the passthrouth endpoint |
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
## Sample Configuration

View file

@ -14,8 +14,10 @@ RunPod inference provider for running models on RunPod's cloud GPU platform.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_token` | `pydantic.types.SecretStr \| None` | No | | The API token |
| `url` | `str \| None` | No | | The URL for the Runpod model serving endpoint |
| `api_token` | `str \| None` | No | | The API token |
## Sample Configuration

View file

@ -14,8 +14,10 @@ SambaNova inference provider for running models on SambaNova's dataflow architec
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | https://api.sambanova.ai/v1 | The URL for the SambaNova AI server |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The SambaNova cloud API Key |
## Sample Configuration

View file

@ -14,6 +14,8 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `url` | `<class 'str'>` | No | | The URL for the TGI serving endpoint |
## Sample Configuration

View file

@ -15,8 +15,9 @@ Together AI inference provider for open-source models and collaborative AI devel
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | https://api.together.xyz/v1 | The URL for the Together AI server |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The Together AI API Key |
## Sample Configuration

View file

@ -53,6 +53,8 @@ Available Models:
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `project` | `<class 'str'>` | No | | Google Cloud project ID for Vertex AI |
| `location` | `<class 'str'>` | No | us-central1 | Google Cloud location for Vertex AI |

View file

@ -14,11 +14,12 @@ Remote vLLM inference provider for connecting to vLLM servers.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_token` | `pydantic.types.SecretStr \| None` | No | | The API token |
| `url` | `str \| None` | No | | The URL for the vLLM model serving endpoint |
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
| `api_token` | `str \| None` | No | fake | The API token |
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
## Sample Configuration

View file

@ -14,9 +14,11 @@ IBM WatsonX inference provider for accessing AI models on IBM's WatsonX platform
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | https://us-south.ml.cloud.ibm.com | A base url for accessing the watsonx.ai |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | The watsonx API key |
| `project_id` | `str \| None` | No | | The Project ID key |
| `project_id` | `str \| None` | No | | The watsonx.ai project ID |
| `timeout` | `<class 'int'>` | No | 60 | Timeout for the HTTP requests |
## Sample Configuration

View file

@ -1,3 +1,4 @@
---
title: OpenAI Compatibility
description: OpenAI API Compatibility
sidebar_label: OpenAI Compatibility
@ -47,7 +48,7 @@ models = client.models.list()
#### Responses
> **Note:** The Responses API implementation is still in active development. While it is quite usable, there are still unimplemented parts of the API. We'd love feedback on any use-cases you try that do not work to help prioritize the pieces left to implement. Please open issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work.
> **Note:** The Responses API implementation is still in active development. While it is quite usable, there are still unimplemented parts of the API. See [Known Limitations of the OpenAI-compatible Responses API in Llama Stack](./openai_responses_limitations.mdx) for more details.
##### Simple inference

View file

@ -0,0 +1,301 @@
---
title: Known Limitations of the OpenAI-compatible Responses API in Llama Stack
description: Limitations of Responses API
sidebar_label: Limitations of Responses API
sidebar_position: 1
---
## Unresolved Issues
This document outlines known limitations and inconsistencies between Llama Stack's Responses API and OpenAI's Responses API. This comparison is based on OpenAI's API and reflects a comparison with the OpenAI APIs as of October 6, 2025 (OpenAI's client version `openai==1.107`).
See the OpenAI [changelog](https://platform.openai.com/docs/changelog) for details of any new functionality that has been added since that date. Links to issues are included so readers can read about status, post comments, and/or subscribe for updates relating to any limitations that are of specific interest to them. We would also love any other feedback on any use-cases you try that do not work to help prioritize the pieces left to implement.
Please open new issues in the [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) GitHub repository with details of anything that does not work that does not already have an open issue.
### Instructions
**Status:** Partial Implementation + Work in Progress
**Issue:** [#3566](https://github.com/llamastack/llama-stack/issues/3566)
In Llama Stack, the instructions parameter is already implemented for creating a response, but it is not yet included in the output response object.
---
### Streaming
**Status:** Partial Implementation
**Issue:** [#2364](https://github.com/llamastack/llama-stack/issues/2364)
Streaming functionality for the Responses API is partially implemented and does work to some extent, but some streaming response objects that would be needed for full compatibility are still missing.
---
### Prompt Templates
**Status:** Partial Implementation
**Issue:** [#3321](https://github.com/llamastack/llama-stack/issues/3321)
OpenAI's platform supports [templated prompts using a structured language](https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts). These templates can be stored server-side for organizational sharing. This feature is under development for Llama Stack.
---
### Web-search tool compatibility
**Status:** Partial Implementation
Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create) for web search tool in a Responses tool list says:
> The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.
In contrast, the [Llama Stack documentation](https://llamastack.github.io/docs/api/create-a-new-open-ai-response) says that the allowed values for `type` for web search are `MOD1`, `MOD2` and `MOD3`.
Is that correct? If so, what are the meanings of each of them? It might make sense for the allowed values for OpenAI map to some values for Llama Stack so that code written to the OpenAI specification
also work with Llama Stack.
The OpenAI web search tool also has fields for `filters` and `user_location` which are not documented as options for Llama Stack. If feasible, it would be good to support these too.
---
### Other built-in Tools
**Status:** Partial Implementation
OpenAI's Responses API includes an ecosystem of built-in tools (e.g., code interpreter) that lower the barrier to entry for agentic workflows. These tools are typically aligned with specific model training.
**Current Status in Llama Stack:**
- Some built-in tools exist (file search, web search)
- Missing tools include code interpreter, computer use, and image generation
- Some built-in tools may require additional APIs (e.g., [containers API](https://platform.openai.com/docs/api-reference/containers) for code interpreter)
It's unclear whether there is demand for additional built-in tools in Llama Stack. No upstream issues have been filed for adding more built-in tools.
---
### Response Branching
**Status:** Not Working
Response branching, as discussed in the [Agents vs OpenAI Responses API documentation](https://llamastack.github.io/docs/building_applications/responses_vs_agents), is not currently functional.
---
### Include
**Status:** Not Implemented
The `include` parameter allows you to provide a list of values that indicate additional information for the system to include in the model response. The [OpenAI API](https://platform.openai.com/docs/api-reference/responses/create) specifies the following allowed values for this parameter.
- `web_search_call.action.sources`
- `code_interpreter_call.outputs`
- `computer_call_output.output.image_url`
- `file_search_call.results`
- `message.input_image.image_url`
- `message.output_text.logprobs`
- `reasoning.encrypted_content`
Some of these are not relevant to Llama Stack in its current form. For example, code interpreter is not implemented (see "Built-in tools" below), so `code_interpreter_call.outputs` would not be a useful directive to Llama Stack.
However, others might be useful. For example, `message.output_text.logprobs` can be useful for assessing how confident a model is in each token of its output.
---
### Tool Choice
**Status:** Not Implemented
**Issue:** [#3548](https://github.com/llamastack/llama-stack/issues/3548)
In OpenAI's API, the `tool_choice` parameter allows you to set restrictions or requirements for which tools should be used when generating a response. This feature is not implemented in Llama Stack.
---
### Safety Identification and Tracking
**Status:** Not Implemented
OpenAI's platform allows users to track agentic users using a safety identifier passed with each response. When requests violate moderation or safety rules, account holders are alerted and automated actions can be taken. This capability is not currently available in Llama Stack.
---
### Connectors
**Status:** Not Implemented
Connectors are MCP servers maintained and managed by the Responses API provider. OpenAI has documented their connectors at [https://platform.openai.com/docs/guides/tools-connectors-mcp](https://platform.openai.com/docs/guides/tools-connectors-mcp).
**Open Questions:**
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
- Should there be a mechanism for administrators to add custom connectors via `run.yaml` or an API?
---
### Reasoning
**Status:** Partially Implemented
The `reasoning` object in the output of Responses works for inference providers such as vLLM that output reasoning traces in chat completions requests. It does not work for other providers such as OpenAI's hosted service. See [#3551](https://github.com/llamastack/llama-stack/issues/3551) for more details.
---
### Service Tier
**Status:** Not Implemented
**Issue:** [#3550](https://github.com/llamastack/llama-stack/issues/3550)
Responses has a field `service_tier` that can be used to prioritize access to inference resources. Not all inference providers have such a concept, but Llama Stack pass through this value for those providers that do. Currently it does not.
---
### Top Logprobs
**Status:** Not Implemented
**Issue:** [#3552](https://github.com/llamastack/llama-stack/issues/3552)
The `top_logprobs` parameter from OpenAI's Responses API extends the functionality obtained by including `message.output_text.logprobs` in the `include` parameter list (as discussed in the Include section above).
It enables users to also get logprobs for alternative tokens.
---
### Max Tool Calls
**Status:** Not Implemented
**Issue:** [#3563](https://github.com/llamastack/llama-stack/issues/3563)
The Responses API can accept a `max_tool_calls` parameter that limits the number of tool calls allowed to be executed for a given response. This feature needs full implementation and documentation.
---
### Max Output Tokens
**Status:** Not Implemented
**Issue:** [#3562](https://github.com/llamastack/llama-stack/issues/3562)
The `max_output_tokens` field limits how many tokens the model is allowed to generate (for both reasoning and output combined). It is not implemented in Llama Stack.
---
### Incomplete Details
**Status:** Not Implemented
**Issue:** [#3567](https://github.com/llamastack/llama-stack/issues/3567)
The return object from a call to Responses includes a field for indicating why a response is incomplete if it is. For example, if the model stops generating because it has reached the specified max output tokens (see above), this field should be set to "IncompleteDetails(reason='max_output_tokens')". This is not implemented in Llama Stack.
---
### Metadata
**Status:** Not Implemented
**Issue:** [#3564](https://github.com/llamastack/llama-stack/issues/3564)
Metadata allows you to attach additional information to a response for your own reference and tracking. It is not implemented in Llama Stack.
---
### Background
**Status:** Not Implemented
**Issue:** [#3568](https://github.com/llamastack/llama-stack/issues/3568)
[Background mode](https://platform.openai.com/docs/guides/background) in OpenAI Responses lets you start a response generation job and then check back in on it later. This is useful if you might lose a connection during a generation and want to reconnect later and get the response back (for example if the client is running in a mobile app). It is not implemented in Llama Stack.
---
### Global Guardrails
**Status:** Feature Request
When calling the OpenAI Responses API, model outputs go through safety models configured by OpenAI administrators. Perhaps Llama Stack should provide a mechanism to configure safety models (or non-model logic) for all Responses requests, either through `run.yaml` or an administrative API.
---
### User-Controlled Guardrails
**Status:** Feature Request
**Issue:** [#3325](https://github.com/llamastack/llama-stack/issues/3325)
OpenAI has not released a way for users to configure their own guardrails. However, Llama Stack users may want this capability to complement or replace global guardrails. This could be implemented as a non-breaking, additive difference from the OpenAI API.
---
### MCP Elicitations
**Status:** Unknown
Elicitations allow MCP servers to request additional information from users through the client during interactions (e.g., a tool requesting a username before proceeding).
See the [MCP specification](https://modelcontextprotocol.io/specification/draft/client/elicitation) for details.
**Open Questions:**
- Does this work in OpenAI's Responses API reference implementation?
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
- Does this work in Llama Stack?
---
### MCP Sampling
**Status:** Unknown
Sampling allows MCP tools to query the generative AI model. See the [MCP specification](https://modelcontextprotocol.io/specification/draft/client/sampling) for details.
**Open Questions:**
- Does this work in OpenAI's Responses API reference implementation?
- If not, is there a reasonable way to make that work within the API as is? Or would the API need to change?
- Does this work in Llama Stack?
### Prompt Caching
**Status:** Unknown
OpenAI provides a [prompt caching](https://platform.openai.com/docs/guides/prompt-caching) mechanism in Responses that is enabled for its most recent models.
**Open Questions:**
- Does this work in Llama Stack?
- If not, is there a reasonable way to make that work for those inference providers that have this capability by passing through the provided `prompt_cache_key` to the inference provider?
- Is there a reasonable way to make that work for inference providers that don't build in this capability by doing some sort of caching at the Llama Stack layer?
---
### Parallel Tool Calls
**Status:** Rumored Issue
There are reports that `parallel_tool_calls` may not work correctly. This needs verification and a ticket should be opened if confirmed.
---
## Resolved Issues
The following limitations have been addressed in recent releases:
### MCP and Function Tools with No Arguments
**Status:** ✅ Resolved
MCP and function tools now work correctly even when they have no arguments.
---
### `require_approval` Parameter for MCP Tools
**Status:** ✅ Resolved
The `require_approval` parameter for MCP tools in the Responses API now works correctly.
---
### MCP Tools with Array-Type Arguments
**Status:** ✅ Resolved
**Fixed in:** [#3003](https://github.com/llamastack/llama-stack/pull/3003) (Agent API), [#3602](https://github.com/llamastack/llama-stack/pull/3602) (Responses API)
MCP tools now correctly handle array-type arguments in both the Agent API and Responses API.

View file

@ -1,4 +1,7 @@
---
description: "Safety
OpenAI-compatible Moderations API."
sidebar_label: Safety
title: Safety
---
@ -7,4 +10,8 @@ title: Safety
## Overview
Safety
OpenAI-compatible Moderations API.
This section contains documentation for all available providers for the **safety** API.

View file

@ -14,6 +14,8 @@ AWS Bedrock safety provider for content moderation using AWS's safety services.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |

View file

@ -16,14 +16,12 @@ Meta's reference implementation of telemetry and observability using OpenTelemet
|-------|------|----------|---------|-------------|
| `otel_exporter_otlp_endpoint` | `str \| None` | No | | The OpenTelemetry collector endpoint URL (base URL for traces, metrics, and logs). If not set, the SDK will use OTEL_EXPORTER_OTLP_ENDPOINT environment variable. |
| `service_name` | `<class 'str'>` | No | | The service name to use for telemetry |
| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [&lt;TelemetrySink.SQLITE: 'sqlite'&gt;] | List of telemetry sinks to enable (possible values: otel_trace, otel_metric, sqlite, console) |
| `sqlite_db_path` | `<class 'str'>` | No | ~/.llama/runtime/trace_store.db | The path to the SQLite database to use for storing traces |
| `sinks` | `list[inline.telemetry.meta_reference.config.TelemetrySink` | No | [] | List of telemetry sinks to enable (possible values: otel_trace, otel_metric, console) |
## Sample Configuration
```yaml
service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
sinks: ${env.TELEMETRY_SINKS:=sqlite}
sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/trace_store.db
sinks: ${env.TELEMETRY_SINKS:=}
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
```

View file

@ -79,13 +79,13 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `db_path` | `<class 'str'>` | No | | |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | Config for KV store backend |
## Sample Configuration
```yaml
db_path: ${env.CHROMADB_PATH}
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/chroma_inline_registry.db
persistence:
namespace: vector_io::chroma
backend: kv_default
```

View file

@ -95,12 +95,12 @@ more details about Faiss in general.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/faiss_store.db
persistence:
namespace: vector_io::faiss
backend: kv_default
```

View file

@ -14,14 +14,14 @@ Meta's reference implementation of a vector database.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | |
## Sample Configuration
```yaml
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/faiss_store.db
persistence:
namespace: vector_io::faiss
backend: kv_default
```
## Deprecation Notice

View file

@ -17,14 +17,14 @@ Please refer to the remote provider documentation.
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `db_path` | `<class 'str'>` | No | | |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | Config for KV store backend (SQLite only for now) |
| `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |
## Sample Configuration
```yaml
db_path: ${env.MILVUS_DB_PATH:=~/.llama/dummy}/milvus.db
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/milvus_registry.db
persistence:
namespace: vector_io::milvus
backend: kv_default
```

View file

@ -98,13 +98,13 @@ See the [Qdrant documentation](https://qdrant.tech/documentation/) for more deta
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `path` | `<class 'str'>` | No | | |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | |
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | |
## Sample Configuration
```yaml
path: ${env.QDRANT_PATH:=~/.llama/~/.llama/dummy}/qdrant.db
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/qdrant_registry.db
persistence:
namespace: vector_io::qdrant
backend: kv_default
```

View file

@ -28,7 +28,7 @@ description: |
#### Empirical Example
Consider the histogram below in which 10,000 randomly generated strings were inserted
in batches of 100 into both Faiss and sqlite-vec using `client.tool_runtime.rag_tool.insert()`.
in batches of 100 into both Faiss and sqlite-vec.
```{image} ../../../../_static/providers/vector_io/write_time_comparison_sqlite-vec-faiss.png
:alt: Comparison of SQLite-Vec and Faiss write times
@ -233,7 +233,7 @@ Datasets that can fit in memory, frequent reads | Faiss | Optimized for speed, i
#### Empirical Example
Consider the histogram below in which 10,000 randomly generated strings were inserted
in batches of 100 into both Faiss and sqlite-vec using `client.tool_runtime.rag_tool.insert()`.
in batches of 100 into both Faiss and sqlite-vec.
```{image} ../../../../_static/providers/vector_io/write_time_comparison_sqlite-vec-faiss.png
:alt: Comparison of SQLite-Vec and Faiss write times
@ -408,13 +408,13 @@ See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) f
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `db_path` | `<class 'str'>` | No | | Path to the SQLite database file |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |
| `persistence` | `<class 'llama_stack.core.storage.datatypes.KVStoreReference'>` | No | | Config for KV store backend (SQLite only for now) |
## Sample Configuration
```yaml
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/sqlite_vec.db
kvstore:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/sqlite_vec_registry.db
persistence:
namespace: vector_io::sqlite_vec
backend: kv_default
```

Some files were not shown because too many files have changed in this diff Show more