Commit graph

1704 commits

Author SHA1 Message Date
Swapna Lekkala
4e4db89df6 fix rate limit errors 2025-10-06 10:54:12 -07:00
Swapna Lekkala
19fdf052a5 improve cancel test 2025-10-06 10:54:12 -07:00
Swapna Lekkala
fbac7cf4df improve resume and dont attach duplicate file 2025-10-06 10:54:12 -07:00
Swapna Lekkala
e2af40b4a6 add __init__ to the mixin 2025-10-06 10:54:12 -07:00
Swapna Lekkala
37b220355b add concurrent file attaching 2025-10-06 10:54:12 -07:00
Swapna Lekkala
4c14cf0747 clean up clean up 2025-10-06 10:54:12 -07:00
Swapna Lekkala
a25fe110f3 remove unwanted comments 2025-10-06 10:54:12 -07:00
Swapna Lekkala
af4c5df185 feat(api): Add vector store file batches api 2025-10-06 10:54:12 -07:00
Alexey Rybak
a8da6ba3a7
docs: API docstrings cleanup for better documentation rendering (#3661)
# What does this PR do?
* Cleans up API docstrings for better documentation rendering

<img width="2346" height="1126" alt="image"
src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1"
/>

## Test Plan
* Manual testing

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Co-authored-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com>
Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu>
Co-authored-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 10:46:33 -07:00
Matthew Farrellee
892ea759fa
chore: remove together inference adapter's custom check_model_availability (#3702)
# What does this PR do?

remove Together inference adapter's check_model_availability impl, rely
on standard impl instead


## Test Plan

ci
2025-10-06 13:28:36 -04:00
Matthew Farrellee
de9940c697
chore: disable openai_embeddings on inference=remote::llama-openai-compat (#3704)
# What does this PR do?

api.llama.com does not provide embedding models, this makes that clear


## Test Plan

ci
2025-10-06 13:27:40 -04:00
Matthew Farrellee
ae74b31ae3
chore: remove vLLM inference adapter's custom list_models (#3703)
# What does this PR do?

remove vLLM inference adapter's custom list_models impl, rely on
standard impl instead

## Test Plan

ci
2025-10-06 13:27:30 -04:00
Matthew Farrellee
d23ed26238
chore: turn OpenAIMixin into a pydantic.BaseModel (#3671)
# What does this PR do?

- implement get_api_key instead of relying on
LiteLLMOpenAIMixin.get_api_key
 - remove use of LiteLLMOpenAIMixin
 - add default initialize/shutdown methods to OpenAIMixin
 - remove __init__s to allow proper pydantic construction
- remove dead code from vllm adapter and associated / duplicate unit
tests
 - update vllm adapter to use openaimixin for model registration
 - remove ModelRegistryHelper from fireworks & together adapters
 - remove Inference from nvidia adapter
 - complete type hints on embedding_model_metadata
- allow extra fields on OpenAIMixin, for model_store, __provider_id__,
etc
 - new recordings for ollama
 - enhance the list models error handling
- update cerebras (remove cerebras-cloud-sdk) and anthropic (custom
model listing) inference adapters
 - parametrized test_inference_client_caching
- remove cerebras, databricks, fireworks, together from blanket mypy
exclude
 - removed unnecessary litellm deps

## Test Plan

ci
2025-10-06 11:33:19 -04:00
Matthew Farrellee
724dac498c
chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details (#3682)
# What does this PR do?

close the _model_cache abstraction leak

## Test Plan

ci w/ new tests
2025-10-06 09:44:33 -04:00
Charlie Doern
f00bcd9561
feat: allow for multiple external provider specs (#3341)
# What does this PR do?

when using the providers.d method of installation users could hand craft
their AdapterSpec's to use overlapping code meaning one repo could
contain an inline and remote impl. Currently installing a provider via
module does not allow for that as each repo is only allowed to have one
`get_provider_spec` method with one Spec returned

add an optional way for `get_provider_spec` to return a list of
`ProviderSpec` where each can be either an inline or remote impl.

Note: the `adapter_type` in `get_provider_spec` MUST match the
`provider_type` in the build/run yaml for this to work.

resolves #3226

## Test Plan

once this merges we need to re-enable the external provider test and
account for this functionality. Work needs to be done in the external
provider repos to support this functionality.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-06 15:26:38 +02:00
ehhuang
426cac078b
chore: use uvicorn to start llama stack server everywhere (#3625)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn
to start llama stack server which supports spawning multiple workers.

This PR enables us to launch >1 workers from `llama stack run` (will add
the parameter in a follow-up PR, keeping this PR on simplifying) by
removing the old way of launching stack server and consolidates
launching via uvicorn.run only.


## Test Plan
ran `llama stack run starter`
CI
2025-10-06 14:27:40 +02:00
dependabot[bot]
c0f0a03529
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3693)
Bumps
[react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom)
and
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom).
These dependencies needed to be updated together.
Updates `react-dom` from 19.1.1 to 19.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/releases">react-dom's
releases</a>.</em></p>
<blockquote>
<h2>19.2.0 (Oct 1, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h2>New React Features</h2>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h2>New React DOM Features</h2>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h2>Notable changes</h2>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h2>All Changes</h2>
<h3>React</h3>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h3>React DOM</h3>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
<li>Warn for using a React owned node as a Container if it also has text
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32774">#32774</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's
changelog</a>.</em></p>
<blockquote>
<h2>19.2.0 (October 1st, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h3>New React Features</h3>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h3>New React DOM Features</h3>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h3>Notable changes</h3>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h3>All Changes</h3>
<h4>React</h4>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h4>React DOM</h4>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="861811347b"><code>8618113</code></a>
Bump scheduler version (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34671">#34671</a>)</li>
<li><a
href="1bd1f01f2a"><code>1bd1f01</code></a>
Ship partial-prerendering APIs to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34633">#34633</a>)</li>
<li><a
href="2f0649a0b2"><code>2f0649a</code></a>
[Fizz] Remove <code>nonce</code> option from resume-and-prerender APIs
(<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34664">#34664</a>)</li>
<li><a
href="5667a41fe4"><code>5667a41</code></a>
Bump next prerelease version numbers (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34639">#34639</a>)</li>
<li><a
href="e08f53b182"><code>e08f53b</code></a>
Match <code>react-dom/static</code> test entrypoints and published
entrypoints (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34599">#34599</a>)</li>
<li><a
href="8bb7241f4c"><code>8bb7241</code></a>
Bump useEffectEvent to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34610">#34610</a>)</li>
<li><a
href="83c88ad470"><code>83c88ad</code></a>
Handle fabric root level fragment with compareDocumentPosition (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34533">#34533</a>)</li>
<li><a
href="68f00c901c"><code>68f00c9</code></a>
Release Activity in Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34374">#34374</a>)</li>
<li><a
href="3168e08f83"><code>3168e08</code></a>
[flags] enable opt-in for enableDefaultTransitionIndicator (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34373">#34373</a>)</li>
<li><a
href="3434ff4f4b"><code>3434ff4</code></a>
Add scrollIntoView to fragment instances (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32814">#32814</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/facebook/react/commits/v19.2.0/packages/react-dom">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/react-dom` from 19.1.9 to 19.2.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:02:31 -04:00
dependabot[bot]
91c6a8a3a3
chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui (#3694)
Bumps [next](https://github.com/vercel/next.js) from 15.5.3 to 15.5.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.4</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: ensure onRequestError is invoked when otel enabled (<a
href="https://redirect.github.com/vercel/next.js/issues/83343">#83343</a>)</li>
<li>fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)</li>
<li>[devtool] fix overlay styles are missing (<a
href="https://redirect.github.com/vercel/next.js/issues/83721">#83721</a>)</li>
<li>Turbopack: don't match dynamic pattern for node_modules packages (<a
href="https://redirect.github.com/vercel/next.js/issues/83176">#83176</a>)</li>
<li>Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/82911">#82911</a>)</li>
<li>[turbopack] Improve handling of symlink resolution errors in
track_glob and read_glob (<a
href="https://redirect.github.com/vercel/next.js/issues/83357">#83357</a>)</li>
<li>Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/82939">#82939</a>)</li>
<li>fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)</li>
<li>Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84077">#84077</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>[CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)</li>
<li>CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83745">#83745</a>)</li>
<li>docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/yiminghe"><code>@​yiminghe</code></a>, <a
href="https://github.com/huozhi"><code>@​huozhi</code></a>, <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/mischnic"><code>@​mischnic</code></a>, <a
href="https://github.com/lukesandberg"><code>@​lukesandberg</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a>, <a
href="https://github.com/leerob"><code>@​leerob</code></a>, <a
href="https://github.com/fufuShih"><code>@​fufuShih</code></a>, <a
href="https://github.com/dwrth"><code>@​dwrth</code></a>, <a
href="https://github.com/aymericzip"><code>@​aymericzip</code></a>, <a
href="https://github.com/obendev"><code>@​obendev</code></a>, <a
href="https://github.com/molebox"><code>@​molebox</code></a>, <a
href="https://github.com/OoMNoO"><code>@​OoMNoO</code></a>, <a
href="https://github.com/pontasan"><code>@​pontasan</code></a>, <a
href="https://github.com/styfle"><code>@​styfle</code></a>, <a
href="https://github.com/HondaYt"><code>@​HondaYt</code></a>, <a
href="https://github.com/ryuapp"><code>@​ryuapp</code></a>, <a
href="https://github.com/lpalmes"><code>@​lpalmes</code></a>, and <a
href="https://github.com/ijjk"><code>@​ijjk</code></a> for helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40f1d7814d"><code>40f1d78</code></a>
v15.5.4</li>
<li><a
href="cb30f0a176"><code>cb30f0a</code></a>
[backport] docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
<li><a
href="b6a32bb579"><code>b6a32bb</code></a>
[backport] [CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/84087">#84087</a>)</li>
<li><a
href="26d61f1e9a"><code>26d61f1</code></a>
[backport] Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84079">#84079</a>)</li>
<li><a
href="e11e87a547"><code>e11e87a</code></a>
[backport] fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83">#83</a>...</li>
<li><a
href="0a29888575"><code>0a29888</code></a>
[backport] fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)...</li>
<li><a
href="7a53950c13"><code>7a53950</code></a>
[backport] Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/83804">#83804</a>)</li>
<li><a
href="050bdf1ae7"><code>050bdf1</code></a>
[backport] Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/83816">#83816</a>)</li>
<li><a
href="1f6ea09f85"><code>1f6ea09</code></a>
[backport] Turbopack: Improve handling of symlink resolution errors (<a
href="https://redirect.github.com/vercel/next.js/issues/83805">#83805</a>)</li>
<li><a
href="c7d1855499"><code>c7d1855</code></a>
[backport] CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83860">#83860</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.5.3...v15.5.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.5.3&new-version=15.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:01:38 -04:00
Matthew Farrellee
351c4b98e4
chore: inference=remote::llama-openai-compat does not support /v1/completion (#3683)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m22s
## What does this PR do?

skip completion tests for inference=remote::llama-openai-compat

## Test Plan

ci
2025-10-04 11:36:48 -07:00
Ashwin Bharambe
045a0c1d57
feat(tests): implement test isolation for inference recordings (#3681)
Uses test_id in request hashes and test-scoped subdirectories to prevent
cross-test contamination. Model list endpoints exclude test_id to enable
merging recordings from different servers.

Additionally, this PR adds a `record-if-missing` mode (which we will use
instead of `record` which records everything) which is very useful.

🤖 Co-authored with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-04 11:34:18 -07:00
Ashwin Bharambe
3f36bfaeaa
chore(tests): normalize recording IDs and timestamps to reduce git diff noise (#3676)
IDs are now deterministic hashes based on request content, and
timestamps are normalized to constants, eliminating spurious changes
when re-recording tests.

## Changes
- Updated `inference_recorder.py` to normalize IDs and timestamps during
recording
- Added `scripts/normalize_recordings.py` utility to re-normalize
existing recordings
- Created documentation in `tests/integration/recordings/README.md`
- Normalized 350 existing recording files
2025-10-03 17:26:11 -07:00
Ashwin Bharambe
61b4238912
feat(api): add extra_body parameter support with shields example (#3670)
## Summary
Introduce `ExtraBodyField` annotation to enable parameters that arrive
via extra_body in client SDKs but are accessible server-side with full
typing.

These parameters are documented in OpenAPI specs under
**`x-llama-stack-extra-body-params`** but excluded from generated SDK
signatures.

Add `shields` parameter to `create_openai_response` as the first
implementation using this pattern.

## Test Plan
- added an integration test which checks that shields parameter passed
via extra_body reaches server implementation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-03 13:25:09 -07:00
Matthew Farrellee
ce77c27ff8
chore: use remoteinferenceproviderconfig for remote inference providers (#3668)
# What does this PR do?

on the path to maintainable impls of inference providers. make all
configs instances of RemoteInferenceProviderConfig.

## Test Plan

ci
2025-10-03 08:48:42 -07:00
Francisco Arceo
a20e8eac8c
feat: Add OpenAI Conversations API (#3429)
# What does this PR do?

Initial implementation for `Conversations` and `ConversationItems` using
`AuthorizedSqlStore` with endpoints to:
- CREATE
- UPDATE
- GET/RETRIEVE/LIST
- DELETE

Set `level=LLAMA_STACK_API_V1`.

NOTE: This does not currently incorporate changes for Responses, that'll
be done in a subsequent PR.

Closes https://github.com/llamastack/llama-stack/issues/3235

## Test Plan
- Unit tests
- Integration tests

Also comparison of [OpenAPI spec for OpenAI
API](https://github.com/openai/openai-openapi/tree/manual_spec)
```bash
oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \
--match-path '(^/v1/openai/v1/conversations.*|^/conversations.*)'
```

Note I still have some uncertainty about this, I borrowed this info from
@cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need
to spend more time to confirm it's working, at the moment it suggests it
does.

UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks
like currently the spec does not list Conversations, so that analysis is
useless. Noting for future reference.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-03 08:47:18 -07:00
Matthew Farrellee
d266c59c2a
chore: remove deprecated inference.chat_completion implementations (#3654)
# What does this PR do?

remove unused chat_completion implementations

vllm features ported -
 - requires max_tokens be set, use config value
 - set tool_choice to none if no tools provided


## Test Plan

ci
2025-10-03 07:55:34 -04:00
Christian Zaccaria
bcdbb53be3
feat: implement keyword and hybrid search for Weaviate provider (#3264)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- This PR implements keyword and hybrid search for Weaviate DB based on
its inbuilt functions.
- Added fixtures to conftest.py for Weaviate.
- Enabled integration tests for remote Weaviate on all 3 search modes.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3010 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Unit tests and integration tests should pass on this PR.
2025-10-03 10:22:30 +02:00
Doug Edgar
52c8df2322
feat: auto-detect Console width (#3327)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Addresses Issue #3271 - "Starting LLS server locally on a terminal with
120 chars width results in an output with empty lines".

This removes the specific 150-character width limit specified for the
Console, and will now auto-detect the terminal width instead. Now the
formatting of Console output is consistent across different sizes of
terminal windows.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3271

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching the server with several different sizes of terminal windows
results in Console output without unexpected spacing. e.g. `python -m
llama_stack.core.server.server /tmp/run.yaml --port 8321`

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-10-03 10:19:31 +02:00
Matthew Farrellee
0a41c4ead0
chore: OpenAIMixin implements ModelsProtocolPrivate (#3662)
# What does this PR do?

add ModelsProtocolPrivate methods to OpenAIMixin

this will allow providers using OpenAIMixin to use a common interface


## Test Plan

ci w/ new tests
2025-10-02 21:32:02 -07:00
ehhuang
14a94e9894
fix: responses <> chat completion input conversion (#3645)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do?

closes #3268
closes #3498

When resuming from previous response ID, currently we attempt to convert
from the stored responses input to chat completion messages, which is
not always possible, e.g. for tool calls where some data is lost once
converted from chat completion message to repsonses input format.

This PR stores the chat completion messages that correspond to the
_last_ call to chat completion, which is sufficient to be resumed from
in the next responses API call, where we load these saved messages and
skip conversion entirely.

Separate issue to optimize storage:
https://github.com/llamastack/llama-stack/issues/3646

## Test Plan
existing CI tests
2025-10-02 16:01:08 -07:00
Ashwin Bharambe
ef0736527d
feat(tools)!: substantial clean up of "Tool" related datatypes (#3627)
This is a sweeping change to clean up some gunk around our "Tool"
definitions.

First, we had two types `Tool` and `ToolDef`. The first of these was a
"Resource" type for the registry but we had stopped registering tools
inside the Registry long back (and only registered ToolGroups.) The
latter was for specifying tools for the Agents API. This PR removes the
former and adds an optional `toolgroup_id` field to the latter.

Secondly, as pointed out by @bbrowning in
https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132,
we were doing a lossy conversion from a full JSON schema from the MCP
tool specification into our ToolDefinition to send it to the model.
There is no necessity to do this -- we ourselves aren't doing any
execution at all but merely passing it to the chat completions API which
supports this. By doing this (and by doing it poorly), we encountered
limitations like not supporting array items, or not resolving $refs,
etc.

To fix this, we replaced the `parameters` field by `{ input_schema,
output_schema }` which can be full blown JSON schemas.

Finally, there were some types in our llama-related chat format
conversion which needed some cleanup. We are taking this opportunity to
clean those up.

This PR is a substantial breaking change to the API. However, given our
window for introducing breaking changes, this suits us just fine. I will
be landing a concurrent `llama-stack-client` change as well since API
shapes are changing.
2025-10-02 15:12:03 -07:00
ehhuang
ceca3c056f
chore: fix/add logging categories (#3658)
# What does this PR do?
These aren't controllable by LLAMA_STACK_LOGGING

```

tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [  3%]
tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [  7%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] 
instantiating llama_stack_client
WARNING  2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20                  
WARNING  2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
INFO     2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20                    
WARNING  2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20             
WARNING  2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20        
INFO     2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20 
```

## Test Plan
2025-10-02 13:10:13 -07:00
Ashwin Bharambe
6afa96b0b9 fix(api): fix a mistake from #3636 which overwrote POST /responses 2025-10-02 13:03:17 -07:00
Sébastien Han
4161102100
chore!: add double routes for v1/openai/v1 (#3636)
So that users get a warning in 0.3.0 and we remove them in 0.4.0.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-02 16:11:05 +02:00
Aakanksha Duggal
7e48cc48bc
refactor(agents): migrate to OpenAI chat completions API (#3323)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 18s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m16s
2025-10-02 06:50:32 -04:00
Chacksu
426dc54883
docs: Fix Dell distro documentation code snippets (#3640)
# What does this PR do?
* Updates code snippets for Dell distribution, fixing specific user home
directory in code (replacing with $HOME) and updates docker instructions
to use `docker` instead of `podman`.

## Test Plan
N.A.

Co-authored-by: Connor Hack <connorhack@fb.com>
2025-10-02 11:11:30 +02:00
ehhuang
5adcf0e0cb
chore: Remove debug logging from telemetry adapter (#3643)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Spammy

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
n/a
2025-10-01 15:16:23 -07:00
Matthew Farrellee
4dbe0593f9
chore: add provider-data-api-key support to openaimixin (#3639)
# What does this PR do?

the LiteLLMOpenAIMixin provides support for reading key from provider
data (headers users send).

this adds the same functionality to the OpenAIMixin.

this is infrastructure for migrating providers.


## Test Plan

ci w/ new tests
2025-10-01 13:44:59 -07:00
Alexey Rybak
b6a5bccadf
docs: api separation (#3630)
# What does this PR do?

First step towards cleaning up the API reference section of the docs.

- Separates API reference into 3 sections: stable (`v1`), experimental (`v1alpha` and `v1beta`), and deprecated (`deprecated=True`)
- Each section is accessible via the dropdown menu and `docs/api-overview`

<img width="1237" height="321" alt="Screenshot 2025-09-30 at 5 47 30 PM" src="https://github.com/user-attachments/assets/fe0e498c-b066-46ed-a48e-4739d3b6724c" />

<img width="860" height="510" alt="Screenshot 2025-09-30 at 5 47 49 PM" src="https://github.com/user-attachments/assets/a92a8d8c-94bf-42d5-9f5b-b47bb2b14f9c" />

- Deprecated APIs: Added styling to the sidebar, and a notice on the endpoint pages

<img width="867" height="428" alt="Screenshot 2025-09-30 at 5 47 43 PM" src="https://github.com/user-attachments/assets/9e6e050d-c782-461b-8084-5ff6496d7bd9" />

Closes #3628

TODO in follow-up PRs:

- Add the ability to annotate API groups with supplementary content  (so we can have longer descriptions of complex APIs like Responses)
- Clean up docstrings to show API endpoints (or short semantic titles) in the sidebar

## Test Plan

- Local testing
- Made sure API conformance test still passes
2025-10-01 10:13:31 -07:00
ehhuang
853e9b3b0a
fix: log level (#3637)
# What does this PR do?
- categories like "core::server" is not recognized so it's level is not
set by 'all=debug'
- removed spammy telemetry debug logging

## Test Plan
test server launched with LLAMA_STACK_LOGGING='all=debug'
2025-10-01 09:51:39 -07:00
Charlie Doern
d167101e70
feat(api): implement v1beta leveling, and additional alpha (#3594)
# What does this PR do?

level the following APIs, keeping their old routes around as well until
0.4.0

1. datasetio to v1beta: used primarily by eval and training. Given that
training is v1alpha, and eval is v1alpha, datasetio is likely to change
in structure as real usages of the API spin up. Register,unregister, and
iter dataset is sparsely implemented meaning the shape of that route is
likely to change.

2. telemetry to v1alpha: telemetry has been going through many changes.
for example query_metrics was not even implemented until recently and
had to change its shape to work. putting this in v1beta will allow us to
fix functionality like OTEL, sqlite, etc. The routes themselves are set,
but the structure might change a bit

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-01 09:18:11 -07:00
Matthew Farrellee
f7c5ef4ec0
chore: remove /v1/inference/completion and implementations (#3622)
# What does this PR do?

the /inference/completion route is gone. this removes the
implementations.

## Test Plan

ci
2025-10-01 11:36:53 -04:00
Matthew Farrellee
ea15f2a270
chore: use openai_chat_completion for llm as a judge scoring (#3635)
# What does this PR do?

update llm as a judge to use openai_chat_completion, instead of
deprecated chat_completion


## Test Plan

ci
2025-10-01 09:44:31 -04:00
Jaideep Rao
ca47d90926
fix: Ensure that tool calls with no arguments get handled correctly (#3560)
# What does this PR do?
When a model decides to use an MCP tool call that requires no arguments,
it sets the `arguments` field to `None`. This causes the user to see a
`400 bad requst error` due to validation errors down the stack because
this field gets removed when being parsed by an openai compatible
inference provider like vLLM
This PR ensures that, as soon as the tool call args are accumulated
while streaming, we check to ensure no tool call function arguments are
set to None - if they are we replace them with "{}"

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3456

## Test Plan
Added new unit test to verify that any tool calls with function
arguments set to `None` get handled correctly

---------

Signed-off-by: Jaideep Rao <jrao@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-01 08:36:57 -04:00
Ashwin Bharambe
42414a1a1b
fix(logging): disable console telemetry sink by default (#3623)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 21s
Test Llama Stack Build / build-single-provider (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 33s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m12s
The current span processing dumps so much junk on the console that it
makes actual understanding of what is going on in the server impossible.
I am killing the console sink as a default. If you want, you are always
free to change your run.yaml to add it.

Before: 
<img width="1877" height="1107" alt="image"
src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37"
/>

After:
<img width="1919" height="470" alt="image"
src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332"
/>
2025-09-30 14:58:05 -07:00
ehhuang
ac7c35fbe6
fix: don't pass default response format in Responses (#3614)
# What does this PR do?
Fireworks doesn't allow repsonse_format with tool use. The default
response format is 'text' anyway, so we can safely omit.


## Test Plan
Below script failed without the change, runs after.

```
#!/usr/bin/env python3
"""
Script to test Responses API with kubernetes-mcp-server.

This script:
1. Connects to the llama stack server
2. Uses the Responses API with MCP tools
3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server
"""

import json

from openai import OpenAI

# Connect to the llama stack server
base_url = "http://localhost:8321/v1"
client = OpenAI(base_url=base_url, api_key="fake")

# Define the MCP tool pointing to the kubernetes-mcp-server
# The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse
mcp_server_url = "http://localhost:3000/sse"

tools = [
    {
        "type": "mcp",
        "server_label": "k8s",
        "server_url": mcp_server_url,
    }
]

# Create a response request asking for k8s namespaces
print("Sending request to list Kubernetes namespaces...")
print(f"Using MCP server at: {mcp_server_url}")
print("Available tools will be listed automatically by the MCP server.")
print()

response = client.responses.create(
    # model="meta-llama/Llama-3.2-3B-Instruct",  # Using the vllm model
    model="fireworks/accounts/fireworks/models/llama4-scout-instruct-basic",
    # model="openai/gpt-4o",
    input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format UNDER ALL CIRCUMSTANCES.",
    tools=tools,
    stream=False,
)

print("\n" + "=" * 80)
print("RESPONSE OUTPUT:")
print("=" * 80)

# Print the output
for i, output in enumerate(response.output):
    print(f"\n[Output {i + 1}] Type: {output.type}")
    if output.type == "mcp_list_tools":
        print(f"  Server: {output.server_label}")
        print(f"  Tools available: {[t.name for t in output.tools]}")
    elif output.type == "mcp_call":
        print(f"  Tool called: {output.name}")
        print(f"  Arguments: {output.arguments}")
        print(f"  Result: {output.output}")
        if output.error:
            print(f"  Error: {output.error}")
    elif output.type == "message":
        print(f"  Role: {output.role}")
        print(f"  Content: {output.content}")

print("\n" + "=" * 80)
print("FINAL RESPONSE TEXT:")
print("=" * 80)
print(response.output_text)
```
2025-09-30 14:52:24 -07:00
grs
d350e3662b
feat: add support for require_approval argument when creating response (#3608)
# What does this PR do?
This PR adds support for the require_approval on an mcp tool definition
passed to create response in the Responses API. This allows the caller
to indicate whether they want to approve calls to that server, or let
them be called without approval.

Closes #3443

## Test Plan
Tested both approval and denial.
Added automated integration test for both cases.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-09-30 14:18:34 -07:00
Ashwin Bharambe
606f4cf281
fix(expires_after): make sure multipart/form-data is properly parsed (#3612)
https://github.com/llamastack/llama-stack/pull/3604 broke multipart form
data field parsing for the Files API since it changed its shape -- so as
to match the API exactly to the OpenAI spec even in the generated client
code.

The underlying reason is that multipart/form-data cannot transport
structured nested fields. Each field must be str-serialized. The client
(specifically the OpenAI client whose behavior we must match),
transports sub-fields as `expires_after[anchor]` and
`expires_after[seconds]`, etc. We must be able to handle these fields
somehow on the server without compromising the shape of the YAML spec.

This PR "fixes" this by adding a dependency to convert the data. The
main trade-off here is that we must add this `Depends()` annotation on
every provider implementation for Files. This is a headache, but a much
more reasonable one (in my opinion) given the alternatives.

## Test Plan

Tests as shown in
https://github.com/llamastack/llama-stack/pull/3604#issuecomment-3351090653
pass.
2025-09-30 16:14:03 -04:00
slekkala1
cc64093ae4
feat(api): Add Vector Store File batches api stub (#3615)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?
Adding api stubs for vector store file batches apis
https://github.com/llamastack/llama-stack/issues/3533
API Ref:
https://platform.openai.com/docs/api-reference/vector-stores-file-batches

## Test Plan
CI
2025-09-30 12:07:33 -07:00
Charlie Doern
1e25a72ece
feat(api): level /agents as v1alpha (#3610)
# What does this PR do?

agents is likely to be deprecated in favor of responses. Lets level it
as alpha to indicate the lack of longterm support

keep v1 route for backwards compat.

Closes #3611

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-30 11:15:04 -07:00
Matthew Farrellee
2de4e6c900
feat: use /v1/chat/completions for safety model inference (#3591)
# What does this PR do?

migrate safety api implementation from /inference/chat-completion to
/v1/chat/completions

## Test Plan

ci w/ recordings

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-09-30 11:01:44 -07:00