llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-07 04:45:44 +00:00

Author	SHA1	Message	Date
Swapna Lekkala	4e4db89df6	fix rate limit errors	2025-10-06 10:54:12 -07:00
Swapna Lekkala	19fdf052a5	improve cancel test	2025-10-06 10:54:12 -07:00
Swapna Lekkala	fbac7cf4df	improve resume and dont attach duplicate file	2025-10-06 10:54:12 -07:00
Swapna Lekkala	e2af40b4a6	add __init__ to the mixin	2025-10-06 10:54:12 -07:00
Swapna Lekkala	37b220355b	add concurrent file attaching	2025-10-06 10:54:12 -07:00
Swapna Lekkala	4c14cf0747	clean up clean up	2025-10-06 10:54:12 -07:00
Swapna Lekkala	a25fe110f3	remove unwanted comments	2025-10-06 10:54:12 -07:00
Swapna Lekkala	af4c5df185	feat(api): Add vector store file batches api	2025-10-06 10:54:12 -07:00
Alexey Rybak	a8da6ba3a7	docs: API docstrings cleanup for better documentation rendering (#3661 ) # What does this PR do? * Cleans up API docstrings for better documentation rendering <img width="2346" height="1126" alt="image" src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1" /> ## Test Plan * Manual testing --------- Signed-off-by: Doug Edgar <dedgar@redhat.com> Signed-off-by: Charlie Doern <cdoern@redhat.com> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu> Co-authored-by: Doug Edgar <dedgar@redhat.com> Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com> Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu> Co-authored-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 10:46:33 -07:00
Matthew Farrellee	892ea759fa	chore: remove together inference adapter's custom check_model_availability (#3702 ) # What does this PR do? remove Together inference adapter's check_model_availability impl, rely on standard impl instead ## Test Plan ci	2025-10-06 13:28:36 -04:00
Matthew Farrellee	de9940c697	chore: disable openai_embeddings on inference=remote::llama-openai-compat (#3704 ) # What does this PR do? api.llama.com does not provide embedding models, this makes that clear ## Test Plan ci	2025-10-06 13:27:40 -04:00
Matthew Farrellee	ae74b31ae3	chore: remove vLLM inference adapter's custom list_models (#3703 ) # What does this PR do? remove vLLM inference adapter's custom list_models impl, rely on standard impl instead ## Test Plan ci	2025-10-06 13:27:30 -04:00
Matthew Farrellee	d23ed26238	chore: turn OpenAIMixin into a pydantic.BaseModel (#3671 ) # What does this PR do? - implement get_api_key instead of relying on LiteLLMOpenAIMixin.get_api_key - remove use of LiteLLMOpenAIMixin - add default initialize/shutdown methods to OpenAIMixin - remove __init__s to allow proper pydantic construction - remove dead code from vllm adapter and associated / duplicate unit tests - update vllm adapter to use openaimixin for model registration - remove ModelRegistryHelper from fireworks & together adapters - remove Inference from nvidia adapter - complete type hints on embedding_model_metadata - allow extra fields on OpenAIMixin, for model_store, __provider_id__, etc - new recordings for ollama - enhance the list models error handling - update cerebras (remove cerebras-cloud-sdk) and anthropic (custom model listing) inference adapters - parametrized test_inference_client_caching - remove cerebras, databricks, fireworks, together from blanket mypy exclude - removed unnecessary litellm deps ## Test Plan ci	2025-10-06 11:33:19 -04:00
Matthew Farrellee	724dac498c	chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details (#3682 ) # What does this PR do? close the _model_cache abstraction leak ## Test Plan ci w/ new tests	2025-10-06 09:44:33 -04:00
Charlie Doern	f00bcd9561	feat: allow for multiple external provider specs (#3341 ) # What does this PR do? when using the providers.d method of installation users could hand craft their AdapterSpec's to use overlapping code meaning one repo could contain an inline and remote impl. Currently installing a provider via module does not allow for that as each repo is only allowed to have one `get_provider_spec` method with one Spec returned add an optional way for `get_provider_spec` to return a list of `ProviderSpec` where each can be either an inline or remote impl. Note: the `adapter_type` in `get_provider_spec` MUST match the `provider_type` in the build/run yaml for this to work. resolves #3226 ## Test Plan once this merges we need to re-enable the external provider test and account for this functionality. Work needs to be done in the external provider repos to support this functionality. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-06 15:26:38 +02:00
ehhuang	426cac078b	chore: use uvicorn to start llama stack server everywhere (#3625 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 44s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn to start llama stack server which supports spawning multiple workers. This PR enables us to launch >1 workers from `llama stack run` (will add the parameter in a follow-up PR, keeping this PR on simplifying) by removing the old way of launching stack server and consolidates launching via uvicorn.run only. ## Test Plan ran `llama stack run starter` CI	2025-10-06 14:27:40 +02:00
dependabot[bot]	c0f0a03529	chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3693 ) Bumps [react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom) and [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom). These dependencies needed to be updated together. Updates `react-dom` from 19.1.1 to 19.2.0 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/facebook/react/releases">react-dom's releases</a>.</em></p> <blockquote> <h2>19.2.0 (Oct 1, 2025)</h2> <p>Below is a list of all new features, APIs, and bug fixes.</p> <p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React 19.2 release post</a> for more information.</p> <h2>New React Features</h2> <ul> <li><a href="https://react.dev/reference/react/Activity"><code><Activity></code></a>: A new API to hide and restore the UI and internal state of its children.</li> <li><a href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a> is a React Hook that lets you extract non-reactive logic into an <a href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect Event</a>.</li> <li><a href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a> (for RSCs) lets your know when the <code>cache()</code> lifetime is over.</li> <li><a href="https://react.dev/reference/developer-tooling/react-performance-tracks">React Performance tracks</a> appear on the Performance panel’s timeline in your browser developer tools</li> </ul> <h2>New React DOM Features</h2> <ul> <li>Added resume APIs for partial pre-rendering with Web Streams: <ul> <li><a href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>: to resume a prerender to a stream.</li> <li><a href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>: to resume a prerender to HTML.</li> </ul> </li> <li>Added resume APIs for partial pre-rendering with Node Streams: <ul> <li><a href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>: to resume a prerender to a stream.</li> <li><a href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>: to resume a prerender to HTML.</li> </ul> </li> <li>Updated <a href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a> APIs to return a <code>postponed</code> state that can be passed to the <code>resume</code> APIs.</li> </ul> <h2>Notable changes</h2> <ul> <li>React DOM now batches suspense boundary reveals, matching the behavior of client side rendering. This change is especially noticeable when animating the reveal of Suspense boundaries e.g. with the upcoming <code><ViewTransition></code> Component. React will batch as much reveals as possible before the first paint while trying to hit popular first-contentful paint metrics.</li> <li>Add Node Web Streams (<code>prerender</code>, <code>renderToReadableStream</code>) to server-side-rendering APIs for Node.js</li> <li>Use underscore instead of <code>:</code> IDs generated by useId</li> </ul> <h2>All Changes</h2> <h3>React</h3> <ul> <li><code><Activity /></code> was developed over many years, starting before <code>ClassComponent.setState</code> (<a href="https://github.com/acdlite"><code>@acdlite</code></a> <a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> and many others)</li> <li>Stringify context as "SomeContext" instead of "SomeContext.Provider" (<a href="https://github.com/kassens"><code>@kassens</code></a> <a href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li> <li>Include stack of cause of React instrumentation errors with <code>%o</code> placeholder (<a href="https://github.com/eps1lon"><code>@eps1lon</code></a> <a href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li> <li>Fix infinite <code>useDeferredValue</code> loop in popstate event (<a href="https://github.com/acdlite"><code>@acdlite</code></a> <a href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li> <li>Fix a bug when an initial value was passed to <code>useDeferredValue</code> (<a href="https://github.com/acdlite"><code>@acdlite</code></a> <a href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li> <li>Fix a crash when submitting forms with Client Actions (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li> <li>Hide/unhide the content of dehydrated suspense boundaries if they resuspend (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li> <li>Avoid stack overflow on wide trees during Hot Reload (<a href="https://github.com/sophiebits"><code>@sophiebits</code></a> <a href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li> <li>Improve Owner and Component stacks in various places (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a>, <a href="https://github.com/eps1lon"><code>@eps1lon</code></a>: <a href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>, <a href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>, <a href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>, <a href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li> <li>Add <code>cacheSignal</code> (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li> </ul> <h3>React DOM</h3> <ul> <li>Block on Suspensey Fonts during reveal of server-side-rendered content (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li> <li>Use underscore instead of <code>:</code> for IDs generated by <code>useId</code> (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a>, <a href="https://github.com/eps1lon"><code>@eps1lon</code></a>: <a href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>, <a href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>, <a href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li> <li>Stop warning when ARIA 1.3 attributes are used (<a href="https://github.com/Abdul-Omira"><code>@Abdul-Omira</code></a> <a href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li> <li>Allow <code>nonce</code> to be used on hoistable styles (<a href="https://github.com/Andarist"><code>@Andarist</code></a> <a href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li> <li>Warn for using a React owned node as a Container if it also has text content (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/32774">#32774</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's changelog</a>.</em></p> <blockquote> <h2>19.2.0 (October 1st, 2025)</h2> <p>Below is a list of all new features, APIs, and bug fixes.</p> <p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React 19.2 release post</a> for more information.</p> <h3>New React Features</h3> <ul> <li><a href="https://react.dev/reference/react/Activity"><code><Activity></code></a>: A new API to hide and restore the UI and internal state of its children.</li> <li><a href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a> is a React Hook that lets you extract non-reactive logic into an <a href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect Event</a>.</li> <li><a href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a> (for RSCs) lets your know when the <code>cache()</code> lifetime is over.</li> <li><a href="https://react.dev/reference/developer-tooling/react-performance-tracks">React Performance tracks</a> appear on the Performance panel’s timeline in your browser developer tools</li> </ul> <h3>New React DOM Features</h3> <ul> <li>Added resume APIs for partial pre-rendering with Web Streams: <ul> <li><a href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>: to resume a prerender to a stream.</li> <li><a href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>: to resume a prerender to HTML.</li> </ul> </li> <li>Added resume APIs for partial pre-rendering with Node Streams: <ul> <li><a href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>: to resume a prerender to a stream.</li> <li><a href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>: to resume a prerender to HTML.</li> </ul> </li> <li>Updated <a href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a> APIs to return a <code>postponed</code> state that can be passed to the <code>resume</code> APIs.</li> </ul> <h3>Notable changes</h3> <ul> <li>React DOM now batches suspense boundary reveals, matching the behavior of client side rendering. This change is especially noticeable when animating the reveal of Suspense boundaries e.g. with the upcoming <code><ViewTransition></code> Component. React will batch as much reveals as possible before the first paint while trying to hit popular first-contentful paint metrics.</li> <li>Add Node Web Streams (<code>prerender</code>, <code>renderToReadableStream</code>) to server-side-rendering APIs for Node.js</li> <li>Use underscore instead of <code>:</code> IDs generated by useId</li> </ul> <h3>All Changes</h3> <h4>React</h4> <ul> <li><code><Activity /></code> was developed over many years, starting before <code>ClassComponent.setState</code> (<a href="https://github.com/acdlite"><code>@acdlite</code></a> <a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> and many others)</li> <li>Stringify context as "SomeContext" instead of "SomeContext.Provider" (<a href="https://github.com/kassens"><code>@kassens</code></a> <a href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li> <li>Include stack of cause of React instrumentation errors with <code>%o</code> placeholder (<a href="https://github.com/eps1lon"><code>@eps1lon</code></a> <a href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li> <li>Fix infinite <code>useDeferredValue</code> loop in popstate event (<a href="https://github.com/acdlite"><code>@acdlite</code></a> <a href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li> <li>Fix a bug when an initial value was passed to <code>useDeferredValue</code> (<a href="https://github.com/acdlite"><code>@acdlite</code></a> <a href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li> <li>Fix a crash when submitting forms with Client Actions (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li> <li>Hide/unhide the content of dehydrated suspense boundaries if they resuspend (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li> <li>Avoid stack overflow on wide trees during Hot Reload (<a href="https://github.com/sophiebits"><code>@sophiebits</code></a> <a href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li> <li>Improve Owner and Component stacks in various places (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a>, <a href="https://github.com/eps1lon"><code>@eps1lon</code></a>: <a href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>, <a href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>, <a href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>, <a href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li> <li>Add <code>cacheSignal</code> (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li> </ul> <h4>React DOM</h4> <ul> <li>Block on Suspensey Fonts during reveal of server-side-rendered content (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a> <a href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li> <li>Use underscore instead of <code>:</code> for IDs generated by <code>useId</code> (<a href="https://github.com/sebmarkbage"><code>@sebmarkbage</code></a>, <a href="https://github.com/eps1lon"><code>@eps1lon</code></a>: <a href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>, <a href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>, <a href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li> <li>Stop warning when ARIA 1.3 attributes are used (<a href="https://github.com/Abdul-Omira"><code>@Abdul-Omira</code></a> <a href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li> <li>Allow <code>nonce</code> to be used on hoistable styles (<a href="https://github.com/Andarist"><code>@Andarist</code></a> <a href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`861811347b`"><code>8618113</code></a> Bump scheduler version (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34671">#34671</a>)</li> <li><a href="`1bd1f01f2a`"><code>1bd1f01</code></a> Ship partial-prerendering APIs to Canary (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34633">#34633</a>)</li> <li><a href="`2f0649a0b2`"><code>2f0649a</code></a> [Fizz] Remove <code>nonce</code> option from resume-and-prerender APIs (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34664">#34664</a>)</li> <li><a href="`5667a41fe4`"><code>5667a41</code></a> Bump next prerelease version numbers (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34639">#34639</a>)</li> <li><a href="`e08f53b182`"><code>e08f53b</code></a> Match <code>react-dom/static</code> test entrypoints and published entrypoints (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34599">#34599</a>)</li> <li><a href="`8bb7241f4c`"><code>8bb7241</code></a> Bump useEffectEvent to Canary (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34610">#34610</a>)</li> <li><a href="`83c88ad470`"><code>83c88ad</code></a> Handle fabric root level fragment with compareDocumentPosition (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34533">#34533</a>)</li> <li><a href="`68f00c901c`"><code>68f00c9</code></a> Release Activity in Canary (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34374">#34374</a>)</li> <li><a href="`3168e08f83`"><code>3168e08</code></a> [flags] enable opt-in for enableDefaultTransitionIndicator (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34373">#34373</a>)</li> <li><a href="`3434ff4f4b`"><code>3434ff4</code></a> Add scrollIntoView to fragment instances (<a href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32814">#32814</a>)</li> <li>Additional commits viewable in <a href="https://github.com/facebook/react/commits/v19.2.0/packages/react-dom">compare view</a></li> </ul> </details> <br /> Updates `@types/react-dom` from 19.1.9 to 19.2.0 <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 00:02:31 -04:00
dependabot[bot]	91c6a8a3a3	chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui (#3694 ) Bumps [next](https://github.com/vercel/next.js) from 15.5.3 to 15.5.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.5.4</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>fix: ensure onRequestError is invoked when otel enabled (<a href="https://redirect.github.com/vercel/next.js/issues/83343">#83343</a>)</li> <li>fix: devtools initial position should be from next config (<a href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)</li> <li>[devtool] fix overlay styles are missing (<a href="https://redirect.github.com/vercel/next.js/issues/83721">#83721</a>)</li> <li>Turbopack: don't match dynamic pattern for node_modules packages (<a href="https://redirect.github.com/vercel/next.js/issues/83176">#83176</a>)</li> <li>Turbopack: don't treat metadata routes as RSC (<a href="https://redirect.github.com/vercel/next.js/issues/82911">#82911</a>)</li> <li>[turbopack] Improve handling of symlink resolution errors in track_glob and read_glob (<a href="https://redirect.github.com/vercel/next.js/issues/83357">#83357</a>)</li> <li>Turbopack: throw large static metadata error earlier (<a href="https://redirect.github.com/vercel/next.js/issues/82939">#82939</a>)</li> <li>fix: error overlay not closing when backdrop clicked (<a href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)</li> <li>Turbopack: flush Node.js worker IPC on error (<a href="https://redirect.github.com/vercel/next.js/issues/84077">#84077</a>)</li> </ul> <h3>Misc Changes</h3> <ul> <li>[CNA] use linter preference (<a href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)</li> <li>CI: use KV for test timing data (<a href="https://redirect.github.com/vercel/next.js/issues/83745">#83745</a>)</li> <li>docs: september improvements and fixes (<a href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/yiminghe"><code>@yiminghe</code></a>, <a href="https://github.com/huozhi"><code>@huozhi</code></a>, <a href="https://github.com/devjiwonchoi"><code>@devjiwonchoi</code></a>, <a href="https://github.com/mischnic"><code>@mischnic</code></a>, <a href="https://github.com/lukesandberg"><code>@lukesandberg</code></a>, <a href="https://github.com/ztanner"><code>@ztanner</code></a>, <a href="https://github.com/icyJoseph"><code>@icyJoseph</code></a>, <a href="https://github.com/leerob"><code>@leerob</code></a>, <a href="https://github.com/fufuShih"><code>@fufuShih</code></a>, <a href="https://github.com/dwrth"><code>@dwrth</code></a>, <a href="https://github.com/aymericzip"><code>@aymericzip</code></a>, <a href="https://github.com/obendev"><code>@obendev</code></a>, <a href="https://github.com/molebox"><code>@molebox</code></a>, <a href="https://github.com/OoMNoO"><code>@OoMNoO</code></a>, <a href="https://github.com/pontasan"><code>@pontasan</code></a>, <a href="https://github.com/styfle"><code>@styfle</code></a>, <a href="https://github.com/HondaYt"><code>@HondaYt</code></a>, <a href="https://github.com/ryuapp"><code>@ryuapp</code></a>, <a href="https://github.com/lpalmes"><code>@lpalmes</code></a>, and <a href="https://github.com/ijjk"><code>@ijjk</code></a> for helping!</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`40f1d7814d`"><code>40f1d78</code></a> v15.5.4</li> <li><a href="`cb30f0a176`"><code>cb30f0a</code></a> [backport] docs: september improvements and fixes (<a href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li> <li><a href="`b6a32bb579`"><code>b6a32bb</code></a> [backport] [CNA] use linter preference (<a href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/84087">#84087</a>)</li> <li><a href="`26d61f1e9a`"><code>26d61f1</code></a> [backport] Turbopack: flush Node.js worker IPC on error (<a href="https://redirect.github.com/vercel/next.js/issues/84079">#84079</a>)</li> <li><a href="`e11e87a547`"><code>e11e87a</code></a> [backport] fix: error overlay not closing when backdrop clicked (<a href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>) (<a href="https://redirect.github.com/vercel/next.js/issues/83">#83</a>...</li> <li><a href="`0a29888575`"><code>0a29888</code></a> [backport] fix: devtools initial position should be from next config (<a href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)...</li> <li><a href="`7a53950c13`"><code>7a53950</code></a> [backport] Turbopack: don't treat metadata routes as RSC (<a href="https://redirect.github.com/vercel/next.js/issues/83804">#83804</a>)</li> <li><a href="`050bdf1ae7`"><code>050bdf1</code></a> [backport] Turbopack: throw large static metadata error earlier (<a href="https://redirect.github.com/vercel/next.js/issues/83816">#83816</a>)</li> <li><a href="`1f6ea09f85`"><code>1f6ea09</code></a> [backport] Turbopack: Improve handling of symlink resolution errors (<a href="https://redirect.github.com/vercel/next.js/issues/83805">#83805</a>)</li> <li><a href="`c7d1855499`"><code>c7d1855</code></a> [backport] CI: use KV for test timing data (<a href="https://redirect.github.com/vercel/next.js/issues/83860">#83860</a>)</li> <li>Additional commits viewable in <a href="https://github.com/vercel/next.js/compare/v15.5.3...v15.5.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.5.3&new-version=15.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 00:01:38 -04:00
Matthew Farrellee	351c4b98e4	chore: inference=remote::llama-openai-compat does not support /v1/completion (#3683 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 16s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details UI Tests / ui-tests (22) (push) Successful in 44s Details Pre-commit / pre-commit (push) Successful in 1m22s Details ## What does this PR do? skip completion tests for inference=remote::llama-openai-compat ## Test Plan ci	2025-10-04 11:36:48 -07:00
Ashwin Bharambe	045a0c1d57	feat(tests): implement test isolation for inference recordings (#3681 ) Uses test_id in request hashes and test-scoped subdirectories to prevent cross-test contamination. Model list endpoints exclude test_id to enable merging recordings from different servers. Additionally, this PR adds a `record-if-missing` mode (which we will use instead of `record` which records everything) which is very useful. 🤖 Co-authored with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-04 11:34:18 -07:00
Ashwin Bharambe	3f36bfaeaa	chore(tests): normalize recording IDs and timestamps to reduce git diff noise (#3676 ) IDs are now deterministic hashes based on request content, and timestamps are normalized to constants, eliminating spurious changes when re-recording tests. ## Changes - Updated `inference_recorder.py` to normalize IDs and timestamps during recording - Added `scripts/normalize_recordings.py` utility to re-normalize existing recordings - Created documentation in `tests/integration/recordings/README.md` - Normalized 350 existing recording files	2025-10-03 17:26:11 -07:00
Ashwin Bharambe	61b4238912	feat(api): add extra_body parameter support with shields example (#3670 ) ## Summary Introduce `ExtraBodyField` annotation to enable parameters that arrive via extra_body in client SDKs but are accessible server-side with full typing. These parameters are documented in OpenAPI specs under `x-llama-stack-extra-body-params` but excluded from generated SDK signatures. Add `shields` parameter to `create_openai_response` as the first implementation using this pattern. ## Test Plan - added an integration test which checks that shields parameter passed via extra_body reaches server implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-03 13:25:09 -07:00
Matthew Farrellee	ce77c27ff8	chore: use remoteinferenceproviderconfig for remote inference providers (#3668 ) # What does this PR do? on the path to maintainable impls of inference providers. make all configs instances of RemoteInferenceProviderConfig. ## Test Plan ci	2025-10-03 08:48:42 -07:00
Francisco Arceo	a20e8eac8c	feat: Add OpenAI Conversations API (#3429 ) # What does this PR do? Initial implementation for `Conversations` and `ConversationItems` using `AuthorizedSqlStore` with endpoints to: - CREATE - UPDATE - GET/RETRIEVE/LIST - DELETE Set `level=LLAMA_STACK_API_V1`. NOTE: This does not currently incorporate changes for Responses, that'll be done in a subsequent PR. Closes https://github.com/llamastack/llama-stack/issues/3235 ## Test Plan - Unit tests - Integration tests Also comparison of [OpenAPI spec for OpenAI API](https://github.com/openai/openai-openapi/tree/manual_spec) ```bash oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \ --match-path '(^/v1/openai/v1/conversations.\|^/conversations.)' ``` Note I still have some uncertainty about this, I borrowed this info from @cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need to spend more time to confirm it's working, at the moment it suggests it does. UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks like currently the spec does not list Conversations, so that analysis is useless. Noting for future reference. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-03 08:47:18 -07:00
Matthew Farrellee	d266c59c2a	chore: remove deprecated inference.chat_completion implementations (#3654 ) # What does this PR do? remove unused chat_completion implementations vllm features ported - - requires max_tokens be set, use config value - set tool_choice to none if no tools provided ## Test Plan ci	2025-10-03 07:55:34 -04:00
Christian Zaccaria	bcdbb53be3	feat: implement keyword and hybrid search for Weaviate provider (#3264 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - This PR implements keyword and hybrid search for Weaviate DB based on its inbuilt functions. - Added fixtures to conftest.py for Weaviate. - Enabled integration tests for remote Weaviate on all 3 search modes. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3010 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Unit tests and integration tests should pass on this PR.	2025-10-03 10:22:30 +02:00
Doug Edgar	52c8df2322	feat: auto-detect Console width (#3327 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Addresses Issue #3271 - "Starting LLS server locally on a terminal with 120 chars width results in an output with empty lines". This removes the specific 150-character width limit specified for the Console, and will now auto-detect the terminal width instead. Now the formatting of Console output is consistent across different sizes of terminal windows. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3271 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Launching the server with several different sizes of terminal windows results in Console output without unexpected spacing. e.g. `python -m llama_stack.core.server.server /tmp/run.yaml --port 8321` --------- Signed-off-by: Doug Edgar <dedgar@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-10-03 10:19:31 +02:00
Matthew Farrellee	0a41c4ead0	chore: OpenAIMixin implements ModelsProtocolPrivate (#3662 ) # What does this PR do? add ModelsProtocolPrivate methods to OpenAIMixin this will allow providers using OpenAIMixin to use a common interface ## Test Plan ci w/ new tests	2025-10-02 21:32:02 -07:00
ehhuang	14a94e9894	fix: responses <> chat completion input conversion (#3645 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? closes #3268 closes #3498 When resuming from previous response ID, currently we attempt to convert from the stored responses input to chat completion messages, which is not always possible, e.g. for tool calls where some data is lost once converted from chat completion message to repsonses input format. This PR stores the chat completion messages that correspond to the _last_ call to chat completion, which is sufficient to be resumed from in the next responses API call, where we load these saved messages and skip conversion entirely. Separate issue to optimize storage: https://github.com/llamastack/llama-stack/issues/3646 ## Test Plan existing CI tests	2025-10-02 16:01:08 -07:00
Ashwin Bharambe	ef0736527d	feat(tools)!: substantial clean up of "Tool" related datatypes (#3627 ) This is a sweeping change to clean up some gunk around our "Tool" definitions. First, we had two types `Tool` and `ToolDef`. The first of these was a "Resource" type for the registry but we had stopped registering tools inside the Registry long back (and only registered ToolGroups.) The latter was for specifying tools for the Agents API. This PR removes the former and adds an optional `toolgroup_id` field to the latter. Secondly, as pointed out by @bbrowning in https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132, we were doing a lossy conversion from a full JSON schema from the MCP tool specification into our ToolDefinition to send it to the model. There is no necessity to do this -- we ourselves aren't doing any execution at all but merely passing it to the chat completions API which supports this. By doing this (and by doing it poorly), we encountered limitations like not supporting array items, or not resolving $refs, etc. To fix this, we replaced the `parameters` field by `{ input_schema, output_schema }` which can be full blown JSON schemas. Finally, there were some types in our llama-related chat format conversion which needed some cleanup. We are taking this opportunity to clean those up. This PR is a substantial breaking change to the API. However, given our window for introducing breaking changes, this suits us just fine. I will be landing a concurrent `llama-stack-client` change as well since API shapes are changing.	2025-10-02 15:12:03 -07:00
ehhuang	ceca3c056f	chore: fix/add logging categories (#3658 ) # What does this PR do? These aren't controllable by LLAMA_STACK_LOGGING ``` tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [ 3%] tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [ 7%] tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] instantiating llama_stack_client WARNING 2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 INFO 2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20 INFO 2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20 ``` ## Test Plan	2025-10-02 13:10:13 -07:00
Ashwin Bharambe	6afa96b0b9	fix(api): fix a mistake from #3636 which overwrote POST /responses	2025-10-02 13:03:17 -07:00
Sébastien Han	4161102100	chore!: add double routes for v1/openai/v1 (#3636 ) So that users get a warning in 0.3.0 and we remove them in 0.4.0. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-02 16:11:05 +02:00
Aakanksha Duggal	7e48cc48bc	refactor(agents): migrate to OpenAI chat completions API (#3323 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / build-single-provider (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Python Package Build Test / build (3.13) (push) Failing after 14s Details Test Llama Stack Build / generate-matrix (push) Successful in 18s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 44s Details Pre-commit / pre-commit (push) Successful in 1m16s Details	2025-10-02 06:50:32 -04:00
Chacksu	426dc54883	docs: Fix Dell distro documentation code snippets (#3640 ) # What does this PR do? * Updates code snippets for Dell distribution, fixing specific user home directory in code (replacing with $HOME) and updates docker instructions to use `docker` instead of `podman`. ## Test Plan N.A. Co-authored-by: Connor Hack <connorhack@fb.com>	2025-10-02 11:11:30 +02:00
ehhuang	5adcf0e0cb	chore: Remove debug logging from telemetry adapter (#3643 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Spammy ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> n/a	2025-10-01 15:16:23 -07:00
Matthew Farrellee	4dbe0593f9	chore: add provider-data-api-key support to openaimixin (#3639 ) # What does this PR do? the LiteLLMOpenAIMixin provides support for reading key from provider data (headers users send). this adds the same functionality to the OpenAIMixin. this is infrastructure for migrating providers. ## Test Plan ci w/ new tests	2025-10-01 13:44:59 -07:00
Alexey Rybak	b6a5bccadf	docs: api separation (#3630 ) # What does this PR do? First step towards cleaning up the API reference section of the docs. - Separates API reference into 3 sections: stable (`v1`), experimental (`v1alpha` and `v1beta`), and deprecated (`deprecated=True`) - Each section is accessible via the dropdown menu and `docs/api-overview` <img width="1237" height="321" alt="Screenshot 2025-09-30 at 5 47 30 PM" src="https://github.com/user-attachments/assets/fe0e498c-b066-46ed-a48e-4739d3b6724c" /> <img width="860" height="510" alt="Screenshot 2025-09-30 at 5 47 49 PM" src="https://github.com/user-attachments/assets/a92a8d8c-94bf-42d5-9f5b-b47bb2b14f9c" /> - Deprecated APIs: Added styling to the sidebar, and a notice on the endpoint pages <img width="867" height="428" alt="Screenshot 2025-09-30 at 5 47 43 PM" src="https://github.com/user-attachments/assets/9e6e050d-c782-461b-8084-5ff6496d7bd9" /> Closes #3628 TODO in follow-up PRs: - Add the ability to annotate API groups with supplementary content (so we can have longer descriptions of complex APIs like Responses) - Clean up docstrings to show API endpoints (or short semantic titles) in the sidebar ## Test Plan - Local testing - Made sure API conformance test still passes	2025-10-01 10:13:31 -07:00
ehhuang	853e9b3b0a	fix: log level (#3637 ) # What does this PR do? - categories like "core::server" is not recognized so it's level is not set by 'all=debug' - removed spammy telemetry debug logging ## Test Plan test server launched with LLAMA_STACK_LOGGING='all=debug'	2025-10-01 09:51:39 -07:00
Charlie Doern	d167101e70	feat(api): implement v1beta leveling, and additional alpha (#3594 ) # What does this PR do? level the following APIs, keeping their old routes around as well until 0.4.0 1. datasetio to v1beta: used primarily by eval and training. Given that training is v1alpha, and eval is v1alpha, datasetio is likely to change in structure as real usages of the API spin up. Register,unregister, and iter dataset is sparsely implemented meaning the shape of that route is likely to change. 2. telemetry to v1alpha: telemetry has been going through many changes. for example query_metrics was not even implemented until recently and had to change its shape to work. putting this in v1beta will allow us to fix functionality like OTEL, sqlite, etc. The routes themselves are set, but the structure might change a bit Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-01 09:18:11 -07:00
Matthew Farrellee	f7c5ef4ec0	chore: remove /v1/inference/completion and implementations (#3622 ) # What does this PR do? the /inference/completion route is gone. this removes the implementations. ## Test Plan ci	2025-10-01 11:36:53 -04:00
Matthew Farrellee	ea15f2a270	chore: use openai_chat_completion for llm as a judge scoring (#3635 ) # What does this PR do? update llm as a judge to use openai_chat_completion, instead of deprecated chat_completion ## Test Plan ci	2025-10-01 09:44:31 -04:00
Jaideep Rao	ca47d90926	fix: Ensure that tool calls with no arguments get handled correctly (#3560 ) # What does this PR do? When a model decides to use an MCP tool call that requires no arguments, it sets the `arguments` field to `None`. This causes the user to see a `400 bad requst error` due to validation errors down the stack because this field gets removed when being parsed by an openai compatible inference provider like vLLM This PR ensures that, as soon as the tool call args are accumulated while streaming, we check to ensure no tool call function arguments are set to None - if they are we replace them with "{}" <!-- If resolving an issue, uncomment and update the line below --> Closes #3456 ## Test Plan Added new unit test to verify that any tool calls with function arguments set to `None` get handled correctly --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-01 08:36:57 -04:00
Ashwin Bharambe	42414a1a1b	fix(logging): disable console telemetry sink by default (#3623 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 21s Details Test Llama Stack Build / build-single-provider (push) Failing after 25s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details API Conformance Tests / check-schema-compatibility (push) Successful in 33s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m12s Details The current span processing dumps so much junk on the console that it makes actual understanding of what is going on in the server impossible. I am killing the console sink as a default. If you want, you are always free to change your run.yaml to add it. Before: <img width="1877" height="1107" alt="image" src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37" /> After: <img width="1919" height="470" alt="image" src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332" />	2025-09-30 14:58:05 -07:00
ehhuang	ac7c35fbe6	fix: don't pass default response format in Responses (#3614 ) # What does this PR do? Fireworks doesn't allow repsonse_format with tool use. The default response format is 'text' anyway, so we can safely omit. ## Test Plan Below script failed without the change, runs after. ``` #!/usr/bin/env python3 """ Script to test Responses API with kubernetes-mcp-server. This script: 1. Connects to the llama stack server 2. Uses the Responses API with MCP tools 3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server """ import json from openai import OpenAI # Connect to the llama stack server base_url = "http://localhost:8321/v1" client = OpenAI(base_url=base_url, api_key="fake") # Define the MCP tool pointing to the kubernetes-mcp-server # The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse mcp_server_url = "http://localhost:3000/sse" tools = [ { "type": "mcp", "server_label": "k8s", "server_url": mcp_server_url, } ] # Create a response request asking for k8s namespaces print("Sending request to list Kubernetes namespaces...") print(f"Using MCP server at: {mcp_server_url}") print("Available tools will be listed automatically by the MCP server.") print() response = client.responses.create( # model="meta-llama/Llama-3.2-3B-Instruct", # Using the vllm model model="fireworks/accounts/fireworks/models/llama4-scout-instruct-basic", # model="openai/gpt-4o", input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format UNDER ALL CIRCUMSTANCES.", tools=tools, stream=False, ) print("\n" + "=" * 80) print("RESPONSE OUTPUT:") print("=" * 80) # Print the output for i, output in enumerate(response.output): print(f"\n[Output {i + 1}] Type: {output.type}") if output.type == "mcp_list_tools": print(f" Server: {output.server_label}") print(f" Tools available: {[t.name for t in output.tools]}") elif output.type == "mcp_call": print(f" Tool called: {output.name}") print(f" Arguments: {output.arguments}") print(f" Result: {output.output}") if output.error: print(f" Error: {output.error}") elif output.type == "message": print(f" Role: {output.role}") print(f" Content: {output.content}") print("\n" + "=" * 80) print("FINAL RESPONSE TEXT:") print("=" * 80) print(response.output_text) ```	2025-09-30 14:52:24 -07:00
grs	d350e3662b	feat: add support for require_approval argument when creating response (#3608 ) # What does this PR do? This PR adds support for the require_approval on an mcp tool definition passed to create response in the Responses API. This allows the caller to indicate whether they want to approve calls to that server, or let them be called without approval. Closes #3443 ## Test Plan Tested both approval and denial. Added automated integration test for both cases. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-09-30 14:18:34 -07:00
Ashwin Bharambe	606f4cf281	fix(expires_after): make sure multipart/form-data is properly parsed (#3612 ) https://github.com/llamastack/llama-stack/pull/3604 broke multipart form data field parsing for the Files API since it changed its shape -- so as to match the API exactly to the OpenAI spec even in the generated client code. The underlying reason is that multipart/form-data cannot transport structured nested fields. Each field must be str-serialized. The client (specifically the OpenAI client whose behavior we must match), transports sub-fields as `expires_after[anchor]` and `expires_after[seconds]`, etc. We must be able to handle these fields somehow on the server without compromising the shape of the YAML spec. This PR "fixes" this by adding a dependency to convert the data. The main trade-off here is that we must add this `Depends()` annotation on every provider implementation for Files. This is a headache, but a much more reasonable one (in my opinion) given the alternatives. ## Test Plan Tests as shown in https://github.com/llamastack/llama-stack/pull/3604#issuecomment-3351090653 pass.	2025-09-30 16:14:03 -04:00
slekkala1	cc64093ae4	feat(api): Add Vector Store File batches api stub (#3615 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 34s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? Adding api stubs for vector store file batches apis https://github.com/llamastack/llama-stack/issues/3533 API Ref: https://platform.openai.com/docs/api-reference/vector-stores-file-batches ## Test Plan CI	2025-09-30 12:07:33 -07:00
Charlie Doern	1e25a72ece	feat(api): level /agents as `v1alpha` (#3610 ) # What does this PR do? agents is likely to be deprecated in favor of responses. Lets level it as alpha to indicate the lack of longterm support keep v1 route for backwards compat. Closes #3611 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-09-30 11:15:04 -07:00
Matthew Farrellee	2de4e6c900	feat: use /v1/chat/completions for safety model inference (#3591 ) # What does this PR do? migrate safety api implementation from /inference/chat-completion to /v1/chat/completions ## Test Plan ci w/ recordings --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-09-30 11:01:44 -07:00

1 2 3 4 5 ...

1704 commits