fix: nested claims mapping in OAuth2 token validation
The get_attributes_from_claims function was only checking for top-level
claim keys, causing token validation to fail when using nested claims
like "resource_access.llamastack.roles" (common in Keycloak JWT tokens).
Updated the function to support dot notation for traversing nested claim
structures. Give precedence to dot notation over literal keys with dots
in claims mapping.
Added test coverage.
Closes: #3812
Signed-off-by: Derek Higgins <derekh@redhat.com>
Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.41
to 2.0.44.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/sqlalchemy/sqlalchemy/releases">sqlalchemy's
releases</a>.</em></p>
<blockquote>
<h1>2.0.44</h1>
<p>Released: October 10, 2025</p>
<h2>platform</h2>
<ul>
<li><strong>[platform] [bug]</strong> Unblocked automatic greenlet
installation for Python 3.14 now that
there are greenlet wheels on pypi for python 3.14.</li>
</ul>
<h2>orm</h2>
<ul>
<li>
<p><strong>[orm] [usecase]</strong> The way ORM Annotated Declarative
interprets Python <a href="https://peps.python.org/pep-0695">PEP 695</a>
type aliases
in <code>Mapped[]</code> annotations has been refined to expand the
lookup scheme. A
<a href="https://peps.python.org/pep-0695">PEP 695</a> type can now be
resolved based on either its direct presence in
<code>_orm.registry.type_annotation_map</code> or its immediate resolved
value, as long as a recursive lookup across multiple <a
href="https://peps.python.org/pep-0695">PEP 695</a> types is
not required for it to resolve. This change reverses part of the
restrictions introduced in 2.0.37 as part of <a
href="https://www.sqlalchemy.org/trac/ticket/11955">#11955</a>, which
deprecated (and disallowed in 2.1) the ability to resolve any <a
href="https://peps.python.org/pep-0695">PEP 695</a>
type that was not explicitly present in
<code>_orm.registry.type_annotation_map</code>. Recursive lookups of
<a href="https://peps.python.org/pep-0695">PEP 695</a> types remains
deprecated in 2.0 and disallowed in version 2.1,
as do implicit lookups of <code>NewType</code> types without an entry in
<code>_orm.registry.type_annotation_map</code>.</p>
<p>Additionally, new support has been added for generic <a
href="https://peps.python.org/pep-0695">PEP 695</a> aliases that
refer to <a href="https://peps.python.org/pep-0593">PEP 593</a>
<code>Annotated</code> constructs containing
<code>_orm.mapped_column()</code> configurations. See the sections below
for
examples.</p>
<p>References: <a
href="https://www.sqlalchemy.org/trac/ticket/12829">#12829</a></p>
</li>
<li>
<p><strong>[orm] [bug]</strong> Fixed a caching issue where
<code>_orm.with_loader_criteria()</code> would
incorrectly reuse cached bound parameter values when used with
<code>_sql.CompoundSelect</code> constructs such as
<code>_sql.union()</code>. The
issue was caused by the cache key for compound selects not including the
execution options that are part of the <code>_sql.Executable</code> base
class,
which <code>_orm.with_loader_criteria()</code> uses to apply its
criteria
dynamically. The fix ensures that compound selects and other executable
constructs properly include execution options in their cache key
traversal.</p>
<p>References: <a
href="https://www.sqlalchemy.org/trac/ticket/12905">#12905</a></p>
</li>
</ul>
<h2>engine</h2>
<ul>
<li><strong>[engine] [bug]</strong> Implemented initial support for
free-threaded Python by adding new tests
and reworking the test harness to include Python 3.13t and Python 3.14t
in</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/sqlalchemy/sqlalchemy/commits">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
removes error:
ConnectionError: HTTPConnectionPool(host='localhost', port=4318): Max
retries exceeded with url: /v1/traces
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object
at 0x10fd98e60>: Failed to establish a
new connection: [Errno 61] Connection refused'))
## Test Plan
uv run llama stack run starter
curl http://localhost:8321/v1/models
observe no error in server logs
# What does this PR do?
relates to #2878
We introduce a Containerfile which is used to replaced the `llama stack
build` command (removal in a separate PR).
```
llama stack build --distro starter --image-type venv --run
```
is replaced by
```
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```
- See the updated workflow files for e2e workflow.
## Test Plan
CI
```
❯ docker build . -f docker/Dockerfile --build-arg DISTRO_NAME=starter --build-arg INSTALL_MODE=editable --tag test_starter
❯ docker run -p 8321:8321 test_starter
❯ curl http://localhost:8321/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3839).
* #3855
* __->__ #3839
# What does this PR do?
the sidebar currently has an extra `ii. Run the Script` because its
incorrectly put into the doc as an H3 not an H4 (like the other ones)
<img width="239" height="218" alt="Screenshot 2025-10-20 at 1 04 54 PM"
src="https://github.com/user-attachments/assets/eb8cb26e-7ea9-4b61-9101-d64965b39647"
/>
Fix this which will update the sidebar
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
- Fix examples in the NVIDIA inference documentation to align with
current API requirements.
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
N/A
Bumps [jest](https://github.com/jestjs/jest/tree/HEAD/packages/jest) and
[@types/jest](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/jest).
These dependencies needed to be updated together.
Updates `jest` from 29.7.0 to 30.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest's
releases</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore & Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li>`[jest-snapshot-utils] Fix deprecated goo.gl snapshot guide link not
getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<h1>Chore</h1>
<ul>
<li><code>[expect]</code> Update docblock for <code>toContain()</code>
to display info on substring check (<a
href="https://redirect.github.com/jestjs/jest/pull/15789">#15789</a>)</li>
</ul>
<h2>30.0.2</h2>
<h2>What's Changed</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest's
changelog</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore & Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Fix infinite recursion with
self-referential getters in <code>deepCyclicCopyReplaceable</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15831">#15831</a>)</li>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
<h2>30.1.2</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Correct snapshot header regexp to
work with newline across OSes (<a
href="https://redirect.github.com/jestjs/jest/pull/15803">#15803</a>)</li>
</ul>
<h2>30.1.1</h2>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
warning not handling Windows end-of-line sequences (<a
href="https://redirect.github.com/jestjs/jest/pull/15800">#15800</a>)</li>
<li><code>[jest-snapshot-utils]</code> Improve messaging about goo.gl
snapshot link change (<a
href="https://redirect.github.com/jestjs/jest/pull/15821">#15821</a>)</li>
</ul>
<h2>30.1.0</h2>
<h2>Features</h2>
<ul>
<li><code>[jest-leak-detector]</code> Configurable GC aggressiveness
regarding to V8 heap snapshot generation (<a
href="https://redirect.github.com/jestjs/jest/pull/15793/">#15793</a>)</li>
<li><code>[jest-runtime]</code> Reduce redundant ReferenceError
messages</li>
<li><code>[jest-core]</code> Include test modules that failed to load
when --onlyFailures is active</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-snapshot-utils]</code> Fix deprecated goo.gl snapshot
guide link not getting replaced with fully canonical URL (<a
href="https://redirect.github.com/jestjs/jest/pull/15787">#15787</a>)</li>
<li><code>[jest-circus]</code> Fix <code>it.concurrent</code> not
working with <code>describe.skip</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15765">#15765</a>)</li>
<li><code>[jest-snapshot]</code> Fix mangled inline snapshot updates
when used with Prettier 3 and CRLF line endings</li>
<li><code>[jest-runtime]</code> Importing from
<code>@jest/globals</code> in more than one file no longer breaks
relative paths (<a
href="https://redirect.github.com/jestjs/jest/issues/15772">#15772</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="855864e3f9"><code>855864e</code></a>
v30.2.0</li>
<li><a
href="da9b532f04"><code>da9b532</code></a>
v30.1.3</li>
<li><a
href="ebfa31cc97"><code>ebfa31c</code></a>
v30.1.2</li>
<li><a
href="d347c0f3f8"><code>d347c0f</code></a>
v30.1.1</li>
<li><a
href="4d5f41d088"><code>4d5f41d</code></a>
v30.1.0</li>
<li><a
href="22236cf58b"><code>22236cf</code></a>
v30.0.5</li>
<li><a
href="f4296d2bc8"><code>f4296d2</code></a>
v30.0.4</li>
<li><a
href="d4a6c94daf"><code>d4a6c94</code></a>
v30.0.3</li>
<li><a
href="393acbfac3"><code>393acbf</code></a>
v30.0.2</li>
<li><a
href="5ce865b406"><code>5ce865b</code></a>
v30.0.1</li>
<li>Additional commits viewable in <a
href="https://github.com/jestjs/jest/commits/v30.2.0/packages/jest">compare
view</a></li>
</ul>
</details>
<br />
Updates `@types/jest` from 29.5.14 to 30.0.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/jest">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps
[jest-environment-jsdom](https://github.com/jestjs/jest/tree/HEAD/packages/jest-environment-jsdom)
from 30.1.2 to 30.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/releases">jest-environment-jsdom's
releases</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore & Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jestjs/jest/blob/main/CHANGELOG.md">jest-environment-jsdom's
changelog</a>.</em></p>
<blockquote>
<h2>30.2.0</h2>
<h3>Chore & Maintenance</h3>
<ul>
<li><code>[*]</code> Update example repo for testing React Native
projects (<a
href="https://redirect.github.com/jestjs/jest/pull/15832">#15832</a>)</li>
<li><code>[*]</code> Update <code>jest-watch-typeahead</code> to v3 (<a
href="https://redirect.github.com/jestjs/jest/pull/15830">#15830</a>)</li>
</ul>
<h2>Features</h2>
<ul>
<li><code>[jest-environment-jsdom-abstract]</code> Add support for JSDOM
v27 (<a
href="https://redirect.github.com/jestjs/jest/pull/15834">#15834</a>)</li>
</ul>
<h3>Fixes</h3>
<ul>
<li><code>[jest-matcher-utils]</code> Fix infinite recursion with
self-referential getters in <code>deepCyclicCopyReplaceable</code> (<a
href="https://redirect.github.com/jestjs/jest/pull/15831">#15831</a>)</li>
<li><code>[babel-jest]</code> Export the <code>TransformerConfig</code>
interface (<a
href="https://redirect.github.com/jestjs/jest/pull/15820">#15820</a>)</li>
<li><code>[jest-config]</code> Fix <code>jest.config.ts</code> with TS
loader specified in docblock pragma (<a
href="https://redirect.github.com/jestjs/jest/pull/15839">#15839</a>)</li>
</ul>
<h2>30.1.3</h2>
<h3>Fixes</h3>
<ul>
<li>Fix <code>unstable_mockModule</code> with <code>node:</code>
prefixed core modules.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="855864e3f9"><code>855864e</code></a>
v30.2.0</li>
<li>See full diff in <a
href="https://github.com/jestjs/jest/commits/v30.2.0/packages/jest-environment-jsdom">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
## Test Plan
.venv ❯ sh ./scripts/install.sh
⚠️ Found existing container(s) for 'ollama-server', removing...
⚠️ Found existing container(s) for 'llama-stack', removing...
⚠️ Found existing container(s) for 'jaeger', removing...
⚠️ Found existing container(s) for 'otel-collector', removing...
⚠️ Found existing container(s) for 'prometheus', removing...
⚠️ Found existing container(s) for 'grafana', removing...
📡 Starting telemetry stack...
🦙 Starting Ollama...
⏳ Waiting for Ollama daemon...
📦 Ensuring model is pulled: llama3.2:3b...
🦙 Starting Llama Stack...
⏳ Waiting for Llama Stack API...
..
🎉 Llama Stack is ready!
👉 API endpoint: http://localhost:8321📖 Documentation:
https://llamastack.github.io/latest/references/api_reference/index.html💻 To access the llama stack CLI, exec into the container:
docker exec -ti llama-stack bash
📡 Telemetry dashboards:
Jaeger UI: http://localhost:16686
Prometheus UI: http://localhost:9090
Grafana UI: http://localhost:3000 (admin/admin)
OTEL Collector: http://localhost:4318🐛 Report an issue @ https://github.com/llamastack/llama-stack/issues if
you think it's a bug
# What does this PR do?
Adds a test and a standardized way to build future tests out for
telemetry in llama stack.
Contributes to https://github.com/llamastack/llama-stack/issues/3806
## Test Plan
This is the test plan 😎
In replay mode, inference is instantenous. We don't need to wait 15
seconds for the batch to be done. Fixing polling to do exp backoff makes
things work super fast.
# What does this PR do?
remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc
since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.
This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.
so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.
relates to #3806
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
When stack config is set to server in docker
STACK_CONFIG_ARG=--stack-config=http://localhost:8321, the env variable
was not getting correctly set and test id not set, causing
This is needed for test-and-cut to work
E openai.BadRequestError: Error code: 400 - {'detail': 'Invalid value:
Test ID is required for file ID allocation'}
5286461406
## Test Plan
CI
# What does this PR do?
Adds a subpage of the OpenAI compatibility page in the documentation.
This subpage documents known limitations of the Responses API.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3575
---------
Signed-off-by: Bill Murdock <bmurdock@redhat.com>
As indicated in the title. Our `starter` distribution enables all remote
providers _very intentionally_ because we believe it creates an easier,
more welcoming experience to new folks using the software. If we do
that, and then slam the logs with errors making them question their life
choices, it is not so good :)
Note that this fix is limited in scope. If you ever try to actually
instantiate the OpenAI client from a code path without an API key being
present, you deserve to fail hard.
## Test Plan
Run `llama stack run starter` with `OPENAI_API_KEY` set. No more wall of
text, just one message saying "listed 96 models".
a bunch of logger.info()s are good for server code to help debug in
production, but we don't want them killing our unit test output :)
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
**!!BREAKING CHANGE!!**
The lookup is also straightforward -- we always look for this identifier
and don't try to find a match for something without the provider_id
prefix.
Note that, this ideally means we need to update the `register_model()`
API also (we should kill "identifier" from there) but I am not doing
that as part of this PR.
## Test Plan
Existing unit tests
Wanted to re-enable Responses CI but it seems to hang for some reason
due to some interactions with conversations_store or responses_store.
## Test Plan
```
# library client
./scripts/integration-tests.sh --stack-config ci-tests --suite responses
# server
./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses
```
# What does this PR do?
Have closed the previous PR due to merge conflicts with multiple PRs
Addressed all comments from
https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying
over to this one)
## Test Plan
Added UTs and integration tests
Handle a base case when no stored messages exist because no Response
call has been made.
## Test Plan
```
./scripts/integration-tests.sh --stack-config server:ci-tests \
--suite responses --inference-mode record-if-missing --pattern test_conversation_responses
```
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.
Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError.
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.
# What does this PR do?
Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing
`document_id` fields. The API
was failing even though `document_id` is optional according to the
schema.
Closes#3494
## Test Plan
**Before fix:**
- POST to `/v1/vector-io/insert` with chunks → 500 KeyError
- Happened regardless of where `document_id` was placed
**After fix:**
- Same request works fine → 200 OK
- Tested with Postman using FAISS backend
- Added unit test covering missing `document_id` scenarios
This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:
- it creates a streaming output item for the final assistant message
output by
the model. until now we only added content parts and included that
message in the final response.
- rewrites the conversation update code completely to account for items
other than messages (tool calls, outputs, etc.)
## Test Plan
Used the test script from
https://github.com/llamastack/llama-stack-client-python/pull/281 for
this
```
TEST_API_BASE_URL=http://localhost:8321/v1 \
pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs
```
# Add support for Google Gemini `gemini-embedding-001` embedding model
and correctly registers model type
MR message created with the assistance of Claude-4.5-sonnet
This resolves https://github.com/llamastack/llama-stack/issues/3755
## What does this PR do?
This PR adds support for the `gemini-embedding-001` Google embedding
model to the llama-stack Gemini provider. This model provides
high-dimensional embeddings (3072 dimensions) compared to the existing
`text-embedding-004` model (768 dimensions). Old embeddings models (such
as text-embedding-004) will be deprecated soon according to Google
([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/))
## Problem
The Gemini provider only supported the `text-embedding-004` embedding
model. The newer `gemini-embedding-001` model, which provides
higher-dimensional embeddings for improved semantic representation, was
not available through llama-stack.
## Solution
This PR consists of three commits that implement, fix the model
registration, and enable embedding generation:
### Commit 1: Initial addition of gemini-embedding-001
Added metadata for `gemini-embedding-001` to the
`embedding_model_metadata` dictionary:
```python
embedding_model_metadata: dict[str, dict[str, int]] = {
"text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
"gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # NEW
}
```
**Issue discovered:** The model was not being registered correctly
because the dictionary keys didn't match the model IDs returned by
Gemini's API.
### Commit 2: Fix model ID matching with `models/` prefix
Updated both dictionary keys to include the `models/` prefix to match
Gemini's OpenAI-compatible API response format:
```python
embedding_model_metadata: dict[str, dict[str, int]] = {
"models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048}, # UPDATED
"models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # UPDATED
}
```
**Root cause:** Gemini's OpenAI-compatible API returns model IDs with
the `models/` prefix (e.g., `models/text-embedding-004`). The
`OpenAIMixin.list_models()` method directly matches these IDs against
the `embedding_model_metadata` dictionary keys. Without the prefix, the
models were being registered as LLMs instead of embedding models.
### Commit 3: Fix embedding generation for providers without usage stats
Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented
embedding generation for providers (like Gemini) that don't return usage
statistics:
```python
# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
prompt_tokens=response.usage.prompt_tokens, # ← Crashed with AttributeError
total_tokens=response.usage.total_tokens,
)
# After (Lines 351-362):
if response.usage:
usage = OpenAIEmbeddingUsage(
prompt_tokens=response.usage.prompt_tokens,
total_tokens=response.usage.total_tokens,
)
else:
usage = OpenAIEmbeddingUsage(
prompt_tokens=0, # Default when not provided
total_tokens=0, # Default when not provided
)
```
**Impact:** This fix enables embedding generation for **all** Gemini
embedding models, not just the newly added one.
## Changes
### Modified Files
**`llama_stack/providers/remote/inference/gemini/gemini.py`**
- Line 17: Updated `text-embedding-004` key to
`models/text-embedding-004`
- Line 18: Added `models/gemini-embedding-001` with correct metadata
**`llama_stack/providers/utils/inference/openai_mixin.py`**
- Lines 351-362: Added null check for `response.usage` to handle
providers without usage statistics
## Key Technical Details
### Model ID Matching Flow
1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint
2. API returns model IDs like: `models/text-embedding-004`,
`models/gemini-embedding-001`
3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata :=
self.embedding_model_metadata.get(provider_model_id)`
4. If matched, registers as `model_type: "embedding"` with metadata;
otherwise registers as `model_type: "llm"`
### Why Both Keys Needed the Prefix
The `text-embedding-004` model was already working because there was
likely separate configuration or manual registration handling it. For
auto-discovery to work correctly for **both** models, both keys must
match the API's model ID format exactly.
## How to test this PR
Verified the changes by:
1. **Model Auto-Discovery**: Started llama-stack server and confirmed
models are auto-discovered from Gemini API
2. **Model Registration**: Confirmed both embedding models are correctly
registered and visible
```bash
curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'
```
**Results:**
- ✅ `gemini/models/text-embedding-004` - 768 dimensions - `model_type:
"embedding"`
- ✅ `gemini/models/gemini-embedding-001` - 3072 dimensions -
`model_type: "embedding"`
3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"`
without embedding metadata
4. **After Fix (Commit 2)**: Models correctly identified as `model_type:
"embedding"` with proper metadata
5. **Generate Embeddings**: Verified embedding generation works
```bash
curl -X POST http://localhost:8325/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
jq '.data[0].embedding | length'
```
# What does this PR do?
Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
- Unit tests
- Integration tests
- Simple example below:
Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:
```python
vs = client.vector_stores.create(
extra_body={
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dimension": 384,
}
)
```
The `extra_body` is now unnecessary.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Previously, the NVIDIA inference provider implemented a custom
`openai_embeddings` method with a hardcoded `input_type="query"`
parameter, which is required by NVIDIA asymmetric embedding
models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)).
Recently `extra_body` parameter is added to the embeddings API
([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)).
So, this PR updates the NVIDIA inference provider to use the base
`OpenAIMixin.openai_embeddings` method instead and pass the `input_type`
through the `extra_body` parameter for asymmetric embedding models.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run the following command for the ```embedding_model```:
```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```,
```nvidia/nv-embedqa-mistral-7b-v2```, and
```snowflake/arctic-embed-l```.
```
pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record
```
# What does this PR do?
As discussed on discord, we do not need to reinvent the wheel for
telemetry. Instead we'll lean into the canonical OTEL stack.
Logs/traces/metrics will still be sent via OTEL - they just won't be
stored on, queried through Stack.
This is the first of many PRs to remove telemetry API from Stack.
1) removed webmethod decorators to remove from API spec
2) removed tests as @iamemilio is adding them on otel directly.
## Test Plan
# What does this PR do?
Updates CONTRIBUTING.md with the following changes:
- Use Python 3.12 (and why)
- Use pre-commit==4.3.0
- Recommend using -v with pre-commit to get detailed info about why it
is failing if it fails.
- Instructs users to go to the docs/ directory before rebuilding the
docs (it doesn't work unless you do that).
Signed-off-by: Bill Murdock <bmurdock@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to replace the Llama Stack's default embedding
model by nomic-embed-text-v1.5.
These are the key reasons why Llama Stack community decided to switch
from all-MiniLM-L6-v2 to nomic-embed-text-v1.5:
1. The training data for
[all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data)
includes a lot of data sets with various licensing terms, so it is
tricky to know when/whether it is appropriate to use this model for
commercial applications.
2. The model is not particularly competitive on major benchmarks. For
example, if you look at the [MTEB
Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click
on Miscellaneous/BEIR to see English information retrieval accuracy, you
see that the top of the leaderboard is dominated by enormous models but
also that there are many, many models of relatively modest size whith
much higher Retrieval scores. If you want to look closely at the data, I
recommend clicking "Download Table" because it is easier to browse that
way.
More discussion info can be founded
[here](https://github.com/llamastack/llama-stack/issues/2418)
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#2418
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. Run `./scripts/unit-tests.sh`
2. Integration tests via CI wokrflow
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>