mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-17 07:42:36 +00:00
3241 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
700663028f
|
feat: convert Datasets API to use FastAPI router (#4359)
# What does this PR do? Convert the Datasets API from webmethod decorators to FastAPI router pattern. Fixes: https://github.com/llamastack/llama-stack/issues/4344 ## Test Plan CI Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
56f946f3f5
|
feat: add support for tool_choice to responses api (#4106)
# What does this PR do? Adds support for enforcing tool usage via responses api. See https://platform.openai.com/docs/api-reference/responses/create#responses_create-tool_choice for details from official documentation. Note: at present this PR only supports `file_search` and `web_search` as options to enforce builtin tool usage <!-- If resolving an issue, uncomment and update the line below --> Closes #3548 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> `./scripts/unit-tests.sh tests/unit/providers/agents/meta_reference/test_response_tool_context.py ` --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> |
||
|
|
62005dc1a9
|
feat: Making static prompt values in Rag/File Search configurable in Vector Store Config (#4368)
# What does this PR do?
- Enables users to configure prompts used throughout the File Search /
Vector Retrieval
- Configuration is defined in the Vector Stores Config so they can be
modified at runtime
- Backwards compatible, which means the fields are optional and default
to the previously used values
This is the summary of the new options in the `run.yaml`
```yaml
vector_stores:
file_search_params:
header_template: 'knowledge_search tool found {num_chunks} chunks:\nBEGIN of knowledge_search tool results.\n'
footer_template: 'END of knowledge_search tool results.\n'
context_prompt_params:
chunk_annotation_template: 'Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n'
context_template: 'The above results were retrieved to help answer the user\'s query: "{query}". Use them as supporting information only in answering this query.{annotation_instruction}\n'
annotation_prompt_params:
enable_annotations: true
annotation_instruction_template: 'Cite sources immediately at the end of sentences before punctuation, using `<|file-id|>` format like \'This is a fact <|file-Cn3MSNn72ENTiiq11Qda4A|>.\'. Do not add
extra punctuation. Use only the file IDs provided, do not invent new ones.'
chunk_annotation_template: '[{index}] {metadata_text} cite as <|{file_id}|>\n{chunk_text}\n'
```
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Added tests.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
|
||
|
|
4043dedeea
|
fix: correctly unwrap provider data api_key from secret string (#4380)
# What does this PR do? Fix provider header API key handling by correctly unwrapping `SecretStr` values for provider data API keys. Previously the validator cast header keys to `SecretStr` but the value wasn’t unwrapped before use, causing authentication failures with providers like Azure. Closes https://github.com/llamastack/llama-stack/issues/4370 |
||
|
|
2b85600a7e
|
docs: make inference model configurable (#4385)
Allow users to specify the inference model through the INFERENCE_MODEL environment variable instead of hardcoding it, with fallback to ollama/llama3.2:3b if not set. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> |
||
|
|
62f7818051
|
chore(github-deps): bump astral-sh/setup-uv from 7.1.4 to 7.1.6 (#4386)
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 7.1.4 to 7.1.6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v7.1.6 🌈 add OS version to cache key to prevent binary incompatibility</h2> <h2>Changes</h2> <p>This release will invalidate your cache existing keys!</p> <p>The os version e.g. <code>ubuntu-22.04</code> is now part of the cache key. This prevents failing builds when a cache got populated with wheels built with different tools (e.g. glibc) than are present on the runner where the cache got restored.</p> <h2>🐛 Bug fixes</h2> <ul> <li>feat: add OS version to cache key to prevent binary incompatibility <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/716">#716</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known checksums for 0.9.17 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/714">#714</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump actions/checkout from 5.0.0 to 6.0.1 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/712">#712</a>)</li> <li>Bump actions/setup-node from 6.0.0 to 6.1.0 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/715">#715</a>)</li> </ul> <h2>v7.1.5 🌈 allow setting <code>cache-local-path</code> without <code>enable-cache: true</code></h2> <h2>Changes</h2> <p><a href="https://redirect.github.com/astral-sh/setup-uv/pull/612">astral-sh/setup-uv#612</a> fixed a faulty behavior where this action set <code>UV_CACHE_DIR</code> even though <code>enable-cache</code> was <code>false</code>. It also fixed the cases were the cache dir is already configured in a settings file like <code>pyproject.toml</code> or <code>UV_CACHE_DIR</code> was already set. Here the action shouldn't overwrite or set <code>UV_CACHE_DIR</code>.</p> <p>These fixes introduced an unwanted behavior: You can still set <code>cache-local-path</code> but this action didn't do anything. This release fixes that.</p> <p>You can now use <code>cache-local-path</code> to automatically set <code>UV_CACHE_DIR</code> even when <code>enable-cache</code> is <code>false</code> (or gets set to false by default e.g. on self-hosted runners)</p> <pre lang="yaml"><code>- name: This is now possible uses: astral-sh/setup-uv@v7 with: enable-cache: false cache-local-path: "/path/to/cache" </code></pre> <h2>🐛 Bug fixes</h2> <ul> <li>allow cache-local-path w/o enable-cache <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/707">#707</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>set biome files.maxSize to 2MiB <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/708">#708</a>)</li> <li>chore: update known checksums for 0.9.16 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/706">#706</a>)</li> <li>chore: update known checksums for 0.9.15 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/704">#704</a>)</li> <li>chore: use <code>npm ci --ignore-scripts</code> everywhere <a href="https://github.com/woodruffw"><code>@woodruffw</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/699">#699</a>)</li> <li>chore: update known checksums for 0.9.14 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/700">#700</a>)</li> <li>chore: update known checksums for 0.9.13 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/694">#694</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
9b346625bc
|
chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.7.1 to 1.8.1 (#4387)
Bumps [stainless-api/upload-openapi-spec-action](https://github.com/stainless-api/upload-openapi-spec-action) from 1.7.1 to 1.8.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/stainless-api/upload-openapi-spec-action/releases">stainless-api/upload-openapi-spec-action's releases</a>.</em></p> <blockquote> <h2>v1.8.1</h2> <h2><a href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.8.0...v1.8.1">1.8.1</a> (2025-12-09)</h2> <h3>Bug Fixes</h3> <ul> <li>re-enable 'targets' param in diagnostics call (<a href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/148">#148</a>) (<a href=" |
||
|
|
f4df1a66e0
|
chore(github-deps): bump actions/upload-artifact from 5.0.0 to 6.0.0 (#4388)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5.0.0 to 6.0.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v6.0.0</h2> <h2>v6 - What's new</h2> <blockquote> <p>[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (<code>runs.using: node24</code>) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <h3>Node.js 24</h3> <p>This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.</p> <h2>What's Changed</h2> <ul> <li>Upload Artifact Node 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/719">actions/upload-artifact#719</a></li> <li>fix: update <code>@actions/artifact</code> for Node.js 24 punycode deprecation by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/744">actions/upload-artifact#744</a></li> <li>prepare release v6.0.0 for Node.js 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/745">actions/upload-artifact#745</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0">https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
6efe0a2939
|
chore(github-deps): bump actions/cache from 4.3.0 to 5.0.1 (#4389)
Bumps [actions/cache](https://github.com/actions/cache) from 4.3.0 to 5.0.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/cache/releases">actions/cache's releases</a>.</em></p> <blockquote> <h2>v5.0.1</h2> <blockquote> <p>[!IMPORTANT] <strong><code>actions/cache@v5</code> runs on the Node.js 24 runtime and requires a minimum Actions Runner version of <code>2.327.1</code>.</strong></p> <p>If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <hr /> <h1>v5.0.1</h1> <h2>What's Changed</h2> <ul> <li>fix: update <code>@actions/cache</code> for Node.js 24 punycode deprecation by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1685">actions/cache#1685</a></li> <li>prepare release v5.0.1 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1686">actions/cache#1686</a></li> </ul> <h1>v5.0.0</h1> <h2>What's Changed</h2> <ul> <li>Upgrade to use node24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1630">actions/cache#1630</a></li> <li>Prepare v5.0.0 release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1684">actions/cache#1684</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v5...v5.0.1">https://github.com/actions/cache/compare/v5...v5.0.1</a></p> <h2>v5.0.0</h2> <blockquote> <p>[!IMPORTANT] <strong><code>actions/cache@v5</code> runs on the Node.js 24 runtime and requires a minimum Actions Runner version of <code>2.327.1</code>.</strong></p> <p>If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <hr /> <h2>What's Changed</h2> <ul> <li>Upgrade to use node24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1630">actions/cache#1630</a></li> <li>Prepare v5.0.0 release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1684">actions/cache#1684</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v4.3.0...v5.0.0">https://github.com/actions/cache/compare/v4.3.0...v5.0.0</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's changelog</a>.</em></p> <blockquote> <h1>Releases</h1> <h2>Changelog</h2> <h3>5.0.1</h3> <ul> <li>Update <code>@azure/storage-blob</code> to <code>^12.29.1</code> via <code>@actions/cache@5.0.1</code> <a href="https://redirect.github.com/actions/cache/pull/1685">#1685</a></li> </ul> <h3>5.0.0</h3> <blockquote> <p>[!IMPORTANT] <code>actions/cache@v5</code> runs on the Node.js 24 runtime and requires a minimum Actions Runner version of <code>2.327.1</code>. If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <h3>4.3.0</h3> <ul> <li>Bump <code>@actions/cache</code> to <a href="https://redirect.github.com/actions/toolkit/pull/2132">v4.1.0</a></li> </ul> <h3>4.2.4</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.5</li> </ul> <h3>4.2.3</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.3 (obfuscates SAS token in debug logs for cache entries)</li> </ul> <h3>4.2.2</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.2</li> </ul> <h3>4.2.1</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.1</li> </ul> <h3>4.2.0</h3> <p>TLDR; The cache backend service has been rewritten from the ground up for improved performance and reliability. <a href="https://github.com/actions/cache">actions/cache</a> now integrates with the new cache service (v2) APIs.</p> <p>The new service will gradually roll out as of <strong>February 1st, 2025</strong>. The legacy service will also be sunset on the same date. Changes in these release are <strong>fully backward compatible</strong>.</p> <p><strong>We are deprecating some versions of this action</strong>. We recommend upgrading to version <code>v4</code> or <code>v3</code> as soon as possible before <strong>February 1st, 2025.</strong> (Upgrade instructions below).</p> <p>If you are using pinned SHAs, please use the SHAs of versions <code>v4.2.0</code> or <code>v3.4.0</code></p> <p>If you do not upgrade, all workflow runs using any of the deprecated <a href="https://github.com/actions/cache">actions/cache</a> will fail.</p> <p>Upgrading to the recommended versions will not break your workflows.</p> <h3>4.1.2</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
c574db5f1d
|
fix(inference): AttributeError in streaming response cleanup (#4236)
This PR fixes issue #3185 The code calls `await event_gen.aclose()` but OpenAI's `AsyncStream` doesn't have an `aclose()` method - it has `close()` (which is async). when clients cancel streaming requests, the server tries to clean up with: ```python await event_gen.aclose() # ❌ AsyncStream doesn't have aclose()! ``` But `AsyncStream` has never had a public `aclose()` method. The error message literally tells us: ``` AttributeError: 'AsyncStream' object has no attribute 'aclose'. Did you mean: 'close'? ^^^^^^^^ ``` ## Verification * Reproduction script [`reproduce_issue_3185.sh`](https://gist.github.com/r-bit-rry/dea4f8fbb81c446f5db50ea7abd6379b) can be used to verify the fix. * Manual checks, validation against original OpenAI library code |
||
|
|
dfb9f6743a
|
docs: Adding initial updates to the RAG documentation and examples (#4377)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Python Package Build Test / build (3.12) (push) Successful in 18s
Python Package Build Test / build (3.13) (push) Successful in 22s
Test External API and Providers / test-external (venv) (push) Failing after 37s
Vector IO Integration Tests / test-matrix (push) Failing after 46s
UI Tests / ui-tests (22) (push) Successful in 1m23s
Unit Tests / unit-tests (3.12) (push) Failing after 1m48s
Unit Tests / unit-tests (3.13) (push) Failing after 1m50s
Pre-commit / pre-commit (22) (push) Successful in 3m31s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m20s
# What does this PR do? This PR updates the RAG examples included in docs/quick_start.ipynb, docs/getting_started/demo_script.py, rag.mdx and index.md to remove references to the deprecated vector_io and vector_db APIs and to add examples that use /v1/vector_stores with responses and completions. --------- Co-authored-by: Omar Abdelwahab <omara@fb.com> Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com> |
||
|
|
75ef052545
|
docs: Add details on model registration and refresh_models (#4383)
Document the refresh_models configuration option for remote providers that use RemoteInferenceProviderConfig. - Add "Automatic vs Explicit Model Registration" section to resources.mdx - Include examples for registering custom embedding models # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> |
||
|
|
10c878d782
|
feat: added oci-s3 compatibility (#4374)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.12) (push) Successful in 16s
Python Package Build Test / build (3.13) (push) Successful in 17s
Test External API and Providers / test-external (venv) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (push) Failing after 50s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Unit Tests / unit-tests (3.12) (push) Failing after 1m39s
Unit Tests / unit-tests (3.13) (push) Failing after 1m43s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m47s
Pre-commit / pre-commit (22) (push) Successful in 3m42s
# What does this PR do? The PR validates and allow access to OCI object-storage through the S3 compatibility API. Additional documentation for OCI is supplied, in notebook form, as well. ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Co-authored-by: raghotham <rsm@meta.com> |
||
|
|
805abf573f
|
feat!: Implement include parameter specifically for adding logprobs in the output message (#4261)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
Python Package Build Test / build (3.12) (push) Successful in 17s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
UI Tests / ui-tests (22) (push) Successful in 52s
Unit Tests / unit-tests (3.13) (push) Failing after 1m45s
Unit Tests / unit-tests (3.12) (push) Failing after 1m58s
Pre-commit / pre-commit (22) (push) Successful in 3m9s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m5s
# Problem As an Application Developer, I want to use the include parameter with the value message.output_text.logprobs, so that I can receive log probabilities for output tokens to assess the model's confidence in its response. # What does this PR do? - Updates the include parameter in various resource definitions - Updates the inline provider to return logprobs when "message.output_text.logprobs" is passed in the include parameter - Converts the logprobs returned by the inference provider from chat completion format to responses format Closes #[4260](https://github.com/llamastack/llama-stack/issues/4260) ## Test Plan - Created a script to explore OpenAI behavior: https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/include.py - Added integration tests and new recordings --------- Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
|
76e47d811a
|
feat(api): add readonly connectors API (#4258)
# What does this PR do? Adds a new API for connectors and MCP registry support along with required types. Does not include any implementation for it <!-- If resolving an issue, uncomment and update the line below --> Closes #4235 and #4061 (partially) ## Test Plan no tests included --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com> |
||
|
|
470fe55e87
|
fix(inference): respect table_name config in InferenceStore (#4371)
# What does this PR do? The InferenceStore class was ignoring the table_name field from InferenceStoreReference and always using the hardcoded value "chat_completions". This meant that any custom table_name configured in the run config (e.g., "inference_store" in run-with-postgres-store.yaml) was silently ignored. This change updates all SQL operations in InferenceStore to use self.reference.table_name instead of the hardcoded string, ensuring the configured table name is properly respected. A new test has been added to verify that custom table names work correctly for storing, retrieving, and listing chat completions. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan CI Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
7308c8aef1
|
feat: add workflow_dispatch and self-trigger to stainless builds (#4361)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 17s
Test External API and Providers / test-external (venv) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (push) Failing after 48s
UI Tests / ui-tests (22) (push) Successful in 1m36s
Unit Tests / unit-tests (3.13) (push) Failing after 1m43s
Unit Tests / unit-tests (3.12) (push) Failing after 1m54s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m24s
Pre-commit / pre-commit (22) (push) Successful in 4m22s
# What does this PR do? Currently impossible to test workflow changes (pull_request_target uses base branch definition) or manually trigger SDK builds. This adds both capabilities. - Add workflow_dispatch with pr_number input for manual testing - Add workflow file to path triggers for automatic testing - Fetch PR details via gh CLI for manual runs - Update jobs to use computed PR data for both trigger types ## Test Plan impossible to test until it merges unfortunately. I am doing this in a smaller PR so that I can use it immediately in a follow up. Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
95b2948d11
|
feat: Add support for query rewrite in vector_store.search (#4171)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 20s
Test External API and Providers / test-external (venv) (push) Failing after 41s
Vector IO Integration Tests / test-matrix (push) Failing after 49s
UI Tests / ui-tests (22) (push) Successful in 51s
Unit Tests / unit-tests (3.13) (push) Failing after 1m27s
Unit Tests / unit-tests (3.12) (push) Failing after 1m45s
Pre-commit / pre-commit (22) (push) Failing after 2m30s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m22s
# What does this PR do?
Actualize query rewrite in search API, add
`default_query_expansion_model` and `query_expansion_prompt` in
`VectorStoresConfig`.
Makes `rewrite_query` parameter functional in vector store search.
- `rewrite_query=false` (default): Use original query
- `rewrite_query=true`: Expand query via LLM, or fail gracefully if no
LLM available
Adds 4 parameters to`VectorStoresConfig`:
- `default_query_expansion_model`: LLM model for query expansion
(optional)
- `query_expansion_prompt`: Custom prompt template (optional, uses
built-in default)
- `query_expansion_max_tokens`: Configurable token limit (default: 100)
- `query_expansion_temperature`: Configurable temperature (default: 0.3)
Enabled `run.yaml`:
```yaml
vector_stores:
rewrite_query_params:
model:
provider_id: "ollama"
model_id: "llama3.2:3b-instruct-fp16"
# prompt defaults to built-in
# max_tokens defaults to 100
# temperature defaults to 0.3
```
Fully customized `run.yaml`:
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
rewrite_query_params:
model:
provider_id: ollama
model_id: llama3.2:3b-instruct-fp16
prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}"
max_tokens: 100
temperature: 0.3
```
## Test Plan
Added test and recording
Example script as well:
```python
import asyncio
from llama_stack_client import LlamaStackClient
from io import BytesIO
def gen_file(client, text: str=""):
file_buffer = BytesIO(text.encode('utf-8'))
file_buffer.name = "my_file.txt"
uploaded_file = client.files.create(
file=file_buffer,
purpose="assistants"
)
return uploaded_file
async def test_query_rewriting():
client = LlamaStackClient(base_url="http://0.0.0.0:8321/")
uploaded_file = gen_file(client, "banana banana apple")
uploaded_file2 = gen_file(client, "orange orange kiwi")
vs = client.vector_stores.create()
xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)
xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id)
response1 = client.vector_stores.search(
vector_store_id=vs.id,
query="apple",
max_num_results=3,
rewrite_query=False
)
response2 = client.vector_stores.search(
vector_store_id=vs.id,
query="kiwi",
max_num_results=3,
rewrite_query=True,
)
print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m")
print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m")
for f in [uploaded_file.id, uploaded_file2.id]:
client.files.delete(file_id=f)
client.vector_stores.delete(vector_store_id=vs.id)
if __name__ == "__main__":
asyncio.run(test_query_rewriting())
```
And see the screen shot of the server logs showing it worked.
<img width="1111" height="826" alt="Screenshot 2025-11-19 at 1 16 03 PM"
src="https://github.com/user-attachments/assets/2d188b44-1fef-4df5-b465-2d6728ca49ce"
/>
Notice the log:
```bash
Query rewritten:
'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.'
```
So `kiwi` was expanded.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
|
||
|
|
ff375f1abb
|
feat: convert Benchmarks API to use FastAPI router (#4309)
# What does this PR do? Convert the Benchmarks API from @webmethod decorators to FastAPI router pattern, matching the Batches API structure. One notable change is the update of stack.py to handle request models in register_resources(). Closes: #4308 ## Test Plan CI and `curl http://localhost:8321/v1/inspect/routes | jq '.data[] | select(.route | contains("benchmark"))'` --------- Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
661985e240
|
feat: remove usage of build yaml (#4192)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Failing after 3s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Failing after 3s
Test llama stack list-deps / list-deps (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Successful in 19s
Python Package Build Test / build (3.12) (push) Successful in 23s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / show-single-provider (push) Successful in 36s
Test llama stack list-deps / list-deps-from-config (push) Successful in 44s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
Test External API and Providers / test-external (venv) (push) Failing after 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m56s
UI Tests / ui-tests (22) (push) Successful in 2m2s
Unit Tests / unit-tests (3.13) (push) Failing after 2m35s
Pre-commit / pre-commit (22) (push) Successful in 3m16s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 3m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m30s
# What does this PR do? the build.yaml is only used in the following ways: 1. list-deps 2. distribution code-gen since `llama stack build` no longer exists, I found myself asking "why do we need two different files for list-deps and run"? Removing the BuildConfig and altering the usage of the DistributionTemplate in llama stack list-deps is the first step in removing the build yaml entirely. Removing the BuildConfig and build.yaml cuts the files users need to maintain in half, and allows us to focus on the stability of _just_ the run.yaml This PR removes the build.yaml, BuildConfig datatype, and its usage throughout the codebase. Users are now expected to point to run.yaml files when running list-deps, and our codebase automatically uses these types now for things like `get_provider_registry`. **Additionally, two renames: `StackRunConfig` -> `StackConfig` and `run.yaml` -> `config.yaml`.** The build.yaml made sense for when we were managing the build process for the user and actually _producing_ a run.yaml _from_ the build.yaml, but now that we are simply just getting the provider registry and listing the deps, switching to config.yaml simplifies the scope here greatly. ## Test Plan existing list-deps usage should work in the tests. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
17e6912288
|
docs: Fix vector_store_create params (#4364)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 17s
Test External API and Providers / test-external (venv) (push) Failing after 31s
Vector IO Integration Tests / test-matrix (push) Failing after 38s
UI Tests / ui-tests (22) (push) Successful in 44s
Unit Tests / unit-tests (3.12) (push) Failing after 1m30s
Unit Tests / unit-tests (3.13) (push) Failing after 1m29s
Pre-commit / pre-commit (22) (push) Successful in 2m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m38s
|
||
|
|
fcea9893a4
|
feat(UI): Adding Files API to Admin UI (#4319)
# What does this PR do? ## Files Admin Page <img width="1919" height="1238" alt="Screenshot 2025-12-09 at 10 33 06 AM" src="https://github.com/user-attachments/assets/3dd545f0-32bc-45be-af2b-1823800015f2" /> ## Files Upload Modal <img width="1919" height="1287" alt="Screenshot 2025-12-09 at 10 33 38 AM" src="https://github.com/user-attachments/assets/776bb372-75d3-4ccd-b6b5-c9dfb3fcb350" /> ## Files Detail <img width="1918" height="1099" alt="Screenshot 2025-12-09 at 10 34 26 AM" src="https://github.com/user-attachments/assets/f256dbf8-4047-4d79-923d-404161b05f36" /> Note, content preview has some handling for JSON, CSV, and PDF to enable nicer rendering. Pure text rendering is trivial. ### Files Detail File Content Preview (TXT) <img width="1918" height="1341" alt="Screenshot 2025-12-09 at 10 41 20 AM" src="https://github.com/user-attachments/assets/4fa0ddb7-ffff-424b-b764-0bd4af6ed976" /> ### Files Detail File Content Preview (JSON) <img width="1909" height="1233" alt="Screenshot 2025-12-09 at 10 39 57 AM" src="https://github.com/user-attachments/assets/b912f07a-2dff-483b-b73c-2f69dd0d87ad" /> ### Files Detail File Content Preview (HTML) <img width="1916" height="1348" alt="Screenshot 2025-12-09 at 10 40 27 AM" src="https://github.com/user-attachments/assets/17ebec0a-8754-4552-977d-d3c44f7f6973" /> ### Files Detail File Content Preview (CSV) <img width="1919" height="1177" alt="Screenshot 2025-12-09 at 10 34 50 AM" src="https://github.com/user-attachments/assets/20bd0755-1757-4a3a-99d2-fbd072f81f49" /> ### Files Detail File Content Preview (PDF) <img width="1917" height="1154" alt="Screenshot 2025-12-09 at 10 36 48 AM" src="https://github.com/user-attachments/assets/2873e6fe-4da3-4cbd-941b-7d903270b749" /> Closes https://github.com/llamastack/llama-stack/issues/4144 ## Test Plan Added Tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
|
6ad5fb5577
|
feat: Adding OCI Embeddings (#4300)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 30s
UI Tests / ui-tests (22) (push) Successful in 56s
Vector IO Integration Tests / test-matrix (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 1m44s
Unit Tests / unit-tests (3.12) (push) Failing after 1m48s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m17s
Pre-commit / pre-commit (22) (push) Successful in 3m22s
# What does this PR do? Enabling usage of OCI embedding models. ## Test Plan Testing embedding model: `OCI_COMPARTMENT_OCID="" OCI_REGION="us-chicago-1" OCI_AUTH_TYPE=config_file pytest -sv tests/integration/inference/test_openai_embeddings.py --stack-config oci --embedding-model oci/openai.text-embedding-3-small --inference-mode live` Testing chat model: `OCI_COMPARTMENT_OCID="" OCI_REGION="us-chicago-1" OCI_AUTH_TYPE=config_file pytest -sv tests/integration/inference/ --stack-config oci --text-model oci/openai.gpt-4.1-nano-2025-04-14 --inference-mode live` Testing curl for embeddings: `curl -X POST http://localhost:8321/v1/embeddings -H "Content-Type: application/json" -d '{ "model": "oci/openai.text-embedding-3-small", "input": ["First text", "Second text"], "encoding_format": "float" }'` `{"object":"list","data":[{"object":"embedding","embedding":[-0.017190756...0.025272394],"index":1}],"model":"oci/openai.text-embedding-3-small","usage":{"prompt_tokens":4,"total_tokens":4}}` --------- Co-authored-by: Omar Abdelwahab <omaryashraf10@gmail.com> |
||
|
|
d82a2cd6f8
|
fix: httpcore deadlock in CI by properly closing streaming responses (#4335)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.13) (push) Successful in 17s
Python Package Build Test / build (3.12) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 1m13s
Unit Tests / unit-tests (3.12) (push) Failing after 1m37s
Unit Tests / unit-tests (3.13) (push) Failing after 2m11s
Pre-commit / pre-commit (22) (push) Successful in 3m39s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m1s
# What does this PR do? The test_conversation_error_handling test was timing out in CI with a deadlock in httpcore's connection pool. The root cause was the preceding test_conversation_multi_turn_and_streaming test, which broke out of the streaming response iterator early without properly closing the underlying HTTP connection. When a streaming response iterator is abandoned mid-stream, the HTTP connection remains in an incomplete state. Since the openai_client fixture is session-scoped, subsequent tests reuse the same httpcore connection pool. The dangling connection causes the pool's internal lock to deadlock when the next test attempts to acquire a new connection. The fix wraps the streaming response in a context manager, which ensures the connection is properly closed when exiting the with block, even when breaking out of the loop early. This is a best practice when working with streaming HTTP responses that may not be fully consumed. Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
20c11d8fd4
|
chore(github-deps): bump stainless-api/upload-openapi-spec-action from 1.7.0 to 1.7.1 (#4334)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 18s
Python Package Build Test / build (3.12) (push) Successful in 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Test llama stack list-deps / generate-matrix (push) Successful in 33s
Test Llama Stack Build / generate-matrix (push) Successful in 36s
Test llama stack list-deps / show-single-provider (push) Successful in 33s
Python Package Build Test / build (3.13) (push) Successful in 59s
Test llama stack list-deps / list-deps-from-config (push) Successful in 1m8s
Test Llama Stack Build / build-single-provider (push) Successful in 1m12s
Test External API and Providers / test-external (venv) (push) Failing after 1m9s
Vector IO Integration Tests / test-matrix (push) Failing after 1m24s
UI Tests / ui-tests (22) (push) Successful in 1m29s
Test Llama Stack Build / build (push) Successful in 1m0s
Test llama stack list-deps / list-deps (push) Failing after 1m23s
Unit Tests / unit-tests (3.13) (push) Failing after 2m42s
Unit Tests / unit-tests (3.12) (push) Failing after 2m51s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m47s
Pre-commit / pre-commit (22) (push) Successful in 3m55s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 4m43s
Bumps [stainless-api/upload-openapi-spec-action](https://github.com/stainless-api/upload-openapi-spec-action) from 1.7.0 to 1.7.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/stainless-api/upload-openapi-spec-action/releases">stainless-api/upload-openapi-spec-action's releases</a>.</em></p> <blockquote> <h2>v1.7.1</h2> <h2><a href="https://github.com/stainless-api/upload-openapi-spec-action/compare/v1.7.0...v1.7.1">1.7.1</a> (2025-12-01)</h2> <h3>Bug Fixes</h3> <ul> <li>improve getMergeBase to handle shallow clones more robustly (<a href="https://redirect.github.com/stainless-api/upload-openapi-spec-action/issues/138">#138</a>) (<a href=" |
||
|
|
912ab6b4a2
|
chore(github-deps): bump actions/setup-node from 6.0.0 to 6.1.0 (#4333)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 6.0.0 to 6.1.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-node/releases">actions/setup-node's releases</a>.</em></p> <blockquote> <h2>v6.1.0</h2> <h2>What's Changed</h2> <h3>Enhancement:</h3> <ul> <li>Remove always-auth configuration handling by <a href="https://github.com/priyagupta108"><code>@priyagupta108</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1436">actions/setup-node#1436</a></li> </ul> <h3>Dependency updates:</h3> <ul> <li>Upgrade <code>@actions/cache</code> from 4.0.3 to 4.1.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1384">actions/setup-node#1384</a></li> <li>Upgrade actions/checkout from 5 to 6 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1439">actions/setup-node#1439</a></li> <li>Upgrade js-yaml from 3.14.1 to 3.14.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1435">actions/setup-node#1435</a></li> </ul> <h3>Documentation update:</h3> <ul> <li>Add example for restore-only cache in documentation by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1419">actions/setup-node#1419</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-node/compare/v6...v6.1.0">https://github.com/actions/setup-node/compare/v6...v6.1.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
39d23d9894
|
chore(github-deps): bump actions/stale from 10.1.0 to 10.1.1 (#4332)
Bumps [actions/stale](https://github.com/actions/stale) from 10.1.0 to 10.1.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/stale/releases">actions/stale's releases</a>.</em></p> <blockquote> <h2>v10.1.1</h2> <h2>What's Changed</h2> <h3>Bug Fix</h3> <ul> <li>Add Missing Input Reading for <code>only-issue-types</code> by <a href="https://github.com/Bibo-Joshi"><code>@Bibo-Joshi</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1298">actions/stale#1298</a></li> </ul> <h3>Improvement</h3> <ul> <li>Improves error handling when rate limiting is disabled on GHES. by <a href="https://github.com/chiranjib-swain"><code>@chiranjib-swain</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1300">actions/stale#1300</a></li> </ul> <h3>Dependency Upgrades</h3> <ul> <li>Upgrade eslint-config-prettier from 8.10.0 to 10.1.8 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1276">actions/stale#1276</a></li> <li>Upgrade <code>@types/node</code> from 20.10.3 to 24.2.0 and document breaking changes in v10 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1280">actions/stale#1280</a></li> <li>Upgrade actions/publish-action from 0.3.0 to 0.4.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1291">actions/stale#1291</a></li> <li>Upgrade actions/checkout from 4 to 6 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1306">actions/stale#1306</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/chiranjib-swain"><code>@chiranjib-swain</code></a> made their first contribution in <a href="https://redirect.github.com/actions/stale/pull/1300">actions/stale#1300</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/stale/compare/v10...v10.1.1">https://github.com/actions/stale/compare/v10...v10.1.1</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
8f585e4c7a
|
chore(github-deps): bump actions/checkout from 6.0.0 to 6.0.1 (#4331)
Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.0 to 6.0.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/checkout/releases">actions/checkout's releases</a>.</em></p> <blockquote> <h2>v6.0.1</h2> <h2>What's Changed</h2> <ul> <li>Update all references from v5 and v4 to v6 by <a href="https://github.com/ericsciple"><code>@ericsciple</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2314">actions/checkout#2314</a></li> <li>Add worktree support for persist-credentials includeIf by <a href="https://github.com/ericsciple"><code>@ericsciple</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2327">actions/checkout#2327</a></li> <li>Clarify v6 README by <a href="https://github.com/ericsciple"><code>@ericsciple</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2328">actions/checkout#2328</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v6...v6.0.1">https://github.com/actions/checkout/compare/v6...v6.0.1</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
3ca0481e43
|
fix(ui): Fix model dropdown not displaying models in chat playground (#4329)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 18s
Test External API and Providers / test-external (venv) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (push) Failing after 34s
UI Tests / ui-tests (22) (push) Successful in 41s
Unit Tests / unit-tests (3.13) (push) Failing after 1m18s
Unit Tests / unit-tests (3.12) (push) Failing after 1m26s
Pre-commit / pre-commit (22) (push) Successful in 2m53s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m8s
|
||
|
|
8998000aec
|
fix(security): redact JWT tokens in server logs (#4325)
Add "token" to sensitive field patterns in redact_sensitive_fields() to prevent JWT tokens from being logged in plaintext. Previously only api_key, api_token, password, and secret were filtered. This prevents tokens like server.auth.provider_config.jwks.token from being exposed in server logs. Closes: #4324 Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
|
fc4fc03606
|
chore: Small Auth CI refactor (#4322)
In preperation for ABAC addition (next PR)
```
fix(ci): allow run_dir variable expansion in YAML heredoc
Remove single quotes from EOF delimiter to allow $run_dir to
be expanded by bash when creating the configuration file.
Previously the literal string "$run_dir" was being written
to the YAML instead of the actual temp directory path.
drwxr-xr-x 3 runner runner 4096 Dec 5 12:56 $run_dir
```
```
test(ci): add test_endpoint helper function to auth tests
Add reusable test_endpoint function to integration-auth-tests
workflow for consistent API testing:
```
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
|
||
|
|
06f7ff2c80
|
fix: Correct broken links in README (#4218)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 17s
API Conformance Tests / check-schema-compatibility (push) Successful in 22s
Vector IO Integration Tests / test-matrix (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 38s
Test External API and Providers / test-external (venv) (push) Failing after 43s
Unit Tests / unit-tests (3.12) (push) Failing after 1m23s
Unit Tests / unit-tests (3.13) (push) Failing after 1m38s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m49s
Pre-commit / pre-commit (22) (push) Successful in 5m8s
# What does this PR do? Fixing broken README links that were still pointing to the https://llamastack.github.io/latest Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com> |
||
|
|
f14936035d
|
fix: runpod provider no longer crashes sans API key (#4316)
# What does this PR do? previously the runpod provider would fail if the RUNPOD_API_TOKEN was not set modify the impl to default to an empty string to align with similar providers' behavior Closes #4296 ## Test Plan Run `uv run llama stack run --providers inference=remote::runpod` with `RUNPOD_API_TOKEN` unset - server now boots where it previously crashed ``` INFO 2025-12-04 13:52:59,920 uvicorn.error:84 uncategorized: Started server process [233656] INFO 2025-12-04 13:52:59,921 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-12-04 13:52:59,926 llama_stack.core.server.server:168 core::server: Starting up Llama Stack server (version: 0.4.0.dev0) INFO 2025-12-04 13:52:59,927 llama_stack.core.stack:495 core: starting registry refresh task INFO 2025-12-04 13:52:59,928 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-12-04 13:52:59,929 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
|
8bbcfc4f56
|
fix: nvidia provider no longer crashes sans API key (#4317)
# What does this PR do? previously the nvidia provider would throw an exception if a hosted instance was being used but no API key was set modify this behavior to instead log an error informing users that a key is needed to use a hosted NIM but still allow the server to boot Closes #4295 ## Test Plan Run `uv run llama stack run --providers inference=remote::nvidia` with `NVIDIA_API_KEY` unset - server now boots with logged error, where it previously crashed ``` INFO 2025-12-04 14:16:26,156 llama_stack.providers.remote.inference.nvidia.nvidia:47 inference::nvidia: Initializing NVIDIAInferenceAdapter(https://integrate.api.nvidia.com/v1)... ERROR 2025-12-04 14:16:26,157 llama_stack.providers.remote.inference.nvidia.nvidia:51 inference::nvidia: API key is required for hosted NVIDIA NIM. Either provide an API key or use a self-hosted NIM. INFO 2025-12-04 14:16:26,239 uvicorn.error:84 uncategorized: Started server process [251651] INFO 2025-12-04 14:16:26,240 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-12-04 14:16:26,244 llama_stack.core.server.server:168 core::server: Starting up Llama Stack server (version: 0.4.0.dev0) INFO 2025-12-04 14:16:26,245 llama_stack.core.stack:495 core: starting registry refresh task INFO 2025-12-04 14:16:26,246 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-12-04 14:16:26,246 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
|
686065fe27
|
fix: access control to fail-closed when owner attributes are missing (#4273)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Successful in 16s
Python Package Build Test / build (3.13) (push) Successful in 17s
Vector IO Integration Tests / test-matrix (push) Failing after 35s
UI Tests / ui-tests (22) (push) Successful in 39s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m26s
Unit Tests / unit-tests (3.12) (push) Failing after 1m28s
Pre-commit / pre-commit (22) (push) Successful in 3m28s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m12s
|
||
|
|
b4903d6766
|
fix: llama_stack_api inspect API rename (#4311)
# What does this PR do?
when publishing llama_stack_api, `inspect.py` causes issues and gets
confused to be the builtin stdlib inspect module.
This is due to the top level __init__.py we have. We need to rename
inspect.py to inspect_api.py to avoid this conflict.
Also, uv sync
|
||
|
|
c4c6d39c54
|
feat: Implement keyword search and delete_chunk at ChromaDB (#3057)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Successful in 17s
Integration Tests (Replay) / generate-matrix (push) Successful in 23s
Test External API and Providers / test-external (venv) (push) Failing after 26s
Python Package Build Test / build (3.12) (push) Successful in 32s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
UI Tests / ui-tests (22) (push) Successful in 44s
Unit Tests / unit-tests (3.13) (push) Failing after 1m21s
Unit Tests / unit-tests (3.12) (push) Failing after 1m39s
Pre-commit / pre-commit (22) (push) Successful in 3m23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m8s
|
||
|
|
c6609a84f5
|
fix(tests): handle http URLs as aliases for server mode (#4306)
Small fix needed for llama-stack-ops which invokes integration-tests.sh against docker by using a `http://` URL for stack-config |
||
|
|
1d9349c8d6
|
chore(deps): bump next from 15.5.4 to 15.5.7 in /src/llama_stack_ui (#4305)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Successful in 15s
Python Package Build Test / build (3.13) (push) Successful in 19s
Vector IO Integration Tests / test-matrix (push) Failing after 31s
UI Tests / ui-tests (22) (push) Successful in 33s
Test External API and Providers / test-external (venv) (push) Failing after 48s
Unit Tests / unit-tests (3.12) (push) Failing after 1m30s
Unit Tests / unit-tests (3.13) (push) Failing after 1m31s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m58s
Pre-commit / pre-commit (22) (push) Successful in 3m40s
Bumps [next](https://github.com/vercel/next.js) from 15.5.4 to 15.5.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vercel/next.js/releases">next's releases</a>.</em></p> <blockquote> <h2>v15.5.7</h2> <p>Please see <a href="https://nextjs.org/blog/CVE-2025-66478">CVE-2025-66478</a> for additional details about this release.</p> <h2>v15.5.6</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Turbopack: don't define process.cwd() in node_modules <a href="https://redirect.github.com/vercel/next.js/issues/83452">#83452</a></li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/mischnic"><code>@mischnic</code></a> for helping!</p> <h2>v15.5.5</h2> <blockquote> <p>[!NOTE]<br /> This release is backporting bug fixes. It does <strong>not</strong> include all pending features/changes on canary.</p> </blockquote> <h3>Core Changes</h3> <ul> <li>Split code-frame into separate compiled package (<a href="https://redirect.github.com/vercel/next.js/issues/84238">#84238</a>)</li> <li>Add deprecation warning to Runtime config (<a href="https://redirect.github.com/vercel/next.js/issues/84650">#84650</a>)</li> <li>fix: unstable_cache should perform blocking revalidation during ISR revalidation (<a href="https://redirect.github.com/vercel/next.js/issues/84716">#84716</a>)</li> <li>feat: <code>experimental.middlewareClientMaxBodySize</code> body cloning limit (<a href="https://redirect.github.com/vercel/next.js/issues/84722">#84722</a>)</li> <li>fix: missing next/link types with typedRoutes (<a href="https://redirect.github.com/vercel/next.js/issues/84779">#84779</a>)</li> </ul> <h3>Misc Changes</h3> <ul> <li>docs: early October improvements and fixes (<a href="https://redirect.github.com/vercel/next.js/issues/84334">#84334</a>)</li> </ul> <h3>Credits</h3> <p>Huge thanks to <a href="https://github.com/devjiwonchoi"><code>@devjiwonchoi</code></a>, <a href="https://github.com/ztanner"><code>@ztanner</code></a>, and <a href="https://github.com/icyJoseph"><code>@icyJoseph</code></a> for helping!</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
2bdcbe7963
|
fix(ci): standardize CI on node 22 (#4302)
# What does this PR do? CI was previously using both node 20 and 22 standardize on node 22 Closes #4294 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
|
c57c2ae562
|
fix(ci): use latest version of setup-uv and remove pin (#4299)
# What does this PR do? this commit puts aligns all 'setup-uv' instances to the latest version and removes the pin keeping several actions on a very old version Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
|
ee1e63e9b9
|
chore(ci): unify uv versions used in pre-commit (#4297)
# What does this PR do? we had three different versions of uv being used in pre-commit. bump all to the latest version. we should probably try and find some way to automate this. Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
|
c9b50b7e5b
|
fix: check if distro dirs exist before listing (#4301)
# What does this PR do? DISTRO_DIR and DISTRIBS_BASE_DIR need to exist for them to be iterated. our current logic allows us to iterdir without checking if they exist ## Test Plan rm ~/.llama/distributions ``` llama stack list-deps starter --format uv | sh Using Python 3.12.11 environment at: venv Audited 51 packages in 12ms Using Python 3.12.11 environment at: venv Audited 3 packages in 2ms Using Python 3.12.11 environment at: venv Audited 1 package in 3ms Using Python 3.12.11 environment at: venv Audited 3 packages in 5ms ``` Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
743683ba26
|
feat(qdrant): implement hybrid and keyword search support (#4006)
# What does this PR do? - Part of #3009 - Implement hybrid search using Qdrant's native query filtering - Add keyword search support - Update test suites to include qdrant for keyword and hybrid modes <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> ``` pytest -sv tests/unit/providers/vector_io/ ....... ============================================================================================== slowest 10 durations =============================================================================================== 0.20s call tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[qdrant] 0.20s call tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[pgvector] 0.20s call tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[sqlite_vec] 0.20s call tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_max_concurrent_files_per_batch[faiss] 0.06s setup tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py::test_insert_chunks_with_missing_document_id[pgvector] 0.04s call tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_tie_breaking 0.04s call tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_weighted_reranker_parametrization 0.03s call tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_selection 0.03s call tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_edge_cases 0.03s setup tests/unit/providers/vector_io/test_faiss.py::test_faiss_query_vector_returns_infinity_when_query_and_embedding_are_identical ======================================================================================== 180 passed, 47 warnings in 2.78s ========================================================================================= ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com> |
||
|
|
5873a316db
|
feat: Add debug logging for RBAC access control decisions (#4255)
Refactor is_action_allowed() to track decision outcome, matched rule index, and reason. Add structured debug log output for troubleshooting access control. Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
|
fcd6370b34
|
fix: set SqlRecord owner to None when owner_principal is empty (#4284)
Changes SqlRecord creation in AuthorizedSqlStore.fetch_all to use owner=None when owner_principal is empty/missing, matching the ResourceWithOwner pattern used in routing tables. This fixes an inconsistency where SQL store was creating User(principal="") while routing tables use owner=None for public resources. Changes: o Update ProtectedResource Protocol to allow owner: User | None o Update SqlRecord.__init__ to accept owner: User | None o Update fetch_all to create owner=None for records without owner_principal Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
|
aa3898f486
|
chore(cve): Update node-forge to 1.3.3 (#4289)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.12) (push) Successful in 18s
Python Package Build Test / build (3.13) (push) Successful in 19s
Test External API and Providers / test-external (venv) (push) Failing after 28s
UI Tests / ui-tests (22) (push) Successful in 33s
Vector IO Integration Tests / test-matrix (push) Failing after 40s
Unit Tests / unit-tests (3.13) (push) Failing after 1m19s
Unit Tests / unit-tests (3.12) (push) Failing after 1m46s
Pre-commit / pre-commit (push) Successful in 2m49s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
https://github.com/digitalbazaar/forge/security/advisories/GHSA-554w-wpv2-vw27 Taking on a direct dependency is not great 1. We don't actually use node-forge - it's only needed by webpack-dev-server's dependency (selfsigned) for generating self-signed certificates during development 2. Adding a direct dependency would be misleading - it suggests our code uses node-forge when it doesn't In the dependency chain: ``` @docusaurus/core@3.8.1 └─ webpack-dev-server@4.15.2 └─ selfsigned@2.4.1 └─ node-forge@1.3.1 ``` Latest Docusaurus (3.9.2) uses webpack-dev-server 5.2.2, which still uses selfsigned 2.4.1 So, overriding dependency on node-forge is the only option |
||
|
|
3c2d74f39a
|
chore: bump mcp package version (#4287)
# What does this PR do? Address https://github.com/modelcontextprotocol/python-sdk/security/advisories/GHSA-9h52-p55h-vw2f Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
8940be23c4
|
fix: RBAC bypass vulnerabilities in model access (#4270)
Closes security gaps where RBAC checks could be bypassed: o Inference router: Added RBAC enforcement in the fallback path to ensure access control is applied consistently. o Model listing: Dynamic models fetched via provider_data were returned without RBAC checks. Added filtering to ensure users only see models they have permission to access. Both fixes create temporary ModelWithOwner objects for RBAC validation, maintaining security through consistent access control enforcement. Closes: #4269 Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
|
7f43051a63
|
feat: Implement FastAPI router system (#4191)
# What does this PR do? This commit introduces a new FastAPI router-based system for defining API endpoints, enabling a migration path away from the legacy @webmethod decorator system. The implementation includes router infrastructure, migration of the Batches API as the first example, and updates to server, OpenAPI generation, and inspection systems to support both routing approaches. The router infrastructure consists of a router registry system that allows APIs to register FastAPI router factories, which are then automatically discovered and included in the server application. Standard error responses are centralized in router_utils to ensure consistent OpenAPI specification generation with proper $ref references to component responses. The Batches API has been migrated to demonstrate the new pattern. The protocol definition and models remain in llama_stack_api/batches, maintaining clear separation between API contracts and server implementation. The FastAPI router implementation lives in llama_stack/core/server/routers/batches, following the established pattern where API contracts are defined in llama_stack_api and server routing logic lives in llama_stack/core/server. The server now checks for registered routers before falling back to the legacy webmethod-based route discovery, ensuring backward compatibility during the migration period. The OpenAPI generator has been updated to handle both router-based and webmethod-based routes, correctly extracting metadata from FastAPI route decorators and Pydantic Field descriptions. The inspect endpoint now includes routes from both systems, with proper filtering for deprecated routes and API levels. Response descriptions are now explicitly defined in router decorators, ensuring the generated OpenAPI specification matches the previous format. Error responses use $ref references to component responses (BadRequest400, TooManyRequests429, etc.) as required by the specification. This is neat and will allow us to remove a lot of boiler plate code from our generator once the migration is done. This implementation provides a foundation for incrementally migrating other APIs to the router system while maintaining full backward compatibility with existing webmethod-based APIs. Closes: https://github.com/llamastack/llama-stack/issues/4188 ## Test Plan CI, the server should start, same routes should be visible. ``` curl http://localhost:8321/v1/inspect/routes | jq '.data[] | select(.route | contains("batches"))' ``` Also: ``` uv run pytest tests/integration/batches/ -vv --stack-config=http://localhost:8321 ================================================== test session starts ================================================== platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 24 items tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_creation_and_retrieval[None] SKIPPED [ 4%] tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_listing[None] SKIPPED [ 8%] tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_immediate_cancellation[None] SKIPPED [ 12%] tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_chat_completions[None] SKIPPED [ 16%] tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_completions[None] SKIPPED [ 20%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_endpoint[None] SKIPPED [ 25%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_completed[None] SKIPPED [ 29%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_missing_required_fields[None] SKIPPED [ 33%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_completion_window[None] SKIPPED [ 37%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_streaming_not_supported[None] SKIPPED [ 41%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_mixed_streaming_requests[None] SKIPPED [ 45%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_endpoint_mismatch[None] SKIPPED [ 50%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_missing_required_body_fields[None] SKIPPED [ 54%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_metadata_types[None] SKIPPED [ 58%] tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_embeddings[None] SKIPPED [ 62%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_nonexistent_file_id PASSED [ 66%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_malformed_jsonl PASSED [ 70%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[empty] XFAIL [ 75%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[malformed] XFAIL [ 79%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_retrieve_nonexistent PASSED [ 83%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_nonexistent PASSED [ 87%] tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_error_handling_invalid_model PASSED [ 91%] tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotent_batch_creation_successful PASSED [ 95%] tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotency_conflict_with_different_params PASSED [100%] ================================================= slowest 10 durations ================================================== 1.01s call tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotent_batch_creation_successful 0.21s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_nonexistent_file_id 0.17s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_malformed_jsonl 0.12s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_error_handling_invalid_model 0.05s setup tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_creation_and_retrieval[None] 0.02s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[empty] 0.01s call tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotency_conflict_with_different_params 0.01s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[malformed] 0.01s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_retrieve_nonexistent 0.00s call tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_nonexistent ======================================= 7 passed, 15 skipped, 2 xfailed in 1.78s ======================================== ``` --------- Signed-off-by: Sébastien Han <seb@redhat.com> |