llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Omar Abdelwahab	eb545034ab	fix: MCP authorization parameter implementation (#4052 ) # What does this PR do? Adding a user-facing `authorization ` parameter to MCP tool definitions that allows users to explicitly configure credentials per MCP server, addressing GitHub Issue #4034 in a secure manner. ## Test Plan tests/integration/responses/test_mcp_authentication.py --------- Co-authored-by: Omar Abdelwahab <omara@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-14 08:54:42 -08:00
Ashwin Bharambe	2441ca9389	fix(api): ensure openapi spec has deprecated routes (#4156 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / generate-matrix (push) Successful in 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details Test External API and Providers / test-external (venv) (push) Failing after 30s Details Test llama stack list-deps / list-deps-from-config (push) Successful in 36s Details Test Llama Stack Build / build-single-provider (push) Successful in 40s Details Test llama stack list-deps / show-single-provider (push) Successful in 48s Details Vector IO Integration Tests / test-matrix (push) Failing after 55s Details Test Llama Stack Build / build (push) Successful in 48s Details UI Tests / ui-tests (22) (push) Successful in 54s Details Test llama stack list-deps / list-deps (push) Failing after 1m34s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m6s Details Unit Tests / unit-tests (3.13) (push) Failing after 2m38s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m38s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m44s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m50s Details Pre-commit / pre-commit (push) Successful in 3m51s Details Deprecated doesn't mean it's "gone", it just means it is "going away" in the next major version of the package.	2025-11-13 13:16:02 -08:00
Francisco Arceo	eb3f9ac278	feat: allow returning embeddings and metadata from `/vector_stores/` methods; disallow changing Provider ID (#4046 ) # What does this PR do? - Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to allow returning `embeddings` and `metadata` using the `extra_query` - Updates the UI accordingly to display them. - Update UI to support CRUD operations in the Vector Stores section and adds a new modal exposing the functionality. - Updates Vector Store update to fail if a user tries to update Provider ID (which doesn't make sense to allow) ```python In [1]: client.vector_stores.files.content( vector_store_id=vector_store.id, file_id=file.id, extra_query={"include_embeddings": True, "include_metadata": True} ) Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic -ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt') ``` Screenshots of UI are displayed below: ### List Vector Store with Added "Create New Vector Store" <img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47 25 PM" src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3" /> ### Create New Vector Store <img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47 49 PM" src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158" /> ### Edit Vector Store <img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48 32 PM" src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414" /> ### Vector Store Files Contents page (with Embeddings) <img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54 32 PM" src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27" /> ### Vector Store Files Contents Details page (with Embeddings) <img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55 00 PM" src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Tests added for Middleware extension and Provider failures. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-12 09:59:48 -08:00
Nathan Weinberg	97ccfb5e62	refactor: inspect routes now shows all non-deprecated APIs (#4116 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 21s Details UI Tests / ui-tests (22) (push) Successful in 46s Details # What does this PR do? the inspect API lacked any mechanism to get all non-deprecated APIs (v1, v1alpha, v1beta) change default to this behavior 'v1' filter can be used for user' wanting a list of stable APIs ## Test Plan 1. pull the PR 2. launch a LLS server 3. run `curl http://beanlab3.bss.redhat.com:8321/v1/inspect/routes` 4. note there are APIs for `v1`, `v1alpha`, and `v1beta` but no deprecated APIs Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-11-10 15:57:17 -08:00
Shabana Baig	433438cfc0	feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062 ) # Problem Responses API uses max_tool_calls parameter to limit the number of tool calls that can be generated in a response. Currently, LLS implementation of the Responses API does not support this parameter. # What does this PR do? This pull request adds the max_tool_calls field to the response object definition and updates the inline provider. it also ensures that: - the total number of calls to built-in and mcp tools do not exceed max_tool_calls - an error is thrown if max_tool_calls < 1 (behavior seen with the OpenAI Responses API, but we can change this if needed) Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563) ## Test Plan - Tested manually for change in model response w.r.t supplied max_tool_calls field. - Added integration tests to test invalid max_tool_calls parameter. - Added integration tests to check max_tool_calls parameter with built-in and function tools. - Added integration tests to check max_tool_calls parameter in the returned response object. - Recorded OpenAI Responses API behavior using a sample script: https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-10 13:21:27 -08:00
Ashwin Bharambe	fadf17daf3	feat(api)!: deprecate register/unregister resource APIs (#4099 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Pre-commit / pre-commit (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details Mark all register_* / unregister_* APIs as deprecated across models, shields, tool groups, datasets, benchmarks, and scoring functions. This is the first step toward moving resource mutations to an `/admin` namespace as outlined in https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585. The deprecation flag will be reflected in the OpenAPI schema to warn API users that these endpoints are being phased out. Next step will be implementing the `/admin` route namespace for these resource management operations. - `register_model` / `unregister_model` - `register_shield` / `unregister_shield` - `register_tool_group` / `unregister_toolgroup` - `register_dataset` / `unregister_dataset` - `register_benchmark` / `unregister_benchmark` - `register_scoring_function` / `unregister_scoring_function`	2025-11-10 10:36:33 -08:00
ehhuang	d4ecbfd092	fix(vector store)!: fix file content API (#4105 ) # What does this PR do? - changed to match https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml ## Test Plan updated test CI	2025-11-10 10:16:35 -08:00
Aakanksha Duggal	b83184f7ef	feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103 ) # What does this PR do? Resolves #4102 1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and the `OpenAIResponseInputToolWebSearch.type` Literal union 2. No changes needed to tool execution logic - all `web_search` types map to the same underlying tool 3. Backward compatibility is maintained - existing `web_search`, `web_search_preview`, and `web_search_preview_2025_03_11` types continue to work 4. Added an integration test case using {"type": "web_search_2025_08_26"} to verify it works correctly 5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to reflect that `web_search_2025_08_26` is now supported. 6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist in the codebase) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-07 10:01:12 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
Ashwin Bharambe	0c49a53c97	chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067 ) RAG aka file search is implemented via the Responses API by specifying the file-search tool. The backend implementation remains unchanged. This PR merely removes the directly exposed API surface which allowed users to directly perform searches from the client. This facility is now available via the `client.vector_store.search()` OpenAI compatible API.	2025-11-04 14:50:54 -08:00
Ashwin Bharambe	a8a8aa56c0	chore!: remove the agents (sessions and turns) API (#4055 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details - Removes the deprecated agents (sessions and turns) API that was marked alpha in 0.3.0 - Cleans up unused imports and orphaned types after the API removal - Removes `SessionNotFoundError` and `AgentTurnInputType` which are no longer needed The agents API is completely superseded by the Responses + Conversations APIs, and the client SDK Agent class already uses those implementations. Corresponding client-side PR: https://github.com/llamastack/llama-stack-client-python/pull/295	2025-11-04 09:38:39 -08:00
Ashwin Bharambe	053fc0ac39	chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m13s Details This PR removes all routes which we had marked deprecated for the 0.3.0 release. This includes: - all the `/v1/openai/v1/` routes (the corresponding /v1 routes still exist of course) - the /agents API (which is superseded completely by Responses + Conversations) - several alpha routes which had a "v1" route to aide transitioning to "v1alpha" This is the corresponding client-python change: https://github.com/llamastack/llama-stack-client-python/pull/294	2025-11-03 19:00:59 -08:00
Sébastien Han	4a5ef65286	chore!: remove SDG API (#4035 ) # What does this PR do? This API hasn't received any traction and close to zero interest from the community. Let's revisit in the future if things change. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 16:12:06 -08:00
Ashwin Bharambe	44096512b5	feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051 ) We need to remove `/v1/openai/v1` paths shortly. There is one trouble -- our current `/v1/openai/v1/models` endpoint provides different data than `/v1/models`. Unfortunately our tests target the latter (llama-stack customized) behavior. We need to get to true OpenAI compatibility. This is step 1: adding `custom_metadata` field to `OpenAIModel` that includes all the extra stuff we add in the native `/v1/models` response. This can be extracted on the consumer end by look at `__pydantic_extra__` or other similar fields. This PR: - Adds `custom_metadata` field to `OpenAIModel` class in `src/llama_stack/apis/models/models.py` - Modified `openai_list_models()` in `src/llama_stack/core/routing_tables/models.py` to populate custom_metadata Next Steps 1. Update stainless client to use `/v1/openai/v1/models` instead of `/v1/models` 2. Migrate tests to read from `custom_metadata` 3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single `/v1/models` endpoint	2025-11-03 15:56:07 -08:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Sébastien Han	b4ea05ada9	chore: add batches to openapi schema (#3980 ) # What does this PR do? While working on https://github.com/llamastack/llama-stack/pull/3944 I realized that the batches API wasn't generated. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-30 07:08:35 -07:00
Charlie Doern	e8ecc99524	fix!: remove chunk_id property from Chunk class (#3954 ) # What does this PR do? chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 18:59:59 -07:00
Ian Miller	5598f61e12	feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for making changes to Responses API scheme to introduce OpenAI compatible prompts there. Change to the API only, therefore currently no implementation at all. However, the follow up PR with actual implementation will be submitted after current PR lands. The need of this functionality was initiated in #3514. > Note, #3514 is divided on three separate PRs. Current PR is the second of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> CI	2025-10-28 09:31:27 -07:00
ehhuang	b7dd3f5c56	chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923 ) # What does this PR do? ## Test Plan CI vector_io tests will fail until next client sync passed with https://github.com/llamastack/llama-stack-client-python/pull/286 checked out locally	2025-10-27 14:26:06 -07:00
Luis Tomas Bolivar	63422e5b36	fix!: Enhance response API support to not fail with tool calling (#3385 ) Some checks failed Python Package Build Test / build (3.12) (push) Failing after 8s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Test External API and Providers / test-external (venv) (push) Failing after 1m3s Details Vector IO Integration Tests / test-matrix (push) Failing after 1m6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 1m17s Details UI Tests / ui-tests (22) (push) Successful in 1m18s Details Pre-commit / pre-commit (push) Successful in 3m5s Details # What does this PR do? Introduces two main fixes to enhance the stability of Responses API when dealing with tool calling responses and structured outputs. ### Changes Made 1. It added OpenAIResponseOutputMessageMCPCall and ListTools to OpenAIResponseInput but https://github.com/llamastack/llama-stack/pull/3810 got merge that did the same in a different way. Still this PR does it in a way that keep the sync between OpenAIResponsesOutput and the allowed objects in OpenAIResponseInput. 2. Add protection in case self.ctx.response_format does not have type attribute BREAKING CHANGE: OpenAIResponseInput now uses OpenAIResponseOutput union type. This is semantically equivalent - all previously accepted types are still supported via the OpenAIResponseOutput union. This improves type consistency and maintainability.	2025-10-27 09:33:02 -07:00
Luis Tomas Bolivar	f18b5eb537	fix: Avoid BadRequestError due to invalid max_tokens (#3667 ) This patch ensures if max tokens is not defined, then is set to None instead of 0 when calling openai_chat_completion. This way some providers (like gemini) that cannot handle the `max_tokens = 0` will not fail Issue: #3666	2025-10-27 09:27:21 -07:00
ehhuang	9916cb3b17	chore: support default model in moderations API (#3890 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? https://platform.openai.com/docs/api-reference/moderations supports optional model parameter. This PR adds support for using moderations API with model=None if a default shield id is provided via safety config. ## Test Plan added tests manual test: ``` > SAFETY_MODEL='together/meta-llama/Llama-Guard-4-12B' uv run llama stack run starter > curl http://localhost:8321/v1/moderations \ -H "Content-Type: application/json" \ -d '{ "input": [ "hello" ] }' ```	2025-10-23 16:03:53 -07:00
dependabot[bot]	8885cea8d7	fix(conversations)!: update Conversations API definitions (was: bump openai from 1.107.0 to 2.5.0) (#3847 ) Bumps [openai](https://github.com/openai/openai-python) from 1.107.0 to 2.5.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/openai/openai-python/releases">openai's releases</a>.</em></p> <blockquote> <h2>v2.5.0</h2> <h2>2.5.0 (2025-10-17)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.4.0...v2.5.0">v2.4.0...v2.5.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> api update (<a href="`8b280d57d6`">8b280d5</a>)</li> </ul> <h3>Chores</h3> <ul> <li>bump <code>httpx-aiohttp</code> version to 0.1.9 (<a href="`67f2f0afe5`">67f2f0a</a>)</li> </ul> <h2>v2.4.0</h2> <h2>2.4.0 (2025-10-16)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.3.0...v2.4.0">v2.3.0...v2.4.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> Add support for gpt-4o-transcribe-diarize on audio/transcriptions endpoint (<a href="`bdbe9b8f44`">bdbe9b8</a>)</li> </ul> <h3>Chores</h3> <ul> <li>fix dangling comment (<a href="`da14e99606`">da14e99</a>)</li> <li><strong>internal:</strong> detect missing future annotations with ruff (<a href="`2672b8f072`">2672b8f</a>)</li> </ul> <h2>v2.3.0</h2> <h2>2.3.0 (2025-10-10)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.2.0...v2.3.0">v2.2.0...v2.3.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> comparison filter in/not in (<a href="`aa49f626a6`">aa49f62</a>)</li> </ul> <h3>Chores</h3> <ul> <li><strong>package:</strong> bump jiter to >=0.10.0 to support Python 3.14 (<a href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>) (<a href="`aa445cab5c`">aa445ca</a>)</li> </ul> <h2>v2.2.0</h2> <h2>2.2.0 (2025-10-06)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.1.0...v2.2.0">v2.1.0...v2.2.0</a></p> <h3>Features</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/openai/openai-python/blob/main/CHANGELOG.md">openai's changelog</a>.</em></p> <blockquote> <h2>2.5.0 (2025-10-17)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.4.0...v2.5.0">v2.4.0...v2.5.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> api update (<a href="`8b280d57d6`">8b280d5</a>)</li> </ul> <h3>Chores</h3> <ul> <li>bump <code>httpx-aiohttp</code> version to 0.1.9 (<a href="`67f2f0afe5`">67f2f0a</a>)</li> </ul> <h2>2.4.0 (2025-10-16)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.3.0...v2.4.0">v2.3.0...v2.4.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> Add support for gpt-4o-transcribe-diarize on audio/transcriptions endpoint (<a href="`bdbe9b8f44`">bdbe9b8</a>)</li> </ul> <h3>Chores</h3> <ul> <li>fix dangling comment (<a href="`da14e99606`">da14e99</a>)</li> <li><strong>internal:</strong> detect missing future annotations with ruff (<a href="`2672b8f072`">2672b8f</a>)</li> </ul> <h2>2.3.0 (2025-10-10)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.2.0...v2.3.0">v2.2.0...v2.3.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> comparison filter in/not in (<a href="`aa49f626a6`">aa49f62</a>)</li> </ul> <h3>Chores</h3> <ul> <li><strong>package:</strong> bump jiter to >=0.10.0 to support Python 3.14 (<a href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>) (<a href="`aa445cab5c`">aa445ca</a>)</li> </ul> <h2>2.2.0 (2025-10-06)</h2> <p>Full Changelog: <a href="https://github.com/openai/openai-python/compare/v2.1.0...v2.2.0">v2.1.0...v2.2.0</a></p> <h3>Features</h3> <ul> <li><strong>api:</strong> dev day 2025 launches (<a href="`38ac0093eb`">38ac009</a>)</li> </ul> <h3>Bug Fixes</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`513ae76253`"><code>513ae76</code></a> release: 2.5.0 (<a href="https://redirect.github.com/openai/openai-python/issues/2694">#2694</a>)</li> <li><a href="`ebf32212f7`"><code>ebf3221</code></a> release: 2.4.0</li> <li><a href="`e043d7b164`"><code>e043d7b</code></a> chore: fix dangling comment</li> <li><a href="`25cbb74f83`"><code>25cbb74</code></a> feat(api): Add support for gpt-4o-transcribe-diarize on audio/transcriptions ...</li> <li><a href="`8cdfd0650e`"><code>8cdfd06</code></a> codegen metadata</li> <li><a href="`d5c64434b7`"><code>d5c6443</code></a> codegen metadata</li> <li><a href="`b20a9e7b81`"><code>b20a9e7</code></a> chore(internal): detect missing future annotations with ruff</li> <li><a href="`e5f93f5dae`"><code>e5f93f5</code></a> release: 2.3.0</li> <li><a href="`044878859c`"><code>0448788</code></a> feat(api): comparison filter in/not in</li> <li><a href="`85a91ade61`"><code>85a91ad</code></a> chore(package): bump jiter to >=0.10.0 to support Python 3.14 (<a href="https://redirect.github.com/openai/openai-python/issues/2618">#2618</a>)</li> <li>Additional commits viewable in <a href="https://github.com/openai/openai-python/compare/v1.107.0...v2.5.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=openai&package-manager=uv&previous-version=1.107.0&new-version=2.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-22 12:32:48 -07:00
Jiayi Ni	bb1ebb3c6b	feat: Add rerank models and rerank API change (#3831 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Extend the model type to include rerank models. - Implement `rerank()` method in inference router. - Add `rerank_model_list` to `OpenAIMixin` to enable providers to register and identify rerank models - Update documentation. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` pytest tests/unit/providers/utils/inference/test_openai_mixin.py ```	2025-10-22 12:02:28 -07:00
Ashwin Bharambe	bd3c473208	revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877 ) Reverts llamastack/llama-stack#3871 This PR broke RAG (even from Responses -- there _is_ a dependency)	2025-10-21 11:22:06 -07:00
Ashwin Bharambe	0e96279bee	chore(cleanup)!: remove tool_runtime.rag_tool (#3871 ) Kill the `builtin::rag` tool group completely since it is no longer targeted. We use the Responses implementation for knowledge_search which uses the `openai_vector_stores` pathway. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-20 22:26:21 -07:00
Ashwin Bharambe	122de785c4	chore(cleanup)!: kill vector_db references as far as possible (#3864 ) There should not be "vector db" anywhere.	2025-10-20 20:06:16 -07:00
Shabana Baig	add64e8e2a	feat: Add instructions parameter in response object (#3741 ) # Problem The current inline provider appends the user provided instructions to messages as a system prompt, but the returned response object does not contain the instructions field (as specified in the OpenAI responses spec). # What does this PR do? This pull request adds the instruction field to the response object definition and updates the inline provider. It also ensures that instructions from previous response is not carried over to the next response (as specified in the openAI spec). Closes #[3566](https://github.com/llamastack/llama-stack/issues/3566) ## Test Plan - Tested manually for change in model response w.r.t supplied instructions field. - Added unit test to check that the instructions from previous response is not carried over to the next response. - Added integration tests to check instructions parameter in the returned response object. - Added new recordings for the integration tests. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-20 13:10:37 -07:00
Alexey Rybak	224c99560c	docs: update docstrings for better formatting (#3838 ) # What does this PR do? Updates docstrings for Conversations and Eval APIs to render better in the docs nav sidebar. Before: <img width="363" height="233" alt="Screenshot 2025-10-17 at 9 52 17 AM" src="https://github.com/user-attachments/assets/3a77f9e3-3b03-43ae-8584-a21d1f44d54d" /> After: <img width="410" height="206" alt="Screenshot 2025-10-17 at 9 52 11 AM" src="https://github.com/user-attachments/assets/fa5d428d-2bde-4453-84fd-9aceebe712e8" /> ## Test Plan * Manual testing	2025-10-17 10:41:50 -07:00
slekkala1	99141c29b1	feat: Add responses and safety impl extra_body (#3781 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Test External API and Providers / test-external (venv) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? Have closed the previous PR due to merge conflicts with multiple PRs Addressed all comments from https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying over to this one) ## Test Plan Added UTs and integration tests	2025-10-15 15:01:37 -07:00
Ashwin Bharambe	e9b4278a51	feat(responses)!: improve responses + conversations implementations (#3810 ) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from https://github.com/llamastack/llama-stack-client-python/pull/281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```	2025-10-15 09:36:11 -07:00
ehhuang	866c13cdc2	chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs (#3740 ) # What does this PR do? As discussed on discord, we do not need to reinvent the wheel for telemetry. Instead we'll lean into the canonical OTEL stack. Logs/traces/metrics will still be sent via OTEL - they just won't be stored on, queried through Stack. This is the first of many PRs to remove telemetry API from Stack. 1) removed webmethod decorators to remove from API spec 2) removed tests as @iamemilio is adding them on otel directly. ## Test Plan	2025-10-14 13:48:40 -07:00
Ashwin Bharambe	ecc8a554d2	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m23s Details Applies the same pattern from https://github.com/llamastack/llama-stack/pull/3777 to embeddings and vector_stores.create() endpoints. This should _not_ be a breaking change since (a) our tests were already using the `extra_body` parameter when passing in to the backend (b) but the backend probably wasn't extracting the parameters correctly. This PR will fix that. Updated APIs: `openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()`	2025-10-12 19:01:52 -07:00
slekkala1	3bb6ef351b	chore!: Safety api refactoring to use OpenAIMessageParam (#3796 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m28s Details # What does this PR do? Remove usage of deprecated `Message` from Safety apis ## Test Plan CI	2025-10-12 08:01:00 -07:00
Ashwin Bharambe	7c63aebd64	feat(responses)!: add reasoning and annotation added events (#3793 ) Implements missing streaming events from OpenAI Responses API spec: - reasoning text/summary events for o1/o3 models, - refusal events for safety moderation - annotation events for citations, - and file search streaming events. Added optional reasoning_content field to chat completion chunks to support non-standard provider extensions. NOTE: OpenAI does _not_ fill reasoning_content when users use the chat_completion APIs. This means there is no way for us to implement Responses (with reasoning) by using OpenAI chat completions! We'd need to transparently punt to OpenAI's responses endpoints if we wish to do that. For others though (vLLM, etc.) we can use it. ## Test Plan File search streaming test passes: ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --setup gpt --inference-mode replay --pattern test_response_file_search_streaming_events ``` Need more complex setup and validation for reasoning tests (need a vLLM powered OSS model maybe gpt-oss which can return reasoning_content). I will do that in a followup PR.	2025-10-11 16:47:14 -07:00
Francisco Arceo	a165b8b5bb	chore!: BREAKING CHANGE removing VectorDB APIs (#3774 ) # What does this PR do? Removes VectorDBs from API surface and our tests. Moves tests to Vector Stores. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-11 14:07:08 -07:00
ehhuang	06e4cd8e02	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? Allows passing through extra_body parameters to inference providers. With this, we removed the 2 vllm-specific parameters from completions API into `extra_body`. Before/After <img width="1883" height="324" alt="image" src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1" /> closes #2720 ## Test Plan CI and added new test ``` ❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes Uninstalled 3 packages in 125ms Installed 3 packages in 19ms INFO 2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base INFO 2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server (stack_config=server:starter) ============================================================================================================== test session starts ============================================================================================================== platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-1 configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 285 items / 284 deselected / 1 selected tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 11.773s PASSEDTerminating llama stack server process... Terminating process 98444 and its group... Server process and children terminated gracefully ============================================================================================================= slowest 10 durations ============================================================================================================== 11.88s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 3.02s call tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] ================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s ================================================================================================= ```	2025-10-10 16:21:44 -07:00
ehhuang	80d58ab519	chore: refactor (chat)completions endpoints to use shared params struct (#3761 ) # What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * #3777 * __->__ #3761	2025-10-10 15:46:34 -07:00
Francisco Arceo	e7d21e1ee3	feat: Add support for Conversations in Responses API (#3743 ) # What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 11:57:40 -07:00
grs	8bf07f91cb	feat: reuse previous mcp tool listings where possible (#3710 ) # What does this PR do? This PR checks whether, if a previous response is linked, there are mcp_list_tools objects that can be reused instead of listing the tools explicitly every time. Closes #3106 ## Test Plan Tested manually. Added unit tests to cover new behaviour. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 09:28:25 -07:00
Ashwin Bharambe	e039b61d26	feat(responses)!: add in_progress, failed, content part events (#3765 ) ## Summary - add schema + runtime support for response.in_progress / response.failed / response.incomplete - stream content parts with proper indexes and reasoning slots - align tests + docs with the richer event payloads ## Testing - uv run pytest tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input - uv run pytest tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py	2025-10-10 07:27:34 -07:00
Ashwin Bharambe	aaf5036235	feat(responses): add usage types to inference and responses APIs (#3764 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Details API Conformance Tests / check-schema-compatibility (push) Successful in 36s Details UI Tests / ui-tests (22) (push) Successful in 55s Details Pre-commit / pre-commit (push) Successful in 2m7s Details ## Summary Adds OpenAI-compatible usage tracking types to enable reporting token consumption for both streaming and non-streaming responses. ## Type Definitions Chat Completion Usage (inference API): ```python class OpenAIChatCompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails \| None completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails \| None ``` Response Usage (responses API): ```python class OpenAIResponseUsage(BaseModel): input_tokens: int output_tokens: int total_tokens: int input_tokens_details: OpenAIResponseUsageInputTokensDetails \| None output_tokens_details: OpenAIResponseUsageOutputTokensDetails \| None ``` This matches OpenAI's usage reporting format and enables PR #3766 to implement usage tracking in streaming responses. Co-authored-by: Claude <noreply@anthropic.com>	2025-10-10 09:22:59 -04:00
Alexey Rybak	a8da6ba3a7	docs: API docstrings cleanup for better documentation rendering (#3661 ) # What does this PR do? * Cleans up API docstrings for better documentation rendering <img width="2346" height="1126" alt="image" src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1" /> ## Test Plan * Manual testing --------- Signed-off-by: Doug Edgar <dedgar@redhat.com> Signed-off-by: Charlie Doern <cdoern@redhat.com> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu> Co-authored-by: Doug Edgar <dedgar@redhat.com> Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com> Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu> Co-authored-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 10:46:33 -07:00
Ashwin Bharambe	61b4238912	feat(api): add extra_body parameter support with shields example (#3670 ) ## Summary Introduce `ExtraBodyField` annotation to enable parameters that arrive via extra_body in client SDKs but are accessible server-side with full typing. These parameters are documented in OpenAPI specs under `x-llama-stack-extra-body-params` but excluded from generated SDK signatures. Add `shields` parameter to `create_openai_response` as the first implementation using this pattern. ## Test Plan - added an integration test which checks that shields parameter passed via extra_body reaches server implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-03 13:25:09 -07:00
Francisco Arceo	a20e8eac8c	feat: Add OpenAI Conversations API (#3429 ) # What does this PR do? Initial implementation for `Conversations` and `ConversationItems` using `AuthorizedSqlStore` with endpoints to: - CREATE - UPDATE - GET/RETRIEVE/LIST - DELETE Set `level=LLAMA_STACK_API_V1`. NOTE: This does not currently incorporate changes for Responses, that'll be done in a subsequent PR. Closes https://github.com/llamastack/llama-stack/issues/3235 ## Test Plan - Unit tests - Integration tests Also comparison of [OpenAPI spec for OpenAI API](https://github.com/openai/openai-openapi/tree/manual_spec) ```bash oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \ --match-path '(^/v1/openai/v1/conversations.\|^/conversations.)' ``` Note I still have some uncertainty about this, I borrowed this info from @cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need to spend more time to confirm it's working, at the moment it suggests it does. UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks like currently the spec does not list Conversations, so that analysis is useless. Noting for future reference. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-03 08:47:18 -07:00
Ashwin Bharambe	ef0736527d	feat(tools)!: substantial clean up of "Tool" related datatypes (#3627 ) This is a sweeping change to clean up some gunk around our "Tool" definitions. First, we had two types `Tool` and `ToolDef`. The first of these was a "Resource" type for the registry but we had stopped registering tools inside the Registry long back (and only registered ToolGroups.) The latter was for specifying tools for the Agents API. This PR removes the former and adds an optional `toolgroup_id` field to the latter. Secondly, as pointed out by @bbrowning in https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132, we were doing a lossy conversion from a full JSON schema from the MCP tool specification into our ToolDefinition to send it to the model. There is no necessity to do this -- we ourselves aren't doing any execution at all but merely passing it to the chat completions API which supports this. By doing this (and by doing it poorly), we encountered limitations like not supporting array items, or not resolving $refs, etc. To fix this, we replaced the `parameters` field by `{ input_schema, output_schema }` which can be full blown JSON schemas. Finally, there were some types in our llama-related chat format conversion which needed some cleanup. We are taking this opportunity to clean those up. This PR is a substantial breaking change to the API. However, given our window for introducing breaking changes, this suits us just fine. I will be landing a concurrent `llama-stack-client` change as well since API shapes are changing.	2025-10-02 15:12:03 -07:00
Ashwin Bharambe	6afa96b0b9	fix(api): fix a mistake from #3636 which overwrote POST /responses	2025-10-02 13:03:17 -07:00
Alexey Rybak	24ee577cb0	docs: API spec generation for Stainless (#3655 ) # What does this PR do? * Adds stainless-llama-stack-spec.yaml for Stainless client generation, which comprises stable + experimental APIs ## Test Plan * Manual generation	2025-10-02 09:25:09 -07:00

48 commits