llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Charlie Doern	d6b915ce0a	Merge remote-tracking branch 'upstream/main' into api-pkg Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-12 13:54:09 -05:00
Charlie Doern	37853ca558	fix(tests): add OpenAI client connection cleanup to prevent CI hangs (#4119 ) # What does this PR do? Add explicit connection cleanup and shorter timeouts to OpenAI client fixtures. Fixes CI deadlock after 25+ tests due to connection pool exhaustion. Also adds 60s timeout to test_conversation_context_loading as safety net. ## Test Plan tests pass Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-12 12:17:13 -05:00
Charlie Doern	85d407c2a0	feat: split API and provider specs into separate llama-stack-api pkg Extract API definitions, models, and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Preserved git history using git mv for moved files - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages - Rebased on top of upstream src/ layout changes Testing Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-12 09:19:40 -05:00
Ashwin Bharambe	aa2bd82b1d	fix(ci): add recordings for responses suite due to web search type changing (#4104 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 1m3s Details #4103 broke (even though the PR itself was green) trunk	2025-11-07 10:42:07 -08:00
Aakanksha Duggal	b83184f7ef	feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103 ) # What does this PR do? Resolves #4102 1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and the `OpenAIResponseInputToolWebSearch.type` Literal union 2. No changes needed to tool execution logic - all `web_search` types map to the same underlying tool 3. Backward compatibility is maintained - existing `web_search`, `web_search_preview`, and `web_search_preview_2025_03_11` types continue to work 4. Added an integration test case using {"type": "web_search_2025_08_26"} to verify it works correctly 5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to reflect that `web_search_2025_08_26` is now supported. 6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist in the codebase) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-07 10:01:12 -08:00
Ashwin Bharambe	f49cb0b717	chore: Stack server no longer depends on llama-stack-client (#4094 ) This dependency has been bothering folks for a long time (cc @leseb). We really needed it due to "library client" which is primarily used for our tests and is not a part of the Stack server. Anyone who needs to use the library client can certainly install `llama-stack-client` in their environment to make that work. Updated the notebook references to install `llama-stack-client` additionally when setting things up.	2025-11-07 09:54:09 -08:00
Ashwin Bharambe	4d3069bfa5	chore(ci): remove unused recordings (#4074 ) Added a script to cleanup recordings. While doing this, moved the CI matrix generation to a separate script so there is a single source of truth for the matrix. Ran the cleanup script as: ``` PYTHONPATH=. python scripts/cleanup_recordings.py ``` Also added this as part of the pre-commit workflow to ensure that the recordings are always up to date and that no stale recordings are left in the repo.	2025-11-05 09:21:58 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Charlie Doern	e8ecc99524	fix!: remove chunk_id property from Chunk class (#3954 ) # What does this PR do? chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 18:59:59 -07:00
Ashwin Bharambe	7918188f1e	fix(ci): enable responses tests in CI; suppress expected MCP auth error logs (#3889 ) Let us enable responses suite in CI now. Also a minor fix: MCP tool tests intentionally trigger authentication failures to verify error handling, but the resulting error logs clutter test output.	2025-10-22 14:59:42 -07:00
Ashwin Bharambe	c0c0e337d9	misc(tests): add recordings for responses tests	2025-10-21 16:39:08 -07:00
Ashwin Bharambe	f205ab6f6c	fix(responses): fixes, re-record tests (#3820 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 17s Details UI Tests / ui-tests (22) (push) Successful in 55s Details Pre-commit / pre-commit (push) Successful in 1m43s Details Wanted to re-enable Responses CI but it seems to hang for some reason due to some interactions with conversations_store or responses_store. ## Test Plan ``` # library client ./scripts/integration-tests.sh --stack-config ci-tests --suite responses # server ./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses ```	2025-10-15 16:37:42 -07:00
slekkala1	99141c29b1	feat: Add responses and safety impl extra_body (#3781 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Test External API and Providers / test-external (venv) (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? Have closed the previous PR due to merge conflicts with multiple PRs Addressed all comments from https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying over to this one) ## Test Plan Added UTs and integration tests	2025-10-15 15:01:37 -07:00
Ashwin Bharambe	8e7e0ddfec	fix(responses): use conversation items when no stored messages exist (#3819 ) Handle a base case when no stored messages exist because no Response call has been made. ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --inference-mode record-if-missing --pattern test_conversation_responses ```	2025-10-15 14:43:44 -07:00
Ashwin Bharambe	e9b4278a51	feat(responses)!: improve responses + conversations implementations (#3810 ) This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from https://github.com/llamastack/llama-stack-client-python/pull/281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ```	2025-10-15 09:36:11 -07:00
Ashwin Bharambe	7c63aebd64	feat(responses)!: add reasoning and annotation added events (#3793 ) Implements missing streaming events from OpenAI Responses API spec: - reasoning text/summary events for o1/o3 models, - refusal events for safety moderation - annotation events for citations, - and file search streaming events. Added optional reasoning_content field to chat completion chunks to support non-standard provider extensions. NOTE: OpenAI does _not_ fill reasoning_content when users use the chat_completion APIs. This means there is no way for us to implement Responses (with reasoning) by using OpenAI chat completions! We'd need to transparently punt to OpenAI's responses endpoints if we wish to do that. For others though (vLLM, etc.) we can use it. ## Test Plan File search streaming test passes: ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --setup gpt --inference-mode replay --pattern test_response_file_search_streaming_events ``` Need more complex setup and validation for reasoning tests (need a vLLM powered OSS model maybe gpt-oss which can return reasoning_content). I will do that in a followup PR.	2025-10-11 16:47:14 -07:00
Ashwin Bharambe	1394403360	feat(responses): implement usage tracking in streaming responses (#3771 ) Implementats usage accumulation to StreamingResponseOrchestrator. The most important part was to pass `stream_options = { "include_usage": true }` to the chat_completion call. This means I will have to record all responses tests again because request hash will change :) Test changes: - Add usage assertions to streaming and non-streaming tests - Update test recordings with actual usage data from OpenAI	2025-10-10 12:27:03 -07:00
Francisco Arceo	e7d21e1ee3	feat: Add support for Conversations in Responses API (#3743 ) # What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 11:57:40 -07:00
Ashwin Bharambe	e039b61d26	feat(responses)!: add in_progress, failed, content part events (#3765 ) ## Summary - add schema + runtime support for response.in_progress / response.failed / response.incomplete - stream content parts with proper indexes and reasoning slots - align tests + docs with the richer event payloads ## Testing - uv run pytest tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input - uv run pytest tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py	2025-10-10 07:27:34 -07:00
Ashwin Bharambe	f50ce11a3b	feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403 ) Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ```	2025-10-09 14:27:51 -07:00
Ashwin Bharambe	61b4238912	feat(api): add extra_body parameter support with shields example (#3670 ) ## Summary Introduce `ExtraBodyField` annotation to enable parameters that arrive via extra_body in client SDKs but are accessible server-side with full typing. These parameters are documented in OpenAPI specs under `x-llama-stack-extra-body-params` but excluded from generated SDK signatures. Add `shields` parameter to `create_openai_response` as the first implementation using this pattern. ## Test Plan - added an integration test which checks that shields parameter passed via extra_body reaches server implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-03 13:25:09 -07:00
ehhuang	14a94e9894	fix: responses <> chat completion input conversion (#3645 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? closes #3268 closes #3498 When resuming from previous response ID, currently we attempt to convert from the stored responses input to chat completion messages, which is not always possible, e.g. for tool calls where some data is lost once converted from chat completion message to repsonses input format. This PR stores the chat completion messages that correspond to the _last_ call to chat completion, which is sufficient to be resumed from in the next responses API call, where we load these saved messages and skip conversion entirely. Separate issue to optimize storage: https://github.com/llamastack/llama-stack/issues/3646 ## Test Plan existing CI tests	2025-10-02 16:01:08 -07:00
grs	d350e3662b	feat: add support for require_approval argument when creating response (#3608 ) # What does this PR do? This PR adds support for the require_approval on an mcp tool definition passed to create response in the Responses API. This allows the caller to indicate whether they want to approve calls to that server, or let them be called without approval. Closes #3443 ## Test Plan Tested both approval and denial. Added automated integration test for both cases. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-09-30 14:18:34 -07:00
Ashwin Bharambe	47b640370e	feat(tests): introduce a test "suite" concept to encompass dirs, options (#3339 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m15s Details Our integration tests need to be 'grouped' because each group often needs a specific set of models it works with. We separated vision tests due to this, and we have a separate set of tests which test "Responses" API. This PR makes this system a bit more official so it is very easy to target these groups and apply all testing infrastructure towards all the groups (for example, record-replay) uniformly. There are three suites declared: - base - vision - responses Note that our CI currently runs the "base" and "vision" suites. You can use the `--suite` option when running pytest (or any of the testing scripts or workflows.) For example: ``` OLLAMA_URL=http://localhost:11434 \ pytest -s -v tests/integration/ --stack-config starter --suite vision ```	2025-09-05 13:58:49 -07:00

24 commits