llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 10:10:36 +00:00

Author	SHA1	Message	Date
ehhuang	06e4cd8e02	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? Allows passing through extra_body parameters to inference providers. With this, we removed the 2 vllm-specific parameters from completions API into `extra_body`. Before/After <img width="1883" height="324" alt="image" src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1" /> closes #2720 ## Test Plan CI and added new test ``` ❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes Uninstalled 3 packages in 125ms Installed 3 packages in 19ms INFO 2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base INFO 2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server (stack_config=server:starter) ============================================================================================================== test session starts ============================================================================================================== platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-1 configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 285 items / 284 deselected / 1 selected tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 11.773s PASSEDTerminating llama stack server process... Terminating process 98444 and its group... Server process and children terminated gracefully ============================================================================================================= slowest 10 durations ============================================================================================================== 11.88s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 3.02s call tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] ================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s ================================================================================================= ```	2025-10-10 16:21:44 -07:00
ehhuang	80d58ab519	chore: refactor (chat)completions endpoints to use shared params struct (#3761 ) # What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * #3777 * __->__ #3761	2025-10-10 15:46:34 -07:00
Francisco Arceo	e7d21e1ee3	feat: Add support for Conversations in Responses API (#3743 ) # What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 11:57:40 -07:00
grs	8bf07f91cb	feat: reuse previous mcp tool listings where possible (#3710 ) # What does this PR do? This PR checks whether, if a previous response is linked, there are mcp_list_tools objects that can be reused instead of listing the tools explicitly every time. Closes #3106 ## Test Plan Tested manually. Added unit tests to cover new behaviour. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 09:28:25 -07:00
Ashwin Bharambe	e039b61d26	feat(responses)!: add in_progress, failed, content part events (#3765 ) ## Summary - add schema + runtime support for response.in_progress / response.failed / response.incomplete - stream content parts with proper indexes and reasoning slots - align tests + docs with the richer event payloads ## Testing - uv run pytest tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input - uv run pytest tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py	2025-10-10 07:27:34 -07:00
Ashwin Bharambe	aaf5036235	feat(responses): add usage types to inference and responses APIs (#3764 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Details API Conformance Tests / check-schema-compatibility (push) Successful in 36s Details UI Tests / ui-tests (22) (push) Successful in 55s Details Pre-commit / pre-commit (push) Successful in 2m7s Details ## Summary Adds OpenAI-compatible usage tracking types to enable reporting token consumption for both streaming and non-streaming responses. ## Type Definitions Chat Completion Usage (inference API): ```python class OpenAIChatCompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails \| None completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails \| None ``` Response Usage (responses API): ```python class OpenAIResponseUsage(BaseModel): input_tokens: int output_tokens: int total_tokens: int input_tokens_details: OpenAIResponseUsageInputTokensDetails \| None output_tokens_details: OpenAIResponseUsageOutputTokensDetails \| None ``` This matches OpenAI's usage reporting format and enables PR #3766 to implement usage tracking in streaming responses. Co-authored-by: Claude <noreply@anthropic.com>	2025-10-10 09:22:59 -04:00
Alexey Rybak	a8da6ba3a7	docs: API docstrings cleanup for better documentation rendering (#3661 ) # What does this PR do? * Cleans up API docstrings for better documentation rendering <img width="2346" height="1126" alt="image" src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1" /> ## Test Plan * Manual testing --------- Signed-off-by: Doug Edgar <dedgar@redhat.com> Signed-off-by: Charlie Doern <cdoern@redhat.com> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu> Co-authored-by: Doug Edgar <dedgar@redhat.com> Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com> Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu> Co-authored-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 10:46:33 -07:00
Ashwin Bharambe	61b4238912	feat(api): add extra_body parameter support with shields example (#3670 ) ## Summary Introduce `ExtraBodyField` annotation to enable parameters that arrive via extra_body in client SDKs but are accessible server-side with full typing. These parameters are documented in OpenAPI specs under `x-llama-stack-extra-body-params` but excluded from generated SDK signatures. Add `shields` parameter to `create_openai_response` as the first implementation using this pattern. ## Test Plan - added an integration test which checks that shields parameter passed via extra_body reaches server implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-03 13:25:09 -07:00
Ashwin Bharambe	ef0736527d	feat(tools)!: substantial clean up of "Tool" related datatypes (#3627 ) This is a sweeping change to clean up some gunk around our "Tool" definitions. First, we had two types `Tool` and `ToolDef`. The first of these was a "Resource" type for the registry but we had stopped registering tools inside the Registry long back (and only registered ToolGroups.) The latter was for specifying tools for the Agents API. This PR removes the former and adds an optional `toolgroup_id` field to the latter. Secondly, as pointed out by @bbrowning in https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132, we were doing a lossy conversion from a full JSON schema from the MCP tool specification into our ToolDefinition to send it to the model. There is no necessity to do this -- we ourselves aren't doing any execution at all but merely passing it to the chat completions API which supports this. By doing this (and by doing it poorly), we encountered limitations like not supporting array items, or not resolving $refs, etc. To fix this, we replaced the `parameters` field by `{ input_schema, output_schema }` which can be full blown JSON schemas. Finally, there were some types in our llama-related chat format conversion which needed some cleanup. We are taking this opportunity to clean those up. This PR is a substantial breaking change to the API. However, given our window for introducing breaking changes, this suits us just fine. I will be landing a concurrent `llama-stack-client` change as well since API shapes are changing.	2025-10-02 15:12:03 -07:00
Ashwin Bharambe	6afa96b0b9	fix(api): fix a mistake from #3636 which overwrote POST /responses	2025-10-02 13:03:17 -07:00
Sébastien Han	4161102100	chore!: add double routes for v1/openai/v1 (#3636 ) So that users get a warning in 0.3.0 and we remove them in 0.4.0. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-02 16:11:05 +02:00
Alexey Rybak	28bbbcf2c1	docs: adding supplementary markdown content to API specs (#3632 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 45s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? Adds supplementary static content to root API spec pages. This is useful for giving context behind a specific API group, adding information on supported features or work in progress, etc. This PR introduces supplementary information for Agents (experimental, deprecated) and Responses (stable) APIs. <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Documentation server renders rich static content for the Agents API group: ![image.png](https://app.graphite.dev/user-attachments/assets/fc521619-0320-4a22-9409-8ee3fb57ed0e.png) <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-10-01 10:15:30 -07:00
Alexey Rybak	b6a5bccadf	docs: api separation (#3630 ) # What does this PR do? First step towards cleaning up the API reference section of the docs. - Separates API reference into 3 sections: stable (`v1`), experimental (`v1alpha` and `v1beta`), and deprecated (`deprecated=True`) - Each section is accessible via the dropdown menu and `docs/api-overview` <img width="1237" height="321" alt="Screenshot 2025-09-30 at 5 47 30 PM" src="https://github.com/user-attachments/assets/fe0e498c-b066-46ed-a48e-4739d3b6724c" /> <img width="860" height="510" alt="Screenshot 2025-09-30 at 5 47 49 PM" src="https://github.com/user-attachments/assets/a92a8d8c-94bf-42d5-9f5b-b47bb2b14f9c" /> - Deprecated APIs: Added styling to the sidebar, and a notice on the endpoint pages <img width="867" height="428" alt="Screenshot 2025-09-30 at 5 47 43 PM" src="https://github.com/user-attachments/assets/9e6e050d-c782-461b-8084-5ff6496d7bd9" /> Closes #3628 TODO in follow-up PRs: - Add the ability to annotate API groups with supplementary content (so we can have longer descriptions of complex APIs like Responses) - Clean up docstrings to show API endpoints (or short semantic titles) in the sidebar ## Test Plan - Local testing - Made sure API conformance test still passes	2025-10-01 10:13:31 -07:00

13 commits