llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-11 21:48:36 +00:00

Author	SHA1	Message	Date
Eric Huang	9be300e1f5	merge commit for archive created by Sapling	2025-10-10 14:05:24 -07:00
Eric Huang	d7b57a8dd2	featu: support passing "extra body" throught to providers # What does this PR do? Allows passing through extra_body parameters to inference providers. closes #2720 ## Test Plan CI and added new test	2025-10-10 14:05:18 -07:00
ehhuang	7e80cd0428	Merge `89ae6152c9` into sapling-pr-archive-ehhuang	2025-10-10 14:03:16 -07:00
Eric Huang	89ae6152c9	featu: support passing "extra body" throught to providers # What does this PR do? Allows passing through extra_body parameters to inference providers. closes #2720 ## Test Plan CI and added new test	2025-10-10 14:03:10 -07:00
Eric Huang	9e05ebb5b5	merge commit for archive created by Sapling	2025-10-10 13:53:15 -07:00
Eric Huang	cb7fb0705b	refactor, body # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-10 13:52:36 -07:00
Eric Huang	237aaa0c93	merge commit for archive created by Sapling	2025-10-10 13:47:55 -07:00
Eric Huang	c40d8dcbee	featu: support passing "extra body" throught to providers # What does this PR do? Allows passing through extra_body parameters to inference providers. closes #2720 ## Test Plan CI and added new test	2025-10-10 13:47:38 -07:00
Eric Huang	361fcaf2f3	refactor, body # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-10 13:47:38 -07:00
Eric Huang	5edb810b32	merge commit for archive created by Sapling	2025-10-10 13:43:28 -07:00
Eric Huang	b22d531600	refactor, body # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-10 13:43:23 -07:00
Derek Higgins	6954fe2274	fix(auth): allow unauthenticated access to health and version endpoints (#3736 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test Llama Stack Build / build (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 2m1s Details The AuthenticationMiddleware was blocking all requests without an Authorization header, including health and version endpoints that are needed by monitoring tools, load balancers, and Kubernetes probes. This commit allows endpoints ending in /health or /version to bypass authentication, enabling operational tooling to function properly without requiring credentials. Closes: #3735 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-10 13:41:43 -07:00
Varsha	32fde8d9a8	feat: Add /v1/embeddings endpoint to batches API (#3384 ) # What does this PR do? This PR extends the Llama Stack Batches API to support the /v1/embeddings endpoint, enabling efficient batch processing of embedding requests alongside the existing /v1/chat/completions and /v1/completions support. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes: https://github.com/llamastack/llama-stack/issues/3145 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> ``` (stack-client) ➜ llama-stack git:(support/embeddings-api) conda activate stack-client && python -m pytest tests/unit/providers/batches/test_reference.py -v ============================================================================================================================================ test session starts ============================================================================================================================================= platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 46 items tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_and_retrieve_batch_success PASSED [ 2%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_without_metadata PASSED [ 4%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_completion_window PASSED [ 6%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[/v1/invalid/endpoint] PASSED [ 8%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[] PASSED [ 10%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_metadata PASSED [ 13%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_retrieve_batch_not_found PASSED [ 15%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_success PASSED [ 17%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[failed] PASSED [ 19%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[expired] PASSED [ 21%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[completed] PASSED [ 23%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_not_found PASSED [ 26%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_empty PASSED [ 28%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_single_batch PASSED [ 30%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_multiple_batches PASSED [ 32%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_limit PASSED [ 34%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_pagination PASSED [ 36%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_invalid_after PASSED [ 39%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_kvstore_persistence PASSED [ 41%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_not_found PASSED [ 43%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_exists_empty_content PASSED [ 45%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_mixed_valid_invalid_json PASSED [ 47%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_model PASSED [ 50%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED [ 52%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED [ 54%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED [ 56%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED [ 58%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[model-body.model-invalid_request-Model parameter is required] PASSED [ 60%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[messages-body.messages-invalid_request-Messages parameter is required] PASSED [ 63%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED [ 65%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED [ 67%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED [ 69%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED [ 71%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[model-body.model-invalid_request-Model parameter is required] PASSED [ 73%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[prompt-body.prompt-invalid_request-Prompt parameter is required] PASSED [ 76%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_url_mismatch PASSED [ 78%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_multiple_errors_per_request PASSED [ 80%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_request_format PASSED [ 82%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[custom_id-custom_id-12345-Custom_id must be a string] PASSED [ 84%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[url-url-123-URL must be a string] PASSED [ 86%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[method-method-invalid_value2-Method must be a string] PASSED [ 89%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[body-body-invalid_value3-Body must be a JSON dictionary object] PASSED [ 91%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[model-body.model-123-Model must be a string] PASSED [ 93%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[messages-body.messages-invalid messages format-Messages must be an array] PASSED [ 95%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_max_concurrent_batches PASSED [ 97%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_embeddings_endpoint PASSED [100%] ``` --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 13:25:58 -07:00
Ashwin Bharambe	1394403360	feat(responses): implement usage tracking in streaming responses (#3771 ) Implementats usage accumulation to StreamingResponseOrchestrator. The most important part was to pass `stream_options = { "include_usage": true }` to the chat_completion call. This means I will have to record all responses tests again because request hash will change :) Test changes: - Add usage assertions to streaming and non-streaming tests - Update test recordings with actual usage data from OpenAI	2025-10-10 12:27:03 -07:00
Eric Huang	26c09a9b89	merge commit for archive created by Sapling	2025-10-10 12:23:26 -07:00
Eric Huang	16adbdc51a	refactor, body # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-10 12:23:07 -07:00
Eric Huang	08d48f0db4	merge commit for archive created by Sapling	2025-10-10 12:14:19 -07:00
Eric Huang	f5b5a2d0d5	refactor, body # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-10 12:14:11 -07:00
Francisco Arceo	e7d21e1ee3	feat: Add support for Conversations in Responses API (#3743 ) # What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 11:57:40 -07:00
Ashwin Bharambe	932fea813a	fix(ci): remove responses from CI for now (#3773 ) There are many changes to responses which are landing. They are introducing fundamental new types. This means re-recordings even from the inference calls. Let's avoid that for now. Once everything lands I will re-record everything, make things pass and re-enable.	2025-10-10 11:52:17 -07:00
Eric Huang	fced954eba	merge commit for archive created by Sapling	2025-10-10 10:58:34 -07:00
Eric Huang	ab7888e927	refactor # What does this PR do? ## Test Plan	2025-10-10 10:55:29 -07:00
Eric Huang	33e7b8bc01	merge commit for archive created by Sapling	2025-10-10 10:50:39 -07:00
Eric Huang	b07711b869	refactor # What does this PR do? ## Test Plan	2025-10-10 10:50:32 -07:00
ehhuang	23730bc118	Merge `bd759b868b` into sapling-pr-archive-ehhuang	2025-10-10 10:39:34 -07:00
Eric Huang	bd759b868b	chore!: remove ALL telemetry APIs # What does this PR do? ## Test Plan	2025-10-10 10:39:29 -07:00
ehhuang	cbbc8dbe30	Merge `8fc91f97dc` into sapling-pr-archive-ehhuang	2025-10-10 10:35:17 -07:00
Eric Huang	8fc91f97dc	chore!: remove ALL telemetry APIs # What does this PR do? ## Test Plan	2025-10-10 10:34:18 -07:00
Eric Huang	fd3c678131	merge commit for archive created by Sapling	2025-10-10 09:51:44 -07:00
Eric Huang	68b645cbb3	test # What does this PR do? ## Test Plan	2025-10-10 09:51:38 -07:00
ehhuang	8957671718	Merge `af63e22372` into sapling-pr-archive-ehhuang	2025-10-10 09:49:56 -07:00
Eric Huang	af63e22372	test # What does this PR do? ## Test Plan	2025-10-10 09:49:47 -07:00
Eric Huang	2a8bd68351	merge commit for archive created by Sapling	2025-10-10 09:45:43 -07:00
Eric Huang	82314c686c	test # What does this PR do? ## Test Plan	2025-10-10 09:45:37 -07:00
Eric Huang	776d814c90	merge commit for archive created by Sapling	2025-10-10 09:38:56 -07:00
Eric Huang	7463e2a458	test # What does this PR do? ## Test Plan	2025-10-10 09:38:37 -07:00
Ashwin Bharambe	548ccff368	fix(mypy): fix wrong attribute access (#3770 )	2025-10-10 09:30:43 -07:00
grs	8bf07f91cb	feat: reuse previous mcp tool listings where possible (#3710 ) # What does this PR do? This PR checks whether, if a previous response is linked, there are mcp_list_tools objects that can be reused instead of listing the tools explicitly every time. Closes #3106 ## Test Plan Tested manually. Added unit tests to cover new behaviour. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-10 09:28:25 -07:00
Matthew Farrellee	0066d986c5	feat: use SecretStr for inference provider auth credentials (#3724 ) # What does this PR do? use SecretStr for OpenAIMixin providers - RemoteInferenceProviderConfig now has auth_credential: SecretStr - the default alias is api_key (most common name) - some providers override to use api_token (RunPod, vLLM, Databricks) - some providers exclude it (Ollama, TGI, Vertex AI) addresses #3517 ## Test Plan ci w/ new tests	2025-10-10 07:32:50 -07:00
Derek Higgins	6d8f61206e	fix: update normalize to search all recordings dirs (#3767 ) Updated scripts/normalize_recordings.py to dynamically find and process all 'recordings' directories under tests/ using pathlib.rglob() instead of hardcoding a single path. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-10 07:32:14 -07:00
Ashwin Bharambe	e039b61d26	feat(responses)!: add in_progress, failed, content part events (#3765 ) ## Summary - add schema + runtime support for response.in_progress / response.failed / response.incomplete - stream content parts with proper indexes and reasoning slots - align tests + docs with the richer event payloads ## Testing - uv run pytest tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input - uv run pytest tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py	2025-10-10 07:27:34 -07:00
Akram Ben Aissi	a548169b99	fix: allow skipping model availability check for vLLM (#3739 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Allows model check to fail gracefully instead of crashing on startup. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> set VLLM_URL to your VLLM server ``` (base) akram@Mac llama-stack % LAMA_STACK_LOGGING="all=debug" VLLM_ENABLE_MODEL_DISCOVERY=false MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run ``` ``` INFO 2025-10-08 20:11:24,637 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-10-08 20:11:24,866 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues ERROR 2025-10-08 20:11:26,160 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&redirect_uri=ht tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&response_type=code&scope=user%3Ainfo+user%3Acheck-access&state=9fba207425 5851c718aca717a5887d76%3A%2Fmodels">Found</a>. [...] INFO 2025-10-08 20:11:26,295 uvicorn.error:84 uncategorized: Started server process [83144] INFO 2025-10-08 20:11:26,296 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-08 20:11:26,297 llama_stack.core.server.server:170 core::server: Starting up INFO 2025-10-08 20:11:26,297 llama_stack.core.stack:399 core: starting registry refresh task INFO 2025-10-08 20:11:26,311 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-10-08 20:11:26,312 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ERROR 2025-10-08 20:11:26,791 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&redirect_uri=ht tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&response_type=code&scope=user%3Ainfo+user%3Acheck-access&state=8ef0cba3e1 71a4f8b04cb445cfb91a4c%3A%2Fmodels">Found</a>. ```	2025-10-10 07:23:13 -07:00
Ashwin Bharambe	aaf5036235	feat(responses): add usage types to inference and responses APIs (#3764 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Details API Conformance Tests / check-schema-compatibility (push) Successful in 36s Details UI Tests / ui-tests (22) (push) Successful in 55s Details Pre-commit / pre-commit (push) Successful in 2m7s Details ## Summary Adds OpenAI-compatible usage tracking types to enable reporting token consumption for both streaming and non-streaming responses. ## Type Definitions Chat Completion Usage (inference API): ```python class OpenAIChatCompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails \| None completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails \| None ``` Response Usage (responses API): ```python class OpenAIResponseUsage(BaseModel): input_tokens: int output_tokens: int total_tokens: int input_tokens_details: OpenAIResponseUsageInputTokensDetails \| None output_tokens_details: OpenAIResponseUsageOutputTokensDetails \| None ``` This matches OpenAI's usage reporting format and enables PR #3766 to implement usage tracking in streaming responses. Co-authored-by: Claude <noreply@anthropic.com>	2025-10-10 09:22:59 -04:00
Eric Huang	87073162ce	merge commit for archive created by Sapling	2025-10-09 23:35:23 -07:00
Eric Huang	275cc53f28	test # What does this PR do? ## Test Plan	2025-10-09 23:35:17 -07:00
Eric Huang	689c9f0762	merge commit for archive created by Sapling	2025-10-09 23:18:10 -07:00
Eric Huang	6cc9102097	test # What does this PR do? ## Test Plan	2025-10-09 23:18:02 -07:00
Eric Huang	fa59fc9f92	merge commit for archive created by Sapling	2025-10-09 23:04:19 -07:00
Eric Huang	9f5fdce86e	test # What does this PR do? ## Test Plan	2025-10-09 23:04:13 -07:00
Eric Huang	0cd78e2ba6	merge commit for archive created by Sapling	2025-10-09 22:46:38 -07:00

1 2 3 4 5 ...

3021 commits