llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 10:23:44 +00:00

Author	SHA1	Message	Date
Eric Huang	f675b09b69	fix: log level # What does this PR do? - categories like "core::server" is not recognized so it's level is not set by 'all=debug' - removed spammy telemetry debug logging ## Test Plan test server launched with LLAMA_STACK_LOGGING='all=debug'	2025-10-01 09:39:51 -07:00
Charlie Doern	d167101e70	feat(api): implement v1beta leveling, and additional alpha (#3594 ) # What does this PR do? level the following APIs, keeping their old routes around as well until 0.4.0 1. datasetio to v1beta: used primarily by eval and training. Given that training is v1alpha, and eval is v1alpha, datasetio is likely to change in structure as real usages of the API spin up. Register,unregister, and iter dataset is sparsely implemented meaning the shape of that route is likely to change. 2. telemetry to v1alpha: telemetry has been going through many changes. for example query_metrics was not even implemented until recently and had to change its shape to work. putting this in v1beta will allow us to fix functionality like OTEL, sqlite, etc. The routes themselves are set, but the structure might change a bit Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-01 09:18:11 -07:00
Matthew Farrellee	f7c5ef4ec0	chore: remove /v1/inference/completion and implementations (#3622 ) # What does this PR do? the /inference/completion route is gone. this removes the implementations. ## Test Plan ci	2025-10-01 11:36:53 -04:00
Matthew Farrellee	ea15f2a270	chore: use openai_chat_completion for llm as a judge scoring (#3635 ) # What does this PR do? update llm as a judge to use openai_chat_completion, instead of deprecated chat_completion ## Test Plan ci	2025-10-01 09:44:31 -04:00
Jaideep Rao	ca47d90926	fix: Ensure that tool calls with no arguments get handled correctly (#3560 ) # What does this PR do? When a model decides to use an MCP tool call that requires no arguments, it sets the `arguments` field to `None`. This causes the user to see a `400 bad requst error` due to validation errors down the stack because this field gets removed when being parsed by an openai compatible inference provider like vLLM This PR ensures that, as soon as the tool call args are accumulated while streaming, we check to ensure no tool call function arguments are set to None - if they are we replace them with "{}" <!-- If resolving an issue, uncomment and update the line below --> Closes #3456 ## Test Plan Added new unit test to verify that any tool calls with function arguments set to `None` get handled correctly --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-01 08:36:57 -04:00
Ashwin Bharambe	42414a1a1b	fix(logging): disable console telemetry sink by default (#3623 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 21s Details Test Llama Stack Build / build-single-provider (push) Failing after 25s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details API Conformance Tests / check-schema-compatibility (push) Successful in 33s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m12s Details The current span processing dumps so much junk on the console that it makes actual understanding of what is going on in the server impossible. I am killing the console sink as a default. If you want, you are always free to change your run.yaml to add it. Before: <img width="1877" height="1107" alt="image" src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37" /> After: <img width="1919" height="470" alt="image" src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332" />	2025-09-30 14:58:05 -07:00
ehhuang	ac7c35fbe6	fix: don't pass default response format in Responses (#3614 ) # What does this PR do? Fireworks doesn't allow repsonse_format with tool use. The default response format is 'text' anyway, so we can safely omit. ## Test Plan Below script failed without the change, runs after. ``` #!/usr/bin/env python3 """ Script to test Responses API with kubernetes-mcp-server. This script: 1. Connects to the llama stack server 2. Uses the Responses API with MCP tools 3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server """ import json from openai import OpenAI # Connect to the llama stack server base_url = "http://localhost:8321/v1" client = OpenAI(base_url=base_url, api_key="fake") # Define the MCP tool pointing to the kubernetes-mcp-server # The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse mcp_server_url = "http://localhost:3000/sse" tools = [ { "type": "mcp", "server_label": "k8s", "server_url": mcp_server_url, } ] # Create a response request asking for k8s namespaces print("Sending request to list Kubernetes namespaces...") print(f"Using MCP server at: {mcp_server_url}") print("Available tools will be listed automatically by the MCP server.") print() response = client.responses.create( # model="meta-llama/Llama-3.2-3B-Instruct", # Using the vllm model model="fireworks/accounts/fireworks/models/llama4-scout-instruct-basic", # model="openai/gpt-4o", input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format UNDER ALL CIRCUMSTANCES.", tools=tools, stream=False, ) print("\n" + "=" * 80) print("RESPONSE OUTPUT:") print("=" * 80) # Print the output for i, output in enumerate(response.output): print(f"\n[Output {i + 1}] Type: {output.type}") if output.type == "mcp_list_tools": print(f" Server: {output.server_label}") print(f" Tools available: {[t.name for t in output.tools]}") elif output.type == "mcp_call": print(f" Tool called: {output.name}") print(f" Arguments: {output.arguments}") print(f" Result: {output.output}") if output.error: print(f" Error: {output.error}") elif output.type == "message": print(f" Role: {output.role}") print(f" Content: {output.content}") print("\n" + "=" * 80) print("FINAL RESPONSE TEXT:") print("=" * 80) print(response.output_text) ```	2025-09-30 14:52:24 -07:00
grs	d350e3662b	feat: add support for require_approval argument when creating response (#3608 ) # What does this PR do? This PR adds support for the require_approval on an mcp tool definition passed to create response in the Responses API. This allows the caller to indicate whether they want to approve calls to that server, or let them be called without approval. Closes #3443 ## Test Plan Tested both approval and denial. Added automated integration test for both cases. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-09-30 14:18:34 -07:00
Ashwin Bharambe	606f4cf281	fix(expires_after): make sure multipart/form-data is properly parsed (#3612 ) https://github.com/llamastack/llama-stack/pull/3604 broke multipart form data field parsing for the Files API since it changed its shape -- so as to match the API exactly to the OpenAI spec even in the generated client code. The underlying reason is that multipart/form-data cannot transport structured nested fields. Each field must be str-serialized. The client (specifically the OpenAI client whose behavior we must match), transports sub-fields as `expires_after[anchor]` and `expires_after[seconds]`, etc. We must be able to handle these fields somehow on the server without compromising the shape of the YAML spec. This PR "fixes" this by adding a dependency to convert the data. The main trade-off here is that we must add this `Depends()` annotation on every provider implementation for Files. This is a headache, but a much more reasonable one (in my opinion) given the alternatives. ## Test Plan Tests as shown in https://github.com/llamastack/llama-stack/pull/3604#issuecomment-3351090653 pass.	2025-09-30 16:14:03 -04:00
slekkala1	cc64093ae4	feat(api): Add Vector Store File batches api stub (#3615 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 34s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? Adding api stubs for vector store file batches apis https://github.com/llamastack/llama-stack/issues/3533 API Ref: https://platform.openai.com/docs/api-reference/vector-stores-file-batches ## Test Plan CI	2025-09-30 12:07:33 -07:00
Charlie Doern	1e25a72ece	feat(api): level /agents as `v1alpha` (#3610 ) # What does this PR do? agents is likely to be deprecated in favor of responses. Lets level it as alpha to indicate the lack of longterm support keep v1 route for backwards compat. Closes #3611 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-09-30 11:15:04 -07:00
Matthew Farrellee	2de4e6c900	feat: use /v1/chat/completions for safety model inference (#3591 ) # What does this PR do? migrate safety api implementation from /inference/chat-completion to /v1/chat/completions ## Test Plan ci w/ recordings --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-09-30 11:01:44 -07:00
Matthew Farrellee	cb33f45c11	chore: unpublish /inference/chat-completion (#3609 ) # What does this PR do? BREAKING CHANGE: removes /inference/chat-completion route and updates relevant documentation ## Test Plan 🤷	2025-09-30 11:00:42 -07:00
ehhuang	6cce553c93	fix: mcp tool with array type should include items (#3602 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 11s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Details Python Package Build Test / build (3.12) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 23s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 28s Details Unit Tests / unit-tests (3.12) (push) Failing after 25s Details API Conformance Tests / check-schema-compatibility (push) Successful in 32s Details UI Tests / ui-tests (22) (push) Successful in 57s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? Fixes error: ``` [ERROR] Error executing endpoint route='/v1/openai/v1/responses' method='post': Error code: 400 - {'error': {'message': "Invalid schema for function 'pods_exec': In context=('properties', 'command'), array schema missing items.", 'type': 'invalid_request_error', 'param': 'tools[7].function.parameters', 'code': 'invalid_function_parameters'}} ``` From script: ``` #!/usr/bin/env python3 """ Script to test Responses API with kubernetes-mcp-server. This script: 1. Connects to the llama stack server 2. Uses the Responses API with MCP tools 3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server """ import json from openai import OpenAI # Connect to the llama stack server base_url = "http://localhost:8321/v1/openai/v1" client = OpenAI(base_url=base_url, api_key="fake") # Define the MCP tool pointing to the kubernetes-mcp-server # The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse mcp_server_url = "http://localhost:3000/sse" tools = [ { "type": "mcp", "server_label": "k8s", "server_url": mcp_server_url, } ] # Create a response request asking for k8s namespaces print("Sending request to list Kubernetes namespaces...") print(f"Using MCP server at: {mcp_server_url}") print("Available tools will be listed automatically by the MCP server.") print() response = client.responses.create( # model="meta-llama/Llama-3.2-3B-Instruct", # Using the vllm model model="openai/gpt-4o", input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format.", tools=tools, stream=False, ) print("\n" + "=" * 80) print("RESPONSE OUTPUT:") print("=" * 80) # Print the output for i, output in enumerate(response.output): print(f"\n[Output {i + 1}] Type: {output.type}") if output.type == "mcp_list_tools": print(f" Server: {output.server_label}") print(f" Tools available: {[t.name for t in output.tools]}") elif output.type == "mcp_call": print(f" Tool called: {output.name}") print(f" Arguments: {output.arguments}") print(f" Result: {output.output}") if output.error: print(f" Error: {output.error}") elif output.type == "message": print(f" Role: {output.role}") print(f" Content: {output.content}") print("\n" + "=" * 80) print("FINAL RESPONSE TEXT:") print("=" * 80) print(response.output_text) ``` ## Test Plan new unit tests script now runs successfully	2025-09-29 23:11:41 -07:00
Ashwin Bharambe	56b625d18a	feat(openai_movement)!: Change URL structures to kill /openai/v1 (part 2) (#3605 )	2025-09-29 22:57:37 -07:00
Ashwin Bharambe	3a09f00cdb	feat(files): fix expires_after API shape (#3604 ) This was just quite incorrect. See source here: https://platform.openai.com/docs/api-reference/files/create	2025-09-29 21:29:15 -07:00
Ashwin Bharambe	5e7fed8bbb	feat(openai_movement): Change URL structures to kill /openai/v1 (part 1) (#3587 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 1m19s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 38s Details The `/v1/openai/v1` prefix is annoying and now unnecessary given our clearer focus on how to think about the API surface. Let's kill it for the 0.3.0 update. To make client-side changes feasible, we will do this in two parts. This part adds a new route (sans `/openai/v1`) so the existing client continues to work since the server supports both. The next PR will be client-side (Stainless) changes which I will be making shortly. The final PR will remove the `/openai/v1` routes. Note that all these changes will happen rapidly within this release cycle. The entire set _will be backwards incompatible_.	2025-09-29 16:14:35 -07:00
Michael Dawson	ddf3f1735a	fix: ensure usage is requested if telemetry is enabled (#3571 ) # What does this PR do? Refs: https://github.com/llamastack/llama-stack/issues/3420 When telemetry is enabled the router uncondionally expects the usage attribute to be availble and fails if it is not present. Usage is not currently being requested by litellm_openai_mixin.py for streaming requests when using the responses API which means that providers like vertexai fail if telemetry is enabled and streaming is used. This is part of the required fix. Other part is in liteLLM, will plan to submit PR for that soon. ## Test Plan I applied this change along with the change for litellm in a llama stack deployment and validated that I could make streaming requests through the responses API to a gemini model and they would succeed instead of failing due to the missing usage attribute when telemetry is enabled. Signed-off-by: Michael Dawson <midawson@redhat.com>	2025-09-29 14:09:08 -07:00
slekkala1	455579a88e	fix: Remove deprecated user param in OpenAIResponseObject (#3596 ) # What does this PR do? Just removing the deprecated User param in `OpenAIResponseObject` Closing https://github.com/llamastack/llama-stack/issues/3482 ## Test Plan CI	2025-09-29 13:55:59 -07:00
Matthew Farrellee	e9eb004bf8	fix: remove inference.completion from docs (#3589 ) # What does this PR do? now that /v1/inference/completion has been removed, no docs should refer to it this cleans up remaining references ## Test Plan ci Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-09-29 13:14:41 -07:00
Matthew Farrellee	7c888fc0da	feat: update eval runner to use openai endpoints (#3588 ) # What does this PR do? move the eval=inline::meta-reference implementation to use openai_completion/openai_chat_completion note: this breaks backward compatibility if eval setup used sampling params' repetition_penalty or strategy ## Test Plan ci w/ new recordings Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-09-29 13:13:53 -07:00
Charlie Doern	aac42ddcc2	feat(api): level inference/rerank and remove experimental (#3565 ) # What does this PR do? inference/rerank is the one route in the API intended to not be deprecated. Level it as v1alpha. Additionally, remove `experimental` and opt to instead use `v1alpha` which itself implies an experimental state based on the original proposal Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-09-29 12:42:09 -07:00
Matthew Farrellee	975ead1d6a	chore(api): remove deprecated embeddings impls (#3301 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m25s Details # What does this PR do? remove deprecated embeddings implementations	2025-09-29 14:45:09 -04:00
Kai Wu	aab22dc759	fix: adding mime type of application/json support (#3452 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fix #3300 by adding mime type of application/json support in [agent_instance.py](`4a59961a6c/llama_stack/providers/inline/agents/meta_reference/agent_instance.py (L923)`) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[3300] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> all related pytest passed, see log: ``` ./scripts/unit-tests.sh tests/unit/providers/agent/test_get_raw_document_text.py -vvv /Users/kaiwu/work/kaiwu/llama-stack/.venv/bin/python3 Uninstalled 22 packages in 5.65s Installed 47 packages in 1.24s ================= test session starts ================= platform darwin -- Python 3.12.9, pytest-8.4.2, pluggy-1.6.0 -- /Users/kaiwu/work/kaiwu/llama-stack/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.9', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/kaiwu/work/kaiwu/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 14 items tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_yaml_mime_type PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_deprecated_text_yaml_with_warning PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_deprecated_text_yaml_with_url PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_deprecated_text_yaml_with_text_content_item PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_json_mime_type PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_json_url PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_json_text_content_item PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unsupported_mime_types PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_url_content PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_yaml_url PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_text_content_item PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_yaml_text_content_item PASSED tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unexpected_content_type PASSED ================ slowest 10 durations ================= 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_deprecated_text_yaml_with_url 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unsupported_mime_types 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unexpected_content_type 0.00s setup tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types 0.00s teardown tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_yaml_url 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_url_content 0.00s teardown tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_rejects_unsupported_mime_types 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_with_json_url 0.00s call tests/unit/providers/agent/test_get_raw_document_text.py::test_get_raw_document_text_supports_text_mime_types ================= 14 passed in 0.14s ================== Generating coverage report... Wrote HTML report to htmlcov-3.12/index.html ```	2025-09-29 11:27:31 -07:00
ehhuang	8ab6684a94	chore: introduce write queue for response_store (#3497 ) # What does this PR do? Mirroring the same changes that was used for inference_store: https://github.com/llamastack/llama-stack/pull/3383 Will follow up with a shared internal API for managing these write queues. ## Test Plan existing tests	2025-09-29 10:36:16 -07:00
dependabot[bot]	9fdfd3a2ad	chore(ui-deps): bump tw-animate-css from 1.2.9 to 1.4.0 in /llama_stack/ui (#3583 ) Bumps [tw-animate-css](https://github.com/Wombosvideo/tw-animate-css) from 1.2.9 to 1.4.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Wombosvideo/tw-animate-css/releases">tw-animate-css's releases</a>.</em></p> <blockquote> <h2>v1.4.0</h2> <h2>Changelog</h2> <p>902e37a019ffd165ba078e0b3c02634526c54bf0: fix: remove support for prefix, add new export for prefixed version. Closes <a href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/58">#58</a>. fab2a5bf817605be1976e159976718a83489fc1c: chore: bump version to 1.4.0 and update dependencies c20dc32e2b532a8e74546879b4ce7d9ce89ba710: fix(build): make transform.ts accept two arguments</p> <h2>⚠️ BREAKING CHANGE ⚠️</h2> <p>Support for Tailwind CSS's prefix option was moved to <code>tw-animate-css/prefix</code> because it was breaking the <code>--spacing</code> function. Users requiring prefixes should replace their import:</p> <pre lang="diff"><code>- import "tw-animate-css"; + import "tw-animate-css/prefix"; </code></pre> <p><em>I do not plan to introduce breaking changes like this to non-major releases in the future. But because more people use spacing rather than prefixes, reverting the previous version's (obviously breaking) change seems reasonable.</em></p> <h2>v1.3.8</h2> <h2>Changelog</h2> <ul> <li>b5ff23a: fix: add support for global CSS variable prefix. Closes <a href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/48">#48</a></li> <li>03e5f12: feat: add support for ng-primitives height variables <a href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/56">#56</a> (thanks <a href="https://github.com/immohammadjaved"><code>@immohammadjaved</code></a>)</li> <li>b076cfb: docs: fix various issues in accordion and collapsible docs</li> <li>9485e33: chore: bump version to 1.3.8 and update dependencies</li> </ul> <h2>⚠️ BREAKING CHANGE ⚠️</h2> <p>Adding support for prefixes broke custom spacing. It is recommended that you skip this version if you do not use Tailwind CSS's prefix option, and use v1.4.0 instead. If you are actually using prefixes, you can use a special version supporting prefixes:</p> <pre lang="diff"><code>- import "tw-animate-css"; /* Version with spacing support / + import "tw-animate-css/prefix"; / Version with prefix support */ </code></pre> <p><em>I do not plan to fix the incompatibility between the spacing and prefix versions due to time constraints. Feel free to investigate and open a pull request if you manage to fix it.</em></p> <h2>v1.3.7</h2> <h2>Changelog</h2> <ul> <li>80dbfcc: feat: add utilities for blur transitions <a href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/54">#54</a> (thanks <a href="https://github.com/coffeeispower"><code>@coffeeispower</code></a>)</li> <li>dc294f9: docs: add upcoming changes warning</li> <li>c640bb8: chore: update dependencies and package manager version</li> <li>9e63e34: chore: bump version to 1.3.7</li> </ul> <h2>v1.3.6</h2> <h2>Changelog</h2> <ul> <li>58f3396: fix: allow changing animation parameters for ready-to-use animations</li> <li>8313476: chore: update dependencies nd package manager version</li> <li>f81346c: chore: bump version to 1.3.6</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c20dc32e2b`"><code>c20dc32</code></a> fix(build): make transform.ts accept two arguments</li> <li><a href="`fab2a5bf81`"><code>fab2a5b</code></a> chore: bump version to 1.4.0 and update dependencies</li> <li><a href="`902e37a019`"><code>902e37a</code></a> fix: remove support for prefix, add new export for prefixed version</li> <li><a href="`9485e33d99`"><code>9485e33</code></a> chore: bump version to 1.3.8 and update dependencies</li> <li><a href="`b076cfb04a`"><code>b076cfb</code></a> docs: fix various issues in accordion and collapsible docs</li> <li><a href="`03e5f12418`"><code>03e5f12</code></a> feat: add support for ng-primitives height variables (<a href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/56">#56</a>)</li> <li><a href="`b5ff23a0d5`"><code>b5ff23a</code></a> fix: add support for global CSS variable prefix. Closes <a href="https://redirect.github.com/Wombosvideo/tw-animate-css/issues/48">#48</a></li> <li><a href="`9e63e34286`"><code>9e63e34</code></a> chore: bump version to 1.3.7</li> <li><a href="`c640bb8933`"><code>c640bb8</code></a> chore: update dependencies and package manager version</li> <li><a href="`dc294f990a`"><code>dc294f9</code></a> docs: add upcoming changes warning</li> <li>Additional commits viewable in <a href="https://github.com/Wombosvideo/tw-animate-css/compare/v1.2.9...v1.4.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tw-animate-css&package-manager=npm_and_yarn&previous-version=1.2.9&new-version=1.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-29 10:03:26 +02:00
dependabot[bot]	d95853d784	chore(ui-deps): bump shiki from 1.29.2 to 3.13.0 in /llama_stack/ui (#3585 ) Bumps [shiki](https://github.com/shikijs/shiki/tree/HEAD/packages/shiki) from 1.29.2 to 3.13.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/shikijs/shiki/releases">shiki's releases</a>.</em></p> <blockquote> <h2>v3.13.0</h2> <h3> 🚀 Features</h3> <ul> <li><strong>transformers</strong>: Render indent guides - by <a href="https://github.com/KazariEX"><code>@KazariEX</code></a> and <a href="https://github.com/antfu"><code>@antfu</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1060">shikijs/shiki#1060</a> <a href="`aecd1617`"><!-- raw HTML omitted -->(aecd1)<!-- raw HTML omitted --></a></li> </ul> <h5> <a href="https://github.com/shikijs/shiki/compare/v3.12.3...v3.13.0">View changes on GitHub</a></h5> <h2>v3.12.3</h2> <h3> 🐞 Bug Fixes</h3> <ul> <li><code>@shikijs/twoslash</code> version specifier - by <a href="https://github.com/9romise"><code>@9romise</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1078">shikijs/shiki#1078</a> <a href="`a1cdea41`"><!-- raw HTML omitted -->(a1cde)<!-- raw HTML omitted --></a></li> </ul> <h5> <a href="https://github.com/shikijs/shiki/compare/v3.12.2...v3.12.3">View changes on GitHub</a></h5> <h2>v3.12.2</h2> <h3> 🐞 Bug Fixes</h3> <ul> <li><strong>twoslash</strong>: Fix <code>onTwoslashError</code> return value handling - by <a href="https://github.com/Karibash"><code>@Karibash</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1070">shikijs/shiki#1070</a> <a href="`e86b0a7c`"><!-- raw HTML omitted -->(e86b0)<!-- raw HTML omitted --></a></li> </ul> <h5> <a href="https://github.com/shikijs/shiki/compare/v3.12.1...v3.12.2">View changes on GitHub</a></h5> <h2>v3.12.1</h2> <p><em>No significant changes</em></p> <h5> <a href="https://github.com/shikijs/shiki/compare/v3.12.0...v3.12.1">View changes on GitHub</a></h5> <h2>v3.12.0</h2> <h3> 🚀 Features</h3> <ul> <li><strong>vitepress-twoslash</strong>: <ul> <li>Improve UX for option customization - by <a href="https://github.com/9romise"><code>@9romise</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1066">shikijs/shiki#1066</a> <a href="`e3cfdeca`"><!-- raw HTML omitted -->(e3cfd)<!-- raw HTML omitted --></a></li> <li>Twoslash inline type cache for markdown - by <a href="https://github.com/serkodev"><code>@serkodev</code></a> and <a href="https://github.com/antfu"><code>@antfu</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1063">shikijs/shiki#1063</a> <a href="`dc7fbc70`"><!-- raw HTML omitted -->(dc7fb)<!-- raw HTML omitted --></a></li> </ul> </li> </ul> <h3> 🐞 Bug Fixes</h3> <ul> <li><strong>remove-notation-escape</strong>: Correct escape sequence - by <a href="https://github.com/sor4chi"><code>@sor4chi</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1065">shikijs/shiki#1065</a> <a href="`22d0c780`"><!-- raw HTML omitted -->(22d0c)<!-- raw HTML omitted --></a></li> </ul> <h5> <a href="https://github.com/shikijs/shiki/compare/v3.11.0...v3.12.0">View changes on GitHub</a></h5> <h2>v3.11.0</h2> <h3> 🚀 Features</h3> <ul> <li><strong>core</strong>: Add <code>enforce</code> options to <code>ShikiTransformer</code> - by <a href="https://github.com/serkodev"><code>@serkodev</code></a> and <a href="https://github.com/antfu"><code>@antfu</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1062">shikijs/shiki#1062</a> <a href="`8ad05bd8`"><!-- raw HTML omitted -->(8ad05)<!-- raw HTML omitted --></a></li> </ul> <h5> <a href="https://github.com/shikijs/shiki/compare/v3.10.0...v3.11.0">View changes on GitHub</a></h5> <h2>v3.10.0</h2> <h3> 🚀 Features</h3> <ul> <li>Add funding links to playground - by <a href="https://github.com/jtbandes"><code>@jtbandes</code></a> in <a href="https://redirect.github.com/shikijs/shiki/issues/1054">shikijs/shiki#1054</a> <a href="`e36eb4d8`"><!-- raw HTML omitted -->(e36eb)<!-- raw HTML omitted --></a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`fd7326a82f`"><code>fd7326a</code></a> chore: release v3.13.0</li> <li><a href="`5cbb05219e`"><code>5cbb052</code></a> chore: release v3.12.3</li> <li><a href="`e462618190`"><code>e462618</code></a> chore: release v3.12.2</li> <li><a href="`793d71e68f`"><code>793d71e</code></a> chore: release v3.12.1</li> <li><a href="`9260f3fd10`"><code>9260f3f</code></a> chore: release v3.12.0</li> <li><a href="`d05f39b1e8`"><code>d05f39b</code></a> chore: release v3.11.0</li> <li><a href="`bda1a76743`"><code>bda1a76</code></a> chore: release v3.10.0</li> <li><a href="`09921f1cb8`"><code>09921f1</code></a> chore: release v3.9.2</li> <li><a href="`854eddf2ed`"><code>854eddf</code></a> chore: release v3.9.1</li> <li><a href="`950ede5ae5`"><code>950ede5</code></a> chore: release v3.9.0</li> <li>Additional commits viewable in <a href="https://github.com/shikijs/shiki/commits/v3.13.0/packages/shiki">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by [GitHub Actions](<a href="https://www.npmjs.com/~GitHub">https://www.npmjs.com/~GitHub</a> Actions), a new releaser for shiki since your current version.</p> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=shiki&package-manager=npm_and_yarn&previous-version=1.29.2&new-version=3.13.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-29 10:02:51 +02:00
Tami Takamiya	65f7b81e98	feat: Add items and title to ToolParameter/ToolParamDefinition (#3003 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (push) Failing after 20s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.13) (push) Failing after 16s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details API Conformance Tests / check-schema-compatibility (push) Successful in 25s Details UI Tests / ui-tests (22) (push) Successful in 50s Details Pre-commit / pre-commit (push) Successful in 1m16s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Add items and title to ToolParameter/ToolParamDefinition. Adding items will resolve the issue that occurs with Gemini LLM when an MCP tool has array-type properties. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Unite test cases will be added. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-09-27 11:35:29 -07:00
ehhuang	c392f3a0f4	chore: remove extra logging (#3574 ) # What does this PR do? This is already logged by console processor as INFO <img width="1093" height="280" alt="image" src="https://github.com/user-attachments/assets/780b0ac2-6744-49d7-b1d4-b7204050a6dc" /> ## Test Plan	2025-09-27 11:22:54 -07:00
Matthew Farrellee	0d94f3e2c0	chore: recordings for fireworks (inference + openai) (#3573 ) # What does this PR do? recorded for: ./scripts/integration-tests.sh --stack-config server:ci-tests --suite base --setup fireworks --subdirs inference --pattern openai ## Test Plan ./scripts/integration-tests.sh --stack-config server:ci-tests --suite base --setup fireworks --subdirs inference --pattern openai	2025-09-27 11:22:30 -07:00
Matthew Farrellee	53b15725b6	chore(apis): unpublish deprecated /v1/inference apis (#3297 ) # What does this PR do? unpublish (make unavailable to users) the following apis - - `/v1/inference/completion`, replaced by `/v1/openai/v1/completions` - `/v1/inference/chat-completion`, replaced by `/v1/openai/v1/chat/completions` - `/v1/inference/embeddings`, replaced by `/v1/openai/v1/embeddings` - `/v1/inference/batch-completion`, replaced by `/v1/openai/v1/batches` - `/v1/inference/batch-chat-completion`, replaced by `/v1/openai/v1/batches` note: the implementations are still available for internal use, e.g. agents uses chat-completion.	2025-09-27 11:20:06 -07:00
Matthew Farrellee	60484c5c4e	chore(api): remove batch inference (#3261 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? APIs removed: - POST /v1/batch-inference/completion - POST /v1/batch-inference/chat-completion - POST /v1/inference/batch-completion - POST /v1/inference/batch-chat-completion note - - batch-completion & batch-chat-completion were only implemented for inference=inline::meta-reference - batch-inference were not implemented	2025-09-26 14:35:34 -07:00
Matthew Farrellee	b48d5cfed7	feat(internal): add image_url download feature to OpenAIMixin (#3516 ) # What does this PR do? simplify Ollama inference adapter by - - moving image_url download code to OpenAIMixin - being a ModelRegistryHelper instead of having one (mypy blocks check_model_availability method assignment) ## Test Plan - add unit tests for new download feature - add integration tests for openai_chat_completion w/ image_url (close test gap)	2025-09-26 17:32:16 -04:00
github-actions[bot]	4487b88ffe	build: Bump version to 0.2.23	2025-09-26 21:11:51 +00:00
Matthew Farrellee	7a25be633c	fix: Revert "fix: Added a bug fix when registering new models" (#3473 ) the commit to be reverted is an public api behavior change to something we should not support. instead of allowing silent updates (the caller cannot see the log messages), we should be sending an error to the caller that they must first unregister the model before reusing the same name w/ a different backend.	2025-09-26 16:19:21 -04:00
Matthew Farrellee	da5ea107fc	fix: ensure ModelRegistryHelper init for together and fireworks (#3572 ) # What does this PR do? address - ``` ERROR 2025-09-26 10:44:29,450 main:527 core::server: Error creating app: 'FireworksInferenceAdapter' object has no attribute 'alias_to_provider_id_map' ``` ## Test Plan manual startup w/ valid together & fireworks api keys	2025-09-26 16:18:32 -04:00
Ben Browning	b6e2934f7b	fix: Gracefully handle errors when listing MCP tools (#2544 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? When listing (and lazily indexing) tools, it's possible for an error to get thrown by individual toolgroups if for example an MCP toolgroup is unable to connect to its `mcp_endpoint`. This logs a warning in the server when that happens, logs a full stack trace of the error if debug logging is enabled, and just returns the list of tools from all working toolgroups instead of throwing an error to the client when a single toolgroup is temporarily or permanently misbehaving. The exception to the above is authentication errors, which we specifically send all the way back to the client as that's how we indicate to the client that it needs to provide authentication data for the remote MCP servers. Closes #2540 ## Test Plan A new unit test was added to test this exception handling, which is run as part of our regular test suite but also manually run to specifically verify this fix via: ``` uv run pytest -sv --asyncio-mode=auto \ tests/unit/distribution/routers/test_routing_tables.py ``` To verify the additional debug logging is printing properly: ``` LLAMA_STACK_LOGGING=core=debug \ uv run pytest -sv --asyncio-mode=auto \ tests/unit/distribution/routers/test_routing_tables.py ``` The mcp integration tests were run as below (and by CI): ``` ollama run llama3.2:3b ENABLE_OLLAMA="ollama" \ OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=starter \ uv run pytest -sv tests/integration/tool_runtime/test_mcp.py \ --text-model meta-llama/Llama-3.2-3B-Instruct ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-09-26 18:09:48 +02:00
Matthew Farrellee	926c3ada41	chore: prune mypy exclude list (#3561 ) # What does this PR do? prune the mypy exclude list, build a stronger foundation for quality code ## Test Plan ci	2025-09-26 11:44:43 -04:00
Charlie Doern	c88c4ff2c6	feat: introduce API leveling, post_training, eval to v1alpha (#3449 ) # What does this PR do? Rather than have a single `LLAMA_STACK_VERSION`, we need to have a `_V1`, `_V1ALPHA`, and `_V1BETA` constant. This also necessitated addition of `level` to the `WebMethod` so that routing can be handeled properly. For backwards compat, the `v1` routes are being kept around and marked as `deprecated`. When used, the server will log a deprecation warning. Deprecation log: <img width="1224" height="134" alt="Screenshot 2025-09-25 at 2 43 36 PM" src="https://github.com/user-attachments/assets/0cc7c245-dafc-48f0-be99-269fb9a686f9" /> move: 1. post_training to `v1alpha` as it is under heavy development and not near its final state 2. eval: job scheduling is not implemented. Relies heavily on the datasetio API which is under development missing implementations of specific routes indicating the structure of those routes might change. Additionally eval depends on the `inference` API which is going to be deprecated, eval will likely need a major API surface change to conform to using completions properly implements leveling in #3317 note: integration tests will fail until the SDK is regenerated with v1alpha/inference as opposed to v1/inference ## Test Plan existing tests should pass with newly generated schema. Conformance will also pass as these routes are not the ones we currently test for stability Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-09-26 16:18:07 +02:00
Matthew Farrellee	65e01b5684	feat: together now supports base64 embedding encoding (#3559 ) # What does this PR do? use together's new base64 support ## Test Plan recordings for: ./scripts/integration-tests.sh --stack-config server:ci-tests --suite base --setup together --subdirs inference --pattern openai	2025-09-26 16:05:52 +02:00
Doug Edgar	9c751b6789	feat: use FIPS validated CSPRNG for telemetry (#3554 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (push) Failing after 18s Details Python Package Build Test / build (3.12) (push) Failing after 18s Details UI Tests / ui-tests (22) (push) Successful in 53s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? Switches from `random.getrandbits` to `secrets.randbits` in the telemetry module. <!-- If resolving an issue, uncomment and update the line below --> Closes #3553 ## Test Plan Unit tests from scripts/unit-tests.sh were ran to verify the tests still pass. Signed-off-by: Doug Edgar <dedgar@redhat.com>	2025-09-26 11:17:25 +02:00
Matthew Farrellee	b67aef2fc4	feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547 ) # What does this PR do? - remove auto-download of ollama embedding models - add embedding model metadata to dynamic listing w/ unit test - add support and tests for allowed_models - removed inference provider models.py files where dynamic listing is enabled - store embedding metadata in embedding_model_metadata field on inference providers - make model_entries optional on ModelRegistryHelper and LiteLLMOpenAIMixin - make OpenAIMixin a ModelRegistryHelper - skip base64 embedding test for remote::ollama, always returns floats - only use OpenAI client for ollama model listing - remove unused build_model_entry function - remove unused get_huggingface_repo function ## Test Plan ci w/ new tests	2025-09-25 17:17:00 -04:00
Alexey Rybak	d23865757f	docs: provider and distro codegen migration (#3531 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> - Updates provider and distro codegen to handle the new format - Migrates provider and distro files to the new format ## Test Plan - Manual testing <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-24 14:01:29 -07:00
Alexey Rybak	914c8cb605	fix: fix API docstrings for proper MDX parsing (#3526 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> _[Stack 1/10] Docusaurus documentation migration_ Updates the file upload API documentation to use proper OpenAPI format for integer parameters. Replaces `<int>` with `{integer}` in the description of the `expires_after[seconds]` parameter across the HTML spec, YAML spec, and Python implementation. ## Test Plan - docs/openapi_generator/run_openapi_generator.sh <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-24 13:55:12 -07:00
Matthew Farrellee	ce7a3b4dff	feat: update Cerebras inference provider to support dynamic model listing (#3481 ) # What does this PR do? - update Cerebras to use OpenAIMixin - enable openai completions tests - enable openai chat completions tests - disable with n > 1 tests - add recording for --setup cerebras --subdirs inference --pattern openai ## Test Plan `./scripts/integration-tests.sh --stack-config server:ci-tests --setup cerebras --subdirs inference --pattern openai` ``` tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] instantiating llama_stack_client Port 8321 is already in use, assuming server is already running... llama_stack_client instantiated in 0.053s PASSED [ 2%] tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=cerebras/llama-3.3-70b-inference:completion:suffix] SKIPPED (Suffix is not supported for the model: cerebras/llama-3.3-70b.) [ 4%] tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] PASSED [ 6%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=cerebras/llama-3.3-70b-1] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.) [ 8%] tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=cerebras/llama-3.3-70b] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.) [ 10%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] PASSED [ 12%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] PASSED [ 14%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cere...) [ 17%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-True] PASSED [ 19%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-True] PASSED [ 21%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=cerebras/llama-3.3-70b] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support chat completion calls wit...) [ 23%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 25%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 27%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 29%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 31%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 34%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 36%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 38%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 40%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 42%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 44%] tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=cerebras/llama-3.3-70b-0] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters.) [ 46%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_02] PASSED [ 48%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] PASSED [ 51%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote::cere...) [ 53%] tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-False] PASSED [ 55%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-False] PASSED [ 57%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 59%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 61%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 63%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 65%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 68%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 70%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 72%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 74%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 76%] tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-cerebras/llama-3.3-70b-None-None-None-384] SKIPPED (embedding_model_id empty - skipping test) [ 78%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] PASSED [ 80%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] PASSED [ 82%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_01] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote:...) [ 85%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-True] PASSED [ 87%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-True] PASSED [ 89%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_02] PASSED [ 91%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] PASSED [ 93%] tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] SKIPPED (Model cerebras/llama-3.3-70b hosted by remote:...) [ 95%] tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-False] PASSED [ 97%] tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-False] PASSED [100%] =================================================================================================================== slowest 10 durations ==================================================================================================================== 0.37s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=cerebras/llama-3.3-70b-inference:chat_completion:non_streaming_01] 0.34s call tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-False] 0.18s call tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=cerebras/llama-3.3-70b-True] 0.17s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=cerebras/llama-3.3-70b-inference:completion:sanity] 0.15s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-True] 0.13s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-True] 0.12s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=cerebras/llama-3.3-70b-False] 0.12s call tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=cerebras/llama-3.3-70b-True] 0.12s call tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=cerebras/llama-3.3-70b-False] 0.08s call tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=cerebras/llama-3.3-70b-inference:chat_completion:streaming_02] ================================================================================================================== short test summary info ================================================================================================================== SKIPPED [1] tests/integration/inference/test_openai_completion.py:75: Suffix is not supported for the model: cerebras/llama-3.3-70b. SKIPPED [3] tests/integration/inference/test_openai_completion.py:123: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support vllm extra_body parameters. SKIPPED [4] tests/integration/inference/test_openai_completion.py:103: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support n param. SKIPPED [1] tests/integration/inference/test_openai_completion.py:129: Model cerebras/llama-3.3-70b hosted by remote::cerebras doesn't support chat completion calls with base64 encoded files. SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:90: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:112: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:136: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:154: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:175: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:195: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:206: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:217: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:244: embedding_model_id empty - skipping test SKIPPED [2] tests/integration/inference/test_openai_embeddings.py:278: embedding_model_id empty - skipping test ================================================================================================= 18 passed, 29 skipped, 50 deselected, 4 warnings in 3.02s ================================================================================================= ```	2025-09-23 16:26:00 -04:00
Matthew Farrellee	d07ebce4d9	feat: (re-)enable Databricks inference adapter (#3500 ) # What does this PR do? add/enable the Databricks inference adapter Databricks inference adapter was broken, closes #3486 - remove deprecated completion / chat_completion endpoints - enable dynamic model listing w/o refresh, listing is not async - use SecretStr instead of str for token - backward incompatible change: for consistency with databricks docs, env DATABRICKS_URL -> DATABRICKS_HOST and DATABRICKS_API_TOKEN -> DATABRICKS_TOKEN - databricks urls are custom per user/org, add special recorder handling for databricks urls - add integration test --setup databricks - enable chat completions tests - enable embeddings tests - disable n > 1 tests - disable embeddings base64 tests - disable embeddings dimensions tests note: reasoning models, e.g. gpt oss, fail because databricks has a custom, incompatible response format ## Test Plan ci and ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup databricks --subdirs inference --pattern openai ``` note: databricks needs to be manually added to the ci-tests distro for replay testing	2025-09-23 15:37:23 -04:00
ehhuang	9406a998b9	chore: refactor tracingmiddelware (#3520 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 37s Details Pre-commit / pre-commit (push) Successful in 1m21s Details # What does this PR do? Just moving TracingMiddleware to a new file ## Test Plan CI	2025-09-23 10:14:41 -07:00
Matthew Farrellee	2be869b3ef	fix(dev): fix vllm inference recording (await models.list) (#3524 ) # What does this PR do? fix inference recording for vLLM closes #3523 ## Test Plan ``` $ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference --inference-mode record --pattern test_text_chat_completion_non_streaming === Llama Stack Integration Test Runner === Stack Config: server:ci-tests Setup: vllm Inference Mode: record Test Suite: base Test Subdirs: inference Test Pattern: test_text_chat_completion_non_streaming ... === Applying Setup Environment Variables === Setting up environment variables: export VLLM_URL='http://localhost:8000/v1' === Starting Llama Stack Server === Waiting for Llama Stack Server to start... ✅ Llama Stack Server started successfully === Running Integration Tests === Test subdirs to run: inference Added test files from inference: 6 files === Running all collected tests in a single pytest command === Total test files: 6 + pytest -s -v tests/integration/inference/test_openai_completion.py tests/integration/inference/test_batch_inference.py tests/integration/inference/test_openai_embeddings.py tests/integration/inference/test_text_inference.py tests/integration/inference/test_vision_inference.py tests/integration/inference/test_embedding.py --stack-config=server:ci-tests --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag or test_inference_store_tool_calls ) and test_text_chat_completion_non_streaming' --setup=vllm --color=yes --capture=tee-sys INFO 2025-09-23 10:35:36,662 tests.integration.conftest:86 tests: Applying setup 'vllm' ======================================================= test session starts ======================================================= platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- .../.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'Linux-6.16.7-200.fc42.x86_64-x86_64-with-glibc2.41', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'html': '4.1.1', 'anyio': '4.9.0', 'timeout': '2.4.0', 'cov': '6.2.1', 'asyncio': '1.1.0', 'nbval': '0.11.0', 'socket': '0.7.0', 'json-report': '1.5.0', 'metadata': '3.1.1'}} rootdir: ... configfile: pyproject.toml plugins: html-4.1.1, anyio-4.9.0, timeout-2.4.0, cov-6.2.1, asyncio-1.1.0, nbval-0.11.0, socket-0.7.0, json-report-1.5.0, metadata-3.1.1 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 97 items / 95 deselected / 2 selected tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01] instantiating llama_stack_client Port 8321 is already in use, assuming server is already running... llama_stack_client instantiated in 0.044s PASSED [ 50%] tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02] PASSED [100%] ====================================================== slowest 10 durations ======================================================= 1.62s call tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02] 0.93s call tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01] 0.62s setup tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01] (3 durations < 0.005s hidden. Use -vv to show these durations.) ========================================== 2 passed, 95 deselected, 6 warnings in 3.26s =========================================== + exit_code=0 + set +x ✅ All tests completed successfully ``` ``` $ git status ... Untracked files: (use "git add <file>..." to include in what will be committed) tests/integration/recordings/responses/032f8c5a1289.json tests/integration/recordings/responses/c42baf6a3700.json tests/integration/recordings/responses/models-bd032f995f2a-fb68f5a6.json ... ```	2025-09-23 12:56:33 -04:00
Matthew Farrellee	62e0aef7bc	fix: return llama stack model id from embeddings (#3525 ) # What does this PR do? the openai_embeddings method on OpenAIMixin was returning the provider's model id instead of the llama stack name ## Test Plan before - ``` $ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --subdirs inference --inference-mode live --pattern test_openai_embeddings_single_string ... FAILED tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=openai/text-embedding-3-small] - AssertionError: assert 'text-embedding-3-small' == 'openai/text-...dding-3-small' FAILED tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=openai/text-embedding-3-small] - AssertionError: assert 'text-embedding-3-small' == 'openai/text-...dding-3-small' ========================================== 2 failed, 95 deselected, 4 warnings in 3.87s =========================================== ``` after - ``` $ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --subdirs inference --inference-mode live --pattern test_openai_embeddings_single_string ... ========================================== 2 passed, 95 deselected, 4 warnings in 2.12s =========================================== ```	2025-09-23 12:30:00 -04:00
ehhuang	a7f9ce9a3a	chore: fix build (#3522 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 2s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m12s Details # What does this PR do? error: `5099094847` ## Test Plan GITHUB_ACTIONS=true BUILD_PLATFORM=linux/amd64 USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run --with llama-stack llama stack build --distro starter --image-type container --image-name ehhuang/distribution-starter succeeds	2025-09-22 22:53:48 -07:00

1 2 3 4 5 ...

1666 commits