From 1be66d754e7fb4f8dcf35d388afbb8ddc85e7449 Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Thu, 10 Apr 2025 04:04:17 -0400 Subject: [PATCH 01/39] docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923) # What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang --- .../source/distributions/self_hosted_distro/remote-vllm.md | 7 +++++-- llama_stack/templates/remote-vllm/doc_template.md | 7 +++++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/source/distributions/self_hosted_distro/remote-vllm.md b/docs/source/distributions/self_hosted_distro/remote-vllm.md index 457d703b3..e18b5bf40 100644 --- a/docs/source/distributions/self_hosted_distro/remote-vllm.md +++ b/docs/source/distributions/self_hosted_distro/remote-vllm.md @@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | -You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference. +You can use this distribution if you want to run an independent vLLM server for inference. ### Environment Variables @@ -41,7 +41,10 @@ The following environment variables can be configured: ## Setting up vLLM server -Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider. +In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM +server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also +[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and +that we only use GPUs here for demonstration purposes. ### Setting up vLLM server on AMD GPU diff --git a/llama_stack/templates/remote-vllm/doc_template.md b/llama_stack/templates/remote-vllm/doc_template.md index 7543e8239..efcdb62c6 100644 --- a/llama_stack/templates/remote-vllm/doc_template.md +++ b/llama_stack/templates/remote-vllm/doc_template.md @@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following {{ providers_table }} -You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference. +You can use this distribution if you want to run an independent vLLM server for inference. {% if run_config_env_vars %} ### Environment Variables @@ -28,7 +28,10 @@ The following environment variables can be configured: ## Setting up vLLM server -Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider. +In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM +server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also +[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and +that we only use GPUs here for demonstration purposes. ### Setting up vLLM server on AMD GPU From 1f2df59ecee2070e49053173d57b1ee44a5f049e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Han?= Date: Thu, 10 Apr 2025 18:37:48 +0200 Subject: [PATCH 02/39] docs: fix model name (#1926) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? Use llama3.2:3b for consistency. Signed-off-by: Sébastien Han --- docs/source/getting_started/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index e9ad51961..82329e60e 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -9,10 +9,10 @@ In this guide, we'll walk through how to build a RAG agent locally using Llama S ### 1. Download a Llama model with Ollama ```bash -ollama pull llama3.2:3b-instruct-fp16 +ollama pull llama3.2:3b ``` -This will instruct the Ollama service to download the Llama 3.2 3B Instruct model, which we'll use in the rest of this guide. +This will instruct the Ollama service to download the Llama 3.2 3B model, which we'll use in the rest of this guide. ```{admonition} Note :class: tip @@ -176,7 +176,7 @@ python inference.py ``` Sample output: ``` -Model: llama3.2:3b-instruct-fp16 +Model: llama3.2:3b Here is a haiku about coding: Lines of code unfold From 09a83b1ec1767242b7949532b07f68ac5b1c97b5 Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Thu, 10 Apr 2025 10:38:57 -0600 Subject: [PATCH 03/39] docs: Updating background color for code in darkmode (#1930) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? A small quality of life adjustment to make the code background for darkmode black. Makes it much easier to differentiate between code and non-code text. From: Screenshot 2025-04-10 at 9 22 23 AM To: Screenshot 2025-04-10 at 9 22 43 AM The CSS was sourced from here: https://github.com/MrDogeBro/sphinx_rtd_dark_mode/blob/main/sphinx_rtd_dark_mode/static/dark_mode_css/dark.css Signed-off-by: Francisco Javier Arceo --- docs/_static/css/my_theme.css | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/_static/css/my_theme.css b/docs/_static/css/my_theme.css index ccd7d2060..470452661 100644 --- a/docs/_static/css/my_theme.css +++ b/docs/_static/css/my_theme.css @@ -16,3 +16,7 @@ .hide-title h1 { display: none; } + +html[data-theme="dark"] .rst-content div[class^="highlight"] { + background-color: #0b0b0b; +} From 14146e4b3f2757b03f449d74b3498d17353bdcb5 Mon Sep 17 00:00:00 2001 From: ehhuang Date: Thu, 10 Apr 2025 10:26:19 -0700 Subject: [PATCH 04/39] feat(verification): various improvements (#1921) # What does this PR do? - provider and their models now live in config.yaml - better distinguish different cases within a test - add model key to surface provider's model_id - include example command to rerun single test case ## Test Plan image --- tests/verifications/REPORT.md | 125 +- tests/verifications/conf/cerebras.yaml | 10 + tests/verifications/conf/fireworks.yaml | 14 + tests/verifications/conf/groq.yaml | 14 + tests/verifications/conf/openai.yaml | 9 + tests/verifications/conf/together.yaml | 14 + tests/verifications/conftest.py | 67 +- tests/verifications/generate_report.py | 415 +-- .../verifications/openai/fixtures/fixtures.py | 97 - .../openai/test_chat_completion.py | 202 -- .../{openai => openai_api}/__init__.py | 0 .../fixtures/__init__.py | 0 .../openai_api/fixtures/fixtures.py | 105 + .../{openai => openai_api}/fixtures/load.py | 0 .../fixtures/test_cases/chat_completion.yaml | 53 +- .../openai_api/test_chat_completion.py | 271 ++ .../test_results/fireworks_1744154308.json | 2744 ---------------- .../test_results/fireworks_1744264202.json | 1329 ++++++++ .../test_results/openai_1744154522.json | 2672 ---------------- .../test_results/openai_1744264304.json | 868 +++++ .../test_results/together_1744154399.json | 2830 ----------------- .../test_results/together_1744264258.json | 1420 +++++++++ 22 files changed, 4449 insertions(+), 8810 deletions(-) create mode 100644 tests/verifications/conf/cerebras.yaml create mode 100644 tests/verifications/conf/fireworks.yaml create mode 100644 tests/verifications/conf/groq.yaml create mode 100644 tests/verifications/conf/openai.yaml create mode 100644 tests/verifications/conf/together.yaml delete mode 100644 tests/verifications/openai/fixtures/fixtures.py delete mode 100644 tests/verifications/openai/test_chat_completion.py rename tests/verifications/{openai => openai_api}/__init__.py (100%) rename tests/verifications/{openai => openai_api}/fixtures/__init__.py (100%) create mode 100644 tests/verifications/openai_api/fixtures/fixtures.py rename tests/verifications/{openai => openai_api}/fixtures/load.py (100%) rename tests/verifications/{openai => openai_api}/fixtures/test_cases/chat_completion.yaml (78%) create mode 100644 tests/verifications/openai_api/test_chat_completion.py delete mode 100644 tests/verifications/test_results/fireworks_1744154308.json create mode 100644 tests/verifications/test_results/fireworks_1744264202.json delete mode 100644 tests/verifications/test_results/openai_1744154522.json create mode 100644 tests/verifications/test_results/openai_1744264304.json delete mode 100644 tests/verifications/test_results/together_1744154399.json create mode 100644 tests/verifications/test_results/together_1744264258.json diff --git a/tests/verifications/REPORT.md b/tests/verifications/REPORT.md index d5715ae21..449499382 100644 --- a/tests/verifications/REPORT.md +++ b/tests/verifications/REPORT.md @@ -1,6 +1,6 @@ # Test Results Report -*Generated on: 2025-04-08 21:14:02* +*Generated on: 2025-04-09 22:52:19* *This report was generated by running `python tests/verifications/generate_report.py`* @@ -23,66 +23,107 @@ ## Together -*Tests run on: 2025-04-08 16:19:59* +*Tests run on: 2025-04-09 22:50:58* ```bash -pytest tests/verifications/openai/test_chat_completion.py --provider=together -v +# Run all tests for this provider: +pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -v + +# Example: Run only the 'earth' case of test_chat_non_streaming_basic: +pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -k "test_chat_non_streaming_basic and earth" ``` -| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-17B-128E-Instruct | Llama-4-Scout-17B-16E-Instruct | + +**Model Key (Together)** + +| Display Name | Full Model ID | +| --- | --- | +| Llama-3.3-70B-Instruct | `meta-llama/Llama-3.3-70B-Instruct-Turbo` | +| Llama-4-Maverick-Instruct | `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` | +| Llama-4-Scout-Instruct | `meta-llama/Llama-4-Scout-17B-16E-Instruct` | + + +| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-Instruct | Llama-4-Scout-Instruct | | --- | --- | --- | --- | -| test_chat_non_streaming_basic (case 0) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_basic (case 1) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_image (case 0) | ⚪ | ✅ | ✅ | -| test_chat_non_streaming_structured_output (case 0) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_structured_output (case 1) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_tool_calling (case 0) | ✅ | ✅ | ✅ | -| test_chat_streaming_basic (case 0) | ✅ | ❌ | ❌ | -| test_chat_streaming_basic (case 1) | ✅ | ❌ | ❌ | -| test_chat_streaming_image (case 0) | ⚪ | ❌ | ❌ | -| test_chat_streaming_structured_output (case 0) | ✅ | ❌ | ❌ | -| test_chat_streaming_structured_output (case 1) | ✅ | ❌ | ❌ | +| test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_image | ⚪ | ✅ | ✅ | +| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_tool_calling | ✅ | ✅ | ✅ | +| test_chat_streaming_basic (earth) | ✅ | ❌ | ❌ | +| test_chat_streaming_basic (saturn) | ✅ | ❌ | ❌ | +| test_chat_streaming_image | ⚪ | ❌ | ❌ | +| test_chat_streaming_structured_output (calendar) | ✅ | ❌ | ❌ | +| test_chat_streaming_structured_output (math) | ✅ | ❌ | ❌ | ## Fireworks -*Tests run on: 2025-04-08 16:18:28* +*Tests run on: 2025-04-09 22:50:02* ```bash -pytest tests/verifications/openai/test_chat_completion.py --provider=fireworks -v +# Run all tests for this provider: +pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -v + +# Example: Run only the 'earth' case of test_chat_non_streaming_basic: +pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -k "test_chat_non_streaming_basic and earth" ``` -| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-17B-128E-Instruct | Llama-4-Scout-17B-16E-Instruct | + +**Model Key (Fireworks)** + +| Display Name | Full Model ID | +| --- | --- | +| Llama-3.3-70B-Instruct | `accounts/fireworks/models/llama-v3p3-70b-instruct` | +| Llama-4-Maverick-Instruct | `accounts/fireworks/models/llama4-maverick-instruct-basic` | +| Llama-4-Scout-Instruct | `accounts/fireworks/models/llama4-scout-instruct-basic` | + + +| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-Instruct | Llama-4-Scout-Instruct | | --- | --- | --- | --- | -| test_chat_non_streaming_basic (case 0) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_basic (case 1) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_image (case 0) | ⚪ | ✅ | ✅ | -| test_chat_non_streaming_structured_output (case 0) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_structured_output (case 1) | ✅ | ✅ | ✅ | -| test_chat_non_streaming_tool_calling (case 0) | ✅ | ❌ | ❌ | -| test_chat_streaming_basic (case 0) | ✅ | ✅ | ✅ | -| test_chat_streaming_basic (case 1) | ✅ | ✅ | ✅ | -| test_chat_streaming_image (case 0) | ⚪ | ✅ | ✅ | -| test_chat_streaming_structured_output (case 0) | ✅ | ✅ | ✅ | -| test_chat_streaming_structured_output (case 1) | ❌ | ✅ | ✅ | +| test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_image | ⚪ | ✅ | ✅ | +| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_tool_calling | ❌ | ❌ | ❌ | +| test_chat_streaming_basic (earth) | ✅ | ✅ | ✅ | +| test_chat_streaming_basic (saturn) | ✅ | ✅ | ✅ | +| test_chat_streaming_image | ⚪ | ✅ | ✅ | +| test_chat_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | +| test_chat_streaming_structured_output (math) | ✅ | ✅ | ✅ | ## Openai -*Tests run on: 2025-04-08 16:22:02* +*Tests run on: 2025-04-09 22:51:44* ```bash -pytest tests/verifications/openai/test_chat_completion.py --provider=openai -v +# Run all tests for this provider: +pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -v + +# Example: Run only the 'earth' case of test_chat_non_streaming_basic: +pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -k "test_chat_non_streaming_basic and earth" ``` + +**Model Key (Openai)** + +| Display Name | Full Model ID | +| --- | --- | +| gpt-4o | `gpt-4o` | +| gpt-4o-mini | `gpt-4o-mini` | + + | Test | gpt-4o | gpt-4o-mini | | --- | --- | --- | -| test_chat_non_streaming_basic (case 0) | ✅ | ✅ | -| test_chat_non_streaming_basic (case 1) | ✅ | ✅ | -| test_chat_non_streaming_image (case 0) | ✅ | ✅ | -| test_chat_non_streaming_structured_output (case 0) | ✅ | ✅ | -| test_chat_non_streaming_structured_output (case 1) | ✅ | ✅ | -| test_chat_non_streaming_tool_calling (case 0) | ✅ | ✅ | -| test_chat_streaming_basic (case 0) | ✅ | ✅ | -| test_chat_streaming_basic (case 1) | ✅ | ✅ | -| test_chat_streaming_image (case 0) | ✅ | ✅ | -| test_chat_streaming_structured_output (case 0) | ✅ | ✅ | -| test_chat_streaming_structured_output (case 1) | ✅ | ✅ | +| test_chat_non_streaming_basic (earth) | ✅ | ✅ | +| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | +| test_chat_non_streaming_image | ✅ | ✅ | +| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | +| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | +| test_chat_non_streaming_tool_calling | ✅ | ✅ | +| test_chat_streaming_basic (earth) | ✅ | ✅ | +| test_chat_streaming_basic (saturn) | ✅ | ✅ | +| test_chat_streaming_image | ✅ | ✅ | +| test_chat_streaming_structured_output (calendar) | ✅ | ✅ | +| test_chat_streaming_structured_output (math) | ✅ | ✅ | diff --git a/tests/verifications/conf/cerebras.yaml b/tests/verifications/conf/cerebras.yaml new file mode 100644 index 000000000..32a60e766 --- /dev/null +++ b/tests/verifications/conf/cerebras.yaml @@ -0,0 +1,10 @@ +base_url: https://api.cerebras.ai/v1 +api_key_var: CEREBRAS_API_KEY +models: +- llama-3.3-70b +model_display_names: + llama-3.3-70b: Llama-3.3-70B-Instruct +test_exclusions: + llama-3.3-70b: + - test_chat_non_streaming_image + - test_chat_streaming_image \ No newline at end of file diff --git a/tests/verifications/conf/fireworks.yaml b/tests/verifications/conf/fireworks.yaml new file mode 100644 index 000000000..30d6e4d75 --- /dev/null +++ b/tests/verifications/conf/fireworks.yaml @@ -0,0 +1,14 @@ +base_url: https://api.fireworks.ai/inference/v1 +api_key_var: FIREWORKS_API_KEY +models: +- accounts/fireworks/models/llama-v3p3-70b-instruct +- accounts/fireworks/models/llama4-scout-instruct-basic +- accounts/fireworks/models/llama4-maverick-instruct-basic +model_display_names: + accounts/fireworks/models/llama-v3p3-70b-instruct: Llama-3.3-70B-Instruct + accounts/fireworks/models/llama4-scout-instruct-basic: Llama-4-Scout-Instruct + accounts/fireworks/models/llama4-maverick-instruct-basic: Llama-4-Maverick-Instruct +test_exclusions: + accounts/fireworks/models/llama-v3p3-70b-instruct: + - test_chat_non_streaming_image + - test_chat_streaming_image \ No newline at end of file diff --git a/tests/verifications/conf/groq.yaml b/tests/verifications/conf/groq.yaml new file mode 100644 index 000000000..ef31a66e5 --- /dev/null +++ b/tests/verifications/conf/groq.yaml @@ -0,0 +1,14 @@ +base_url: https://api.groq.com/openai/v1 +api_key_var: GROQ_API_KEY +models: +- llama-3.3-70b-versatile +- llama-4-scout-17b-16e-instruct +- llama-4-maverick-17b-128e-instruct +model_display_names: + llama-3.3-70b-versatile: Llama-3.3-70B-Instruct + llama-4-scout-17b-16e-instruct: Llama-4-Scout-Instruct + llama-4-maverick-17b-128e-instruct: Llama-4-Maverick-Instruct +test_exclusions: + llama-3.3-70b-versatile: + - test_chat_non_streaming_image + - test_chat_streaming_image \ No newline at end of file diff --git a/tests/verifications/conf/openai.yaml b/tests/verifications/conf/openai.yaml new file mode 100644 index 000000000..89ae698f3 --- /dev/null +++ b/tests/verifications/conf/openai.yaml @@ -0,0 +1,9 @@ +base_url: https://api.openai.com/v1 +api_key_var: OPENAI_API_KEY +models: +- gpt-4o +- gpt-4o-mini +model_display_names: + gpt-4o: gpt-4o + gpt-4o-mini: gpt-4o-mini +test_exclusions: {} \ No newline at end of file diff --git a/tests/verifications/conf/together.yaml b/tests/verifications/conf/together.yaml new file mode 100644 index 000000000..80e86fa77 --- /dev/null +++ b/tests/verifications/conf/together.yaml @@ -0,0 +1,14 @@ +base_url: https://api.together.xyz/v1 +api_key_var: TOGETHER_API_KEY +models: +- meta-llama/Llama-3.3-70B-Instruct-Turbo +- meta-llama/Llama-4-Scout-17B-16E-Instruct +- meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 +model_display_names: + meta-llama/Llama-3.3-70B-Instruct-Turbo: Llama-3.3-70B-Instruct + meta-llama/Llama-4-Scout-17B-16E-Instruct: Llama-4-Scout-Instruct + meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8: Llama-4-Maverick-Instruct +test_exclusions: + meta-llama/Llama-3.3-70B-Instruct-Turbo: + - test_chat_non_streaming_image + - test_chat_streaming_image \ No newline at end of file diff --git a/tests/verifications/conftest.py b/tests/verifications/conftest.py index 08967e834..0b4a6feb7 100644 --- a/tests/verifications/conftest.py +++ b/tests/verifications/conftest.py @@ -4,6 +4,10 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +import re + +import pytest + def pytest_addoption(parser): parser.addoption( @@ -14,7 +18,7 @@ def pytest_addoption(parser): parser.addoption( "--api-key", action="store", - help="API key", + help="API key to use for the provider", ) parser.addoption( "--provider", @@ -24,5 +28,64 @@ def pytest_addoption(parser): pytest_plugins = [ - "tests.verifications.openai.fixtures.fixtures", + "pytest_jsonreport", + "tests.verifications.openai_api.fixtures.fixtures", + "tests.verifications.openai_api.fixtures.load", ] + + +@pytest.hookimpl(optionalhook=True) +def pytest_json_runtest_metadata(item, call): + """Add model and case_id to pytest-json report metadata.""" + metadata = {} + nodeid = item.nodeid + + # 1. Extract model from callspec if available + model = item.callspec.params.get("model") if hasattr(item, "callspec") else None + if model: + metadata["model"] = model + else: + # Fallback: Try parsing from nodeid (less reliable) + match_model = re.search(r"\[(.*?)-", nodeid) + if match_model: + model = match_model.group(1) # Store model even if found via fallback + metadata["model"] = model + else: + print(f"Warning: Could not determine model for test {nodeid}") + model = None # Ensure model is None if not found + + # 2. Extract case_id using the known model string if possible + if model: + # Construct a regex pattern to find the case_id *after* the model name and a hyphen. + # Escape the model name in case it contains regex special characters. + pattern = re.escape(model) + r"-(.*?)\]$" + match_case = re.search(pattern, nodeid) + if match_case: + case_id = match_case.group(1) + metadata["case_id"] = case_id + else: + # Fallback if the pattern didn't match (e.g., nodeid format unexpected) + # Try the old less specific regex as a last resort. + match_case_fallback = re.search(r"-(.*?)\]$", nodeid) + if match_case_fallback: + case_id = match_case_fallback.group(1) + metadata["case_id"] = case_id + print(f"Warning: Used fallback regex to parse case_id from nodeid {nodeid}") + else: + print(f"Warning: Could not parse case_id from nodeid {nodeid} even with fallback.") + if "case" in (item.callspec.params if hasattr(item, "callspec") else {}): + metadata["case_id"] = "parsing_failed" + elif "case" in (item.callspec.params if hasattr(item, "callspec") else {}): + # Cannot reliably parse case_id without model, but we know it's a case test. + # Try the generic fallback regex. + match_case_fallback = re.search(r"-(.*?)\]$", nodeid) + if match_case_fallback: + case_id = match_case_fallback.group(1) + metadata["case_id"] = case_id + print(f"Warning: Used fallback regex to parse case_id from nodeid {nodeid} (model unknown)") + else: + print(f"Warning: Could not parse case_id from nodeid {nodeid} (model unknown)") + metadata["case_id"] = "parsing_failed_no_model" + # else: Not a test with a model or case param we need to handle. + + return metadata diff --git a/tests/verifications/generate_report.py b/tests/verifications/generate_report.py index 98a5930da..1c760ca19 100755 --- a/tests/verifications/generate_report.py +++ b/tests/verifications/generate_report.py @@ -4,27 +4,48 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +# /// script +# requires-python = ">=3.10" +# dependencies = [ +# "pytest-json-report", +# "pyyaml", +# ] +# /// """ Test Report Generator -Requirements: - pip install pytest-json-report +Description: + This script runs pytest tests (specifically designed for OpenAI API compatibility checks) + for different providers, aggregates the results from JSON reports, and generates + a markdown summary report (REPORT.md). + + It automatically cleans up old test result files, keeping only the latest + per provider. + + +Configuration: + - Provider details (models, display names) are loaded from `tests/verifications/config.yaml`. + - Test cases are defined in YAML files within `tests/verifications/openai_api/fixtures/test_cases/`. + - Test results are stored in `tests/verifications/test_results/`. Usage: - # Generate a report using existing test results + # Generate a report using the latest existing test results python tests/verifications/generate_report.py - # Run tests and generate a report + # Run tests for all configured providers and generate a report python tests/verifications/generate_report.py --run-tests - # Run tests for specific providers + # Run tests only for specific providers (space-separated) python tests/verifications/generate_report.py --run-tests --providers fireworks openai + # Run tests matching a keyword expression (uses pytest -k) + python tests/verifications/generate_report.py --run-tests --providers fireworks --k "streaming" + + # Run a specific test case for a provider + python tests/verifications/generate_report.py --run-tests --providers fireworks --k "test_chat_streaming_basic and basic_earth" + # Save the report to a custom location python tests/verifications/generate_report.py --output custom_report.md - - # Clean up old test result files - python tests/verifications/generate_report.py --cleanup """ import argparse @@ -35,6 +56,9 @@ import subprocess import time from collections import defaultdict from pathlib import Path +from typing import Any, DefaultDict, Dict, Set, Tuple + +from tests.verifications.openai_api.fixtures.fixtures import _load_all_verification_configs # Define the root directory for test results RESULTS_DIR = Path(__file__).parent / "test_results" @@ -43,17 +67,12 @@ RESULTS_DIR.mkdir(exist_ok=True) # Maximum number of test result files to keep per provider MAX_RESULTS_PER_PROVIDER = 1 -# Custom order of providers PROVIDER_ORDER = ["together", "fireworks", "groq", "cerebras", "openai"] -# Dictionary to store providers and their models (will be populated dynamically) -PROVIDERS = defaultdict(set) - -# Tests will be dynamically extracted from results -ALL_TESTS = set() +VERIFICATION_CONFIG = _load_all_verification_configs() -def run_tests(provider): +def run_tests(provider, keyword=None): """Run pytest for a specific provider and save results""" print(f"Running tests for provider: {provider}") @@ -61,20 +80,28 @@ def run_tests(provider): result_file = RESULTS_DIR / f"{provider}_{timestamp}.json" temp_json_file = RESULTS_DIR / f"temp_{provider}_{timestamp}.json" + # Determine project root directory relative to this script + project_root = Path(__file__).parent.parent.parent + # Run pytest with JSON output cmd = [ "python", "-m", "pytest", - "tests/verifications/openai/test_chat_completion.py", + "tests/verifications/openai_api/test_chat_completion.py", f"--provider={provider}", "-v", "--json-report", f"--json-report-file={temp_json_file}", ] + # Append -k argument if provided + if keyword: + cmd.extend(["-k", keyword]) + try: - result = subprocess.run(cmd, capture_output=True, text=True) + # Run subprocess with cwd set to project root + result = subprocess.run(cmd, capture_output=True, text=True, cwd=project_root) print(f"Pytest exit code: {result.returncode}") # Check if the JSON file was created @@ -103,18 +130,30 @@ def run_tests(provider): return None -def parse_results(result_file): - """Parse the test results file and extract pass/fail by model and test""" +def parse_results( + result_file, +) -> Tuple[DefaultDict[str, DefaultDict[str, Dict[str, bool]]], DefaultDict[str, Set[str]], Set[str]]: + """Parse a single test results file. + + Returns: + Tuple containing: + - parsed_results: DefaultDict[provider, DefaultDict[model, Dict[test_name, pass_status]]] + - providers_in_file: DefaultDict[provider, Set[model]] found in this file. + - tests_in_file: Set[test_name] found in this file. + """ if not os.path.exists(result_file): print(f"Results file does not exist: {result_file}") - return {} + # Return empty defaultdicts/set matching the type hint + return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set() with open(result_file, "r") as f: results = json.load(f) - # Initialize results dictionary - parsed_results = defaultdict(lambda: defaultdict(dict)) - provider = os.path.basename(result_file).split("_")[0] + # Initialize results dictionary with specific types + parsed_results: DefaultDict[str, DefaultDict[str, Dict[str, bool]]] = defaultdict(lambda: defaultdict(dict)) + providers_in_file: DefaultDict[str, Set[str]] = defaultdict(set) + tests_in_file: Set[str] = set() + provider: str = os.path.basename(result_file).split("_")[0] # Debug: Print summary of test results print(f"Test results summary for {provider}:") @@ -127,124 +166,72 @@ def parse_results(result_file): # Extract test results if "tests" not in results or not results["tests"]: print(f"No test results found in {result_file}") - return parsed_results + # Return empty defaultdicts/set matching the type hint + return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set() - # Map for normalizing model names - model_name_map = { - "Llama-3.3-8B-Instruct": "Llama-3.3-8B-Instruct", - "Llama-3.3-70B-Instruct": "Llama-3.3-70B-Instruct", - "Llama-3.2-11B-Vision-Instruct": "Llama-3.2-11B-Vision-Instruct", - "Llama-4-Scout-17B-16E": "Llama-4-Scout-17B-16E-Instruct", - "Llama-4-Scout-17B-16E-Instruct": "Llama-4-Scout-17B-16E-Instruct", - "Llama-4-Maverick-17B-128E": "Llama-4-Maverick-17B-128E-Instruct", - "Llama-4-Maverick-17B-128E-Instruct": "Llama-4-Maverick-17B-128E-Instruct", - "gpt-4o": "gpt-4o", - "gpt-4o-mini": "gpt-4o-mini", - } - - # Keep track of all models found for this provider - provider_models = set() - - # Track all unique test cases for each base test - test_case_counts = defaultdict(int) - - # First pass: count the number of cases for each test + # Process the tests for test in results["tests"]: test_id = test.get("nodeid", "") - if "call" in test: - test_name = test_id.split("::")[1].split("[")[0] - input_output_match = re.search(r"\[input_output(\d+)-", test_id) - if input_output_match: - test_case_counts[test_name] += 1 + if not (call_phase := test.get("call")): + continue + call_outcome = call_phase.get("outcome") + if call_outcome not in ("passed", "failed"): + continue - # Second pass: process the tests with case numbers only for tests with multiple cases - for test in results["tests"]: - test_id = test.get("nodeid", "") - outcome = test.get("outcome", "") + # --- Extract data from metadata --- + metadata = test.get("metadata", {}) + model = metadata.get("model") + case_id = metadata.get("case_id") # String ID (if provided) + case_index = metadata.get("case_index") # Integer index (if no ID provided) - # Only process tests that have been executed (not setup errors) - if "call" in test: - # Regular test that actually ran - test_name = test_id.split("::")[1].split("[")[0] + # Check if we have a model and at least one case identifier + if not model or (case_id is None and case_index is None): + print( + f"Warning: Missing 'model' or case identifier ('case_id'/'case_index') metadata for test: {test_id}. Skipping." + ) + continue - # Extract input_output parameter to differentiate between test cases - input_output_match = re.search(r"\[input_output(\d+)-", test_id) - input_output_index = input_output_match.group(1) if input_output_match else "" + try: + test_name_base = test_id.split("::")[1].split("[")[0] + except (IndexError, ValueError) as e: + print(f"Warning: Could not parse base test name for {test_id}. Error: {e}. Skipping.") + continue - # Create a more detailed test name with case number only if there are multiple cases - detailed_test_name = test_name - if input_output_index and test_case_counts[test_name] > 1: - detailed_test_name = f"{test_name} (case {input_output_index})" + # Construct detailed test name using ID or index + if case_id is not None: + detailed_test_name = f"{test_name_base} ({case_id})" + elif case_index == 0: + # If case_id is missing and index is 0, assume single case, use base name only + detailed_test_name = test_name_base + elif case_index is not None: # case_index > 0 + # Use case_index for naming if case_id wasn't provided and index > 0 + detailed_test_name = f"{test_name_base} (case{case_index})" + else: + # This case should be prevented by the earlier check, but handle defensively + print(f"Error: No case identifier found for test {test_id} after initial check. Skipping.") + continue - # Track all unique test names - ALL_TESTS.add(detailed_test_name) + # Populate collections for this file + tests_in_file.add(detailed_test_name) + providers_in_file[provider].add(model) - # Extract model name from test_id using a more robust pattern - model_match = re.search(r"\[input_output\d+-([^\]]+)\]", test_id) - if model_match: - raw_model = model_match.group(1) - model = model_name_map.get(raw_model, raw_model) + if call_outcome == "passed": + parsed_results[provider][model][detailed_test_name] = True + elif call_outcome == "failed": + parsed_results[provider][model][detailed_test_name] = False - # Add to set of known models for this provider - provider_models.add(model) + # Final Summary Warning (Optional) + if not parsed_results.get(provider): + print(f"Warning: No valid test results parsed for provider {provider} from file {result_file}") - # Also update the global PROVIDERS dictionary - PROVIDERS[provider].add(model) - - # Store the result - if outcome == "passed": - parsed_results[provider][model][detailed_test_name] = True - else: - parsed_results[provider][model][detailed_test_name] = False - - print(f"Parsed test result: {detailed_test_name} for model {model}: {outcome}") - elif outcome == "error" and "setup" in test and test.get("setup", {}).get("outcome") == "failed": - # This is a setup failure, which likely means a configuration issue - # Extract the base test name and model name - parts = test_id.split("::") - if len(parts) > 1: - test_name = parts[1].split("[")[0] - - # Extract input_output parameter to differentiate between test cases - input_output_match = re.search(r"\[input_output(\d+)-", test_id) - input_output_index = input_output_match.group(1) if input_output_match else "" - - # Create a more detailed test name with case number only if there are multiple cases - detailed_test_name = test_name - if input_output_index and test_case_counts[test_name] > 1: - detailed_test_name = f"{test_name} (case {input_output_index})" - - if detailed_test_name in ALL_TESTS: - # Use a more robust pattern for model extraction - model_match = re.search(r"\[input_output\d+-([^\]]+)\]", test_id) - if model_match: - raw_model = model_match.group(1) - model = model_name_map.get(raw_model, raw_model) - - # Add to set of known models for this provider - provider_models.add(model) - - # Also update the global PROVIDERS dictionary - PROVIDERS[provider].add(model) - - # Mark setup failures as false (failed) - parsed_results[provider][model][detailed_test_name] = False - print(f"Parsed setup failure: {detailed_test_name} for model {model}") - - # Debug: Print parsed results - if not parsed_results[provider]: - print(f"Warning: No test results parsed for provider {provider}") - else: - for model, tests in parsed_results[provider].items(): - print(f"Model {model}: {len(tests)} test results") - - return parsed_results + return parsed_results, providers_in_file, tests_in_file -def cleanup_old_results(): - """Clean up old test result files, keeping only the newest N per provider""" - for provider in PROVIDERS.keys(): +def cleanup_old_results(providers_to_clean: Dict[str, Set[str]]): + """Clean up old test result files, keeping only the newest N per provider.""" + # Use the passed-in providers dictionary + for provider in providers_to_clean.keys(): # Get all result files for this provider provider_files = list(RESULTS_DIR.glob(f"{provider}_*.json")) @@ -289,8 +276,17 @@ def get_latest_results_by_provider(): return provider_results -def generate_report(results_dict, output_file=None): - """Generate the markdown report""" +def generate_report( + results_dict: Dict[str, Any], providers: Dict[str, Set[str]], all_tests: Set[str], output_file=None +): + """Generate the markdown report. + + Args: + results_dict: Aggregated results [provider][model][test_name] -> status. + providers: Dict of all providers and their models {provider: {models}}. + all_tests: Set of all test names found. + output_file: Optional path to save the report. + """ if output_file is None: # Default to creating the report in the same directory as this script output_file = Path(__file__).parent / "REPORT.md" @@ -299,8 +295,8 @@ def generate_report(results_dict, output_file=None): # Get the timestamp from result files provider_timestamps = {} - provider_results = get_latest_results_by_provider() - for provider, result_file in provider_results.items(): + provider_results_files = get_latest_results_by_provider() + for provider, result_file in provider_results_files.items(): # Extract timestamp from filename (format: provider_timestamp.json) try: timestamp_str = result_file.stem.split("_")[1] @@ -310,12 +306,33 @@ def generate_report(results_dict, output_file=None): except (IndexError, ValueError): provider_timestamps[provider] = "Unknown" - # Convert provider model sets to sorted lists - for provider in PROVIDERS: - PROVIDERS[provider] = sorted(PROVIDERS[provider]) + # Convert provider model sets to sorted lists (use passed-in providers dict) + providers_sorted = {prov: sorted(models) for prov, models in providers.items()} - # Sort tests alphabetically - sorted_tests = sorted(ALL_TESTS) + # Sort tests alphabetically (use passed-in all_tests set) + sorted_tests = sorted(all_tests) + + # Calculate counts for each base test name + base_test_case_counts: DefaultDict[str, int] = defaultdict(int) + base_test_name_map: Dict[str, str] = {} + for test_name in sorted_tests: + match = re.match(r"^(.*?)( \([^)]+\))?$", test_name) + if match: + base_name = match.group(1).strip() + base_test_case_counts[base_name] += 1 + base_test_name_map[test_name] = base_name + else: + # Should not happen with current naming, but handle defensively + base_test_case_counts[test_name] += 1 + base_test_name_map[test_name] = test_name + + if not sorted_tests: + print("Warning: No test results found to generate a report.") + # Optionally create an empty report or return early + with open(output_file, "w") as f: + f.write("# Test Results Report\n\nNo test results found.\n") + print(f"Generated empty report: {output_file}") + return report = ["# Test Results Report\n"] report.append(f"*Generated on: {time.strftime('%Y-%m-%d %H:%M:%S')}*\n") @@ -336,19 +353,15 @@ def generate_report(results_dict, output_file=None): # Add a summary section report.append("## Summary\n") - # Count total tests and passes + # Count total tests and passes (use passed-in providers and all_tests) total_tests = 0 passed_tests = 0 provider_totals = {} - - # Prepare summary data - for provider in PROVIDERS.keys(): + for provider, models in providers_sorted.items(): provider_passed = 0 provider_total = 0 - if provider in results_dict: - provider_models = PROVIDERS[provider] - for model in provider_models: + for model in models: if model in results_dict[provider]: model_results = results_dict[provider][model] for test in sorted_tests: @@ -358,33 +371,26 @@ def generate_report(results_dict, output_file=None): if model_results[test]: provider_passed += 1 passed_tests += 1 - provider_totals[provider] = (provider_passed, provider_total) - # Add summary table + # Add summary table (use passed-in providers dict) report.append("| Provider | Pass Rate | Tests Passed | Total Tests |") report.append("| --- | --- | --- | --- |") - - # Use the custom order for summary table - for provider in [p for p in PROVIDER_ORDER if p in PROVIDERS]: + for provider in [p for p in PROVIDER_ORDER if p in providers]: # Check against keys of passed-in dict passed, total = provider_totals.get(provider, (0, 0)) pass_rate = f"{(passed / total * 100):.1f}%" if total > 0 else "N/A" report.append(f"| {provider.capitalize()} | {pass_rate} | {passed} | {total} |") - - # Add providers not in the custom order - for provider in [p for p in PROVIDERS if p not in PROVIDER_ORDER]: + for provider in [p for p in providers if p not in PROVIDER_ORDER]: # Check against keys of passed-in dict passed, total = provider_totals.get(provider, (0, 0)) pass_rate = f"{(passed / total * 100):.1f}%" if total > 0 else "N/A" report.append(f"| {provider.capitalize()} | {pass_rate} | {passed} | {total} |") - report.append("\n") - # Process each provider in the custom order, then any additional providers for provider in sorted( - PROVIDERS.keys(), key=lambda p: (PROVIDER_ORDER.index(p) if p in PROVIDER_ORDER else float("inf"), p) + providers_sorted.keys(), key=lambda p: (PROVIDER_ORDER.index(p) if p in PROVIDER_ORDER else float("inf"), p) ): - if not PROVIDERS[provider]: - # Skip providers with no models + provider_models = providers_sorted[provider] # Use sorted models + if not provider_models: continue report.append(f"\n## {provider.capitalize()}\n") @@ -394,34 +400,70 @@ def generate_report(results_dict, output_file=None): report.append(f"*Tests run on: {provider_timestamps[provider]}*\n") # Add test command for reproducing results - test_cmd = f"pytest tests/verifications/openai/test_chat_completion.py --provider={provider} -v" - report.append(f"```bash\n{test_cmd}\n```\n") + test_cmd_all = f"pytest tests/verifications/openai_api/test_chat_completion.py --provider={provider} -v" + report.append(f"```bash\n# Run all tests for this provider:\n{test_cmd_all}\n") - # Get the relevant models for this provider - provider_models = PROVIDERS[provider] + # Find an example test with a case ID + example_base_test_name = None + example_case_id = None + # Get first test as fallback base, handle empty list + first_test_name = sorted_tests[0] if sorted_tests else "unknown_test" - # Create table header with models as columns - header = "| Test | " + " | ".join(provider_models) + " |" + match = re.match(r"^(.*?) \((.*?)\)$", first_test_name) + if match: + example_base_test_name = match.group(1).strip() + example_case_id = match.group(2).strip() + else: + example_base_test_name = first_test_name + + base_name = base_test_name_map.get(test, test) # Get base name + case_count = base_test_case_counts.get(base_name, 1) # Get count + filter_str = f"{example_base_test_name} and {example_case_id}" if case_count > 1 else example_base_test_name + + test_cmd_specific_case = ( + f'pytest tests/verifications/openai_api/test_chat_completion.py --provider={provider} -k "{filter_str}"' + ) + report.append( + f"# Example: Run only the '{example_case_id}' case of {example_base_test_name}:\n{test_cmd_specific_case}\n```\n" + ) + + # Get display names (use passed-in providers dict) + provider_config = VERIFICATION_CONFIG.get("providers", {}).get(provider, {}) + display_name_map = provider_config.get("model_display_names", {}) + + # Add Model Key Table (use provider_models) + report.append(f"\n**Model Key ({provider.capitalize()})**\n") + provider_key_lines = ["| Display Name | Full Model ID |", "| --- | --- |"] + for model_id in provider_models: + display_name = display_name_map.get(model_id, model_id) + provider_key_lines.append(f"| {display_name} | `{model_id}` |") + report.extend(provider_key_lines) + report.append("\n") + + # Create results table header (use provider_models) + display_names = [display_name_map.get(m, m) for m in provider_models] + header = "| Test | " + " | ".join(display_names) + " |" separator = "| --- | " + " | ".join(["---"] * len(provider_models)) + " |" - report.append(header) report.append(separator) - # Get results for this provider - provider_results = results_dict.get(provider, {}) + # Get results for this provider from results_dict + provider_results_data = results_dict.get(provider, {}) - # Add rows for each test + # Add rows for each test (use sorted_tests) for test in sorted_tests: - row = f"| {test} |" + # Determine display name based on case count + base_name = base_test_name_map.get(test, test) # Get base name + case_count = base_test_case_counts.get(base_name, 1) # Get count + display_test_name = base_name if case_count == 1 else test # Choose display name + row = f"| {display_test_name} |" # Use display name - # Add results for each model in this test - for model in provider_models: - if model in provider_results and test in provider_results[model]: - result = pass_icon if provider_results[model][test] else fail_icon + for model_id in provider_models: + if model_id in provider_results_data and test in provider_results_data[model_id]: + result = pass_icon if provider_results_data[model_id][test] else fail_icon else: result = na_icon row += f" {result} |" - report.append(row) # Write to file @@ -442,9 +484,13 @@ def main(): help="Specify providers to test (comma-separated or space-separated, default: all)", ) parser.add_argument("--output", type=str, help="Output file location (default: tests/verifications/REPORT.md)") + parser.add_argument("--k", type=str, help="Keyword expression to filter tests (passed to pytest -k)") args = parser.parse_args() all_results = {} + # Initialize collections to aggregate results in main + aggregated_providers = defaultdict(set) + aggregated_tests = set() if args.run_tests: # Get list of available providers from command line or use detected providers @@ -463,22 +509,31 @@ def main(): for provider in test_providers: provider = provider.strip() # Remove any whitespace - result_file = run_tests(provider) + result_file = run_tests(provider, keyword=args.k) if result_file: - provider_results = parse_results(result_file) - all_results.update(provider_results) + # Parse and aggregate results + parsed_results, providers_in_file, tests_in_file = parse_results(result_file) + all_results.update(parsed_results) + for prov, models in providers_in_file.items(): + aggregated_providers[prov].update(models) + aggregated_tests.update(tests_in_file) else: # Use existing results provider_result_files = get_latest_results_by_provider() for result_file in provider_result_files.values(): - provider_results = parse_results(result_file) - all_results.update(provider_results) + # Parse and aggregate results + parsed_results, providers_in_file, tests_in_file = parse_results(result_file) + all_results.update(parsed_results) + for prov, models in providers_in_file.items(): + aggregated_providers[prov].update(models) + aggregated_tests.update(tests_in_file) - # Generate the report - generate_report(all_results, args.output) + # Generate the report, passing aggregated data + generate_report(all_results, aggregated_providers, aggregated_tests, args.output) - cleanup_old_results() + # Cleanup, passing aggregated providers + cleanup_old_results(aggregated_providers) if __name__ == "__main__": diff --git a/tests/verifications/openai/fixtures/fixtures.py b/tests/verifications/openai/fixtures/fixtures.py deleted file mode 100644 index b86de3662..000000000 --- a/tests/verifications/openai/fixtures/fixtures.py +++ /dev/null @@ -1,97 +0,0 @@ -# Copyright (c) Meta Platforms, Inc. and affiliates. -# All rights reserved. -# -# This source code is licensed under the terms described in the LICENSE file in -# the root directory of this source tree. - -import os - -import pytest -from openai import OpenAI - - -@pytest.fixture -def providers_model_mapping(): - """ - Mapping from model names used in test cases to provider's model names. - """ - return { - "fireworks": { - "Llama-3.3-70B-Instruct": "accounts/fireworks/models/llama-v3p1-70b-instruct", - "Llama-3.2-11B-Vision-Instruct": "accounts/fireworks/models/llama-v3p2-11b-vision-instruct", - "Llama-4-Scout-17B-16E-Instruct": "accounts/fireworks/models/llama4-scout-instruct-basic", - "Llama-4-Maverick-17B-128E-Instruct": "accounts/fireworks/models/llama4-maverick-instruct-basic", - }, - "together": { - "Llama-3.3-70B-Instruct": "meta-llama/Llama-3.3-70B-Instruct-Turbo", - "Llama-3.2-11B-Vision-Instruct": "meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo", - "Llama-4-Scout-17B-16E-Instruct": "meta-llama/Llama-4-Scout-17B-16E-Instruct", - "Llama-4-Maverick-17B-128E-Instruct": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", - }, - "groq": { - "Llama-3.3-70B-Instruct": "llama-3.3-70b-versatile", - "Llama-3.2-11B-Vision-Instruct": "llama-3.2-11b-vision-preview", - "Llama-4-Scout-17B-16E-Instruct": "llama-4-scout-17b-16e-instruct", - "Llama-4-Maverick-17B-128E-Instruct": "llama-4-maverick-17b-128e-instruct", - }, - "cerebras": { - "Llama-3.3-70B-Instruct": "llama-3.3-70b", - }, - "openai": { - "gpt-4o": "gpt-4o", - "gpt-4o-mini": "gpt-4o-mini", - }, - } - - -@pytest.fixture -def provider_metadata(): - return { - "fireworks": ("https://api.fireworks.ai/inference/v1", "FIREWORKS_API_KEY"), - "together": ("https://api.together.xyz/v1", "TOGETHER_API_KEY"), - "groq": ("https://api.groq.com/openai/v1", "GROQ_API_KEY"), - "cerebras": ("https://api.cerebras.ai/v1", "CEREBRAS_API_KEY"), - "openai": ("https://api.openai.com/v1", "OPENAI_API_KEY"), - } - - -@pytest.fixture -def provider(request, provider_metadata): - provider = request.config.getoption("--provider") - base_url = request.config.getoption("--base-url") - - if provider and base_url and provider_metadata[provider][0] != base_url: - raise ValueError(f"Provider {provider} is not supported for base URL {base_url}") - - if not provider: - if not base_url: - raise ValueError("Provider and base URL are not provided") - for provider, metadata in provider_metadata.items(): - if metadata[0] == base_url: - provider = provider - break - - return provider - - -@pytest.fixture -def base_url(request, provider, provider_metadata): - return request.config.getoption("--base-url") or provider_metadata[provider][0] - - -@pytest.fixture -def api_key(request, provider, provider_metadata): - return request.config.getoption("--api-key") or os.getenv(provider_metadata[provider][1]) - - -@pytest.fixture -def model_mapping(provider, providers_model_mapping): - return providers_model_mapping[provider] - - -@pytest.fixture -def openai_client(base_url, api_key): - return OpenAI( - base_url=base_url, - api_key=api_key, - ) diff --git a/tests/verifications/openai/test_chat_completion.py b/tests/verifications/openai/test_chat_completion.py deleted file mode 100644 index c6a10de7b..000000000 --- a/tests/verifications/openai/test_chat_completion.py +++ /dev/null @@ -1,202 +0,0 @@ -# Copyright (c) Meta Platforms, Inc. and affiliates. -# All rights reserved. -# -# This source code is licensed under the terms described in the LICENSE file in -# the root directory of this source tree. - -from typing import Any - -import pytest -from pydantic import BaseModel - -from tests.verifications.openai.fixtures.load import load_test_cases - -chat_completion_test_cases = load_test_cases("chat_completion") - - -@pytest.fixture -def correct_model_name(model, provider, providers_model_mapping): - """Return the provider-specific model name based on the generic model name.""" - mapping = providers_model_mapping[provider] - if model not in mapping: - pytest.skip(f"Provider {provider} does not support model {model}") - return mapping[model] - - -@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_basic"]["test_params"]["model"]) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_chat_basic"]["test_params"]["input_output"], -) -def test_chat_non_streaming_basic(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - stream=False, - ) - assert response.choices[0].message.role == "assistant" - assert input_output["output"].lower() in response.choices[0].message.content.lower() - - -@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_basic"]["test_params"]["model"]) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_chat_basic"]["test_params"]["input_output"], -) -def test_chat_streaming_basic(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - stream=True, - ) - content = "" - for chunk in response: - content += chunk.choices[0].delta.content or "" - - # TODO: add detailed type validation - - assert input_output["output"].lower() in content.lower() - - -@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_image"]["test_params"]["model"]) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_chat_image"]["test_params"]["input_output"], -) -def test_chat_non_streaming_image(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - stream=False, - ) - assert response.choices[0].message.role == "assistant" - assert input_output["output"].lower() in response.choices[0].message.content.lower() - - -@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_image"]["test_params"]["model"]) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_chat_image"]["test_params"]["input_output"], -) -def test_chat_streaming_image(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - stream=True, - ) - content = "" - for chunk in response: - content += chunk.choices[0].delta.content or "" - - # TODO: add detailed type validation - - assert input_output["output"].lower() in content.lower() - - -@pytest.mark.parametrize( - "model", - chat_completion_test_cases["test_chat_structured_output"]["test_params"]["model"], -) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_chat_structured_output"]["test_params"]["input_output"], -) -def test_chat_non_streaming_structured_output(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - response_format=input_output["input"]["response_format"], - stream=False, - ) - - assert response.choices[0].message.role == "assistant" - maybe_json_content = response.choices[0].message.content - - validate_structured_output(maybe_json_content, input_output["output"]) - - -@pytest.mark.parametrize( - "model", - chat_completion_test_cases["test_chat_structured_output"]["test_params"]["model"], -) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_chat_structured_output"]["test_params"]["input_output"], -) -def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - response_format=input_output["input"]["response_format"], - stream=True, - ) - maybe_json_content = "" - for chunk in response: - maybe_json_content += chunk.choices[0].delta.content or "" - validate_structured_output(maybe_json_content, input_output["output"]) - - -@pytest.mark.parametrize( - "model", - chat_completion_test_cases["test_tool_calling"]["test_params"]["model"], -) -@pytest.mark.parametrize( - "input_output", - chat_completion_test_cases["test_tool_calling"]["test_params"]["input_output"], -) -def test_chat_non_streaming_tool_calling(openai_client, input_output, correct_model_name): - response = openai_client.chat.completions.create( - model=correct_model_name, - messages=input_output["input"]["messages"], - tools=input_output["input"]["tools"], - stream=False, - ) - - assert response.choices[0].message.role == "assistant" - assert len(response.choices[0].message.tool_calls) > 0 - assert input_output["output"] == "get_weather_tool_call" - assert response.choices[0].message.tool_calls[0].function.name == "get_weather" - # TODO: add detailed type validation - - -def get_structured_output(maybe_json_content: str, schema_name: str) -> Any | None: - if schema_name == "valid_calendar_event": - - class CalendarEvent(BaseModel): - name: str - date: str - participants: list[str] - - try: - calendar_event = CalendarEvent.model_validate_json(maybe_json_content) - return calendar_event - except Exception: - return None - elif schema_name == "valid_math_reasoning": - - class Step(BaseModel): - explanation: str - output: str - - class MathReasoning(BaseModel): - steps: list[Step] - final_answer: str - - try: - math_reasoning = MathReasoning.model_validate_json(maybe_json_content) - return math_reasoning - except Exception: - return None - - return None - - -def validate_structured_output(maybe_json_content: str, schema_name: str) -> None: - structured_output = get_structured_output(maybe_json_content, schema_name) - assert structured_output is not None - if schema_name == "valid_calendar_event": - assert structured_output.name is not None - assert structured_output.date is not None - assert len(structured_output.participants) == 2 - elif schema_name == "valid_math_reasoning": - assert len(structured_output.final_answer) > 0 diff --git a/tests/verifications/openai/__init__.py b/tests/verifications/openai_api/__init__.py similarity index 100% rename from tests/verifications/openai/__init__.py rename to tests/verifications/openai_api/__init__.py diff --git a/tests/verifications/openai/fixtures/__init__.py b/tests/verifications/openai_api/fixtures/__init__.py similarity index 100% rename from tests/verifications/openai/fixtures/__init__.py rename to tests/verifications/openai_api/fixtures/__init__.py diff --git a/tests/verifications/openai_api/fixtures/fixtures.py b/tests/verifications/openai_api/fixtures/fixtures.py new file mode 100644 index 000000000..4f8c2e017 --- /dev/null +++ b/tests/verifications/openai_api/fixtures/fixtures.py @@ -0,0 +1,105 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +import os +from pathlib import Path + +import pytest +import yaml +from openai import OpenAI + + +# --- Helper Function to Load Config --- +def _load_all_verification_configs(): + """Load and aggregate verification configs from the conf/ directory.""" + # Note: Path is relative to *this* file (fixtures.py) + conf_dir = Path(__file__).parent.parent.parent / "conf" + if not conf_dir.is_dir(): + # Use pytest.fail if called during test collection, otherwise raise error + # For simplicity here, we'll raise an error, assuming direct calls + # are less likely or can handle it. + raise FileNotFoundError(f"Verification config directory not found at {conf_dir}") + + all_provider_configs = {} + yaml_files = list(conf_dir.glob("*.yaml")) + if not yaml_files: + raise FileNotFoundError(f"No YAML configuration files found in {conf_dir}") + + for config_path in yaml_files: + provider_name = config_path.stem + try: + with open(config_path, "r") as f: + provider_config = yaml.safe_load(f) + if provider_config: + all_provider_configs[provider_name] = provider_config + else: + # Log warning if possible, or just skip empty files silently + print(f"Warning: Config file {config_path} is empty or invalid.") + except Exception as e: + raise IOError(f"Error loading config file {config_path}: {e}") from e + + return {"providers": all_provider_configs} + + +# --- End Helper Function --- + + +@pytest.fixture(scope="session") +def verification_config(): + """Pytest fixture to provide the loaded verification config.""" + try: + return _load_all_verification_configs() + except (FileNotFoundError, IOError) as e: + pytest.fail(str(e)) # Fail test collection if config loading fails + + +@pytest.fixture +def provider(request, verification_config): + provider = request.config.getoption("--provider") + base_url = request.config.getoption("--base-url") + + if provider and base_url and verification_config["providers"][provider]["base_url"] != base_url: + raise ValueError(f"Provider {provider} is not supported for base URL {base_url}") + + if not provider: + if not base_url: + raise ValueError("Provider and base URL are not provided") + for provider, metadata in verification_config["providers"].items(): + if metadata["base_url"] == base_url: + provider = provider + break + + return provider + + +@pytest.fixture +def base_url(request, provider, verification_config): + return request.config.getoption("--base-url") or verification_config["providers"][provider]["base_url"] + + +@pytest.fixture +def api_key(request, provider, verification_config): + provider_conf = verification_config.get("providers", {}).get(provider, {}) + api_key_env_var = provider_conf.get("api_key_var") + + key_from_option = request.config.getoption("--api-key") + key_from_env = os.getenv(api_key_env_var) if api_key_env_var else None + + final_key = key_from_option or key_from_env + return final_key + + +@pytest.fixture +def model_mapping(provider, providers_model_mapping): + return providers_model_mapping[provider] + + +@pytest.fixture +def openai_client(base_url, api_key): + return OpenAI( + base_url=base_url, + api_key=api_key, + ) diff --git a/tests/verifications/openai/fixtures/load.py b/tests/verifications/openai_api/fixtures/load.py similarity index 100% rename from tests/verifications/openai/fixtures/load.py rename to tests/verifications/openai_api/fixtures/load.py diff --git a/tests/verifications/openai/fixtures/test_cases/chat_completion.yaml b/tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml similarity index 78% rename from tests/verifications/openai/fixtures/test_cases/chat_completion.yaml rename to tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml index 2c302a704..78ea8245d 100644 --- a/tests/verifications/openai/fixtures/test_cases/chat_completion.yaml +++ b/tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml @@ -1,31 +1,24 @@ test_chat_basic: test_name: test_chat_basic test_params: - input_output: - - input: + case: + - case_id: "earth" + input: messages: - content: Which planet do humans live on? role: user output: Earth - - input: + - case_id: "saturn" + input: messages: - content: Which planet has rings around it with a name starting with letter S? role: user output: Saturn - model: - - Llama-3.3-8B-Instruct - - Llama-3.3-70B-Instruct - - Llama-4-Scout-17B-16E - - Llama-4-Scout-17B-16E-Instruct - - Llama-4-Maverick-17B-128E - - Llama-4-Maverick-17B-128E-Instruct - - gpt-4o - - gpt-4o-mini test_chat_image: test_name: test_chat_image test_params: - input_output: + case: - input: messages: - content: @@ -36,18 +29,12 @@ test_chat_image: type: image_url role: user output: llama - model: - - Llama-4-Scout-17B-16E - - Llama-4-Scout-17B-16E-Instruct - - Llama-4-Maverick-17B-128E - - Llama-4-Maverick-17B-128E-Instruct - - gpt-4o - - gpt-4o-mini test_chat_structured_output: test_name: test_chat_structured_output test_params: - input_output: - - input: + case: + - case_id: "calendar" + input: messages: - content: Extract the event information. role: system @@ -77,7 +64,8 @@ test_chat_structured_output: type: object type: json_schema output: valid_calendar_event - - input: + - case_id: "math" + input: messages: - content: You are a helpful math tutor. Guide the user through the solution step by step. @@ -118,19 +106,10 @@ test_chat_structured_output: type: object type: json_schema output: valid_math_reasoning - model: - - Llama-3.3-8B-Instruct - - Llama-3.3-70B-Instruct - - Llama-4-Scout-17B-16E - - Llama-4-Scout-17B-16E-Instruct - - Llama-4-Maverick-17B-128E - - Llama-4-Maverick-17B-128E-Instruct - - gpt-4o - - gpt-4o-mini test_tool_calling: test_name: test_tool_calling test_params: - input_output: + case: - input: messages: - content: You are a helpful assistant that can use tools to get information. @@ -152,11 +131,3 @@ test_tool_calling: type: object type: function output: get_weather_tool_call - model: - - Llama-3.3-70B-Instruct - - Llama-4-Scout-17B-16E - - Llama-4-Scout-17B-16E-Instruct - - Llama-4-Maverick-17B-128E - - Llama-4-Maverick-17B-128E-Instruct - - gpt-4o - - gpt-4o-mini diff --git a/tests/verifications/openai_api/test_chat_completion.py b/tests/verifications/openai_api/test_chat_completion.py new file mode 100644 index 000000000..dc08ec944 --- /dev/null +++ b/tests/verifications/openai_api/test_chat_completion.py @@ -0,0 +1,271 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +import re +from typing import Any + +import pytest +from pydantic import BaseModel + +from tests.verifications.openai_api.fixtures.fixtures import _load_all_verification_configs +from tests.verifications.openai_api.fixtures.load import load_test_cases + +chat_completion_test_cases = load_test_cases("chat_completion") + + +def case_id_generator(case): + """Generate a test ID from the case's 'case_id' field, or use a default.""" + case_id = case.get("case_id") + if isinstance(case_id, (str, int)): + return re.sub(r"\\W|^(?=\\d)", "_", str(case_id)) + return None + + +def pytest_generate_tests(metafunc): + """Dynamically parametrize tests based on the selected provider and config.""" + if "model" in metafunc.fixturenames: + provider = metafunc.config.getoption("provider") + if not provider: + print("Warning: --provider not specified. Skipping model parametrization.") + metafunc.parametrize("model", []) + return + + try: + config_data = _load_all_verification_configs() + except (FileNotFoundError, IOError) as e: + print(f"ERROR loading verification configs: {e}") + config_data = {"providers": {}} + + provider_config = config_data.get("providers", {}).get(provider) + if provider_config: + models = provider_config.get("models", []) + if models: + metafunc.parametrize("model", models) + else: + print(f"Warning: No models found for provider '{provider}' in config.") + metafunc.parametrize("model", []) # Parametrize empty if no models found + else: + print(f"Warning: Provider '{provider}' not found in config. No models parametrized.") + metafunc.parametrize("model", []) # Parametrize empty if provider not found + + +def should_skip_test(verification_config, provider, model, test_name_base): + """Check if a test should be skipped based on config exclusions.""" + provider_config = verification_config.get("providers", {}).get(provider) + if not provider_config: + return False # No config for provider, don't skip + + exclusions = provider_config.get("test_exclusions", {}).get(model, []) + return test_name_base in exclusions + + +# Helper to get the base test name from the request object +def get_base_test_name(request): + return request.node.originalname + + +# --- Test Functions --- + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_chat_basic"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_non_streaming_basic(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + stream=False, + ) + assert response.choices[0].message.role == "assistant" + assert case["output"].lower() in response.choices[0].message.content.lower() + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_chat_basic"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + stream=True, + ) + content = "" + for chunk in response: + content += chunk.choices[0].delta.content or "" + + # TODO: add detailed type validation + + assert case["output"].lower() in content.lower() + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_chat_image"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_non_streaming_image(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + stream=False, + ) + assert response.choices[0].message.role == "assistant" + assert case["output"].lower() in response.choices[0].message.content.lower() + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_chat_image"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + stream=True, + ) + content = "" + for chunk in response: + content += chunk.choices[0].delta.content or "" + + # TODO: add detailed type validation + + assert case["output"].lower() in content.lower() + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_chat_structured_output"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_non_streaming_structured_output(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + response_format=case["input"]["response_format"], + stream=False, + ) + + assert response.choices[0].message.role == "assistant" + maybe_json_content = response.choices[0].message.content + + validate_structured_output(maybe_json_content, case["output"]) + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_chat_structured_output"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + response_format=case["input"]["response_format"], + stream=True, + ) + maybe_json_content = "" + for chunk in response: + maybe_json_content += chunk.choices[0].delta.content or "" + validate_structured_output(maybe_json_content, case["output"]) + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_tool_calling"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + tools=case["input"]["tools"], + stream=False, + ) + + assert response.choices[0].message.role == "assistant" + assert len(response.choices[0].message.tool_calls) > 0 + assert case["output"] == "get_weather_tool_call" + assert response.choices[0].message.tool_calls[0].function.name == "get_weather" + # TODO: add detailed type validation + + +# --- Helper functions (structured output validation) --- + + +def get_structured_output(maybe_json_content: str, schema_name: str) -> Any | None: + if schema_name == "valid_calendar_event": + + class CalendarEvent(BaseModel): + name: str + date: str + participants: list[str] + + try: + calendar_event = CalendarEvent.model_validate_json(maybe_json_content) + return calendar_event + except Exception: + return None + elif schema_name == "valid_math_reasoning": + + class Step(BaseModel): + explanation: str + output: str + + class MathReasoning(BaseModel): + steps: list[Step] + final_answer: str + + try: + math_reasoning = MathReasoning.model_validate_json(maybe_json_content) + return math_reasoning + except Exception: + return None + + return None + + +def validate_structured_output(maybe_json_content: str, schema_name: str) -> None: + structured_output = get_structured_output(maybe_json_content, schema_name) + assert structured_output is not None + if schema_name == "valid_calendar_event": + assert structured_output.name is not None + assert structured_output.date is not None + assert len(structured_output.participants) == 2 + elif schema_name == "valid_math_reasoning": + assert len(structured_output.final_answer) > 0 diff --git a/tests/verifications/test_results/fireworks_1744154308.json b/tests/verifications/test_results/fireworks_1744154308.json deleted file mode 100644 index 691f6e474..000000000 --- a/tests/verifications/test_results/fireworks_1744154308.json +++ /dev/null @@ -1,2744 +0,0 @@ -{ - "created": 1744154399.039055, - "duration": 87.73799800872803, - "exitcode": 1, - "root": "/Users/erichuang/projects/llama-stack", - "environment": {}, - "summary": { - "skipped": 52, - "passed": 28, - "failed": 3, - "total": 83, - "collected": 83 - }, - "collectors": [ - { - "nodeid": "", - "outcome": "passed", - "result": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py", - "type": "Module" - } - ] - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py", - "outcome": "passed", - "result": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 138 - } - ] - } - ], - "tests": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.17320987500716, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.000177707988768816, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009193749981932342, - "outcome": "passed" - }, - "call": { - "duration": 1.1473859580000862, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00043337501119822264, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01645291701424867, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002898749662563205, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01562033302616328, - "outcome": "passed" - }, - "call": { - "duration": 0.8782661251025274, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0002795408945530653, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008571124984882772, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0003043749602511525, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00842841702979058, - "outcome": "passed" - }, - "call": { - "duration": 1.3863223339430988, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009970410028472543, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007089875056408346, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00017958390526473522, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005809499998576939, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00016495899762958288, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0119722920935601, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00016962504014372826, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005716291954740882, - "outcome": "passed" - }, - "call": { - "duration": 0.6822018750244752, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005292498972266912, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.025827708072029054, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.000295999925583601, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010980832972563803, - "outcome": "passed" - }, - "call": { - "duration": 0.7537062909686938, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0008091670460999012, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006567832897417247, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001545000122860074, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005985083989799023, - "outcome": "passed" - }, - "call": { - "duration": 0.7263387079583481, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006324589485302567, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0171962499152869, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.000780042028054595, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01365620899014175, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00016758404672145844, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0064070840599015355, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0002031669719144702, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010951624950394034, - "outcome": "passed" - }, - "call": { - "duration": 0.5433399169705808, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0013178749941289425, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.022056750021874905, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0006570409750565886, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008314333041198552, - "outcome": "passed" - }, - "call": { - "duration": 0.7779882500180975, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006799160037189722, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.03601404093205929, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.000610582996159792, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014321292052045465, - "outcome": "passed" - }, - "call": { - "duration": 1.0243758750148118, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0010485410457476974, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.021133000031113625, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0005400830414146185, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007212458993308246, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00026770797558128834, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012334750033915043, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00042683398351073265, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011477917083539069, - "outcome": "passed" - }, - "call": { - "duration": 1.670572166913189, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005759169580414891, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.024620208074338734, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0005166250048205256, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008708957931958139, - "outcome": "passed" - }, - "call": { - "duration": 0.6654335829662159, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0002927089808508754, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.018128167022950947, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001929170684888959, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0063874589977785945, - "outcome": "passed" - }, - "call": { - "duration": 0.8047525839647278, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00039245898369699717, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01366533397231251, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00028241705149412155, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010844790958799422, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.000258082989603281, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00936354196164757, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00020533299539238214, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 60, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008578249951824546, - "outcome": "passed" - }, - "call": { - "duration": 2.6288582499837503, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006052498938515782, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.02061279199551791, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00029320805333554745, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 60, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00995812495239079, - "outcome": "passed" - }, - "call": { - "duration": 3.0904540000483394, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0003214169992133975, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0261635419446975, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00032716698478907347, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.027220541960559785, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0003192499279975891, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010883458075113595, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002687909873202443, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 75, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0171177500160411, - "outcome": "passed" - }, - "call": { - "duration": 1.6752691670553759, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004877089522778988, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011608208995312452, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00017137499526143074, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 75, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009284624946303666, - "outcome": "passed" - }, - "call": { - "duration": 3.537356249988079, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005068340105935931, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.016660499968566, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00029341597110033035, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o-mini]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01374066702555865, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0002625000197440386, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013120374991558492, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00021954195108264685, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.015080374898388982, - "outcome": "passed" - }, - "call": { - "duration": 1.157175041968003, - "outcome": "passed" - }, - "teardown": { - "duration": 0.000495875021442771, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013946042046882212, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002954580122604966, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011617792071774602, - "outcome": "passed" - }, - "call": { - "duration": 0.9537639999762177, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004819999448955059, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.027436082949861884, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00030274991877377033, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.016110333963297307, - "outcome": "passed" - }, - "call": { - "duration": 0.8493227910948917, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004883749643340707, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.017850833013653755, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0003287500003352761, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012523208046332002, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00023500004317611456, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007516667013987899, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00018912507221102715, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007337165996432304, - "outcome": "passed" - }, - "call": { - "duration": 3.124099582899362, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006703329272568226, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014259999967180192, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00030262500513345003, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010863124975003302, - "outcome": "passed" - }, - "call": { - "duration": 1.3330956250429153, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00018679199274629354, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005797958001494408, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00017529097385704517, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005647709011100233, - "outcome": "passed" - }, - "call": { - "duration": 3.2295467499643564, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005654999986290932, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007151791942305863, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00015316694043576717, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006435790914110839, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00015954102855175734, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006164791993796825, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00014074996579438448, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010064583038911223, - "outcome": "passed" - }, - "call": { - "duration": 1.1676458748988807, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0002513329964131117, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011011417023837566, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00020608294289559126, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011654542060568929, - "outcome": "passed" - }, - "call": { - "duration": 0.7950789160095155, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0002690000692382455, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0066834589233621955, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00017270795069634914, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011390416999347508, - "outcome": "passed" - }, - "call": { - "duration": 0.7844940840732306, - "outcome": "passed" - }, - "teardown": { - "duration": 0.000511458027176559, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005813500029034913, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00015495799016207457, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0075639160349965096, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00014358304906636477, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008526541059836745, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00015841599088162184, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 117, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007805416011251509, - "outcome": "passed" - }, - "call": { - "duration": 13.25898533302825, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 196, - "message": "assert None is not None" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 136, - "message": "" - }, - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 196, - "message": "AssertionError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solution step by step.',... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\ncorrect_model_name = 'accounts/fireworks/models/llama-v3p1-70b-instruct'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n response_format=input_output[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n maybe_json_content += chunk.choices[0].delta.content or \"\"\n> validate_structured_output(maybe_json_content, input_output[\"output\"])\n\ntests/verifications/openai/test_chat_completion.py:136: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nmaybe_json_content = '{ \"final_answer\": \"}To solve the equation 8x + 7 = -23, we need to isolate the variable x. We can do this by followin...tassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistant'\nschema_name = 'valid_math_reasoning'\n\n def validate_structured_output(maybe_json_content: str, schema_name: str) -> None:\n structured_output = get_structured_output(maybe_json_content, schema_name)\n> assert structured_output is not None\nE assert None is not None\n\ntests/verifications/openai/test_chat_completion.py:196: AssertionError" - }, - "teardown": { - "duration": 0.00022583396639674902, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006412541959434748, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0001449589617550373, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010353000019676983, - "outcome": "passed" - }, - "call": { - "duration": 4.559281209018081, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00021179206669330597, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011320417048409581, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001623749267309904, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005637791007757187, - "outcome": "passed" - }, - "call": { - "duration": 2.9282109580235556, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00019149994477629662, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.021475916961207986, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0002605828922241926, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012046082993037999, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00016966694965958595, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 138, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00782629195600748, - "outcome": "passed" - }, - "call": { - "duration": 0.9290615000063553, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004110001027584076, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00842183397617191, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider fireworks does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00023745803628116846, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 138, - "outcome": "failed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010762874968349934, - "outcome": "passed" - }, - "call": { - "duration": 23.62101216695737, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 156, - "message": "TypeError: object of type 'NoneType' has no len()" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 156, - "message": "TypeError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\ncorrect_model_name = 'accounts/fireworks/models/llama4-scout-instruct-basic'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_non_streaming_tool_calling(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n tools=input_output[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai/test_chat_completion.py:156: TypeError" - }, - "teardown": { - "duration": 0.0004520840011537075, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00953104195650667, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider fireworks does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00017912499606609344, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 138, - "outcome": "failed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010302042006514966, - "outcome": "passed" - }, - "call": { - "duration": 5.55651158397086, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 156, - "message": "TypeError: object of type 'NoneType' has no len()" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 156, - "message": "TypeError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\ncorrect_model_name = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_non_streaming_tool_calling(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n tools=input_output[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai/test_chat_completion.py:156: TypeError" - }, - "teardown": { - "duration": 0.0003929579397663474, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01593891705852002, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider fireworks does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0003579579060897231, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01874550001230091, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider fireworks does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00031995808240026236, - "outcome": "passed" - } - } - ] -} diff --git a/tests/verifications/test_results/fireworks_1744264202.json b/tests/verifications/test_results/fireworks_1744264202.json new file mode 100644 index 000000000..d14738be9 --- /dev/null +++ b/tests/verifications/test_results/fireworks_1744264202.json @@ -0,0 +1,1329 @@ +{ + "created": 1744264258.730061, + "duration": 53.86071586608887, + "exitcode": 1, + "root": "/Users/erichuang/projects/llama-stack", + "environment": {}, + "summary": { + "passed": 28, + "skipped": 2, + "failed": 3, + "total": 33, + "collected": 33 + }, + "collectors": [ + { + "nodeid": "", + "outcome": "passed", + "result": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py", + "type": "Module" + } + ] + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py", + "outcome": "passed", + "result": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 203 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 203 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 203 + } + ] + } + ], + "tests": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "earth" + }, + "setup": { + "duration": 0.05236550001427531, + "outcome": "passed" + }, + "call": { + "duration": 0.5364967910572886, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015075004193931818, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "saturn" + }, + "setup": { + "duration": 0.00699599995277822, + "outcome": "passed" + }, + "call": { + "duration": 0.5843954589217901, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0003858329728245735, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "earth" + }, + "setup": { + "duration": 0.009176500025205314, + "outcome": "passed" + }, + "call": { + "duration": 0.9258683329680935, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015787500888109207, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "saturn" + }, + "setup": { + "duration": 0.011275375029072165, + "outcome": "passed" + }, + "call": { + "duration": 0.6890578339807689, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004926669644191861, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "earth" + }, + "setup": { + "duration": 0.007520624902099371, + "outcome": "passed" + }, + "call": { + "duration": 0.6675686669768766, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00016137503553181887, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "saturn" + }, + "setup": { + "duration": 0.0076431670458987355, + "outcome": "passed" + }, + "call": { + "duration": 1.6813415409997106, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004928340204060078, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "earth" + }, + "setup": { + "duration": 0.01302404107991606, + "outcome": "passed" + }, + "call": { + "duration": 1.3206909999717027, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002220839960500598, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "saturn" + }, + "setup": { + "duration": 0.0071772499941289425, + "outcome": "passed" + }, + "call": { + "duration": 0.4109888339880854, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005431669997051358, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "earth" + }, + "setup": { + "duration": 0.012043708004057407, + "outcome": "passed" + }, + "call": { + "duration": 0.4509220840409398, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00016408402007073164, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "saturn" + }, + "setup": { + "duration": 0.007165874936617911, + "outcome": "passed" + }, + "call": { + "duration": 0.6527335830032825, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0006419579731300473, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "earth" + }, + "setup": { + "duration": 0.007546542095951736, + "outcome": "passed" + }, + "call": { + "duration": 0.9360042089829221, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00020483299158513546, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "saturn" + }, + "setup": { + "duration": 0.046697250101715326, + "outcome": "passed" + }, + "call": { + "duration": 0.668349124956876, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005031249020248652, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 115, + "outcome": "skipped", + "keywords": [ + "test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.012287458986975253, + "outcome": "passed" + }, + "call": { + "duration": 0.00015287497080862522, + "outcome": "skipped", + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 124, 'Skipped: Skipping test_chat_non_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" + }, + "teardown": { + "duration": 0.00012162502389401197, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 115, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.007204124936833978, + "outcome": "passed" + }, + "call": { + "duration": 1.8676417920505628, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0001557499635964632, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 115, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.008226625039242208, + "outcome": "passed" + }, + "call": { + "duration": 3.2724285409785807, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002898330567404628, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 134, + "outcome": "skipped", + "keywords": [ + "test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.011927249957807362, + "outcome": "passed" + }, + "call": { + "duration": 0.00017358292825520039, + "outcome": "skipped", + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 143, 'Skipped: Skipping test_chat_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" + }, + "teardown": { + "duration": 0.00014037499204277992, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 134, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.008731417008675635, + "outcome": "passed" + }, + "call": { + "duration": 2.8333610829431564, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005132080987095833, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 134, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.016569208004511893, + "outcome": "passed" + }, + "call": { + "duration": 2.302010750048794, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00016108399722725153, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "calendar" + }, + "setup": { + "duration": 0.039960999973118305, + "outcome": "passed" + }, + "call": { + "duration": 7.661373125039972, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015833403449505568, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "math" + }, + "setup": { + "duration": 0.006928625050932169, + "outcome": "passed" + }, + "call": { + "duration": 2.762534625013359, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0006561250193044543, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "calendar" + }, + "setup": { + "duration": 0.008602249901741743, + "outcome": "passed" + }, + "call": { + "duration": 0.8311484589939937, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005021670367568731, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "math" + }, + "setup": { + "duration": 0.015500334091484547, + "outcome": "passed" + }, + "call": { + "duration": 2.505719291046262, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002619170118123293, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "calendar" + }, + "setup": { + "duration": 0.01948041608557105, + "outcome": "passed" + }, + "call": { + "duration": 0.6336237500654534, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00016637507360428572, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "math" + }, + "setup": { + "duration": 0.006810749997384846, + "outcome": "passed" + }, + "call": { + "duration": 1.9086956249084324, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00018824997823685408, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "calendar" + }, + "setup": { + "duration": 0.007881582947447896, + "outcome": "passed" + }, + "call": { + "duration": 0.7142562499502674, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0007035828894004226, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "math" + }, + "setup": { + "duration": 0.00848070892971009, + "outcome": "passed" + }, + "call": { + "duration": 1.5210869159782305, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00021216599270701408, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "calendar" + }, + "setup": { + "duration": 0.009669666993431747, + "outcome": "passed" + }, + "call": { + "duration": 1.3105999580584466, + "outcome": "passed" + }, + "teardown": { + "duration": 0.000588166993111372, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "math" + }, + "setup": { + "duration": 0.007745541981421411, + "outcome": "passed" + }, + "call": { + "duration": 3.250162083073519, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0001455000601708889, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "calendar" + }, + "setup": { + "duration": 0.009726207936182618, + "outcome": "passed" + }, + "call": { + "duration": 0.5564592910232022, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00019470800179988146, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "math" + }, + "setup": { + "duration": 0.018431040924042463, + "outcome": "passed" + }, + "call": { + "duration": 3.8501765420660377, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015279196668416262, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 203, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.007509749964810908, + "outcome": "passed" + }, + "call": { + "duration": 0.4906975000631064, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 222, + "message": "TypeError: object of type 'NoneType' has no len()" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 222, + "message": "TypeError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:222: TypeError" + }, + "teardown": { + "duration": 0.00023995805531740189, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 203, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.007144959061406553, + "outcome": "passed" + }, + "call": { + "duration": 3.818257624981925, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 222, + "message": "TypeError: object of type 'NoneType' has no len()" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 222, + "message": "TypeError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:222: TypeError" + }, + "teardown": { + "duration": 0.0002668750239536166, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 203, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.015290249953977764, + "outcome": "passed" + }, + "call": { + "duration": 1.5883799999719486, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 222, + "message": "TypeError: object of type 'NoneType' has no len()" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 222, + "message": "TypeError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:222: TypeError" + }, + "teardown": { + "duration": 0.0008049579337239265, + "outcome": "passed" + } + } + ] +} diff --git a/tests/verifications/test_results/openai_1744154522.json b/tests/verifications/test_results/openai_1744154522.json deleted file mode 100644 index 310f3500d..000000000 --- a/tests/verifications/test_results/openai_1744154522.json +++ /dev/null @@ -1,2672 +0,0 @@ -{ - "created": 1744154576.251519, - "duration": 51.50739002227783, - "exitcode": 0, - "root": "/Users/erichuang/projects/llama-stack", - "environment": {}, - "summary": { - "skipped": 61, - "passed": 22, - "total": 83, - "collected": 83 - }, - "collectors": [ - { - "nodeid": "", - "outcome": "passed", - "result": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py", - "type": "Module" - } - ] - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py", - "outcome": "passed", - "result": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 138 - } - ] - } - ], - "tests": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0531630830373615, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0001657919492572546, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006063499953597784, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.00014004099648445845, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005356832989491522, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00016508297994732857, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006139832898043096, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00014450005255639553, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00542324990965426, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00014112505596131086, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.004965625004842877, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.00013720791321247816, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005054084002040327, - "outcome": "passed" - }, - "call": { - "duration": 0.6271341659594327, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00043925002682954073, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0159178749890998, - "outcome": "passed" - }, - "call": { - "duration": 0.44088316697161645, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006467089988291264, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.016705541987903416, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0005769169656559825, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012067249976098537, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.00016683305148035288, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009295083000324667, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00017204193864017725, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009534333017654717, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00020175008103251457, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006628665956668556, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0003687090938910842, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0061322919791564345, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.0003664169926196337, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00623433303553611, - "outcome": "passed" - }, - "call": { - "duration": 0.7898445830214769, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006602079374715686, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014758958015590906, - "outcome": "passed" - }, - "call": { - "duration": 1.1555478329537436, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0011781250359490514, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.03454475000035018, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.000967124942690134, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.025206666090525687, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.000189624959602952, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014331333106383681, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00023133307695388794, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009339665994048119, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00020329200197011232, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010387042071670294, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00018254201859235764, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012297999928705394, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.00018662505317479372, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006984042003750801, - "outcome": "passed" - }, - "call": { - "duration": 0.32529433304443955, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0033042499562725425, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01832079200539738, - "outcome": "passed" - }, - "call": { - "duration": 0.48440287495031953, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00047233293298631907, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.02893691696226597, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0001747499918565154, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006553041050210595, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.00016829196829348803, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013746666954830289, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00019237503875046968, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007175332983024418, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.0001873329747468233, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006127291941083968, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00019004102796316147, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006421791040338576, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.0001611249754205346, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009806249989196658, - "outcome": "passed" - }, - "call": { - "duration": 0.9556747920578346, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004937920020893216, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.03146500000730157, - "outcome": "passed" - }, - "call": { - "duration": 1.082494750036858, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0006242080125957727, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.021534667001105845, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0003469999646767974, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.025929750059731305, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.0008774169255048037, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012507125036790967, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00022008304949849844, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008156375028192997, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.0002079169498756528, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o]", - "lineno": 60, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_image[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012587749981321394, - "outcome": "passed" - }, - "call": { - "duration": 2.7379885419504717, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00044579198583960533, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "lineno": 60, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.017111250082962215, - "outcome": "passed" - }, - "call": { - "duration": 2.599374584038742, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009177909232676029, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.02198700001463294, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00042749999556690454, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.015032917028293014, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00041016703471541405, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013976250076666474, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00027600000612437725, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00799729092977941, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.00020320899784564972, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o]", - "lineno": 75, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_image[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010483540943823755, - "outcome": "passed" - }, - "call": { - "duration": 4.249965250026435, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0008596250554546714, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o-mini]", - "lineno": 75, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_image[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.018141582957468927, - "outcome": "passed" - }, - "call": { - "duration": 2.297856790944934, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005075830267742276, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.017144332989118993, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0006829580524936318, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009827250032685697, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.00024204188957810402, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006737958989106119, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00022729102056473494, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006030917051248252, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00022229203023016453, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009183833957649767, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00022629194427281618, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007097500027157366, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.00826825003605336, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006604874972254038, - "outcome": "passed" - }, - "call": { - "duration": 1.4057738750707358, - "outcome": "passed" - }, - "teardown": { - "duration": 0.000506040989421308, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.015966624952852726, - "outcome": "passed" - }, - "call": { - "duration": 0.540478374925442, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009536249563097954, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.020631707971915603, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0004928340204060078, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.016745459055528045, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.0003412909572944045, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012252667103894055, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00028650008607655764, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01128904102370143, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00027041707653552294, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009191332967020571, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0002074999501928687, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007687666919082403, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.0002027079463005066, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007542708073742688, - "outcome": "passed" - }, - "call": { - "duration": 4.244797708000988, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0012778330128639936, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.026919999974779785, - "outcome": "passed" - }, - "call": { - "duration": 9.006108874920756, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00046324997674673796, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01554666692391038, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0004023330984637141, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007354958914220333, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.0002900830004364252, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.017274250043556094, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002668329980224371, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006813667016103864, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.00024500000290572643, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007385291974060237, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00017024995759129524, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00857366609852761, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.00016850000247359276, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005570041947066784, - "outcome": "passed" - }, - "call": { - "duration": 0.8564215000951663, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004029169213026762, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00786762498319149, - "outcome": "passed" - }, - "call": { - "duration": 0.6419672920601442, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005102079594507813, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.017147499951533973, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00032350001856684685, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01194737502373755, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.0005004579434171319, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010250666993670166, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00022554199676960707, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007847042055800557, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.000283458037301898, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008078000042587519, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001794169656932354, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007204750087112188, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.00017725001089274883, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006797667010687292, - "outcome": "passed" - }, - "call": { - "duration": 5.411579457926564, - "outcome": "passed" - }, - "teardown": { - "duration": 0.001134666963480413, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.025059624924324453, - "outcome": "passed" - }, - "call": { - "duration": 9.112342999898829, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009202499641105533, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.024287916952744126, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider openai does not support model Llama-3.3-70B-Instruct')" - }, - "teardown": { - "duration": 0.00015587499365210533, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006531457998789847, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00014670798555016518, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006190375075675547, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider openai does not support model Llama-4-Scout-17B-16E-Instruct')" - }, - "teardown": { - "duration": 0.0001603750279173255, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005670750048011541, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001479999627918005, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005662833107635379, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider openai does not support model Llama-4-Maverick-17B-128E-Instruct')" - }, - "teardown": { - "duration": 0.0001480829669162631, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "lineno": 138, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00573637499473989, - "outcome": "passed" - }, - "call": { - "duration": 0.6269576249178499, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0010142088867723942, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "lineno": 138, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01623620803002268, - "outcome": "passed" - }, - "call": { - "duration": 0.7144521250156686, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0011040839599445462, - "outcome": "passed" - } - } - ] -} diff --git a/tests/verifications/test_results/openai_1744264304.json b/tests/verifications/test_results/openai_1744264304.json new file mode 100644 index 000000000..fe9c2fcac --- /dev/null +++ b/tests/verifications/test_results/openai_1744264304.json @@ -0,0 +1,868 @@ +{ + "created": 1744264338.9923031, + "duration": 32.825536012649536, + "exitcode": 0, + "root": "/Users/erichuang/projects/llama-stack", + "environment": {}, + "summary": { + "passed": 22, + "total": 22, + "collected": 22 + }, + "collectors": [ + { + "nodeid": "", + "outcome": "passed", + "result": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py", + "type": "Module" + } + ] + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py", + "outcome": "passed", + "result": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]", + "type": "Function", + "lineno": 203 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 203 + } + ] + } + ], + "tests": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[gpt-4o-earth]", + "parametrize", + "pytestmark", + "gpt-4o-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "earth" + }, + "setup": { + "duration": 0.05381445901002735, + "outcome": "passed" + }, + "call": { + "duration": 0.49848275003023446, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00018287496641278267, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[gpt-4o-saturn]", + "parametrize", + "pytestmark", + "gpt-4o-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "saturn" + }, + "setup": { + "duration": 0.007965500000864267, + "outcome": "passed" + }, + "call": { + "duration": 0.9293275829404593, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00018229195848107338, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[gpt-4o-mini-earth]", + "parametrize", + "pytestmark", + "gpt-4o-mini-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "earth" + }, + "setup": { + "duration": 0.00875679193995893, + "outcome": "passed" + }, + "call": { + "duration": 0.5793640419142321, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005307920509949327, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[gpt-4o-mini-saturn]", + "parametrize", + "pytestmark", + "gpt-4o-mini-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "saturn" + }, + "setup": { + "duration": 0.01076845801435411, + "outcome": "passed" + }, + "call": { + "duration": 0.8752291660057381, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004834589781239629, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[gpt-4o-earth]", + "parametrize", + "pytestmark", + "gpt-4o-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "earth" + }, + "setup": { + "duration": 0.01662245800253004, + "outcome": "passed" + }, + "call": { + "duration": 0.8336971249664202, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0024086670018732548, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[gpt-4o-saturn]", + "parametrize", + "pytestmark", + "gpt-4o-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "saturn" + }, + "setup": { + "duration": 0.009416291955858469, + "outcome": "passed" + }, + "call": { + "duration": 0.43594495789147913, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0009131249971687794, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[gpt-4o-mini-earth]", + "parametrize", + "pytestmark", + "gpt-4o-mini-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "earth" + }, + "setup": { + "duration": 0.013155042077414691, + "outcome": "passed" + }, + "call": { + "duration": 0.6119836670113727, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00023804197553545237, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[gpt-4o-mini-saturn]", + "parametrize", + "pytestmark", + "gpt-4o-mini-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "saturn" + }, + "setup": { + "duration": 0.009004916995763779, + "outcome": "passed" + }, + "call": { + "duration": 0.8327413749648258, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00046841695439070463, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]", + "lineno": 115, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_image[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.009574208059348166, + "outcome": "passed" + }, + "call": { + "duration": 2.221839000005275, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015945907216519117, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]", + "lineno": 115, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_image[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.0084402080392465, + "outcome": "passed" + }, + "call": { + "duration": 2.298736457945779, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002423750702291727, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]", + "lineno": 134, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_image[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.007330416003242135, + "outcome": "passed" + }, + "call": { + "duration": 4.062959833070636, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015470804646611214, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]", + "lineno": 134, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_image[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.019998832955025136, + "outcome": "passed" + }, + "call": { + "duration": 2.609432084020227, + "outcome": "passed" + }, + "teardown": { + "duration": 0.005618917057290673, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[gpt-4o-calendar]", + "parametrize", + "pytestmark", + "gpt-4o-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "calendar" + }, + "setup": { + "duration": 0.00867662497330457, + "outcome": "passed" + }, + "call": { + "duration": 0.6856697499752045, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00018445902969688177, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[gpt-4o-math]", + "parametrize", + "pytestmark", + "gpt-4o-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "math" + }, + "setup": { + "duration": 0.01139050000347197, + "outcome": "passed" + }, + "call": { + "duration": 2.764390083961189, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0003164170775562525, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", + "parametrize", + "pytestmark", + "gpt-4o-mini-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "calendar" + }, + "setup": { + "duration": 0.01321374997496605, + "outcome": "passed" + }, + "call": { + "duration": 0.8284227909753099, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00030170800164341927, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[gpt-4o-mini-math]", + "parametrize", + "pytestmark", + "gpt-4o-mini-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "math" + }, + "setup": { + "duration": 0.013477458036504686, + "outcome": "passed" + }, + "call": { + "duration": 2.4146235829684883, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00025754200760275126, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[gpt-4o-calendar]", + "parametrize", + "pytestmark", + "gpt-4o-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "calendar" + }, + "setup": { + "duration": 0.006940583931282163, + "outcome": "passed" + }, + "call": { + "duration": 0.5102092920569703, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00023379107005894184, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[gpt-4o-math]", + "parametrize", + "pytestmark", + "gpt-4o-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "math" + }, + "setup": { + "duration": 0.007166999974288046, + "outcome": "passed" + }, + "call": { + "duration": 3.5751801669830456, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015041697770357132, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[gpt-4o-mini-calendar]", + "parametrize", + "pytestmark", + "gpt-4o-mini-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "calendar" + }, + "setup": { + "duration": 0.010652625001966953, + "outcome": "passed" + }, + "call": { + "duration": 0.6648182499920949, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0008647920330986381, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[gpt-4o-mini-math]", + "parametrize", + "pytestmark", + "gpt-4o-mini-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "math" + }, + "setup": { + "duration": 0.007372208056040108, + "outcome": "passed" + }, + "call": { + "duration": 2.80747462506406, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00028124998789280653, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]", + "lineno": 203, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_calling[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.01625587500166148, + "outcome": "passed" + }, + "call": { + "duration": 0.6878769160248339, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002637499710544944, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", + "lineno": 203, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.008817250025458634, + "outcome": "passed" + }, + "call": { + "duration": 0.7181202919455245, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0017147079342976213, + "outcome": "passed" + } + } + ] +} diff --git a/tests/verifications/test_results/together_1744154399.json b/tests/verifications/test_results/together_1744154399.json deleted file mode 100644 index ae801e83b..000000000 --- a/tests/verifications/test_results/together_1744154399.json +++ /dev/null @@ -1,2830 +0,0 @@ -{ - "created": 1744154470.9868789, - "duration": 59.6187219619751, - "exitcode": 1, - "root": "/Users/erichuang/projects/llama-stack", - "environment": {}, - "summary": { - "skipped": 52, - "passed": 21, - "failed": 10, - "total": 83, - "collected": 83 - }, - "collectors": [ - { - "nodeid": "", - "outcome": "passed", - "result": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py", - "type": "Module" - } - ] - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py", - "outcome": "passed", - "result": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 25 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 40 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 60 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 75 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 95 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "type": "Function", - "lineno": 117 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "type": "Function", - "lineno": 138 - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "type": "Function", - "lineno": 138 - } - ] - } - ], - "tests": [ - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.39231995795853436, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0002014160854741931, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0071710830088704824, - "outcome": "passed" - }, - "call": { - "duration": 0.7968309168936685, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004362498875707388, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012780916062183678, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00029158301185816526, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013563874992541969, - "outcome": "passed" - }, - "call": { - "duration": 0.5071627920260653, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005456249928101897, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.020708917058072984, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00030325003899633884, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014170082984492183, - "outcome": "passed" - }, - "call": { - "duration": 1.2383921250002459, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009597090538591146, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013402250013314188, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00028245802968740463, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008693707990460098, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00016249995678663254, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005904874997213483, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0001960420049726963, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006532749976031482, - "outcome": "passed" - }, - "call": { - "duration": 0.5410778749501333, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00019516597967594862, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009374375105835497, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00015524995978921652, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007205875008367002, - "outcome": "passed" - }, - "call": { - "duration": 0.42584729101508856, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009506250498816371, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.029625958995893598, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001860830234363675, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 25, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.023576707928441465, - "outcome": "passed" - }, - "call": { - "duration": 1.2249365829629824, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004278330598026514, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014816291979514062, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00029558304231613874, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "lineno": 25, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_basic[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012769333901815116, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 26, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00024329195730388165, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009145625052042305, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00021195888984948397, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0133140409598127, - "outcome": "passed" - }, - "call": { - "duration": 0.7228892090497538, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004301250446587801, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013998750015161932, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002961249556392431, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 40, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012570249964483082, - "outcome": "passed" - }, - "call": { - "duration": 0.7193170419195667, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\ncorrect_model_name = 'meta-llama/Llama-4-Scout-17B-16E-Instruct'\n\n @pytest.mark.parametrize(\"model\", chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"model\"])\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_basic(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:54: IndexError" - }, - "teardown": { - "duration": 0.00022504094522446394, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006660082959569991, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001445829402655363, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 40, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_basic[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.021228999947197735, - "outcome": "passed" - }, - "call": { - "duration": 1.5670281670754775, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\ncorrect_model_name = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\n\n @pytest.mark.parametrize(\"model\", chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"model\"])\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_basic(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:54: IndexError" - }, - "teardown": { - "duration": 0.0004656669916585088, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009595917072147131, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00025625003036111593, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009242708911187947, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0002484159776940942, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00905474997125566, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00023312494158744812, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 40, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007183165987953544, - "outcome": "passed" - }, - "call": { - "duration": 1.0667660840554163, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005163750611245632, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.05233616603072733, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0003471659729257226, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 40, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.015932541922666132, - "outcome": "passed" - }, - "call": { - "duration": 0.41540695796720684, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\ncorrect_model_name = 'meta-llama/Llama-4-Scout-17B-16E-Instruct'\n\n @pytest.mark.parametrize(\"model\", chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"model\"])\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_basic(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:54: IndexError" - }, - "teardown": { - "duration": 0.0002845840062946081, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007243875064887106, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00016258296091109514, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 40, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_basic[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009275624994188547, - "outcome": "passed" - }, - "call": { - "duration": 1.43309554096777, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 54, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\ncorrect_model_name = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\n\n @pytest.mark.parametrize(\"model\", chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"model\"])\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_basic(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:54: IndexError" - }, - "teardown": { - "duration": 0.0003690000157803297, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011570582981221378, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00024937500711530447, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "lineno": 40, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_basic[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010756584000773728, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 41, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00026183295994997025, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.008863041992299259, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00023283297196030617, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 60, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.007975792046636343, - "outcome": "passed" - }, - "call": { - "duration": 2.1585817909799516, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005107080796733499, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.05228079203516245, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0017226670170202851, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 60, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009964749915525317, - "outcome": "passed" - }, - "call": { - "duration": 4.6593364590080455, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009852920193225145, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.023214041953906417, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0003567079547792673, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "lineno": 60, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_image[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01705008395947516, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 61, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0003085409989580512, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014711958006955683, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0003121249610558152, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 75, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01843333407305181, - "outcome": "passed" - }, - "call": { - "duration": 2.8683876669965684, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 89, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 89, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\ncorrect_model_name = 'meta-llama/Llama-4-Scout-17B-16E-Instruct'\n\n @pytest.mark.parametrize(\"model\", chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"model\"])\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_image(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:89: IndexError" - }, - "teardown": { - "duration": 0.00028662499971687794, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00653208396397531, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.00021291698794811964, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 75, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_image[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.006028458010405302, - "outcome": "passed" - }, - "call": { - "duration": 4.981105040991679, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 89, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 89, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\ncorrect_model_name = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\n\n @pytest.mark.parametrize(\"model\", chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"model\"])\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_image(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:89: IndexError" - }, - "teardown": { - "duration": 0.0010110830189660192, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01591233303770423, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0003783750580623746, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_image[input_output0-gpt-4o-mini]", - "lineno": 75, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_image[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010691000032238662, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 76, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00027445796877145767, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01258529198821634, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.0002044580178335309, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010904791066423059, - "outcome": "passed" - }, - "call": { - "duration": 0.8311828339938074, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00048687495291233063, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.029216791968792677, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002269580727443099, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.013182583032175899, - "outcome": "passed" - }, - "call": { - "duration": 1.7446029160637408, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0008087089518085122, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.02009516698308289, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.000320291961543262, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.015216833096928895, - "outcome": "passed" - }, - "call": { - "duration": 0.8049291669158265, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005109170451760292, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0171551660168916, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0005707499803975224, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01131124992389232, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0003044159384444356, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0054290409898385406, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00014645792543888092, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011368000064976513, - "outcome": "passed" - }, - "call": { - "duration": 4.363120499998331, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0003998749889433384, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.04945958300959319, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0002401659730821848, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.011090958025306463, - "outcome": "passed" - }, - "call": { - "duration": 4.699277375009842, - "outcome": "passed" - }, - "teardown": { - "duration": 0.000689250067807734, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.020744459005072713, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0001836250303313136, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 95, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005926624988205731, - "outcome": "passed" - }, - "call": { - "duration": 2.7814464160474017, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0009554170537739992, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.03027112502604723, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.0003245410043746233, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "lineno": 95, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_structured_output[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.009138708002865314, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 96, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0001919999485835433, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0064505410846322775, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00015720794908702374, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00582624995149672, - "outcome": "passed" - }, - "call": { - "duration": 0.8302567919017747, - "outcome": "passed" - }, - "teardown": { - "duration": 0.00020354206208139658, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.014151416951790452, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.00034970801789313555, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 117, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012150791939347982, - "outcome": "passed" - }, - "call": { - "duration": 0.7078855830477551, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'content': 'Alice and Bob ar...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\ncorrect_model_name = 'meta-llama/Llama-4-Scout-17B-16E-Instruct'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n response_format=input_output[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:135: IndexError" - }, - "teardown": { - "duration": 0.0008542909054085612, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.022667833953164518, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0006820419803261757, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 117, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.01285991701297462, - "outcome": "passed" - }, - "call": { - "duration": 0.6888671671040356, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'content': 'Alice and Bob ar...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\ncorrect_model_name = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n response_format=input_output[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:135: IndexError" - }, - "teardown": { - "duration": 0.0007953330641612411, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.015029000001959503, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00015666603576391935, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.00622316705994308, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0001533749746158719, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-3.3-8B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-8B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005598834017291665, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model Llama-3.3-8B-Instruct')" - }, - "teardown": { - "duration": 0.00013062497600913048, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "lineno": 117, - "outcome": "passed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.005876541952602565, - "outcome": "passed" - }, - "call": { - "duration": 7.561108374968171, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0004579999949783087, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.018791542039252818, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0004900830099359155, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 117, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0065952910808846354, - "outcome": "passed" - }, - "call": { - "duration": 2.6826554159633815, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solution step by step.',... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\ncorrect_model_name = 'meta-llama/Llama-4-Scout-17B-16E-Instruct'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n response_format=input_output[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:135: IndexError" - }, - "teardown": { - "duration": 0.0009669580031186342, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.019489208003506064, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0007419160101562738, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 117, - "outcome": "failed", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output1-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012299792026169598, - "outcome": "passed" - }, - "call": { - "duration": 2.829678333015181, - "outcome": "failed", - "crash": { - "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError: list index out of range" - }, - "traceback": [ - { - "path": "tests/verifications/openai/test_chat_completion.py", - "lineno": 135, - "message": "IndexError" - } - ], - "longrepr": "openai_client = \ninput_output = {'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solution step by step.',... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\ncorrect_model_name = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\n\n @pytest.mark.parametrize(\n \"model\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"model\"],\n )\n @pytest.mark.parametrize(\n \"input_output\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"input_output\"],\n )\n def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name):\n response = openai_client.chat.completions.create(\n model=correct_model_name,\n messages=input_output[\"input\"][\"messages\"],\n response_format=input_output[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai/test_chat_completion.py:135: IndexError" - }, - "teardown": { - "duration": 0.0010418329620733857, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-gpt-4o]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.016189916990697384, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.00027966592460870743, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "lineno": 117, - "outcome": "skipped", - "keywords": [ - "test_chat_streaming_structured_output[input_output1-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output1-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.010247125057503581, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 118, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.00023291702382266521, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "lineno": 138, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-3.3-70B-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-3.3-70B-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012632582918740809, - "outcome": "passed" - }, - "call": { - "duration": 0.40774812502786517, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0007319580763578415, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.019890791969373822, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider together does not support model Llama-4-Scout-17B-16E')" - }, - "teardown": { - "duration": 0.0006391670322045684, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "lineno": 138, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Scout-17B-16E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Scout-17B-16E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.0178165000397712, - "outcome": "passed" - }, - "call": { - "duration": 0.38229950005188584, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0010000420734286308, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.024259291938506067, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider together does not support model Llama-4-Maverick-17B-128E')" - }, - "teardown": { - "duration": 0.0003602079814299941, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "lineno": 138, - "outcome": "passed", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-Llama-4-Maverick-17B-128E-Instruct]", - "parametrize", - "pytestmark", - "input_output0-Llama-4-Maverick-17B-128E-Instruct", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012425708002410829, - "outcome": "passed" - }, - "call": { - "duration": 0.7610744580160826, - "outcome": "passed" - }, - "teardown": { - "duration": 0.0005935420049354434, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-gpt-4o]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.018717541941441596, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider together does not support model gpt-4o')" - }, - "teardown": { - "duration": 0.000659791985526681, - "outcome": "passed" - } - }, - { - "nodeid": "tests/verifications/openai/test_chat_completion.py::test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "lineno": 138, - "outcome": "skipped", - "keywords": [ - "test_chat_non_streaming_tool_calling[input_output0-gpt-4o-mini]", - "parametrize", - "pytestmark", - "input_output0-gpt-4o-mini", - "test_chat_completion.py", - "openai", - "verifications", - "tests", - "llama-stack", - "" - ], - "setup": { - "duration": 0.012784749967977405, - "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai/test_chat_completion.py', 139, 'Skipped: Provider together does not support model gpt-4o-mini')" - }, - "teardown": { - "duration": 0.0002145830076187849, - "outcome": "passed" - } - } - ] -} diff --git a/tests/verifications/test_results/together_1744264258.json b/tests/verifications/test_results/together_1744264258.json new file mode 100644 index 000000000..c38dd52b5 --- /dev/null +++ b/tests/verifications/test_results/together_1744264258.json @@ -0,0 +1,1420 @@ +{ + "created": 1744264304.064288, + "duration": 42.470197916030884, + "exitcode": 1, + "root": "/Users/erichuang/projects/llama-stack", + "environment": {}, + "summary": { + "passed": 21, + "failed": 10, + "skipped": 2, + "total": 33, + "collected": 33 + }, + "collectors": [ + { + "nodeid": "", + "outcome": "passed", + "result": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py", + "type": "Module" + } + ] + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py", + "outcome": "passed", + "result": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", + "type": "Function", + "lineno": 72 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", + "type": "Function", + "lineno": 91 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 115 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 134 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", + "type": "Function", + "lineno": 158 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", + "type": "Function", + "lineno": 181 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 203 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 203 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 203 + } + ] + } + ], + "tests": [ + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "earth" + }, + "setup": { + "duration": 0.06113254197407514, + "outcome": "passed" + }, + "call": { + "duration": 1.0720349580515176, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015966698992997408, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "saturn" + }, + "setup": { + "duration": 0.006908083101734519, + "outcome": "passed" + }, + "call": { + "duration": 0.5013210839824751, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005375830223783851, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "earth" + }, + "setup": { + "duration": 0.006910792086273432, + "outcome": "passed" + }, + "call": { + "duration": 0.5142245410243049, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004069580463692546, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "saturn" + }, + "setup": { + "duration": 0.009730000048875809, + "outcome": "passed" + }, + "call": { + "duration": 0.40133179200347513, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004558749496936798, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "earth" + }, + "setup": { + "duration": 0.008247417048551142, + "outcome": "passed" + }, + "call": { + "duration": 0.7914331250358373, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00020262505859136581, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", + "lineno": 72, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "saturn" + }, + "setup": { + "duration": 0.00922900007572025, + "outcome": "passed" + }, + "call": { + "duration": 1.2742049579974264, + "outcome": "passed" + }, + "teardown": { + "duration": 0.000688415952026844, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "earth" + }, + "setup": { + "duration": 0.006949124974198639, + "outcome": "passed" + }, + "call": { + "duration": 0.4681705000111833, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00017795804888010025, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", + "lineno": 91, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "saturn" + }, + "setup": { + "duration": 0.008564374991692603, + "outcome": "passed" + }, + "call": { + "duration": 1.7430362500017509, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00015312491450458765, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", + "lineno": 91, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "earth" + }, + "setup": { + "duration": 0.007404124946333468, + "outcome": "passed" + }, + "call": { + "duration": 0.515926624997519, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + }, + "teardown": { + "duration": 0.0002389999572187662, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", + "lineno": 91, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "saturn" + }, + "setup": { + "duration": 0.0071305419551208615, + "outcome": "passed" + }, + "call": { + "duration": 0.37054662499576807, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + }, + "teardown": { + "duration": 0.0006014580139890313, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", + "lineno": 91, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "earth" + }, + "setup": { + "duration": 0.007489709067158401, + "outcome": "passed" + }, + "call": { + "duration": 0.7767745839664713, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + }, + "teardown": { + "duration": 0.00025491707492619753, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", + "lineno": 91, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "saturn" + }, + "setup": { + "duration": 0.006736499955877662, + "outcome": "passed" + }, + "call": { + "duration": 0.43948554201051593, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 109, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + }, + "teardown": { + "duration": 0.0002264160430058837, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 115, + "outcome": "skipped", + "keywords": [ + "test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.007171708042733371, + "outcome": "passed" + }, + "call": { + "duration": 0.00013554200995713472, + "outcome": "skipped", + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 124, 'Skipped: Skipping test_chat_non_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" + }, + "teardown": { + "duration": 0.0001235839445143938, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 115, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.008639499894343317, + "outcome": "passed" + }, + "call": { + "duration": 1.4001279999502003, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00014812499284744263, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 115, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.015450250008143485, + "outcome": "passed" + }, + "call": { + "duration": 3.3522649579681456, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00041629199404269457, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 134, + "outcome": "skipped", + "keywords": [ + "test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.007634000037796795, + "outcome": "passed" + }, + "call": { + "duration": 0.0001563339028507471, + "outcome": "skipped", + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 143, 'Skipped: Skipping test_chat_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" + }, + "teardown": { + "duration": 0.0001324999611824751, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 134, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.007050334010273218, + "outcome": "passed" + }, + "call": { + "duration": 1.7063317500287667, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 152, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 152, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:152: IndexError" + }, + "teardown": { + "duration": 0.0002109999768435955, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 134, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.006729208980686963, + "outcome": "passed" + }, + "call": { + "duration": 3.829621708020568, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 152, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 152, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:152: IndexError" + }, + "teardown": { + "duration": 0.0002882500411942601, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "calendar" + }, + "setup": { + "duration": 0.007713916013017297, + "outcome": "passed" + }, + "call": { + "duration": 2.48285808309447, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00020350003615021706, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "math" + }, + "setup": { + "duration": 0.010098082944750786, + "outcome": "passed" + }, + "call": { + "duration": 1.6994713749736547, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00014512497000396252, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "calendar" + }, + "setup": { + "duration": 0.006934792036190629, + "outcome": "passed" + }, + "call": { + "duration": 1.277176082949154, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004985419800505042, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "math" + }, + "setup": { + "duration": 0.012558708898723125, + "outcome": "passed" + }, + "call": { + "duration": 2.442075416096486, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0003505420172587037, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "calendar" + }, + "setup": { + "duration": 0.012642999994568527, + "outcome": "passed" + }, + "call": { + "duration": 0.9305703329155222, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00016004196368157864, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", + "lineno": 158, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "math" + }, + "setup": { + "duration": 0.008792415959760547, + "outcome": "passed" + }, + "call": { + "duration": 2.194098167004995, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0003667499404400587, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "calendar" + }, + "setup": { + "duration": 0.01219504198525101, + "outcome": "passed" + }, + "call": { + "duration": 2.045097667025402, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00029958400409668684, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", + "lineno": 181, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "math" + }, + "setup": { + "duration": 0.014203459024429321, + "outcome": "passed" + }, + "call": { + "duration": 1.3079068749211729, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0001914579188451171, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", + "lineno": 181, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "calendar" + }, + "setup": { + "duration": 0.04714570892974734, + "outcome": "passed" + }, + "call": { + "duration": 0.44743770791683346, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + }, + "teardown": { + "duration": 0.00022199994418770075, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", + "lineno": 181, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "math" + }, + "setup": { + "duration": 0.012237709015607834, + "outcome": "passed" + }, + "call": { + "duration": 3.180020791012794, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + }, + "teardown": { + "duration": 0.000273333047516644, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", + "lineno": 181, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "calendar" + }, + "setup": { + "duration": 0.013312208000570536, + "outcome": "passed" + }, + "call": { + "duration": 0.4110311249969527, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + }, + "teardown": { + "duration": 0.00022975006140768528, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", + "lineno": 181, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "math" + }, + "setup": { + "duration": 0.006676917080767453, + "outcome": "passed" + }, + "call": { + "duration": 2.316411833046004, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 200, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + }, + "teardown": { + "duration": 0.000245374976657331, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 203, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.007064500008709729, + "outcome": "passed" + }, + "call": { + "duration": 0.606806542025879, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00046320806723088026, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 203, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.009071375010535121, + "outcome": "passed" + }, + "call": { + "duration": 0.41908070899080485, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00026074994821101427, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 203, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.0068333749659359455, + "outcome": "passed" + }, + "call": { + "duration": 0.8904451669659466, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005833340110257268, + "outcome": "passed" + } + } + ] +} From de6ec5803e18e336c936c5d5f8d9d8a9302b14bf Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Thu, 10 Apr 2025 11:37:31 -0600 Subject: [PATCH 05/39] fix: Fix linter failures from #1921 (#1932) # What does this PR do? fix: Fix linter failures from #1921 Signed-off-by: Francisco Javier Arceo --- tests/verifications/conf/cerebras.yaml | 2 +- tests/verifications/conf/fireworks.yaml | 2 +- tests/verifications/conf/groq.yaml | 2 +- tests/verifications/conf/openai.yaml | 2 +- tests/verifications/conf/together.yaml | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tests/verifications/conf/cerebras.yaml b/tests/verifications/conf/cerebras.yaml index 32a60e766..5b19b4916 100644 --- a/tests/verifications/conf/cerebras.yaml +++ b/tests/verifications/conf/cerebras.yaml @@ -7,4 +7,4 @@ model_display_names: test_exclusions: llama-3.3-70b: - test_chat_non_streaming_image - - test_chat_streaming_image \ No newline at end of file + - test_chat_streaming_image diff --git a/tests/verifications/conf/fireworks.yaml b/tests/verifications/conf/fireworks.yaml index 30d6e4d75..f55b707ba 100644 --- a/tests/verifications/conf/fireworks.yaml +++ b/tests/verifications/conf/fireworks.yaml @@ -11,4 +11,4 @@ model_display_names: test_exclusions: accounts/fireworks/models/llama-v3p3-70b-instruct: - test_chat_non_streaming_image - - test_chat_streaming_image \ No newline at end of file + - test_chat_streaming_image diff --git a/tests/verifications/conf/groq.yaml b/tests/verifications/conf/groq.yaml index ef31a66e5..7871036dc 100644 --- a/tests/verifications/conf/groq.yaml +++ b/tests/verifications/conf/groq.yaml @@ -11,4 +11,4 @@ model_display_names: test_exclusions: llama-3.3-70b-versatile: - test_chat_non_streaming_image - - test_chat_streaming_image \ No newline at end of file + - test_chat_streaming_image diff --git a/tests/verifications/conf/openai.yaml b/tests/verifications/conf/openai.yaml index 89ae698f3..95a6259f7 100644 --- a/tests/verifications/conf/openai.yaml +++ b/tests/verifications/conf/openai.yaml @@ -6,4 +6,4 @@ models: model_display_names: gpt-4o: gpt-4o gpt-4o-mini: gpt-4o-mini -test_exclusions: {} \ No newline at end of file +test_exclusions: {} diff --git a/tests/verifications/conf/together.yaml b/tests/verifications/conf/together.yaml index 80e86fa77..258616662 100644 --- a/tests/verifications/conf/together.yaml +++ b/tests/verifications/conf/together.yaml @@ -11,4 +11,4 @@ model_display_names: test_exclusions: meta-llama/Llama-3.3-70B-Instruct-Turbo: - test_chat_non_streaming_image - - test_chat_streaming_image \ No newline at end of file + - test_chat_streaming_image From 79fc81f78f737057a4af3567fa533db20774513a Mon Sep 17 00:00:00 2001 From: Ilya Kolchinsky <58424190+ilya-kolchinsky@users.noreply.github.com> Date: Thu, 10 Apr 2025 22:38:31 +0200 Subject: [PATCH 06/39] fix: Playground RAG page errors (#1928) # What does this PR do? This PR fixes two issues with the RAG page of the Playground UI: 1. When the user modifies a configurable setting via a widget (e.g., system prompt, temperature, etc.), the agent is not recreated. Thus, the change has no effect and the user gets no indication of that. 2. After the first issue is fixed, it becomes possible to recreate the agent mid-conversation or even mid-generation. To mitigate this, widgets related to agent configuration are now disabled when a conversation is in progress (i.e., when the chat is non-empty). They are automatically enabled again when the user resets the chat history. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Send a message to the agent via the chat; - The widgets in charge of the agent parameters will become disabled at this point; - Send a second message asking the model about the content of the first message; - The reply will indicate that the two messages were sent over the same session, that is, the agent was not recreated; - Click the 'Clear Chat' button; - All widgets will be enabled and a new agent will be created (which can be validated by sending another message). --- .../distribution/ui/page/playground/rag.py | 59 ++++++++++++++----- 1 file changed, 44 insertions(+), 15 deletions(-) diff --git a/llama_stack/distribution/ui/page/playground/rag.py b/llama_stack/distribution/ui/page/playground/rag.py index bb31bd2a7..be222f840 100644 --- a/llama_stack/distribution/ui/page/playground/rag.py +++ b/llama_stack/distribution/ui/page/playground/rag.py @@ -16,6 +16,13 @@ from llama_stack.distribution.ui.modules.utils import data_url_from_file def rag_chat_page(): st.title("🦙 RAG") + def reset_agent_and_chat(): + st.session_state.clear() + st.cache_resource.clear() + + def should_disable_input(): + return "messages" in st.session_state and len(st.session_state.messages) > 0 + with st.sidebar: # File/Directory Upload Section st.subheader("Upload Documents") @@ -69,21 +76,27 @@ def rag_chat_page(): vector_dbs = llama_stack_api.client.vector_dbs.list() vector_dbs = [vector_db.identifier for vector_db in vector_dbs] selected_vector_dbs = st.multiselect( - "Select Vector Databases", - vector_dbs, + label="Select Vector Databases", + options=vector_dbs, + on_change=reset_agent_and_chat, + disabled=should_disable_input(), ) available_models = llama_stack_api.client.models.list() available_models = [model.identifier for model in available_models if model.model_type == "llm"] selected_model = st.selectbox( - "Choose a model", - available_models, + label="Choose a model", + options=available_models, index=0, + on_change=reset_agent_and_chat, + disabled=should_disable_input(), ) system_prompt = st.text_area( "System Prompt", value="You are a helpful assistant. ", help="Initial instructions given to the AI to set its behavior and context", + on_change=reset_agent_and_chat, + disabled=should_disable_input(), ) temperature = st.slider( "Temperature", @@ -92,6 +105,8 @@ def rag_chat_page(): value=0.0, step=0.1, help="Controls the randomness of the response. Higher values make the output more creative and unexpected, lower values make it more conservative and predictable", + on_change=reset_agent_and_chat, + disabled=should_disable_input(), ) top_p = st.slider( @@ -100,12 +115,14 @@ def rag_chat_page(): max_value=1.0, value=0.95, step=0.1, + on_change=reset_agent_and_chat, + disabled=should_disable_input(), ) # Add clear chat button to sidebar if st.button("Clear Chat", use_container_width=True): - st.session_state.clear() - st.cache_resource.clear() + reset_agent_and_chat() + st.rerun() # Chat Interface if "messages" not in st.session_state: @@ -151,15 +168,8 @@ def rag_chat_page(): session_id = st.session_state["agent_session_id"] - # Chat input - if prompt := st.chat_input("Ask a question about your documents"): - # Add user message to chat history - st.session_state.messages.append({"role": "user", "content": prompt}) - - # Display user message - with st.chat_message("user"): - st.markdown(prompt) - + def process_prompt(prompt): + # Send the prompt to the agent response = agent.create_turn( messages=[ { @@ -188,5 +198,24 @@ def rag_chat_page(): st.session_state.messages.append({"role": "assistant", "content": full_response}) + # Chat input + if prompt := st.chat_input("Ask a question about your documents"): + # Add user message to chat history + st.session_state.messages.append({"role": "user", "content": prompt}) + + # Display user message + with st.chat_message("user"): + st.markdown(prompt) + + # store the prompt to process it after page refresh + st.session_state.prompt = prompt + + # force page refresh to disable the settings widgets + st.rerun() + + if "prompt" in st.session_state and st.session_state.prompt is not None: + process_prompt(st.session_state.prompt) + st.session_state.prompt = None + rag_chat_page() From edd9aaac3b22fe91e8f45e7c6bc6e3d9f97cb250 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Han?= Date: Thu, 10 Apr 2025 22:39:20 +0200 Subject: [PATCH 07/39] fix: use torchao 0.8.0 for inference (#1925) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? While building the "experimental-post-training" distribution, we encountered a version conflict between torchao with inference requiring version 0.5.0 and training currently depending on version 0.8.0. Resolves this error: ``` × No solution found when resolving dependencies: ╰─▶ Because you require torchao==0.5.0 and torchao==0.8.0, we can conclude that your requirements are unsatisfiable. ERROR 2025-04-10 10:41:22,597 llama_stack.distribution.build:128 uncategorized: Failed to build target test with return code 1 ``` Signed-off-by: Sébastien Han --- llama_stack/providers/registry/inference.py | 2 +- llama_stack/templates/dependencies.json | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/llama_stack/providers/registry/inference.py b/llama_stack/providers/registry/inference.py index aabb3bbdf..3c54cabcf 100644 --- a/llama_stack/providers/registry/inference.py +++ b/llama_stack/providers/registry/inference.py @@ -24,7 +24,7 @@ META_REFERENCE_DEPS = [ "zmq", "lm-format-enforcer", "sentence-transformers", - "torchao==0.5.0", + "torchao==0.8.0", "fbgemm-gpu-genai==1.1.2", ] diff --git a/llama_stack/templates/dependencies.json b/llama_stack/templates/dependencies.json index 053d6ef8a..b96191752 100644 --- a/llama_stack/templates/dependencies.json +++ b/llama_stack/templates/dependencies.json @@ -381,7 +381,7 @@ "sentence-transformers", "sentencepiece", "torch", - "torchao==0.5.0", + "torchao==0.8.0", "torchvision", "tqdm", "transformers", From 49955a06b10814058de9cab85331dd76433a31bd Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Thu, 10 Apr 2025 15:09:00 -0600 Subject: [PATCH 08/39] docs: Update quickstart page to structure things a little more for the novices (#1873) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? Another doc enhancement for https://github.com/meta-llama/llama-stack/issues/1818 Summary of changes: - `docs/source/distributions/configuration.md` - Updated dropdown title to include a more user-friendly description. - `docs/_static/css/my_theme.css` - Added styling for `

` elements to set a normal font weight. - `docs/source/distributions/starting_llama_stack_server.md` - Changed section headers from bold text to proper markdown headers (e.g., `##`). - Improved descriptions for starting Llama Stack server using different methods (library, container, conda, Kubernetes). - Enhanced clarity and structure by converting instructions into markdown headers and improved formatting. - `docs/source/getting_started/index.md` - Major restructuring of the "Quick Start" guide: - Added new introductory section for Llama Stack and its capabilities. - Reorganized steps into clearer subsections with proper markdown headers. - Replaced dropdowns with tabbed content for OS-specific instructions. - Added detailed steps for setting up and running the Llama Stack server and client. - Introduced new sections for running basic inference and building agents. - Enhanced readability and visual structure with emojis, admonitions, and examples. - `docs/source/providers/index.md` - Updated the list of LLM inference providers to include "Ollama." - Expanded the list of vector databases to include "SQLite-Vec." Let me know if you need further details! ## Test Plan Renders locally, included screenshot. # Documentation For https://github.com/meta-llama/llama-stack/issues/1818 Screenshot 2025-04-09 at 11 07 12 AM --------- Signed-off-by: Francisco Javier Arceo --- docs/_static/css/my_theme.css | 3 + docs/source/distributions/configuration.md | 2 +- .../starting_llama_stack_server.md | 8 +- .../getting_started/detailed_tutorial.md | 545 ++++++++++++++++++ docs/source/getting_started/index.md | 497 +++------------- docs/source/index.md | 3 +- docs/source/providers/index.md | 4 +- 7 files changed, 633 insertions(+), 429 deletions(-) create mode 100644 docs/source/getting_started/detailed_tutorial.md diff --git a/docs/_static/css/my_theme.css b/docs/_static/css/my_theme.css index 470452661..6f82f6358 100644 --- a/docs/_static/css/my_theme.css +++ b/docs/_static/css/my_theme.css @@ -17,6 +17,9 @@ display: none; } +h3 { + font-weight: normal; +} html[data-theme="dark"] .rst-content div[class^="highlight"] { background-color: #0b0b0b; } diff --git a/docs/source/distributions/configuration.md b/docs/source/distributions/configuration.md index 6cd5e161f..c06632991 100644 --- a/docs/source/distributions/configuration.md +++ b/docs/source/distributions/configuration.md @@ -2,7 +2,7 @@ The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution: -```{dropdown} Sample Configuration File +```{dropdown} 👋 Click here for a Sample Configuration File ```yaml version: 2 diff --git a/docs/source/distributions/starting_llama_stack_server.md b/docs/source/distributions/starting_llama_stack_server.md index 9be2e9ec5..f74de6d48 100644 --- a/docs/source/distributions/starting_llama_stack_server.md +++ b/docs/source/distributions/starting_llama_stack_server.md @@ -2,22 +2,22 @@ You can run a Llama Stack server in one of the following ways: -**As a Library**: +## As a Library: This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library) -**Container**: +## Container: Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details. -**Conda**: +## Conda: If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details. -**Kubernetes**: +## Kubernetes: If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details. diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md new file mode 100644 index 000000000..65582e8d8 --- /dev/null +++ b/docs/source/getting_started/detailed_tutorial.md @@ -0,0 +1,545 @@ +# Detailed Tutorial + +In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple agent. +A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with +tools (e.g., RAG, web search, code execution, etc.) for taking actions. +In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. + +Llama Stack is a stateful service with REST APIs to support seamless transition of AI applications across different environments. The server can be run in a variety of ways, including as a standalone binary, Docker container, or hosted service. You can build and test using a local server first and deploy to a hosted endpoint for production. + +In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/) +as the inference [provider](../providers/index.md#inference) for a Llama Model. + +## Step 1: Installation and Setup + +Install Ollama by following the instructions on the [Ollama website](https://ollama.com/download), then +download Llama 3.2 3B model, and then start the Ollama service. +```bash +ollama pull llama3.2:3b +ollama run llama3.2:3b --keepalive 60m +``` + +Install [uv](https://docs.astral.sh/uv/) to setup your virtual environment + +::::{tab-set} + +:::{tab-item} macOS and Linux +Use `curl` to download the script and execute it with `sh`: +```console +curl -LsSf https://astral.sh/uv/install.sh | sh +``` +::: + +:::{tab-item} Windows +Use `irm` to download the script and execute it with `iex`: + +```console +powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" +``` +::: +:::: + +Setup your virtual environment. + +```bash +uv venv --python 3.10 +source .venv/bin/activate +``` +## Step 2: Run Llama Stack +Llama Stack is a server that exposes multiple APIs, you connect with it using the Llama Stack client SDK. + +::::{tab-set} + +:::{tab-item} Using `venv` +You can use Python to build and run the Llama Stack server, which is useful for testing and development. + +Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup, +which defines the providers and their settings. +Now let's build and run the Llama Stack config for Ollama. + +```bash +INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run +``` +::: +:::{tab-item} Using `conda` +You can use Python to build and run the Llama Stack server, which is useful for testing and development. + +Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup, +which defines the providers and their settings. +Now let's build and run the Llama Stack config for Ollama. + +```bash +INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type conda --run +``` +::: +:::{tab-item} Using a Container +You can use a container image to run the Llama Stack server. We provide several container images for the server +component that works with different inference providers out of the box. For this guide, we will use +`llamastack/distribution-ollama` as the container image. If you'd like to build your own image or customize the +configurations, please check out [this guide](../references/index.md). + +First lets setup some environment variables and create a local directory to mount into the container’s file system. +```bash +export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" +export LLAMA_STACK_PORT=8321 +mkdir -p ~/.llama +``` +Then start the server using the container tool of your choice. For example, if you are running Docker you can use the +following command: +```bash +docker run -it \ + --pull always \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ~/.llama:/root/.llama \ + llamastack/distribution-ollama \ + --port $LLAMA_STACK_PORT \ + --env INFERENCE_MODEL=$INFERENCE_MODEL \ + --env OLLAMA_URL=http://host.docker.internal:11434 +``` +Note to start the container with Podman, you can do the same but replace `docker` at the start of the command with +`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL` +with `host.containers.internal`. + +The configuration YAML for the Ollama distribution is available at `distributions/ollama/run.yaml`. + +```{tip} + +Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host’s network directly so it can connect to Ollama running on `localhost:11434`. + +Linux users having issues running the above command should instead try the following: +```bash +docker run -it \ + --pull always \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ~/.llama:/root/.llama \ + --network=host \ + llamastack/distribution-ollama \ + --port $LLAMA_STACK_PORT \ + --env INFERENCE_MODEL=$INFERENCE_MODEL \ + --env OLLAMA_URL=http://localhost:11434 +``` +::: +:::: +You will see output like below: +``` +INFO: Application startup complete. +INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) +``` + +Now you can use the Llama Stack client to run inference and build agents! + +You can reuse the server setup or use the [Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/). +Note that the client package is already included in the `llama-stack` package. + +## Step 3: Run Client CLI + +Open a new terminal and navigate to the same directory you started the server from. Then set up a new or activate your +existing server virtual environment. + +::::{tab-set} + +:::{tab-item} Reuse Server `venv` +```bash +# The client is included in the llama-stack package so we just activate the server venv +source .venv/bin/activate +``` +::: + +:::{tab-item} Install with `venv` +```bash +uv venv client --python 3.10 +source client/bin/activate +pip install llama-stack-client +``` +::: + +:::{tab-item} Install with `conda` +```bash +yes | conda create -n stack-client python=3.10 +conda activate stack-client +pip install llama-stack-client +``` +::: +:::: + +Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the +connectivity to the server. + +```bash +llama-stack-client configure --endpoint http://localhost:8321 --api-key none +``` +You will see the below: +``` +Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321 +``` + +#### iii. List Available Models +List the models +``` +llama-stack-client models list +Available Models + +┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ +┃ model_type ┃ identifier ┃ provider_resource_id ┃ metadata ┃ provider_id ┃ +┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ +│ embedding │ all-MiniLM-L6-v2 │ all-minilm:latest │ {'embedding_dimension': 384.0} │ ollama │ +├─────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────┼─────────────────┤ +│ llm │ llama3.2:3b │ llama3.2:3b │ │ ollama │ +└─────────────────┴─────────────────────────────────────┴─────────────────────────────────────┴───────────────────────────────────────────┴─────────────────┘ + +Total models: 2 + +``` + +## Step 4: Run the Demos + +Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md). +Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options. + +::::{tab-set} + +:::{tab-item} Basic Inference with the CLI +You can test basic Llama inference completion using the CLI. + +```bash +llama-stack-client inference chat-completion --message "tell me a joke" +``` +Sample output: +```python +ChatCompletionResponse( + completion_message=CompletionMessage( + content="Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!", + role="assistant", + stop_reason="end_of_turn", + tool_calls=[], + ), + logprobs=None, + metrics=[ + Metric(metric="prompt_tokens", value=14.0, unit=None), + Metric(metric="completion_tokens", value=27.0, unit=None), + Metric(metric="total_tokens", value=41.0, unit=None), + ], +) +``` +::: + +:::{tab-item} Basic Inference with a Script +Alternatively, you can run inference using the Llama Stack client SDK. + +### i. Create the Script +Create a file `inference.py` and add the following code: +```python +from llama_stack_client import LlamaStackClient + +client = LlamaStackClient(base_url="http://localhost:8321") + +# List available models +models = client.models.list() + +# Select the first LLM +llm = next(m for m in models if m.model_type == "llm") +model_id = llm.identifier + +print("Model:", model_id) + +response = client.inference.chat_completion( + model_id=model_id, + messages=[ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Write a haiku about coding"}, + ], +) +print(response.completion_message.content) +``` + +### ii. Run the Script +Let's run the script using `uv` +```bash +uv run python inference.py +``` +Which will output: +``` +Model: llama3.2:3b +Here is a haiku about coding: + +Lines of code unfold +Logic flows through digital night +Beauty in the bits +``` +::: + +:::{tab-item} Build a Simple Agent +Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. +### i. Create the Script +Create a file `agent.py` and add the following code: + +```python +from llama_stack_client import LlamaStackClient +from llama_stack_client import Agent, AgentEventLogger +from rich.pretty import pprint +import uuid + +client = LlamaStackClient(base_url=f"http://localhost:8321") + +models = client.models.list() +llm = next(m for m in models if m.model_type == "llm") +model_id = llm.identifier + +agent = Agent(client, model=model_id, instructions="You are a helpful assistant.") + +s_id = agent.create_session(session_name=f"s{uuid.uuid4().hex}") + +print("Non-streaming ...") +response = agent.create_turn( + messages=[{"role": "user", "content": "Who are you?"}], + session_id=s_id, + stream=False, +) +print("agent>", response.output_message.content) + +print("Streaming ...") +stream = agent.create_turn( + messages=[{"role": "user", "content": "Who are you?"}], session_id=s_id, stream=True +) +for event in stream: + pprint(event) + +print("Streaming with print helper...") +stream = agent.create_turn( + messages=[{"role": "user", "content": "Who are you?"}], session_id=s_id, stream=True +) +for event in AgentEventLogger().log(stream): + event.print() +``` +### ii. Run the Script +Let's run the script using `uv` +```bash +uv run python agent.py +``` + +```{dropdown} 👋 Click here to see the sample output + Non-streaming ... + agent> I'm an artificial intelligence designed to assist and communicate with users like you. I don't have a personal identity, but I'm here to provide information, answer questions, and help with tasks to the best of my abilities. + + I can be used for a wide range of purposes, such as: + + * Providing definitions and explanations + * Offering suggestions and ideas + * Helping with language translation + * Assisting with writing and proofreading + * Generating text or responses to questions + * Playing simple games or chatting about topics of interest + + I'm constantly learning and improving my abilities, so feel free to ask me anything, and I'll do my best to help! + + Streaming ... + AgentTurnResponseStreamChunk( + │ event=TurnResponseEvent( + │ │ payload=AgentTurnResponseStepStartPayload( + │ │ │ event_type='step_start', + │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', + │ │ │ step_type='inference', + │ │ │ metadata={} + │ │ ) + │ ) + ) + AgentTurnResponseStreamChunk( + │ event=TurnResponseEvent( + │ │ payload=AgentTurnResponseStepProgressPayload( + │ │ │ delta=TextDelta(text='As', type='text'), + │ │ │ event_type='step_progress', + │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', + │ │ │ step_type='inference' + │ │ ) + │ ) + ) + AgentTurnResponseStreamChunk( + │ event=TurnResponseEvent( + │ │ payload=AgentTurnResponseStepProgressPayload( + │ │ │ delta=TextDelta(text=' a', type='text'), + │ │ │ event_type='step_progress', + │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', + │ │ │ step_type='inference' + │ │ ) + │ ) + ) + ... + AgentTurnResponseStreamChunk( + │ event=TurnResponseEvent( + │ │ payload=AgentTurnResponseStepCompletePayload( + │ │ │ event_type='step_complete', + │ │ │ step_details=InferenceStep( + │ │ │ │ api_model_response=CompletionMessage( + │ │ │ │ │ content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?', + │ │ │ │ │ role='assistant', + │ │ │ │ │ stop_reason='end_of_turn', + │ │ │ │ │ tool_calls=[] + │ │ │ │ ), + │ │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', + │ │ │ │ step_type='inference', + │ │ │ │ turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca', + │ │ │ │ completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 716174, tzinfo=TzInfo(UTC)), + │ │ │ │ started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28823, tzinfo=TzInfo(UTC)) + │ │ │ ), + │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', + │ │ │ step_type='inference' + │ │ ) + │ ) + ) + AgentTurnResponseStreamChunk( + │ event=TurnResponseEvent( + │ │ payload=AgentTurnResponseTurnCompletePayload( + │ │ │ event_type='turn_complete', + │ │ │ turn=Turn( + │ │ │ │ input_messages=[UserMessage(content='Who are you?', role='user', context=None)], + │ │ │ │ output_message=CompletionMessage( + │ │ │ │ │ content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?', + │ │ │ │ │ role='assistant', + │ │ │ │ │ stop_reason='end_of_turn', + │ │ │ │ │ tool_calls=[] + │ │ │ │ ), + │ │ │ │ session_id='abd4afea-4324-43f4-9513-cfe3970d92e8', + │ │ │ │ started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28722, tzinfo=TzInfo(UTC)), + │ │ │ │ steps=[ + │ │ │ │ │ InferenceStep( + │ │ │ │ │ │ api_model_response=CompletionMessage( + │ │ │ │ │ │ │ content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?', + │ │ │ │ │ │ │ role='assistant', + │ │ │ │ │ │ │ stop_reason='end_of_turn', + │ │ │ │ │ │ │ tool_calls=[] + │ │ │ │ │ │ ), + │ │ │ │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', + │ │ │ │ │ │ step_type='inference', + │ │ │ │ │ │ turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca', + │ │ │ │ │ │ completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 716174, tzinfo=TzInfo(UTC)), + │ │ │ │ │ │ started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28823, tzinfo=TzInfo(UTC)) + │ │ │ │ │ ) + │ │ │ │ ], + │ │ │ │ turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca', + │ │ │ │ completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 727364, tzinfo=TzInfo(UTC)), + │ │ │ │ output_attachments=[] + │ │ │ ) + │ │ ) + │ ) + ) + + + Streaming with print helper... + inference> Déjà vu! + + As I mentioned earlier, I'm an artificial intelligence language model. I don't have a personal identity or consciousness like humans do. I exist solely to process and respond to text-based inputs, providing information and assistance on a wide range of topics. + + I'm a computer program designed to simulate human-like conversations, using natural language processing (NLP) and machine learning algorithms to understand and generate responses. My purpose is to help users like you with their questions, provide information, and engage in conversation. + + Think of me as a virtual companion, a helpful tool designed to make your interactions more efficient and enjoyable. I don't have personal opinions, emotions, or biases, but I'm here to provide accurate and informative responses to the best of my abilities. + + So, who am I? I'm just a computer program designed to help you! +``` +::: + +:::{tab-item} Build a RAG Agent + +For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents +in a vector database. +### i. Create the Script +Create a file `rag_agent.py` and add the following code: + +```python +from llama_stack_client import LlamaStackClient +from llama_stack_client import Agent, AgentEventLogger +from llama_stack_client.types import Document +import uuid +from termcolor import cprint + +client = LlamaStackClient(base_url="http://localhost:8321") + +# Create a vector database instance +embed_lm = next(m for m in client.models.list() if m.model_type == "embedding") +embedding_model = embed_lm.identifier +vector_db_id = f"v{uuid.uuid4().hex}" +client.vector_dbs.register( + vector_db_id=vector_db_id, + embedding_model=embedding_model, +) + +# Create Documents +urls = [ + "memory_optimizations.rst", + "chat.rst", + "llama3.rst", + "datasets.rst", + "qat_finetune.rst", + "lora_finetune.rst", +] +documents = [ + Document( + document_id=f"num-{i}", + content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", + mime_type="text/plain", + metadata={}, + ) + for i, url in enumerate(urls) +] + +# Insert documents +client.tool_runtime.rag_tool.insert( + documents=documents, + vector_db_id=vector_db_id, + chunk_size_in_tokens=512, +) + +# Get the model being served +llm = next(m for m in client.models.list() if m.model_type == "llm") +model = llm.identifier + +# Create the RAG agent +rag_agent = Agent( + client, + model=model, + instructions="You are a helpful assistant. Use the RAG tool to answer questions as needed.", + tools=[ + { + "name": "builtin::rag/knowledge_search", + "args": {"vector_db_ids": [vector_db_id]}, + } + ], +) + +session_id = rag_agent.create_session(session_name=f"s{uuid.uuid4().hex}") + +turns = ["what is torchtune", "tell me about dora"] + +for t in turns: + print("user>", t) + stream = rag_agent.create_turn( + messages=[{"role": "user", "content": t}], session_id=session_id, stream=True + ) + for event in AgentEventLogger().log(stream): + event.print() +``` +### ii. Run the Script +Let's run the script using `uv` +```bash +uv run python rag_agent.py +``` + +```{dropdown} 👋 Click here to see the sample output + user> what is torchtune + inference> [knowledge_search(query='TorchTune')] + tool_execution> Tool:knowledge_search Args:{'query': 'TorchTune'} + tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text='Result 1:\nDocument_id:num-1\nContent: conversational data, :func:`~torchtune.datasets.chat_dataset` seems to be a good fit. ..., type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] + inference> Here is a high-level overview of the text: + + **LoRA Finetuning with PyTorch Tune** + + PyTorch Tune provides a recipe for LoRA (Low-Rank Adaptation) finetuning, which is a technique to adapt pre-trained models to new tasks. The recipe uses the `lora_finetune_distributed` command. + ... + Overall, DORA is a powerful reinforcement learning algorithm that can learn complex tasks from human demonstrations. However, it requires careful consideration of the challenges and limitations to achieve optimal results. +``` +::: + +:::: + +## You're Ready to Build Your Own Apps! + +Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/index)! 🚀 diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 82329e60e..63fa5ae6e 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -1,455 +1,110 @@ -# Quick Start +# Quickstart +Get started with Llama Stack in minutes! -Llama Stack is a stateful service with REST APIs to support seamless transition of AI applications across different environments. The server can be run in a variety of ways, including as a standalone binary, Docker container, or hosted service. You can build and test using a local server first and deploy to a hosted endpoint for production. +Llama Stack is a stateful service with REST APIs to support the seamless transition of AI applications across different +environments. You can build and test using a local server first and deploy to a hosted endpoint for production. -In this guide, we'll walk through how to build a RAG agent locally using Llama Stack with [Ollama](https://ollama.com/) to run inference on a Llama Model. - - -### 1. Download a Llama model with Ollama +In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/) +as the inference [provider](../providers/index.md#inference) for a Llama Model. +## Step 1. Install and Setup +Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with +[Ollama](https://ollama.com/download). ```bash -ollama pull llama3.2:3b -``` - -This will instruct the Ollama service to download the Llama 3.2 3B model, which we'll use in the rest of this guide. - -```{admonition} Note -:class: tip - -If you do not have ollama, you can install it from [here](https://ollama.com/download). -``` - -### 2. Run Llama Stack locally - -We use `uv` to setup a virtual environment and install the Llama Stack package. - -:::{dropdown} [Click to Open] Instructions to setup uv - -Install [uv](https://docs.astral.sh/uv/) to setup your virtual environment. - - -#### For macOS and Linux: -```bash -curl -LsSf https://astral.sh/uv/install.sh | sh -``` -#### For Windows: -Use `irm` to download the script and execute it with `iex`: -```powershell -powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" -``` - -Setup venv -```bash -uv venv --python 3.10 +uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals source .venv/bin/activate +export INFERENCE_MODEL="llama3.2:3b" +ollama run llama3.2:3b --keepalive 60m ``` -::: - -**Install the Llama Stack package** -```bash -uv pip install -U llama-stack -``` - -**Build and Run the Llama Stack server for Ollama.** +## Step 2: Run the Llama Stack Server ```bash INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run ``` - -You will see the output end like below: -``` -... -INFO: Application startup complete. -INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) -``` - -Now you can use the llama stack client to run inference and build agents! - -### 3. Client CLI - -Install the client package -```bash -pip install llama-stack-client -``` - -:::{dropdown} OR reuse server setup -Open a new terminal and navigate to the same directory you started the server from. - -Setup venv (llama-stack already includes the llama-stack-client package) -```bash -source .venv/bin/activate -``` -::: - -#### 3.1 Configure the client to point to the local server -```bash -llama-stack-client configure --endpoint http://localhost:8321 --api-key none -``` -You will see the below: -``` -Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321 -``` - -#### 3.2 List available models -``` -llama-stack-client models list -``` - -``` -Available Models - -┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ -┃ model_type ┃ identifier ┃ provider_resource_id ┃ metadata ┃ provider_id ┃ -┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ -│ embedding │ all-MiniLM-L6-v2 │ all-minilm:latest │ {'embedding_dimension': 384.0} │ ollama │ -├─────────────────┼─────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────┼─────────────────┤ -│ llm │ llama3.2:3b │ llama3.2:3b │ │ ollama │ -└─────────────────┴─────────────────────────────────────┴─────────────────────────────────────┴───────────────────────────────────────────┴─────────────────┘ - -Total models: 2 - -``` - -#### 3.3 Test basic inference -```bash -llama-stack-client inference chat-completion --message "tell me a joke" -``` -Sample output: +## Step 3: Run the Demo +Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell. ```python -ChatCompletionResponse( - completion_message=CompletionMessage( - content="Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!", - role="assistant", - stop_reason="end_of_turn", - tool_calls=[], - ), - logprobs=None, - metrics=[ - Metric(metric="prompt_tokens", value=14.0, unit=None), - Metric(metric="completion_tokens", value=27.0, unit=None), - Metric(metric="total_tokens", value=41.0, unit=None), - ], -) -``` - -### 4. Python SDK -Install the python client -```bash -pip install llama-stack-client -``` -:::{dropdown} OR reuse server setup -Open a new terminal and navigate to the same directory you started the server from. - -Setup venv (llama-stack already includes the llama-stack-client package) -```bash -source .venv/bin/activate -``` -::: -#### 4.1 Basic Inference -Create a file `inference.py` and add the following code: -```python -from llama_stack_client import LlamaStackClient - -client = LlamaStackClient(base_url=f"http://localhost:8321") - -# List available models -models = client.models.list() - -# Select the first LLM -llm = next(m for m in models if m.model_type == "llm") -model_id = llm.identifier - -print("Model:", model_id) - -response = client.inference.chat_completion( - model_id=model_id, - messages=[ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "Write a haiku about coding"}, - ], -) -print(response.completion_message.content) -``` -Run the script -```bash -python inference.py -``` -Sample output: -``` -Model: llama3.2:3b -Here is a haiku about coding: - -Lines of code unfold -Logic flows through digital night -Beauty in the bits -``` - -#### 4.2. Basic Agent - -Create a file `agent.py` and add the following code: -```python -from llama_stack_client import LlamaStackClient -from llama_stack_client import Agent, AgentEventLogger -from rich.pretty import pprint -import uuid - -client = LlamaStackClient(base_url=f"http://localhost:8321") - -models = client.models.list() -llm = next(m for m in models if m.model_type == "llm") -model_id = llm.identifier - -agent = Agent(client, model=model_id, instructions="You are a helpful assistant.") - -s_id = agent.create_session(session_name=f"s{uuid.uuid4().hex}") - -print("Non-streaming ...") -response = agent.create_turn( - messages=[{"role": "user", "content": "Who are you?"}], - session_id=s_id, - stream=False, -) -print("agent>", response.output_message.content) - -print("Streaming ...") -stream = agent.create_turn( - messages=[{"role": "user", "content": "Who are you?"}], session_id=s_id, stream=True -) -for event in stream: - pprint(event) - -print("Streaming with print helper...") -stream = agent.create_turn( - messages=[{"role": "user", "content": "Who are you?"}], session_id=s_id, stream=True -) -for event in AgentEventLogger().log(stream): - event.print() -``` - -Run the script: -```bash -python agent.py -``` - -:::{dropdown} `Sample output` -``` -Non-streaming ... -agent> I'm an artificial intelligence designed to assist and communicate with users like you. I don't have a personal identity, but I'm here to provide information, answer questions, and help with tasks to the best of my abilities. - -I can be used for a wide range of purposes, such as: - -* Providing definitions and explanations -* Offering suggestions and ideas -* Helping with language translation -* Assisting with writing and proofreading -* Generating text or responses to questions -* Playing simple games or chatting about topics of interest - -I'm constantly learning and improving my abilities, so feel free to ask me anything, and I'll do my best to help! - -Streaming ... -AgentTurnResponseStreamChunk( -│ event=TurnResponseEvent( -│ │ payload=AgentTurnResponseStepStartPayload( -│ │ │ event_type='step_start', -│ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', -│ │ │ step_type='inference', -│ │ │ metadata={} -│ │ ) -│ ) -) -AgentTurnResponseStreamChunk( -│ event=TurnResponseEvent( -│ │ payload=AgentTurnResponseStepProgressPayload( -│ │ │ delta=TextDelta(text='As', type='text'), -│ │ │ event_type='step_progress', -│ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', -│ │ │ step_type='inference' -│ │ ) -│ ) -) -AgentTurnResponseStreamChunk( -│ event=TurnResponseEvent( -│ │ payload=AgentTurnResponseStepProgressPayload( -│ │ │ delta=TextDelta(text=' a', type='text'), -│ │ │ event_type='step_progress', -│ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', -│ │ │ step_type='inference' -│ │ ) -│ ) -) -... -AgentTurnResponseStreamChunk( -│ event=TurnResponseEvent( -│ │ payload=AgentTurnResponseStepCompletePayload( -│ │ │ event_type='step_complete', -│ │ │ step_details=InferenceStep( -│ │ │ │ api_model_response=CompletionMessage( -│ │ │ │ │ content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?', -│ │ │ │ │ role='assistant', -│ │ │ │ │ stop_reason='end_of_turn', -│ │ │ │ │ tool_calls=[] -│ │ │ │ ), -│ │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', -│ │ │ │ step_type='inference', -│ │ │ │ turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca', -│ │ │ │ completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 716174, tzinfo=TzInfo(UTC)), -│ │ │ │ started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28823, tzinfo=TzInfo(UTC)) -│ │ │ ), -│ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', -│ │ │ step_type='inference' -│ │ ) -│ ) -) -AgentTurnResponseStreamChunk( -│ event=TurnResponseEvent( -│ │ payload=AgentTurnResponseTurnCompletePayload( -│ │ │ event_type='turn_complete', -│ │ │ turn=Turn( -│ │ │ │ input_messages=[UserMessage(content='Who are you?', role='user', context=None)], -│ │ │ │ output_message=CompletionMessage( -│ │ │ │ │ content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?', -│ │ │ │ │ role='assistant', -│ │ │ │ │ stop_reason='end_of_turn', -│ │ │ │ │ tool_calls=[] -│ │ │ │ ), -│ │ │ │ session_id='abd4afea-4324-43f4-9513-cfe3970d92e8', -│ │ │ │ started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28722, tzinfo=TzInfo(UTC)), -│ │ │ │ steps=[ -│ │ │ │ │ InferenceStep( -│ │ │ │ │ │ api_model_response=CompletionMessage( -│ │ │ │ │ │ │ content='As a conversational AI, I don\'t have a personal identity in the classical sense. I exist as a program running on computer servers, designed to process and respond to text-based inputs.\n\nI\'m an instance of a type of artificial intelligence called a "language model," which is trained on vast amounts of text data to generate human-like responses. My primary function is to understand and respond to natural language inputs, like our conversation right now.\n\nThink of me as a virtual assistant, a chatbot, or a conversational interface – I\'m here to provide information, answer questions, and engage in conversation to the best of my abilities. I don\'t have feelings, emotions, or consciousness like humans do, but I\'m designed to simulate human-like interactions to make our conversations feel more natural and helpful.\n\nSo, that\'s me in a nutshell! What can I help you with today?', -│ │ │ │ │ │ │ role='assistant', -│ │ │ │ │ │ │ stop_reason='end_of_turn', -│ │ │ │ │ │ │ tool_calls=[] -│ │ │ │ │ │ ), -│ │ │ │ │ │ step_id='69831607-fa75-424a-949b-e2049e3129d1', -│ │ │ │ │ │ step_type='inference', -│ │ │ │ │ │ turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca', -│ │ │ │ │ │ completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 716174, tzinfo=TzInfo(UTC)), -│ │ │ │ │ │ started_at=datetime.datetime(2025, 4, 3, 1, 15, 14, 28823, tzinfo=TzInfo(UTC)) -│ │ │ │ │ ) -│ │ │ │ ], -│ │ │ │ turn_id='8b360202-f7cb-4786-baa9-166a1b46e2ca', -│ │ │ │ completed_at=datetime.datetime(2025, 4, 3, 1, 15, 21, 727364, tzinfo=TzInfo(UTC)), -│ │ │ │ output_attachments=[] -│ │ │ ) -│ │ ) -│ ) -) - - -Streaming with print helper... -inference> Déjà vu! - -As I mentioned earlier, I'm an artificial intelligence language model. I don't have a personal identity or consciousness like humans do. I exist solely to process and respond to text-based inputs, providing information and assistance on a wide range of topics. - -I'm a computer program designed to simulate human-like conversations, using natural language processing (NLP) and machine learning algorithms to understand and generate responses. My purpose is to help users like you with their questions, provide information, and engage in conversation. - -Think of me as a virtual companion, a helpful tool designed to make your interactions more efficient and enjoyable. I don't have personal opinions, emotions, or biases, but I'm here to provide accurate and informative responses to the best of my abilities. - -So, who am I? I'm just a computer program designed to help you! - -``` -::: - -#### 4.3. RAG agent - -Create a file `rag_agent.py` and add the following code: - -```python -from llama_stack_client import LlamaStackClient -from llama_stack_client import Agent, AgentEventLogger +from termcolor import cprint from llama_stack_client.types import Document -import uuid +from llama_stack_client import LlamaStackClient -client = LlamaStackClient(base_url=f"http://localhost:8321") -# Create a vector database instance -embedlm = next(m for m in client.models.list() if m.model_type == "embedding") -embedding_model = embedlm.identifier -vector_db_id = f"v{uuid.uuid4().hex}" -client.vector_dbs.register( - vector_db_id=vector_db_id, - embedding_model=embedding_model, -) - -# Create Documents -urls = [ - "memory_optimizations.rst", - "chat.rst", - "llama3.rst", - "datasets.rst", - "qat_finetune.rst", - "lora_finetune.rst", -] +vector_db = "faiss" +vector_db_id = "test-vector-db" +model_id = "llama3.2:3b-instruct-fp16" +query = "Can you give me the arxiv link for Lora Fine Tuning in Pytorch?" documents = [ Document( - document_id=f"num-{i}", - content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", + document_id="document_1", + content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/lora_finetune.rst", mime_type="text/plain", metadata={}, ) - for i, url in enumerate(urls) ] -# Insert documents +client = LlamaStackClient(base_url="http://localhost:8321") +client.vector_dbs.register( + provider_id=vector_db, + vector_db_id=vector_db_id, + embedding_model="all-MiniLM-L6-v2", + embedding_dimension=384, +) + client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, - chunk_size_in_tokens=512, + chunk_size_in_tokens=50, ) -# Get the model being served -llm = next(m for m in client.models.list() if m.model_type == "llm") -model = llm.identifier - -# Create RAG agent -ragagent = Agent( - client, - model=model, - instructions="You are a helpful assistant. Use the RAG tool to answer questions as needed.", - tools=[ - { - "name": "builtin::rag/knowledge_search", - "args": {"vector_db_ids": [vector_db_id]}, - } - ], +response = client.tool_runtime.rag_tool.query( + vector_db_ids=[vector_db_id], + content=query, ) -s_id = ragagent.create_session(session_name=f"s{uuid.uuid4().hex}") +cprint("" + "-" * 50, "yellow") +cprint(f"Query> {query}", "red") +cprint("" + "-" * 50, "yellow") +for chunk in response.content: + cprint(f"Chunk ID> {chunk.text}", "green") + cprint("" + "-" * 50, "yellow") +``` +And you should see output like below. +``` +-------------------------------------------------- +Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch? +-------------------------------------------------- +Chunk ID> knowledge_search tool found 5 chunks: +BEGIN of knowledge_search tool results. -turns = ["what is torchtune", "tell me about dora"] +-------------------------------------------------- +Chunk ID> Result 1: +Document_id:docum +Content: .. _lora_finetune_label: -for t in turns: - print("user>", t) - stream = ragagent.create_turn( - messages=[{"role": "user", "content": t}], session_id=s_id, stream=True - ) - for event in AgentEventLogger().log(stream): - event.print() -``` -Run the script: -``` -python rag_agent.py -``` -:::{dropdown} `Sample output` -``` -user> what is torchtune -inference> [knowledge_search(query='TorchTune')] -tool_execution> Tool:knowledge_search Args:{'query': 'TorchTune'} -tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text='Result 1:\nDocument_id:num-1\nContent: conversational data, :func:`~torchtune.datasets.chat_dataset` seems to be a good fit. ..., type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] -inference> Here is a high-level overview of the text: +============================ +Fine-Tuning Llama2 with LoRA +============================ -**LoRA Finetuning with PyTorch Tune** +This guide will teach you about `LoRA `_, a -PyTorch Tune provides a recipe for LoRA (Low-Rank Adaptation) finetuning, which is a technique to adapt pre-trained models to new tasks. The recipe uses the `lora_finetune_distributed` command. -... -Overall, DORA is a powerful reinforcement learning algorithm that can learn complex tasks from human demonstrations. However, it requires careful consideration of the challenges and limitations to achieve optimal results. +-------------------------------------------------- ``` -::: +Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳 + ## Next Steps -- Go through the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) -- Checkout more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks) -- See [References](../references/index.md) for more details about the llama CLI and Python SDK -- For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository. + +Now you're ready to dive deeper into Llama Stack! +- Explore the [Detailed Tutorial](./detailed_tutorial.md). +- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb). +- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks). +- Learn about Llama Stack [Concepts](../concepts/index.md). +- Discover how to [Build Llama Stacks](../distributions/index.md). +- Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK. +- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials. + +```{toctree} +:maxdepth: 0 +:hidden: + +detailed_tutorial +``` diff --git a/docs/source/index.md b/docs/source/index.md index a0ac95957..99b0e1a3e 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -1,3 +1,5 @@ +# Llama Stack +Welcome to Llama Stack, the open-source framework for building generative AI applications. ```{admonition} Llama 4 is here! :class: tip @@ -9,7 +11,6 @@ Check out [Getting Started with Llama 4](https://colab.research.google.com/githu Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details. ``` -# Llama Stack ## What is Llama Stack? diff --git a/docs/source/providers/index.md b/docs/source/providers/index.md index 75faf7c00..1d1a6e081 100644 --- a/docs/source/providers/index.md +++ b/docs/source/providers/index.md @@ -1,8 +1,8 @@ # Providers Overview The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include: -- LLM inference providers (e.g., Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), -- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, etc.), +- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), +- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.), - Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.) Providers come in two flavors: From a4cc4b7e3160d4df2f97eb2ce6aa7325bf908c50 Mon Sep 17 00:00:00 2001 From: ehhuang Date: Thu, 10 Apr 2025 16:58:06 -0700 Subject: [PATCH 09/39] test(verification): add streaming tool calling test (#1933) # What does this PR do? ## Test Plan --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1933). * #1934 * __->__ #1933 --- .../openai_api/test_chat_completion.py | 55 +++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/tests/verifications/openai_api/test_chat_completion.py b/tests/verifications/openai_api/test_chat_completion.py index dc08ec944..6aee29c3a 100644 --- a/tests/verifications/openai_api/test_chat_completion.py +++ b/tests/verifications/openai_api/test_chat_completion.py @@ -4,6 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +import json import re from typing import Any @@ -225,6 +226,60 @@ def test_chat_non_streaming_tool_calling(request, openai_client, model, provider # TODO: add detailed type validation +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_tool_calling"]["test_params"]["case"], + ids=case_id_generator, +) +def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + stream = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + tools=case["input"]["tools"], + stream=True, + ) + + # Accumulate partial tool_calls here + tool_calls_buffer = {} + current_id = None + # Process streaming chunks + for chunk in stream: + choice = chunk.choices[0] + delta = choice.delta + + if delta.tool_calls is None: + continue + + for tool_call_delta in delta.tool_calls: + if tool_call_delta.id: + current_id = tool_call_delta.id + call_id = current_id + func_delta = tool_call_delta.function + + if call_id not in tool_calls_buffer: + tool_calls_buffer[call_id] = { + "id": call_id, + "type": tool_call_delta.type, + "name": func_delta.name, + "arguments": "", + } + + if func_delta.arguments: + tool_calls_buffer[call_id]["arguments"] += func_delta.arguments + + assert len(tool_calls_buffer) == 1 + for call in tool_calls_buffer.values(): + assert len(call["id"]) > 0 + assert call["name"] == "get_weather" + + args_dict = json.loads(call["arguments"]) + assert "san francisco" in args_dict["location"].lower() + + # --- Helper functions (structured output validation) --- From 2fcb70b78921b89ef69bd868834958776a1e16aa Mon Sep 17 00:00:00 2001 From: ehhuang Date: Thu, 10 Apr 2025 16:59:28 -0700 Subject: [PATCH 10/39] test(verification): overwrite test result instead of creating new ones (#1934) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests --- tests/verifications/REPORT.md | 17 +- tests/verifications/generate_report.py | 113 ++-- ...reworks_1744264202.json => fireworks.json} | 518 +++++++++++------ .../{openai_1744264304.json => openai.json} | 309 ++++++---- ...together_1744264258.json => together.json} | 549 +++++++++++------- 5 files changed, 926 insertions(+), 580 deletions(-) rename tests/verifications/test_results/{fireworks_1744264202.json => fireworks.json} (68%) rename tests/verifications/test_results/{openai_1744264304.json => openai.json} (77%) rename tests/verifications/test_results/{together_1744264258.json => together.json} (77%) diff --git a/tests/verifications/REPORT.md b/tests/verifications/REPORT.md index 449499382..2309c6404 100644 --- a/tests/verifications/REPORT.md +++ b/tests/verifications/REPORT.md @@ -1,6 +1,6 @@ # Test Results Report -*Generated on: 2025-04-09 22:52:19* +*Generated on: 2025-04-10 16:48:18* *This report was generated by running `python tests/verifications/generate_report.py`* @@ -15,15 +15,15 @@ | Provider | Pass Rate | Tests Passed | Total Tests | | --- | --- | --- | --- | -| Together | 67.7% | 21 | 31 | -| Fireworks | 90.3% | 28 | 31 | -| Openai | 100.0% | 22 | 22 | +| Together | 64.7% | 22 | 34 | +| Fireworks | 82.4% | 28 | 34 | +| Openai | 100.0% | 24 | 24 | ## Together -*Tests run on: 2025-04-09 22:50:58* +*Tests run on: 2025-04-10 16:46:35* ```bash # Run all tests for this provider: @@ -56,10 +56,11 @@ pytest tests/verifications/openai_api/test_chat_completion.py --provider=togethe | test_chat_streaming_image | ⚪ | ❌ | ❌ | | test_chat_streaming_structured_output (calendar) | ✅ | ❌ | ❌ | | test_chat_streaming_structured_output (math) | ✅ | ❌ | ❌ | +| test_chat_streaming_tool_calling | ✅ | ❌ | ❌ | ## Fireworks -*Tests run on: 2025-04-09 22:50:02* +*Tests run on: 2025-04-10 16:44:44* ```bash # Run all tests for this provider: @@ -92,10 +93,11 @@ pytest tests/verifications/openai_api/test_chat_completion.py --provider=firewor | test_chat_streaming_image | ⚪ | ✅ | ✅ | | test_chat_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | | test_chat_streaming_structured_output (math) | ✅ | ✅ | ✅ | +| test_chat_streaming_tool_calling | ❌ | ❌ | ❌ | ## Openai -*Tests run on: 2025-04-09 22:51:44* +*Tests run on: 2025-04-10 16:47:28* ```bash # Run all tests for this provider: @@ -127,3 +129,4 @@ pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai | test_chat_streaming_image | ✅ | ✅ | | test_chat_streaming_structured_output (calendar) | ✅ | ✅ | | test_chat_streaming_structured_output (math) | ✅ | ✅ | +| test_chat_streaming_tool_calling | ✅ | ✅ | diff --git a/tests/verifications/generate_report.py b/tests/verifications/generate_report.py index 1c760ca19..6a7c39ee2 100755 --- a/tests/verifications/generate_report.py +++ b/tests/verifications/generate_report.py @@ -77,8 +77,9 @@ def run_tests(provider, keyword=None): print(f"Running tests for provider: {provider}") timestamp = int(time.time()) - result_file = RESULTS_DIR / f"{provider}_{timestamp}.json" - temp_json_file = RESULTS_DIR / f"temp_{provider}_{timestamp}.json" + # Use a constant filename for the final result and temp file + result_file = RESULTS_DIR / f"{provider}.json" + temp_json_file = RESULTS_DIR / f"temp_{provider}.json" # Determine project root directory relative to this script project_root = Path(__file__).parent.parent.parent @@ -106,11 +107,12 @@ def run_tests(provider, keyword=None): # Check if the JSON file was created if temp_json_file.exists(): - # Read the JSON file and save it to our results format with open(temp_json_file, "r") as f: test_results = json.load(f) - # Save results to our own format with a trailing newline + test_results["run_timestamp"] = timestamp + + # Save results to the final (overwritten) file with open(result_file, "w") as f: json.dump(test_results, f, indent=2) f.write("\n") # Add a trailing newline for precommit @@ -132,7 +134,7 @@ def run_tests(provider, keyword=None): def parse_results( result_file, -) -> Tuple[DefaultDict[str, DefaultDict[str, Dict[str, bool]]], DefaultDict[str, Set[str]], Set[str]]: +) -> Tuple[DefaultDict[str, DefaultDict[str, Dict[str, bool]]], DefaultDict[str, Set[str]], Set[str], str]: """Parse a single test results file. Returns: @@ -140,11 +142,12 @@ def parse_results( - parsed_results: DefaultDict[provider, DefaultDict[model, Dict[test_name, pass_status]]] - providers_in_file: DefaultDict[provider, Set[model]] found in this file. - tests_in_file: Set[test_name] found in this file. + - run_timestamp: Timestamp when the test was run """ if not os.path.exists(result_file): print(f"Results file does not exist: {result_file}") # Return empty defaultdicts/set matching the type hint - return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set() + return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set(), "" with open(result_file, "r") as f: results = json.load(f) @@ -153,7 +156,16 @@ def parse_results( parsed_results: DefaultDict[str, DefaultDict[str, Dict[str, bool]]] = defaultdict(lambda: defaultdict(dict)) providers_in_file: DefaultDict[str, Set[str]] = defaultdict(set) tests_in_file: Set[str] = set() - provider: str = os.path.basename(result_file).split("_")[0] + # Extract provider from filename (e.g., "openai.json" -> "openai") + provider: str = result_file.stem + + # Extract run timestamp from the JSON data + run_timestamp_unix = results.get("run_timestamp") + run_timestamp_str = ( + time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(run_timestamp_unix)) + if run_timestamp_unix is not None + else "Unknown" + ) # Debug: Print summary of test results print(f"Test results summary for {provider}:") @@ -167,7 +179,7 @@ def parse_results( if "tests" not in results or not results["tests"]: print(f"No test results found in {result_file}") # Return empty defaultdicts/set matching the type hint - return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set() + return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set(), "" # Process the tests for test in results["tests"]: @@ -225,59 +237,29 @@ def parse_results( if not parsed_results.get(provider): print(f"Warning: No valid test results parsed for provider {provider} from file {result_file}") - return parsed_results, providers_in_file, tests_in_file + return parsed_results, providers_in_file, tests_in_file, run_timestamp_str -def cleanup_old_results(providers_to_clean: Dict[str, Set[str]]): - """Clean up old test result files, keeping only the newest N per provider.""" - # Use the passed-in providers dictionary - for provider in providers_to_clean.keys(): - # Get all result files for this provider - provider_files = list(RESULTS_DIR.glob(f"{provider}_*.json")) - - # Sort by timestamp (newest first) - provider_files.sort(key=lambda x: int(x.stem.split("_")[1]), reverse=True) - - # Remove old files beyond the max to keep - if len(provider_files) > MAX_RESULTS_PER_PROVIDER: - for old_file in provider_files[MAX_RESULTS_PER_PROVIDER:]: - try: - old_file.unlink() - print(f"Removed old result file: {old_file}") - except Exception as e: - print(f"Error removing file {old_file}: {e}") - - -def get_latest_results_by_provider(): - """Get the latest test result file for each provider""" +def get_all_result_files_by_provider(): + """Get all test result files, keyed by provider.""" provider_results = {} - # Get all result files result_files = list(RESULTS_DIR.glob("*.json")) - # Extract all provider names from filenames - all_providers = set() for file in result_files: - # File format is provider_timestamp.json - parts = file.stem.split("_") - if len(parts) >= 2: - all_providers.add(parts[0]) - - # Group by provider - for provider in all_providers: - provider_files = [f for f in result_files if f.name.startswith(f"{provider}_")] - - # Sort by timestamp (newest first) - provider_files.sort(key=lambda x: int(x.stem.split("_")[1]), reverse=True) - - if provider_files: - provider_results[provider] = provider_files[0] + provider = file.stem + if provider: + provider_results[provider] = file return provider_results def generate_report( - results_dict: Dict[str, Any], providers: Dict[str, Set[str]], all_tests: Set[str], output_file=None + results_dict: Dict[str, Any], + providers: Dict[str, Set[str]], + all_tests: Set[str], + provider_timestamps: Dict[str, str], + output_file=None, ): """Generate the markdown report. @@ -285,6 +267,7 @@ def generate_report( results_dict: Aggregated results [provider][model][test_name] -> status. providers: Dict of all providers and their models {provider: {models}}. all_tests: Set of all test names found. + provider_timestamps: Dict of provider to timestamp when tests were run output_file: Optional path to save the report. """ if output_file is None: @@ -293,19 +276,6 @@ def generate_report( else: output_file = Path(output_file) - # Get the timestamp from result files - provider_timestamps = {} - provider_results_files = get_latest_results_by_provider() - for provider, result_file in provider_results_files.items(): - # Extract timestamp from filename (format: provider_timestamp.json) - try: - timestamp_str = result_file.stem.split("_")[1] - timestamp = int(timestamp_str) - formatted_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(timestamp)) - provider_timestamps[provider] = formatted_time - except (IndexError, ValueError): - provider_timestamps[provider] = "Unknown" - # Convert provider model sets to sorted lists (use passed-in providers dict) providers_sorted = {prov: sorted(models) for prov, models in providers.items()} @@ -416,7 +386,7 @@ def generate_report( else: example_base_test_name = first_test_name - base_name = base_test_name_map.get(test, test) # Get base name + base_name = base_test_name_map.get(first_test_name, first_test_name) # Get base name case_count = base_test_case_counts.get(base_name, 1) # Get count filter_str = f"{example_base_test_name} and {example_case_id}" if case_count > 1 else example_base_test_name @@ -491,6 +461,7 @@ def main(): # Initialize collections to aggregate results in main aggregated_providers = defaultdict(set) aggregated_tests = set() + provider_timestamps = {} if args.run_tests: # Get list of available providers from command line or use detected providers @@ -512,28 +483,28 @@ def main(): result_file = run_tests(provider, keyword=args.k) if result_file: # Parse and aggregate results - parsed_results, providers_in_file, tests_in_file = parse_results(result_file) + parsed_results, providers_in_file, tests_in_file, run_timestamp = parse_results(result_file) all_results.update(parsed_results) for prov, models in providers_in_file.items(): aggregated_providers[prov].update(models) + if run_timestamp: + provider_timestamps[prov] = run_timestamp aggregated_tests.update(tests_in_file) else: # Use existing results - provider_result_files = get_latest_results_by_provider() + provider_result_files = get_all_result_files_by_provider() for result_file in provider_result_files.values(): # Parse and aggregate results - parsed_results, providers_in_file, tests_in_file = parse_results(result_file) + parsed_results, providers_in_file, tests_in_file, run_timestamp = parse_results(result_file) all_results.update(parsed_results) for prov, models in providers_in_file.items(): aggregated_providers[prov].update(models) + if run_timestamp: + provider_timestamps[prov] = run_timestamp aggregated_tests.update(tests_in_file) - # Generate the report, passing aggregated data - generate_report(all_results, aggregated_providers, aggregated_tests, args.output) - - # Cleanup, passing aggregated providers - cleanup_old_results(aggregated_providers) + generate_report(all_results, aggregated_providers, aggregated_tests, provider_timestamps, args.output) if __name__ == "__main__": diff --git a/tests/verifications/test_results/fireworks_1744264202.json b/tests/verifications/test_results/fireworks.json similarity index 68% rename from tests/verifications/test_results/fireworks_1744264202.json rename to tests/verifications/test_results/fireworks.json index d14738be9..061e44c08 100644 --- a/tests/verifications/test_results/fireworks_1744264202.json +++ b/tests/verifications/test_results/fireworks.json @@ -1,15 +1,15 @@ { - "created": 1744264258.730061, - "duration": 53.86071586608887, + "created": 1744328795.171092, + "duration": 107.57908606529236, "exitcode": 1, "root": "/Users/erichuang/projects/llama-stack", "environment": {}, "summary": { "passed": 28, "skipped": 2, - "failed": 3, - "total": 33, - "collected": 33 + "failed": 6, + "total": 36, + "collected": 36 }, "collectors": [ { @@ -29,167 +29,182 @@ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 228 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 228 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 228 } ] } @@ -197,7 +212,7 @@ "tests": [ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", @@ -216,21 +231,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.05236550001427531, + "duration": 0.2175025000469759, "outcome": "passed" }, "call": { - "duration": 0.5364967910572886, + "duration": 0.7433859170414507, "outcome": "passed" }, "teardown": { - "duration": 0.00015075004193931818, + "duration": 0.0001592918997630477, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", @@ -249,21 +264,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.00699599995277822, + "duration": 0.007383499993011355, "outcome": "passed" }, "call": { - "duration": 0.5843954589217901, + "duration": 0.5949292909353971, "outcome": "passed" }, "teardown": { - "duration": 0.0003858329728245735, + "duration": 0.00015891704242676497, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", @@ -282,21 +297,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.009176500025205314, + "duration": 0.010730999987572432, "outcome": "passed" }, "call": { - "duration": 0.9258683329680935, + "duration": 0.8945954169612378, "outcome": "passed" }, "teardown": { - "duration": 0.00015787500888109207, + "duration": 0.0003751249751076102, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", @@ -315,21 +330,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.011275375029072165, + "duration": 0.01665666699409485, "outcome": "passed" }, "call": { - "duration": 0.6890578339807689, + "duration": 0.907927209045738, "outcome": "passed" }, "teardown": { - "duration": 0.0004926669644191861, + "duration": 0.00024874997325241566, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", @@ -348,21 +363,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.007520624902099371, + "duration": 0.01039199996739626, "outcome": "passed" }, "call": { - "duration": 0.6675686669768766, + "duration": 0.5971567500382662, "outcome": "passed" }, "teardown": { - "duration": 0.00016137503553181887, + "duration": 0.0003488330403342843, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", @@ -381,21 +396,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.0076431670458987355, + "duration": 0.018627874902449548, "outcome": "passed" }, "call": { - "duration": 1.6813415409997106, + "duration": 2.0586736251134425, "outcome": "passed" }, "teardown": { - "duration": 0.0004928340204060078, + "duration": 0.00046974990982562304, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", @@ -414,21 +429,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.01302404107991606, + "duration": 0.01706262503284961, "outcome": "passed" }, "call": { - "duration": 1.3206909999717027, + "duration": 0.6679969580145553, "outcome": "passed" }, "teardown": { - "duration": 0.0002220839960500598, + "duration": 0.0004670419730246067, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", @@ -447,21 +462,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.0071772499941289425, + "duration": 0.025956374942325056, "outcome": "passed" }, "call": { - "duration": 0.4109888339880854, + "duration": 2.052679874934256, "outcome": "passed" }, "teardown": { - "duration": 0.0005431669997051358, + "duration": 0.00026958296075463295, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", @@ -480,21 +495,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.012043708004057407, + "duration": 0.015856957994401455, "outcome": "passed" }, "call": { - "duration": 0.4509220840409398, + "duration": 0.3096678329166025, "outcome": "passed" }, "teardown": { - "duration": 0.00016408402007073164, + "duration": 0.0007620420074090362, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", @@ -513,21 +528,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.007165874936617911, + "duration": 0.013509334065020084, "outcome": "passed" }, "call": { - "duration": 0.6527335830032825, + "duration": 0.5914681670255959, "outcome": "passed" }, "teardown": { - "duration": 0.0006419579731300473, + "duration": 0.0002906669396907091, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", @@ -546,21 +561,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.007546542095951736, + "duration": 0.013216375024057925, "outcome": "passed" }, "call": { - "duration": 0.9360042089829221, + "duration": 1.8804527079919353, "outcome": "passed" }, "teardown": { - "duration": 0.00020483299158513546, + "duration": 0.0002026669681072235, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", @@ -579,21 +594,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.046697250101715326, + "duration": 0.00827441702131182, "outcome": "passed" }, "call": { - "duration": 0.668349124956876, + "duration": 0.7407040420221165, "outcome": "passed" }, "teardown": { - "duration": 0.0005031249020248652, + "duration": 0.0005084159784018993, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 115, + "lineno": 116, "outcome": "skipped", "keywords": [ "test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -612,22 +627,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.012287458986975253, + "duration": 0.012424499960616231, "outcome": "passed" }, "call": { - "duration": 0.00015287497080862522, + "duration": 0.00032762496266514063, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 124, 'Skipped: Skipping test_chat_non_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 125, 'Skipped: Skipping test_chat_non_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" }, "teardown": { - "duration": 0.00012162502389401197, + "duration": 0.00032416603062301874, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 115, + "lineno": 116, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -646,21 +661,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.007204124936833978, + "duration": 0.02253958396613598, "outcome": "passed" }, "call": { - "duration": 1.8676417920505628, + "duration": 2.64042466704268, "outcome": "passed" }, "teardown": { - "duration": 0.0001557499635964632, + "duration": 0.0003636250039562583, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 115, + "lineno": 116, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -679,21 +694,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.008226625039242208, + "duration": 0.014634749968536198, "outcome": "passed" }, "call": { - "duration": 3.2724285409785807, + "duration": 5.126485540997237, "outcome": "passed" }, "teardown": { - "duration": 0.0002898330567404628, + "duration": 0.0002988330088555813, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 134, + "lineno": 135, "outcome": "skipped", "keywords": [ "test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -712,22 +727,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.011927249957807362, + "duration": 0.015854416065849364, "outcome": "passed" }, "call": { - "duration": 0.00017358292825520039, + "duration": 0.00038058299105614424, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 143, 'Skipped: Skipping test_chat_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 144, 'Skipped: Skipping test_chat_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" }, "teardown": { - "duration": 0.00014037499204277992, + "duration": 0.0002689170651137829, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 134, + "lineno": 135, "outcome": "passed", "keywords": [ "test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -746,21 +761,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.008731417008675635, + "duration": 0.011205915943719447, "outcome": "passed" }, "call": { - "duration": 2.8333610829431564, + "duration": 3.2596546669956297, "outcome": "passed" }, "teardown": { - "duration": 0.0005132080987095833, + "duration": 0.0006222500232979655, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 134, + "lineno": 135, "outcome": "passed", "keywords": [ "test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -779,21 +794,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.016569208004511893, + "duration": 0.016557667055167258, "outcome": "passed" }, "call": { - "duration": 2.302010750048794, + "duration": 4.930164708988741, "outcome": "passed" }, "teardown": { - "duration": 0.00016108399722725153, + "duration": 0.00048687495291233063, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", @@ -812,21 +827,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.039960999973118305, + "duration": 0.00886166701093316, "outcome": "passed" }, "call": { - "duration": 7.661373125039972, + "duration": 0.8833738330285996, "outcome": "passed" }, "teardown": { - "duration": 0.00015833403449505568, + "duration": 0.00025583396200090647, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", @@ -845,21 +860,21 @@ "case_id": "math" }, "setup": { - "duration": 0.006928625050932169, + "duration": 0.01297520799562335, "outcome": "passed" }, "call": { - "duration": 2.762534625013359, + "duration": 1.9960687910206616, "outcome": "passed" }, "teardown": { - "duration": 0.0006561250193044543, + "duration": 0.0005048330640420318, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", @@ -878,21 +893,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.008602249901741743, + "duration": 0.007275875075720251, "outcome": "passed" }, "call": { - "duration": 0.8311484589939937, + "duration": 0.9094266659813002, "outcome": "passed" }, "teardown": { - "duration": 0.0005021670367568731, + "duration": 0.00028041598852723837, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", @@ -911,21 +926,21 @@ "case_id": "math" }, "setup": { - "duration": 0.015500334091484547, + "duration": 0.008899332955479622, "outcome": "passed" }, "call": { - "duration": 2.505719291046262, + "duration": 3.117967874975875, "outcome": "passed" }, "teardown": { - "duration": 0.0002619170118123293, + "duration": 0.00017600005958229303, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", @@ -944,21 +959,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.01948041608557105, + "duration": 0.0073364999843761325, "outcome": "passed" }, "call": { - "duration": 0.6336237500654534, + "duration": 2.2714374579954892, "outcome": "passed" }, "teardown": { - "duration": 0.00016637507360428572, + "duration": 0.0001814159331843257, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", @@ -977,21 +992,21 @@ "case_id": "math" }, "setup": { - "duration": 0.006810749997384846, + "duration": 0.010546459001488984, "outcome": "passed" }, "call": { - "duration": 1.9086956249084324, + "duration": 3.9954450000077486, "outcome": "passed" }, "teardown": { - "duration": 0.00018824997823685408, + "duration": 0.0002719159238040447, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", @@ -1010,21 +1025,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.007881582947447896, + "duration": 0.012508000014349818, "outcome": "passed" }, "call": { - "duration": 0.7142562499502674, + "duration": 9.095425167004578, "outcome": "passed" }, "teardown": { - "duration": 0.0007035828894004226, + "duration": 0.00029200001154094934, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", @@ -1043,21 +1058,21 @@ "case_id": "math" }, "setup": { - "duration": 0.00848070892971009, + "duration": 0.014769250061362982, "outcome": "passed" }, "call": { - "duration": 1.5210869159782305, + "duration": 1.9875252910424024, "outcome": "passed" }, "teardown": { - "duration": 0.00021216599270701408, + "duration": 0.0006288329605013132, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", @@ -1076,21 +1091,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.009669666993431747, + "duration": 0.014440709026530385, "outcome": "passed" }, "call": { - "duration": 1.3105999580584466, + "duration": 1.2613736250204965, "outcome": "passed" }, "teardown": { - "duration": 0.000588166993111372, + "duration": 0.0001937919296324253, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", @@ -1109,21 +1124,21 @@ "case_id": "math" }, "setup": { - "duration": 0.007745541981421411, + "duration": 0.0071510839043185115, "outcome": "passed" }, "call": { - "duration": 3.250162083073519, + "duration": 2.2953888749470934, "outcome": "passed" }, "teardown": { - "duration": 0.0001455000601708889, + "duration": 0.00016245793085545301, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", @@ -1142,21 +1157,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.009726207936182618, + "duration": 0.007294666953384876, "outcome": "passed" }, "call": { - "duration": 0.5564592910232022, + "duration": 2.194703874993138, "outcome": "passed" }, "teardown": { - "duration": 0.00019470800179988146, + "duration": 0.00017604196909815073, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", @@ -1175,21 +1190,21 @@ "case_id": "math" }, "setup": { - "duration": 0.018431040924042463, + "duration": 0.019950625021010637, "outcome": "passed" }, "call": { - "duration": 3.8501765420660377, + "duration": 8.4994609169662, "outcome": "passed" }, "teardown": { - "duration": 0.00015279196668416262, + "duration": 0.00026404205709695816, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 203, + "lineno": 204, "outcome": "failed", "keywords": [ "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -1208,34 +1223,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.007509749964810908, + "duration": 0.011928000021725893, "outcome": "passed" }, "call": { - "duration": 0.4906975000631064, + "duration": 0.5664792089955881, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 222, + "lineno": 223, "message": "TypeError: object of type 'NoneType' has no len()" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 222, + "lineno": 223, "message": "TypeError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:222: TypeError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:223: TypeError" }, "teardown": { - "duration": 0.00023995805531740189, + "duration": 0.00023799994960427284, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 203, + "lineno": 204, "outcome": "failed", "keywords": [ "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -1254,34 +1269,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.007144959061406553, + "duration": 0.006813624990172684, "outcome": "passed" }, "call": { - "duration": 3.818257624981925, + "duration": 3.170418416033499, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 222, + "lineno": 223, "message": "TypeError: object of type 'NoneType' has no len()" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 222, + "lineno": 223, "message": "TypeError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:222: TypeError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:223: TypeError" }, "teardown": { - "duration": 0.0002668750239536166, + "duration": 0.0004129580920562148, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 203, + "lineno": 204, "outcome": "failed", "keywords": [ "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -1300,30 +1315,169 @@ "case_id": "case0" }, "setup": { - "duration": 0.015290249953977764, + "duration": 0.01656208303757012, "outcome": "passed" }, "call": { - "duration": 1.5883799999719486, + "duration": 22.76337137504015, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 222, + "lineno": 223, "message": "TypeError: object of type 'NoneType' has no len()" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 222, + "lineno": 223, "message": "TypeError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:222: TypeError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:223: TypeError" }, "teardown": { - "duration": 0.0008049579337239265, + "duration": 0.00038704206235706806, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 228, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.015727541991509497, + "outcome": "passed" + }, + "call": { + "duration": 0.5719050420448184, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 274, + "message": "assert 0 == 1\n + where 0 = len({})" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 274, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n choice = chunk.choices[0]\n delta = choice.delta\n \n if delta.tool_calls is None:\n continue\n \n for tool_call_delta in delta.tool_calls:\n if tool_call_delta.id:\n current_id = tool_call_delta.id\n call_id = current_id\n func_delta = tool_call_delta.function\n \n if call_id not in tool_calls_buffer:\n tool_calls_buffer[call_id] = {\n \"id\": call_id,\n \"type\": tool_call_delta.type,\n \"name\": func_delta.name,\n \"arguments\": \"\",\n }\n \n if func_delta.arguments:\n tool_calls_buffer[call_id][\"arguments\"] += func_delta.arguments\n \n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len({})\n\ntests/verifications/openai_api/test_chat_completion.py:274: AssertionError" + }, + "teardown": { + "duration": 0.0003532909322530031, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 228, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.011914041941054165, + "outcome": "passed" + }, + "call": { + "duration": 5.403063916950487, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 274, + "message": "assert 0 == 1\n + where 0 = len({})" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 274, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n choice = chunk.choices[0]\n delta = choice.delta\n \n if delta.tool_calls is None:\n continue\n \n for tool_call_delta in delta.tool_calls:\n if tool_call_delta.id:\n current_id = tool_call_delta.id\n call_id = current_id\n func_delta = tool_call_delta.function\n \n if call_id not in tool_calls_buffer:\n tool_calls_buffer[call_id] = {\n \"id\": call_id,\n \"type\": tool_call_delta.type,\n \"name\": func_delta.name,\n \"arguments\": \"\",\n }\n \n if func_delta.arguments:\n tool_calls_buffer[call_id][\"arguments\"] += func_delta.arguments\n \n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len({})\n\ntests/verifications/openai_api/test_chat_completion.py:274: AssertionError" + }, + "teardown": { + "duration": 0.0005193749675527215, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 228, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.012608832912519574, + "outcome": "passed" + }, + "call": { + "duration": 7.587262416025624, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 274, + "message": "assert 0 == 1\n + where 0 = len({})" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 274, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n choice = chunk.choices[0]\n delta = choice.delta\n \n if delta.tool_calls is None:\n continue\n \n for tool_call_delta in delta.tool_calls:\n if tool_call_delta.id:\n current_id = tool_call_delta.id\n call_id = current_id\n func_delta = tool_call_delta.function\n \n if call_id not in tool_calls_buffer:\n tool_calls_buffer[call_id] = {\n \"id\": call_id,\n \"type\": tool_call_delta.type,\n \"name\": func_delta.name,\n \"arguments\": \"\",\n }\n \n if func_delta.arguments:\n tool_calls_buffer[call_id][\"arguments\"] += func_delta.arguments\n \n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len({})\n\ntests/verifications/openai_api/test_chat_completion.py:274: AssertionError" + }, + "teardown": { + "duration": 0.0008685829816386104, "outcome": "passed" } } - ] + ], + "run_timestamp": 1744328684 } diff --git a/tests/verifications/test_results/openai_1744264304.json b/tests/verifications/test_results/openai.json similarity index 77% rename from tests/verifications/test_results/openai_1744264304.json rename to tests/verifications/test_results/openai.json index fe9c2fcac..0c1892f7e 100644 --- a/tests/verifications/test_results/openai_1744264304.json +++ b/tests/verifications/test_results/openai.json @@ -1,13 +1,13 @@ { - "created": 1744264338.9923031, - "duration": 32.825536012649536, + "created": 1744328898.0248861, + "duration": 47.561042070388794, "exitcode": 0, "root": "/Users/erichuang/projects/llama-stack", "environment": {}, "summary": { - "passed": 22, - "total": 22, - "collected": 22 + "passed": 24, + "total": 24, + "collected": 24 }, "collectors": [ { @@ -27,112 +27,122 @@ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-case0]", + "type": "Function", + "lineno": 228 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 228 } ] } @@ -140,7 +150,7 @@ "tests": [ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-earth]", @@ -159,21 +169,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.05381445901002735, + "duration": 0.0694252080284059, "outcome": "passed" }, "call": { - "duration": 0.49848275003023446, + "duration": 0.5709165419684723, "outcome": "passed" }, "teardown": { - "duration": 0.00018287496641278267, + "duration": 0.0007626248989254236, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-saturn]", @@ -192,21 +202,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.007965500000864267, + "duration": 0.010281750001013279, "outcome": "passed" }, "call": { - "duration": 0.9293275829404593, + "duration": 0.6309260830748826, "outcome": "passed" }, "teardown": { - "duration": 0.00018229195848107338, + "duration": 0.0001824579667299986, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-mini-earth]", @@ -225,21 +235,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.00875679193995893, + "duration": 0.007922374992631376, "outcome": "passed" }, "call": { - "duration": 0.5793640419142321, + "duration": 0.31756504194345325, "outcome": "passed" }, "teardown": { - "duration": 0.0005307920509949327, + "duration": 0.0005268750246614218, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-mini-saturn]", @@ -258,21 +268,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.01076845801435411, + "duration": 0.01643404201604426, "outcome": "passed" }, "call": { - "duration": 0.8752291660057381, + "duration": 0.7479908330133185, "outcome": "passed" }, "teardown": { - "duration": 0.0004834589781239629, + "duration": 0.0004037501057609916, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-earth]", @@ -291,21 +301,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.01662245800253004, + "duration": 0.021671707974746823, "outcome": "passed" }, "call": { - "duration": 0.8336971249664202, + "duration": 0.6701172919711098, "outcome": "passed" }, "teardown": { - "duration": 0.0024086670018732548, + "duration": 0.0005569590721279383, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-saturn]", @@ -324,21 +334,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.009416291955858469, + "duration": 0.015847125090658665, "outcome": "passed" }, "call": { - "duration": 0.43594495789147913, + "duration": 0.636536999954842, "outcome": "passed" }, "teardown": { - "duration": 0.0009131249971687794, + "duration": 0.00029395800083875656, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-mini-earth]", @@ -357,21 +367,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.013155042077414691, + "duration": 0.011792832985520363, "outcome": "passed" }, "call": { - "duration": 0.6119836670113727, + "duration": 0.5610962919890881, "outcome": "passed" }, "teardown": { - "duration": 0.00023804197553545237, + "duration": 0.0003578749019652605, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-mini-saturn]", @@ -390,21 +400,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.009004916995763779, + "duration": 0.016500207944773138, "outcome": "passed" }, "call": { - "duration": 0.8327413749648258, + "duration": 0.8060244580265135, "outcome": "passed" }, "teardown": { - "duration": 0.00046841695439070463, + "duration": 0.0005296670133247972, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]", - "lineno": 115, + "lineno": 116, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[gpt-4o-case0]", @@ -423,21 +433,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.009574208059348166, + "duration": 0.008338792016729712, "outcome": "passed" }, "call": { - "duration": 2.221839000005275, + "duration": 7.009252917021513, "outcome": "passed" }, "teardown": { - "duration": 0.00015945907216519117, + "duration": 0.0003042910248041153, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]", - "lineno": 115, + "lineno": 116, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[gpt-4o-mini-case0]", @@ -456,21 +466,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.0084402080392465, + "duration": 0.007238540914840996, "outcome": "passed" }, "call": { - "duration": 2.298736457945779, + "duration": 3.134693874977529, "outcome": "passed" }, "teardown": { - "duration": 0.0002423750702291727, + "duration": 0.0003104590578004718, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]", - "lineno": 134, + "lineno": 135, "outcome": "passed", "keywords": [ "test_chat_streaming_image[gpt-4o-case0]", @@ -489,21 +499,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.007330416003242135, + "duration": 0.0161851670127362, "outcome": "passed" }, "call": { - "duration": 4.062959833070636, + "duration": 3.0745719589758664, "outcome": "passed" }, "teardown": { - "duration": 0.00015470804646611214, + "duration": 0.00022620800882577896, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]", - "lineno": 134, + "lineno": 135, "outcome": "passed", "keywords": [ "test_chat_streaming_image[gpt-4o-mini-case0]", @@ -522,21 +532,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.019998832955025136, + "duration": 0.013220708002336323, "outcome": "passed" }, "call": { - "duration": 2.609432084020227, + "duration": 3.624867417034693, "outcome": "passed" }, "teardown": { - "duration": 0.005618917057290673, + "duration": 0.00020633300300687551, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-calendar]", @@ -555,21 +565,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.00867662497330457, + "duration": 0.017596833989955485, "outcome": "passed" }, "call": { - "duration": 0.6856697499752045, + "duration": 1.248568250099197, "outcome": "passed" }, "teardown": { - "duration": 0.00018445902969688177, + "duration": 0.0004248750628903508, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-math]", @@ -588,21 +598,21 @@ "case_id": "math" }, "setup": { - "duration": 0.01139050000347197, + "duration": 0.01512012502644211, "outcome": "passed" }, "call": { - "duration": 2.764390083961189, + "duration": 8.170285542029887, "outcome": "passed" }, "teardown": { - "duration": 0.0003164170775562525, + "duration": 0.00043537491001188755, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", @@ -621,21 +631,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.01321374997496605, + "duration": 0.010376665974035859, "outcome": "passed" }, "call": { - "duration": 0.8284227909753099, + "duration": 0.756480542011559, "outcome": "passed" }, "teardown": { - "duration": 0.00030170800164341927, + "duration": 0.00025695806834846735, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-mini-math]", @@ -654,21 +664,21 @@ "case_id": "math" }, "setup": { - "duration": 0.013477458036504686, + "duration": 0.006846625008620322, "outcome": "passed" }, "call": { - "duration": 2.4146235829684883, + "duration": 2.6833953330060467, "outcome": "passed" }, "teardown": { - "duration": 0.00025754200760275126, + "duration": 0.00022558309137821198, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-calendar]", @@ -687,21 +697,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.006940583931282163, + "duration": 0.009646040969528258, "outcome": "passed" }, "call": { - "duration": 0.5102092920569703, + "duration": 0.6117532079806551, "outcome": "passed" }, "teardown": { - "duration": 0.00023379107005894184, + "duration": 0.00015258300118148327, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-math]", @@ -720,21 +730,21 @@ "case_id": "math" }, "setup": { - "duration": 0.007166999974288046, + "duration": 0.012024458032101393, "outcome": "passed" }, "call": { - "duration": 3.5751801669830456, + "duration": 4.522625041077845, "outcome": "passed" }, "teardown": { - "duration": 0.00015041697770357132, + "duration": 0.0004230838967487216, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-mini-calendar]", @@ -753,21 +763,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.010652625001966953, + "duration": 0.009566582972183824, "outcome": "passed" }, "call": { - "duration": 0.6648182499920949, + "duration": 2.5591942919418216, "outcome": "passed" }, "teardown": { - "duration": 0.0008647920330986381, + "duration": 0.0007555419579148293, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-mini-math]", @@ -786,21 +796,21 @@ "case_id": "math" }, "setup": { - "duration": 0.007372208056040108, + "duration": 0.010828875005245209, "outcome": "passed" }, "call": { - "duration": 2.80747462506406, + "duration": 2.495122667052783, "outcome": "passed" }, "teardown": { - "duration": 0.00028124998789280653, + "duration": 0.0002802090020850301, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]", - "lineno": 203, + "lineno": 204, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[gpt-4o-case0]", @@ -819,21 +829,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.01625587500166148, + "duration": 0.012762792059220374, "outcome": "passed" }, "call": { - "duration": 0.6878769160248339, + "duration": 0.5655921660363674, "outcome": "passed" }, "teardown": { - "duration": 0.0002637499710544944, + "duration": 0.00022304197773337364, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", - "lineno": 203, + "lineno": 204, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", @@ -852,17 +862,84 @@ "case_id": "case0" }, "setup": { - "duration": 0.008817250025458634, + "duration": 0.03188708401285112, "outcome": "passed" }, "call": { - "duration": 0.7181202919455245, + "duration": 0.6159415419679135, "outcome": "passed" }, "teardown": { - "duration": 0.0017147079342976213, + "duration": 0.0005549580091610551, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-case0]", + "lineno": 228, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_calling[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.014768208027817309, + "outcome": "passed" + }, + "call": { + "duration": 0.47373537498060614, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005811670562252402, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-mini-case0]", + "lineno": 228, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_calling[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.010271625011228025, + "outcome": "passed" + }, + "call": { + "duration": 0.5656027499353513, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0025699170073494315, "outcome": "passed" } } - ] + ], + "run_timestamp": 1744328848 } diff --git a/tests/verifications/test_results/together_1744264258.json b/tests/verifications/test_results/together.json similarity index 77% rename from tests/verifications/test_results/together_1744264258.json rename to tests/verifications/test_results/together.json index c38dd52b5..2b23089e8 100644 --- a/tests/verifications/test_results/together_1744264258.json +++ b/tests/verifications/test_results/together.json @@ -1,15 +1,15 @@ { - "created": 1744264304.064288, - "duration": 42.470197916030884, + "created": 1744328847.853437, + "duration": 49.9419469833374, "exitcode": 1, "root": "/Users/erichuang/projects/llama-stack", "environment": {}, "summary": { - "passed": 21, - "failed": 10, + "passed": 22, + "failed": 12, "skipped": 2, - "total": 33, - "collected": 33 + "total": 36, + "collected": 36 }, "collectors": [ { @@ -29,167 +29,182 @@ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", "type": "Function", - "lineno": 72 + "lineno": 73 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", "type": "Function", - "lineno": 91 + "lineno": 92 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 115 + "lineno": 116 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 134 + "lineno": 135 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", "type": "Function", - "lineno": 158 + "lineno": 159 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", "type": "Function", - "lineno": 181 + "lineno": 182 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 203 + "lineno": 204 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 228 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 228 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 228 } ] } @@ -197,7 +212,7 @@ "tests": [ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", @@ -216,21 +231,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.06113254197407514, + "duration": 0.15774220903404057, "outcome": "passed" }, "call": { - "duration": 1.0720349580515176, + "duration": 0.5396400419995189, "outcome": "passed" }, "teardown": { - "duration": 0.00015966698992997408, + "duration": 0.0002977499971166253, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", @@ -249,21 +264,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.006908083101734519, + "duration": 0.015632833004929125, "outcome": "passed" }, "call": { - "duration": 0.5013210839824751, + "duration": 0.4675290420418605, "outcome": "passed" }, "teardown": { - "duration": 0.0005375830223783851, + "duration": 0.00029129208996891975, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", @@ -282,21 +297,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.006910792086273432, + "duration": 0.01530187507160008, "outcome": "passed" }, "call": { - "duration": 0.5142245410243049, + "duration": 0.501894542016089, "outcome": "passed" }, "teardown": { - "duration": 0.0004069580463692546, + "duration": 0.0002060839906334877, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", @@ -315,21 +330,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.009730000048875809, + "duration": 0.014841833035461605, "outcome": "passed" }, "call": { - "duration": 0.40133179200347513, + "duration": 0.4202229160582647, "outcome": "passed" }, "teardown": { - "duration": 0.0004558749496936798, + "duration": 0.0005559159908443689, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", @@ -348,21 +363,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.008247417048551142, + "duration": 0.008204624988138676, "outcome": "passed" }, "call": { - "duration": 0.7914331250358373, + "duration": 1.991508833016269, "outcome": "passed" }, "teardown": { - "duration": 0.00020262505859136581, + "duration": 0.000539042055606842, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", - "lineno": 72, + "lineno": 73, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", @@ -381,21 +396,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.00922900007572025, + "duration": 0.022528667002916336, "outcome": "passed" }, "call": { - "duration": 1.2742049579974264, + "duration": 0.37111237505450845, "outcome": "passed" }, "teardown": { - "duration": 0.000688415952026844, + "duration": 0.0005334159359335899, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", @@ -414,21 +429,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.006949124974198639, + "duration": 0.00922920904122293, "outcome": "passed" }, "call": { - "duration": 0.4681705000111833, + "duration": 1.1684916669037193, "outcome": "passed" }, "teardown": { - "duration": 0.00017795804888010025, + "duration": 0.0002740409690886736, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", @@ -447,21 +462,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.008564374991692603, + "duration": 0.010883333045057952, "outcome": "passed" }, "call": { - "duration": 1.7430362500017509, + "duration": 0.4275277080014348, "outcome": "passed" }, "teardown": { - "duration": 0.00015312491450458765, + "duration": 0.00043112505227327347, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", - "lineno": 91, + "lineno": 92, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", @@ -480,34 +495,34 @@ "case_id": "earth" }, "setup": { - "duration": 0.007404124946333468, + "duration": 0.012945958063937724, "outcome": "passed" }, "call": { - "duration": 0.515926624997519, + "duration": 0.5551295839250088, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" }, "teardown": { - "duration": 0.0002389999572187662, + "duration": 0.0002744169905781746, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", @@ -526,34 +541,34 @@ "case_id": "saturn" }, "setup": { - "duration": 0.0071305419551208615, + "duration": 0.017372542060911655, "outcome": "passed" }, "call": { - "duration": 0.37054662499576807, + "duration": 0.3579877089941874, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" }, "teardown": { - "duration": 0.0006014580139890313, + "duration": 0.0005445419810712337, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", - "lineno": 91, + "lineno": 92, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", @@ -572,34 +587,34 @@ "case_id": "earth" }, "setup": { - "duration": 0.007489709067158401, + "duration": 0.014297832967713475, "outcome": "passed" }, "call": { - "duration": 0.7767745839664713, + "duration": 0.8067362919682637, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" }, "teardown": { - "duration": 0.00025491707492619753, + "duration": 0.0003220830112695694, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", - "lineno": 91, + "lineno": 92, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", @@ -618,34 +633,34 @@ "case_id": "saturn" }, "setup": { - "duration": 0.006736499955877662, + "duration": 0.008816750021651387, "outcome": "passed" }, "call": { - "duration": 0.43948554201051593, + "duration": 0.5383605000097305, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 109, + "lineno": 110, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:109: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" }, "teardown": { - "duration": 0.0002264160430058837, + "duration": 0.00018316600471735, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 115, + "lineno": 116, "outcome": "skipped", "keywords": [ "test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -664,22 +679,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.007171708042733371, + "duration": 0.0074389580404385924, "outcome": "passed" }, "call": { - "duration": 0.00013554200995713472, + "duration": 0.00014933396596461535, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 124, 'Skipped: Skipping test_chat_non_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 125, 'Skipped: Skipping test_chat_non_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" }, "teardown": { - "duration": 0.0001235839445143938, + "duration": 0.00012462493032217026, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 115, + "lineno": 116, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -698,21 +713,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.008639499894343317, + "duration": 0.013580625061877072, "outcome": "passed" }, "call": { - "duration": 1.4001279999502003, + "duration": 2.89831429196056, "outcome": "passed" }, "teardown": { - "duration": 0.00014812499284744263, + "duration": 0.000491458922624588, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 115, + "lineno": 116, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -731,21 +746,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.015450250008143485, + "duration": 0.008266666904091835, "outcome": "passed" }, "call": { - "duration": 3.3522649579681456, + "duration": 3.8873212080216035, "outcome": "passed" }, "teardown": { - "duration": 0.00041629199404269457, + "duration": 0.00016850000247359276, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 134, + "lineno": 135, "outcome": "skipped", "keywords": [ "test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -764,22 +779,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.007634000037796795, + "duration": 0.0080461660400033, "outcome": "passed" }, "call": { - "duration": 0.0001563339028507471, + "duration": 0.00014758307952433825, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 143, 'Skipped: Skipping test_chat_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 144, 'Skipped: Skipping test_chat_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" }, "teardown": { - "duration": 0.0001324999611824751, + "duration": 0.00012695800978690386, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 134, + "lineno": 135, "outcome": "failed", "keywords": [ "test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -798,34 +813,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.007050334010273218, + "duration": 0.00845700001809746, "outcome": "passed" }, "call": { - "duration": 1.7063317500287667, + "duration": 1.6604419159702957, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 152, + "lineno": 153, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 152, + "lineno": 153, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:152: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:153: IndexError" }, "teardown": { - "duration": 0.0002109999768435955, + "duration": 0.00033458403777331114, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 134, + "lineno": 135, "outcome": "failed", "keywords": [ "test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -844,34 +859,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.006729208980686963, + "duration": 0.012580333976075053, "outcome": "passed" }, "call": { - "duration": 3.829621708020568, + "duration": 4.728511792025529, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 152, + "lineno": 153, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 152, + "lineno": 153, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:152: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:153: IndexError" }, "teardown": { - "duration": 0.0002882500411942601, + "duration": 0.00023266696371138096, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", @@ -890,21 +905,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.007713916013017297, + "duration": 0.011554082971997559, "outcome": "passed" }, "call": { - "duration": 2.48285808309447, + "duration": 1.3857994999270886, "outcome": "passed" }, "teardown": { - "duration": 0.00020350003615021706, + "duration": 0.0003951250109821558, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", @@ -923,21 +938,21 @@ "case_id": "math" }, "setup": { - "duration": 0.010098082944750786, + "duration": 0.007673708954825997, "outcome": "passed" }, "call": { - "duration": 1.6994713749736547, + "duration": 3.082161583006382, "outcome": "passed" }, "teardown": { - "duration": 0.00014512497000396252, + "duration": 0.0002532500075176358, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", @@ -956,21 +971,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.006934792036190629, + "duration": 0.014791041961871088, "outcome": "passed" }, "call": { - "duration": 1.277176082949154, + "duration": 0.6918012499809265, "outcome": "passed" }, "teardown": { - "duration": 0.0004985419800505042, + "duration": 0.00027070799842476845, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", @@ -989,21 +1004,21 @@ "case_id": "math" }, "setup": { - "duration": 0.012558708898723125, + "duration": 0.014746625092811882, "outcome": "passed" }, "call": { - "duration": 2.442075416096486, + "duration": 3.5890139170223847, "outcome": "passed" }, "teardown": { - "duration": 0.0003505420172587037, + "duration": 0.00030137505382299423, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", @@ -1022,21 +1037,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.012642999994568527, + "duration": 0.036798374960199, "outcome": "passed" }, "call": { - "duration": 0.9305703329155222, + "duration": 0.6914895409718156, "outcome": "passed" }, "teardown": { - "duration": 0.00016004196368157864, + "duration": 0.00023716699797660112, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", - "lineno": 158, + "lineno": 159, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", @@ -1055,21 +1070,21 @@ "case_id": "math" }, "setup": { - "duration": 0.008792415959760547, + "duration": 0.05965254199691117, "outcome": "passed" }, "call": { - "duration": 2.194098167004995, + "duration": 2.609581291093491, "outcome": "passed" }, "teardown": { - "duration": 0.0003667499404400587, + "duration": 0.0002674580318853259, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", @@ -1088,21 +1103,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.01219504198525101, + "duration": 0.014533916022628546, "outcome": "passed" }, "call": { - "duration": 2.045097667025402, + "duration": 0.6227063750848174, "outcome": "passed" }, "teardown": { - "duration": 0.00029958400409668684, + "duration": 0.00019699998665601015, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", - "lineno": 181, + "lineno": 182, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", @@ -1121,21 +1136,21 @@ "case_id": "math" }, "setup": { - "duration": 0.014203459024429321, + "duration": 0.009818125050514936, "outcome": "passed" }, "call": { - "duration": 1.3079068749211729, + "duration": 5.144610875053331, "outcome": "passed" }, "teardown": { - "duration": 0.0001914579188451171, + "duration": 0.00045220903120934963, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", @@ -1154,34 +1169,34 @@ "case_id": "calendar" }, "setup": { - "duration": 0.04714570892974734, + "duration": 0.012392290984280407, "outcome": "passed" }, "call": { - "duration": 0.44743770791683346, + "duration": 0.777625665999949, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" }, "teardown": { - "duration": 0.00022199994418770075, + "duration": 0.000559916952624917, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", - "lineno": 181, + "lineno": 182, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", @@ -1200,34 +1215,34 @@ "case_id": "math" }, "setup": { - "duration": 0.012237709015607834, + "duration": 0.010390624986030161, "outcome": "passed" }, "call": { - "duration": 3.180020791012794, + "duration": 2.680094916955568, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" }, "teardown": { - "duration": 0.000273333047516644, + "duration": 0.00041987502481788397, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", - "lineno": 181, + "lineno": 182, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", @@ -1246,34 +1261,34 @@ "case_id": "calendar" }, "setup": { - "duration": 0.013312208000570536, + "duration": 0.01190529193263501, "outcome": "passed" }, "call": { - "duration": 0.4110311249969527, + "duration": 0.6690819580107927, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" }, "teardown": { - "duration": 0.00022975006140768528, + "duration": 0.000247166957706213, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", - "lineno": 181, + "lineno": 182, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", @@ -1292,34 +1307,34 @@ "case_id": "math" }, "setup": { - "duration": 0.006676917080767453, + "duration": 0.009588208980858326, "outcome": "passed" }, "call": { - "duration": 2.316411833046004, + "duration": 2.4867218340514228, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 200, + "lineno": 201, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:200: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" }, "teardown": { - "duration": 0.000245374976657331, + "duration": 0.00022487505339086056, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 203, + "lineno": 204, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -1338,21 +1353,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.007064500008709729, + "duration": 0.008509417064487934, "outcome": "passed" }, "call": { - "duration": 0.606806542025879, + "duration": 0.45511841599363834, "outcome": "passed" }, "teardown": { - "duration": 0.00046320806723088026, + "duration": 0.00031033402774482965, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 203, + "lineno": 204, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -1371,21 +1386,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.009071375010535121, + "duration": 0.01352791697718203, "outcome": "passed" }, "call": { - "duration": 0.41908070899080485, + "duration": 0.7166531670372933, "outcome": "passed" }, "teardown": { - "duration": 0.00026074994821101427, + "duration": 0.00031470798421651125, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 203, + "lineno": 204, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -1404,17 +1419,143 @@ "case_id": "case0" }, "setup": { - "duration": 0.0068333749659359455, + "duration": 0.01369225000962615, "outcome": "passed" }, "call": { - "duration": 0.8904451669659466, + "duration": 0.34134254103992134, "outcome": "passed" }, "teardown": { - "duration": 0.0005833340110257268, + "duration": 0.0002922919811680913, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 228, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.025748749962076545, + "outcome": "passed" + }, + "call": { + "duration": 0.7462511250050738, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00030449999030679464, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 228, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.015131957945413888, + "outcome": "passed" + }, + "call": { + "duration": 0.4556894999695942, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 251, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 251, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:251: IndexError" + }, + "teardown": { + "duration": 0.000539042055606842, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 228, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.016429082956165075, + "outcome": "passed" + }, + "call": { + "duration": 0.3677835420239717, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 251, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 251, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:251: IndexError" + }, + "teardown": { + "duration": 0.001610000035725534, "outcome": "passed" } } - ] + ], + "run_timestamp": 1744328795 } From 6aa459b00c55c31bcd265c6876bdb0f6f1d70123 Mon Sep 17 00:00:00 2001 From: Mark Campbell Date: Fri, 11 Apr 2025 12:04:13 +0100 Subject: [PATCH 11/39] docs: fix errors in kubernetes deployment guide (#1914) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Fixes a couple of errors in PVC/Secret setup and adds context for expected Hugging Face token [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) --- docs/source/distributions/kubernetes_deployment.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/source/distributions/kubernetes_deployment.md b/docs/source/distributions/kubernetes_deployment.md index 2daf9d785..21ec02012 100644 --- a/docs/source/distributions/kubernetes_deployment.md +++ b/docs/source/distributions/kubernetes_deployment.md @@ -11,7 +11,12 @@ First, create a local Kubernetes cluster via Kind: kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test ``` -First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model: +First set your hugging face token as an environment variable. +``` +export HF_TOKEN=$(echo -n "your-hf-token" | base64) +``` + +Now create a Kubernetes PVC and Secret for downloading and storing Hugging Face model: ``` cat </tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <$tmp_dir/Containerfile.llama-stack-run-k8s < Date: Fri, 11 Apr 2025 12:25:57 -0400 Subject: [PATCH 12/39] fix: ensure resource registration arguments are typed (#1941) # What does this PR do? closes https://github.com/meta-llama/llama-stack/issues/1586 this issue arises when loading an mcp_endpoint from run.yaml. the issue does not manifest for mcp servers added via a running distro server. the existing tests only cover the case of adding to a running server. the code for loading run.yaml strips type information from mcp_endpoint, passing `{"uri": ...}` instead of `URL(uri=...)` along to the resource provider registration. ## Test Plan 1. run an mcp server 2. add an mcp tool config to the dev.py, e.g. ``` diff --git a/llama_stack/templates/dev/dev.py b/llama_stack/templates/dev/dev.py index 69924acb..e0dc7189 100644 --- a/llama_stack/templates/dev/dev.py +++ b/llama_stack/templates/dev/dev.py @@ -6,6 +6,8 @@ from typing import List, Tuple +from llama_stack.apis.common.content_types import URL + from llama_stack.apis.models.models import ModelType from llama_stack.distribution.datatypes import ( ModelInput, @@ -154,6 +156,11 @@ def get_distribution_template() -> DistributionTemplate: toolgroup_id="builtin::code_interpreter", provider_id="code-interpreter", ), + ToolGroupInput( + toolgroup_id="mcp::filesystem", + provider_id="model-context-protocol", + mcp_endpoint=URL(uri="http://localhost:8002/sse"), + ), ] embedding_model = ModelInput( model_id="all-MiniLM-L6-v2", ``` 3. run distro_codegen.py 4. llama stack build --template dev --run before this pr, the `llama stack run` would fail w/ `AttributeError: 'dict' object has no attribute 'uri'`, after it will succeed. --- llama_stack/distribution/stack.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/llama_stack/distribution/stack.py b/llama_stack/distribution/stack.py index d70878db4..08ff5e7cd 100644 --- a/llama_stack/distribution/stack.py +++ b/llama_stack/distribution/stack.py @@ -96,7 +96,10 @@ async def register_resources(run_config: StackRunConfig, impls: Dict[Api, Any]): method = getattr(impls[api], register_method) for obj in objects: - await method(**obj.model_dump()) + # we want to maintain the type information in arguments to method. + # instead of method(**obj.model_dump()), which may convert a typed attr to a dict, + # we use model_dump() to find all the attrs and then getattr to get the still typed value. + await method(**{k: getattr(obj, k) for k in obj.model_dump().keys()}) method = getattr(impls[api], list_method) response = await method() From 40f41af2f74078028f0d79ecc291722884679d1c Mon Sep 17 00:00:00 2001 From: Ilya Kolchinsky <58424190+ilya-kolchinsky@users.noreply.github.com> Date: Fri, 11 Apr 2025 19:16:10 +0200 Subject: [PATCH 13/39] feat: Add a direct (non-agentic) RAG option to the Playground RAG page (#1940) # What does this PR do? This PR makes it possible to switch between agentic and non-agentic RAG when running the respective Playground page. When non-agentic RAG is selected, user queries are answered by directly querying the vector DB, augmenting the prompt, and sending the extended prompt to the model via Inference API. ## Test Plan - Launch the Playground and go to the RAG page; - Select the vector DB ID; - Adjust other configuration parameters if necessary; - Set the radio button to Agent-based RAG; - Send a message to the chat; - The query will be answered by an agent using the knowledge search tool as indicated by the output; - Click the 'Clear Chat' button to make it possible to switch modes; - Send a message to the chat again; - This time, the query will be answered by the model directly as can be deduced from the reply. --- .../distribution/ui/page/playground/rag.py | 103 +++++++++++++++--- 1 file changed, 88 insertions(+), 15 deletions(-) diff --git a/llama_stack/distribution/ui/page/playground/rag.py b/llama_stack/distribution/ui/page/playground/rag.py index be222f840..392c9afe2 100644 --- a/llama_stack/distribution/ui/page/playground/rag.py +++ b/llama_stack/distribution/ui/page/playground/rag.py @@ -9,6 +9,7 @@ import uuid import streamlit as st from llama_stack_client import Agent, AgentEventLogger, RAGDocument +from llama_stack.apis.common.content_types import ToolCallDelta from llama_stack.distribution.ui.modules.api import llama_stack_api from llama_stack.distribution.ui.modules.utils import data_url_from_file @@ -21,11 +22,11 @@ def rag_chat_page(): st.cache_resource.clear() def should_disable_input(): - return "messages" in st.session_state and len(st.session_state.messages) > 0 + return "displayed_messages" in st.session_state and len(st.session_state.displayed_messages) > 0 with st.sidebar: # File/Directory Upload Section - st.subheader("Upload Documents") + st.subheader("Upload Documents", divider=True) uploaded_files = st.file_uploader( "Upload file(s) or directory", accept_multiple_files=True, @@ -36,11 +37,11 @@ def rag_chat_page(): st.success(f"Successfully uploaded {len(uploaded_files)} files") # Add memory bank name input field vector_db_name = st.text_input( - "Vector Database Name", + "Document Collection Name", value="rag_vector_db", - help="Enter a unique identifier for this vector database", + help="Enter a unique identifier for this document collection", ) - if st.button("Create Vector Database"): + if st.button("Create Document Collection"): documents = [ RAGDocument( document_id=uploaded_file.name, @@ -71,17 +72,30 @@ def rag_chat_page(): ) st.success("Vector database created successfully!") - st.subheader("Configure Agent") + st.subheader("RAG Parameters", divider=True) + + rag_mode = st.radio( + "RAG mode", + ["Direct", "Agent-based"], + captions=[ + "RAG is performed by directly retrieving the information and augmenting the user query", + "RAG is performed by an agent activating a dedicated knowledge search tool.", + ], + on_change=reset_agent_and_chat, + disabled=should_disable_input(), + ) + # select memory banks vector_dbs = llama_stack_api.client.vector_dbs.list() vector_dbs = [vector_db.identifier for vector_db in vector_dbs] selected_vector_dbs = st.multiselect( - label="Select Vector Databases", + label="Select Document Collections to use in RAG queries", options=vector_dbs, on_change=reset_agent_and_chat, disabled=should_disable_input(), ) + st.subheader("Inference Parameters", divider=True) available_models = llama_stack_api.client.models.list() available_models = [model.identifier for model in available_models if model.model_type == "llm"] selected_model = st.selectbox( @@ -127,9 +141,11 @@ def rag_chat_page(): # Chat Interface if "messages" not in st.session_state: st.session_state.messages = [] + if "displayed_messages" not in st.session_state: + st.session_state.displayed_messages = [] # Display chat history - for message in st.session_state.messages: + for message in st.session_state.displayed_messages: with st.chat_message(message["role"]): st.markdown(message["content"]) @@ -161,14 +177,17 @@ def rag_chat_page(): ], ) - agent = create_agent() + if rag_mode == "Agent-based": + agent = create_agent() + if "agent_session_id" not in st.session_state: + st.session_state["agent_session_id"] = agent.create_session(session_name=f"rag_demo_{uuid.uuid4()}") - if "agent_session_id" not in st.session_state: - st.session_state["agent_session_id"] = agent.create_session(session_name=f"rag_demo_{uuid.uuid4()}") + session_id = st.session_state["agent_session_id"] - session_id = st.session_state["agent_session_id"] + def agent_process_prompt(prompt): + # Add user message to chat history + st.session_state.messages.append({"role": "user", "content": prompt}) - def process_prompt(prompt): # Send the prompt to the agent response = agent.create_turn( messages=[ @@ -197,11 +216,62 @@ def rag_chat_page(): message_placeholder.markdown(full_response) st.session_state.messages.append({"role": "assistant", "content": full_response}) + st.session_state.displayed_messages.append({"role": "assistant", "content": full_response}) + + def direct_process_prompt(prompt): + # Add the system prompt in the beginning of the conversation + if len(st.session_state.messages) == 0: + st.session_state.messages.append({"role": "system", "content": system_prompt}) + + # Query the vector DB + rag_response = llama_stack_api.client.tool_runtime.rag_tool.query( + content=prompt, vector_db_ids=list(selected_vector_dbs) + ) + prompt_context = rag_response.content + + with st.chat_message("assistant"): + retrieval_message_placeholder = st.empty() + message_placeholder = st.empty() + full_response = "" + retrieval_response = "" + + # Display the retrieved content + retrieval_response += str(prompt_context) + retrieval_message_placeholder.info(retrieval_response) + + # Construct the extended prompt + extended_prompt = f"Please answer the following query using the context below.\n\nCONTEXT:\n{prompt_context}\n\nQUERY:\n{prompt}" + + # Run inference directly + st.session_state.messages.append({"role": "user", "content": extended_prompt}) + response = llama_stack_api.client.inference.chat_completion( + messages=st.session_state.messages, + model_id=selected_model, + sampling_params={ + "strategy": strategy, + }, + stream=True, + ) + + # Display assistant response + for chunk in response: + response_delta = chunk.event.delta + if isinstance(response_delta, ToolCallDelta): + retrieval_response += response_delta.tool_call.replace("====", "").strip() + retrieval_message_placeholder.info(retrieval_response) + else: + full_response += chunk.event.delta.text + message_placeholder.markdown(full_response + "▌") + message_placeholder.markdown(full_response) + + response_dict = {"role": "assistant", "content": full_response, "stop_reason": "end_of_message"} + st.session_state.messages.append(response_dict) + st.session_state.displayed_messages.append(response_dict) # Chat input if prompt := st.chat_input("Ask a question about your documents"): # Add user message to chat history - st.session_state.messages.append({"role": "user", "content": prompt}) + st.session_state.displayed_messages.append({"role": "user", "content": prompt}) # Display user message with st.chat_message("user"): @@ -214,7 +284,10 @@ def rag_chat_page(): st.rerun() if "prompt" in st.session_state and st.session_state.prompt is not None: - process_prompt(st.session_state.prompt) + if rag_mode == "Agent-based": + agent_process_prompt(st.session_state.prompt) + else: # rag_mode == "Direct" + direct_process_prompt(st.session_state.prompt) st.session_state.prompt = None From 2a74f0db39de7d25bd4407a2535ef67593ad47f3 Mon Sep 17 00:00:00 2001 From: Ben Browning Date: Fri, 11 Apr 2025 13:17:57 -0400 Subject: [PATCH 14/39] fix: remove extra sft args in NvidiaPostTrainingAdapter (#1939) # What does this PR do? The supervised_fine_tune method in NvidiaPostTrainingAdapter had some extra args that aren't part of the post_training protocol, and these extra args were causing FastAPI to throw an error when attempting to stand up an endpoint that used this provider. (Closes #1938) ## Test Plan Before this change, bringing up a stack with the `nvidia` template failed. Afterwards, it passes. I'm testing this like: ``` INFERENCE_MODEL="meta/llama-3.1-8b-instruct" \ llama stack build --template nvidia --image-type venv --run ``` I also ensured the nvidia/test_supervised_fine_tuning.py tests still pass via: ``` python -m pytest \ tests/unit/providers/nvidia/test_supervised_fine_tuning.py ``` Signed-off-by: Ben Browning --- .../providers/remote/post_training/nvidia/post_training.py | 4 ---- 1 file changed, 4 deletions(-) diff --git a/llama_stack/providers/remote/post_training/nvidia/post_training.py b/llama_stack/providers/remote/post_training/nvidia/post_training.py index bacfdba0b..e14fcf0cc 100644 --- a/llama_stack/providers/remote/post_training/nvidia/post_training.py +++ b/llama_stack/providers/remote/post_training/nvidia/post_training.py @@ -206,10 +206,6 @@ class NvidiaPostTrainingAdapter(ModelRegistryHelper): model: str, checkpoint_dir: Optional[str], algorithm_config: Optional[AlgorithmConfig] = None, - extra_json: Optional[Dict[str, Any]] = None, - params: Optional[Dict[str, Any]] = None, - headers: Optional[Dict[str, Any]] = None, - **kwargs, ) -> NvidiaPostTrainingJob: """ Fine-tunes a model on a dataset. From c1cb6aad11dfc4f77a78b9163cf9c8ff164ef5dc Mon Sep 17 00:00:00 2001 From: Jash Gulabrai <37194352+JashG@users.noreply.github.com> Date: Fri, 11 Apr 2025 14:49:55 -0400 Subject: [PATCH 15/39] feat: Add unit tests for NVIDIA safety (#1897) # What does this PR do? This PR adds unit tests for the NVIDIA Safety provider implementation. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] 1. Ran `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_safety.py` from the root of the project. Verified tests pass. ``` tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_init_nemo_guardrails Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_init_nemo_guardrails_invalid_temperature Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_register_shield_with_valid_id Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_register_shield_without_id Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_allowed Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_blocked Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_http_error Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_not_found Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED ``` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai --- .../providers/remote/safety/nvidia/nvidia.py | 25 +- tests/unit/providers/nvidia/test_safety.py | 326 ++++++++++++++++++ 2 files changed, 340 insertions(+), 11 deletions(-) create mode 100644 tests/unit/providers/nvidia/test_safety.py diff --git a/llama_stack/providers/remote/safety/nvidia/nvidia.py b/llama_stack/providers/remote/safety/nvidia/nvidia.py index 6da2a8344..1ff4a6ad9 100644 --- a/llama_stack/providers/remote/safety/nvidia/nvidia.py +++ b/llama_stack/providers/remote/safety/nvidia/nvidia.py @@ -104,6 +104,15 @@ class NeMoGuardrails: self.threshold = threshold self.guardrails_service_url = config.guardrails_service_url + async def _guardrails_post(self, path: str, data: Any | None): + """Helper for making POST requests to the guardrails service.""" + headers = { + "Accept": "application/json", + } + response = requests.post(url=f"{self.guardrails_service_url}{path}", headers=headers, json=data) + response.raise_for_status() + return response.json() + async def run(self, messages: List[Message]) -> RunShieldResponse: """ Queries the /v1/guardrails/checks endpoint of the NeMo guardrails deployed API. @@ -118,9 +127,6 @@ class NeMoGuardrails: Raises: requests.HTTPError: If the POST request fails. """ - headers = { - "Accept": "application/json", - } request_data = { "model": self.model, "messages": convert_pydantic_to_json_value(messages), @@ -134,15 +140,11 @@ class NeMoGuardrails: "config_id": self.config_id, }, } - response = requests.post( - url=f"{self.guardrails_service_url}/v1/guardrail/checks", headers=headers, json=request_data - ) - response.raise_for_status() - if "Content-Type" in response.headers and response.headers["Content-Type"].startswith("application/json"): - response_json = response.json() - if response_json["status"] == "blocked": + response = await self._guardrails_post(path="/v1/guardrail/checks", data=request_data) + + if response["status"] == "blocked": user_message = "Sorry I cannot do this." - metadata = response_json["rails_status"] + metadata = response["rails_status"] return RunShieldResponse( violation=SafetyViolation( @@ -151,4 +153,5 @@ class NeMoGuardrails: metadata=metadata, ) ) + return RunShieldResponse(violation=None) diff --git a/tests/unit/providers/nvidia/test_safety.py b/tests/unit/providers/nvidia/test_safety.py new file mode 100644 index 000000000..e7e1cb3dc --- /dev/null +++ b/tests/unit/providers/nvidia/test_safety.py @@ -0,0 +1,326 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +import json +import os +import unittest +from typing import Any +from unittest.mock import AsyncMock, MagicMock, patch + +import pytest + +from llama_stack.apis.inference.inference import CompletionMessage, UserMessage +from llama_stack.apis.safety import RunShieldResponse, ViolationLevel +from llama_stack.apis.shields import Shield +from llama_stack.providers.remote.safety.nvidia.config import NVIDIASafetyConfig +from llama_stack.providers.remote.safety.nvidia.nvidia import NVIDIASafetyAdapter + + +class TestNVIDIASafetyAdapter(unittest.TestCase): + def setUp(self): + os.environ["NVIDIA_GUARDRAILS_URL"] = "http://nemo.test" + + # Initialize the adapter + self.config = NVIDIASafetyConfig( + guardrails_service_url=os.environ["NVIDIA_GUARDRAILS_URL"], + ) + self.adapter = NVIDIASafetyAdapter(config=self.config) + self.shield_store = AsyncMock() + self.adapter.shield_store = self.shield_store + + # Mock the HTTP request methods + self.guardrails_post_patcher = patch( + "llama_stack.providers.remote.safety.nvidia.nvidia.NeMoGuardrails._guardrails_post" + ) + self.mock_guardrails_post = self.guardrails_post_patcher.start() + self.mock_guardrails_post.return_value = {"status": "allowed"} + + def tearDown(self): + """Clean up after each test.""" + self.guardrails_post_patcher.stop() + + @pytest.fixture(autouse=True) + def inject_fixtures(self, run_async): + self.run_async = run_async + + def _assert_request( + self, + mock_call: MagicMock, + expected_url: str, + expected_headers: dict[str, str] | None = None, + expected_json: dict[str, Any] | None = None, + ) -> None: + """ + Helper method to verify request details in mock API calls. + + Args: + mock_call: The MagicMock object that was called + expected_url: The expected URL to which the request was made + expected_headers: Optional dictionary of expected request headers + expected_json: Optional dictionary of expected JSON payload + """ + call_args = mock_call.call_args + + # Check URL + assert call_args[0][0] == expected_url + + # Check headers if provided + if expected_headers: + for key, value in expected_headers.items(): + assert call_args[1]["headers"][key] == value + + # Check JSON if provided + if expected_json: + for key, value in expected_json.items(): + if isinstance(value, dict): + for nested_key, nested_value in value.items(): + assert call_args[1]["json"][key][nested_key] == nested_value + else: + assert call_args[1]["json"][key] == value + + def test_register_shield_with_valid_id(self): + shield = Shield( + provider_id="nvidia", + type="shield", + identifier="test-shield", + provider_resource_id="test-model", + ) + + # Register the shield + self.run_async(self.adapter.register_shield(shield)) + + def test_register_shield_without_id(self): + shield = Shield( + provider_id="nvidia", + type="shield", + identifier="test-shield", + provider_resource_id="", + ) + + # Register the shield should raise a ValueError + with self.assertRaises(ValueError): + self.run_async(self.adapter.register_shield(shield)) + + def test_run_shield_allowed(self): + # Set up the shield + shield_id = "test-shield" + shield = Shield( + provider_id="nvidia", + type="shield", + identifier=shield_id, + provider_resource_id="test-model", + ) + self.shield_store.get_shield.return_value = shield + + # Mock Guardrails API response + self.mock_guardrails_post.return_value = {"status": "allowed"} + + # Run the shield + messages = [ + UserMessage(role="user", content="Hello, how are you?"), + CompletionMessage( + role="assistant", + content="I'm doing well, thank you for asking!", + stop_reason="end_of_message", + tool_calls=[], + ), + ] + result = self.run_async(self.adapter.run_shield(shield_id, messages)) + + # Verify the shield store was called + self.shield_store.get_shield.assert_called_once_with(shield_id) + + # Verify the Guardrails API was called correctly + self.mock_guardrails_post.assert_called_once_with( + path="/v1/guardrail/checks", + data={ + "model": shield_id, + "messages": [ + json.loads(messages[0].model_dump_json()), + json.loads(messages[1].model_dump_json()), + ], + "temperature": 1.0, + "top_p": 1, + "frequency_penalty": 0, + "presence_penalty": 0, + "max_tokens": 160, + "stream": False, + "guardrails": { + "config_id": "self-check", + }, + }, + ) + + # Verify the result + assert isinstance(result, RunShieldResponse) + assert result.violation is None + + def test_run_shield_blocked(self): + # Set up the shield + shield_id = "test-shield" + shield = Shield( + provider_id="nvidia", + type="shield", + identifier=shield_id, + provider_resource_id="test-model", + ) + self.shield_store.get_shield.return_value = shield + + # Mock Guardrails API response + self.mock_guardrails_post.return_value = {"status": "blocked", "rails_status": {"reason": "harmful_content"}} + + # Run the shield + messages = [ + UserMessage(role="user", content="Hello, how are you?"), + CompletionMessage( + role="assistant", + content="I'm doing well, thank you for asking!", + stop_reason="end_of_message", + tool_calls=[], + ), + ] + result = self.run_async(self.adapter.run_shield(shield_id, messages)) + + # Verify the shield store was called + self.shield_store.get_shield.assert_called_once_with(shield_id) + + # Verify the Guardrails API was called correctly + self.mock_guardrails_post.assert_called_once_with( + path="/v1/guardrail/checks", + data={ + "model": shield_id, + "messages": [ + json.loads(messages[0].model_dump_json()), + json.loads(messages[1].model_dump_json()), + ], + "temperature": 1.0, + "top_p": 1, + "frequency_penalty": 0, + "presence_penalty": 0, + "max_tokens": 160, + "stream": False, + "guardrails": { + "config_id": "self-check", + }, + }, + ) + + # Verify the result + assert result.violation is not None + assert isinstance(result, RunShieldResponse) + assert result.violation.user_message == "Sorry I cannot do this." + assert result.violation.violation_level == ViolationLevel.ERROR + assert result.violation.metadata == {"reason": "harmful_content"} + + def test_run_shield_not_found(self): + # Set up shield store to return None + shield_id = "non-existent-shield" + self.shield_store.get_shield.return_value = None + + messages = [ + UserMessage(role="user", content="Hello, how are you?"), + ] + + with self.assertRaises(ValueError): + self.run_async(self.adapter.run_shield(shield_id, messages)) + + # Verify the shield store was called + self.shield_store.get_shield.assert_called_once_with(shield_id) + + # Verify the Guardrails API was not called + self.mock_guardrails_post.assert_not_called() + + def test_run_shield_http_error(self): + shield_id = "test-shield" + shield = Shield( + provider_id="nvidia", + type="shield", + identifier=shield_id, + provider_resource_id="test-model", + ) + self.shield_store.get_shield.return_value = shield + + # Mock Guardrails API to raise an exception + error_msg = "API Error: 500 Internal Server Error" + self.mock_guardrails_post.side_effect = Exception(error_msg) + + # Running the shield should raise an exception + messages = [ + UserMessage(role="user", content="Hello, how are you?"), + CompletionMessage( + role="assistant", + content="I'm doing well, thank you for asking!", + stop_reason="end_of_message", + tool_calls=[], + ), + ] + with self.assertRaises(Exception) as context: + self.run_async(self.adapter.run_shield(shield_id, messages)) + + # Verify the shield store was called + self.shield_store.get_shield.assert_called_once_with(shield_id) + + # Verify the Guardrails API was called correctly + self.mock_guardrails_post.assert_called_once_with( + path="/v1/guardrail/checks", + data={ + "model": shield_id, + "messages": [ + json.loads(messages[0].model_dump_json()), + json.loads(messages[1].model_dump_json()), + ], + "temperature": 1.0, + "top_p": 1, + "frequency_penalty": 0, + "presence_penalty": 0, + "max_tokens": 160, + "stream": False, + "guardrails": { + "config_id": "self-check", + }, + }, + ) + # Verify the exception message + assert error_msg in str(context.exception) + + def test_init_nemo_guardrails(self): + from llama_stack.providers.remote.safety.nvidia.nvidia import NeMoGuardrails + + test_config_id = "test-custom-config-id" + config = NVIDIASafetyConfig( + guardrails_service_url=os.environ["NVIDIA_GUARDRAILS_URL"], + config_id=test_config_id, + ) + # Initialize with default parameters + test_model = "test-model" + guardrails = NeMoGuardrails(config, test_model) + + # Verify the attributes are set correctly + assert guardrails.config_id == test_config_id + assert guardrails.model == test_model + assert guardrails.threshold == 0.9 # Default value + assert guardrails.temperature == 1.0 # Default value + assert guardrails.guardrails_service_url == os.environ["NVIDIA_GUARDRAILS_URL"] + + # Initialize with custom parameters + guardrails = NeMoGuardrails(config, test_model, threshold=0.8, temperature=0.7) + + # Verify the attributes are set correctly + assert guardrails.config_id == test_config_id + assert guardrails.model == test_model + assert guardrails.threshold == 0.8 + assert guardrails.temperature == 0.7 + assert guardrails.guardrails_service_url == os.environ["NVIDIA_GUARDRAILS_URL"] + + def test_init_nemo_guardrails_invalid_temperature(self): + from llama_stack.providers.remote.safety.nvidia.nvidia import NeMoGuardrails + + config = NVIDIASafetyConfig( + guardrails_service_url=os.environ["NVIDIA_GUARDRAILS_URL"], + config_id="test-custom-config-id", + ) + with self.assertRaises(ValueError): + NeMoGuardrails(config, "test-model", temperature=0) From 24d70cedcaf2cc373ecf2418da80281b0ca6f9fb Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Fri, 11 Apr 2025 12:50:36 -0600 Subject: [PATCH 16/39] docs: Updated docs to show minimal RAG example and some other minor changes (#1935) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? Incorporating some feedback into the docs. - **`docs/source/getting_started/index.md`:** - Demo actually does RAG now - Simplified the installation command for dependencies. - Updated demo script examples to align with the latest API changes. - Replaced manual document manipulation with `RAGDocument` for clarity and maintainability. - Introduced new logic for model and embedding selection using the Llama Stack Client SDK. - Enhanced examples to showcase proper agent initialization and logging. - **`docs/source/getting_started/detailed_tutorial.md`:** - Updated the section for listing models to include proper code formatting with `bash`. - Removed and reorganized the "Run the Demos" section for clarity. - Adjusted tab-item structures and added new instructions for demo scripts. - **`docs/_static/css/my_theme.css`:** - Updated heading styles to include `h2`, `h3`, and `h4` for consistent font weight. - Added a new style for `pre` tags to wrap text and break long words, this is particularly useful for rendering long output from generation. ## Test Plan Tested locally. Screenshot for reference: Screenshot 2025-04-10 at 10 12 12 PM --------- Signed-off-by: Francisco Javier Arceo --- docs/_static/css/my_theme.css | 6 +- .../getting_started/detailed_tutorial.md | 26 ++--- docs/source/getting_started/index.md | 101 ++++++++---------- 3 files changed, 62 insertions(+), 71 deletions(-) diff --git a/docs/_static/css/my_theme.css b/docs/_static/css/my_theme.css index 6f82f6358..a587f866d 100644 --- a/docs/_static/css/my_theme.css +++ b/docs/_static/css/my_theme.css @@ -17,9 +17,13 @@ display: none; } -h3 { +h2, h3, h4 { font-weight: normal; } html[data-theme="dark"] .rst-content div[class^="highlight"] { background-color: #0b0b0b; } +pre { + white-space: pre-wrap !important; + word-break: break-all; +} diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md index 65582e8d8..911b35437 100644 --- a/docs/source/getting_started/detailed_tutorial.md +++ b/docs/source/getting_started/detailed_tutorial.md @@ -173,9 +173,8 @@ You will see the below: Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321 ``` -#### iii. List Available Models List the models -``` +```bash llama-stack-client models list Available Models @@ -190,15 +189,6 @@ Available Models Total models: 2 ``` - -## Step 4: Run the Demos - -Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md). -Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options. - -::::{tab-set} - -:::{tab-item} Basic Inference with the CLI You can test basic Llama inference completion using the CLI. ```bash @@ -221,10 +211,16 @@ ChatCompletionResponse( ], ) ``` -::: -:::{tab-item} Basic Inference with a Script -Alternatively, you can run inference using the Llama Stack client SDK. +## Step 4: Run the Demos + +Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md). +Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options. + +::::{tab-set} + +:::{tab-item} Basic Inference +Now you can run inference using the Llama Stack client SDK. ### i. Create the Script Create a file `inference.py` and add the following code: @@ -269,7 +265,7 @@ Beauty in the bits ::: :::{tab-item} Build a Simple Agent -Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. +Next we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. ### i. Create the Script Create a file `agent.py` and add the following code: diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 63fa5ae6e..ce7dbe973 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -12,9 +12,8 @@ as the inference [provider](../providers/index.md#inference) for a Llama Model. Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with [Ollama](https://ollama.com/download). ```bash -uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals +uv pip install llama-stack source .venv/bin/activate -export INFERENCE_MODEL="llama3.2:3b" ollama run llama3.2:3b --keepalive 60m ``` ## Step 2: Run the Llama Stack Server @@ -24,70 +23,62 @@ INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type ven ## Step 3: Run the Demo Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell. ```python -from termcolor import cprint -from llama_stack_client.types import Document -from llama_stack_client import LlamaStackClient - - -vector_db = "faiss" -vector_db_id = "test-vector-db" -model_id = "llama3.2:3b-instruct-fp16" -query = "Can you give me the arxiv link for Lora Fine Tuning in Pytorch?" -documents = [ - Document( - document_id="document_1", - content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/lora_finetune.rst", - mime_type="text/plain", - metadata={}, - ) -] +from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient +vector_db_id = "my_demo_vector_db" client = LlamaStackClient(base_url="http://localhost:8321") -client.vector_dbs.register( - provider_id=vector_db, - vector_db_id=vector_db_id, - embedding_model="all-MiniLM-L6-v2", - embedding_dimension=384, -) +models = client.models.list() + +# Select the first LLM and first embedding models +model_id = next(m for m in models if m.model_type == "llm").identifier +embedding_model_id = ( + em := next(m for m in models if m.model_type == "embedding") +).identifier +embedding_dimension = em.metadata["embedding_dimension"] + +_ = client.vector_dbs.register( + vector_db_id=vector_db_id, + embedding_model=embedding_model_id, + embedding_dimension=embedding_dimension, + provider_id="faiss", +) +document = RAGDocument( + document_id="document_1", + content="https://www.paulgraham.com/greatwork.html", + mime_type="text/html", + metadata={}, +) client.tool_runtime.rag_tool.insert( - documents=documents, + documents=[document], vector_db_id=vector_db_id, chunk_size_in_tokens=50, ) - -response = client.tool_runtime.rag_tool.query( - vector_db_ids=[vector_db_id], - content=query, +agent = Agent( + client, + model=model_id, + instructions="You are a helpful assistant", + tools=[ + { + "name": "builtin::rag/knowledge_search", + "args": {"vector_db_ids": [vector_db_id]}, + } + ], ) -cprint("" + "-" * 50, "yellow") -cprint(f"Query> {query}", "red") -cprint("" + "-" * 50, "yellow") -for chunk in response.content: - cprint(f"Chunk ID> {chunk.text}", "green") - cprint("" + "-" * 50, "yellow") +response = agent.create_turn( + messages=[{"role": "user", "content": "How do you do great work?"}], + session_id=agent.create_session("rag_session"), +) + +for log in AgentEventLogger().log(response): + log.print() ``` And you should see output like below. -``` --------------------------------------------------- -Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch? --------------------------------------------------- -Chunk ID> knowledge_search tool found 5 chunks: -BEGIN of knowledge_search tool results. - --------------------------------------------------- -Chunk ID> Result 1: -Document_id:docum -Content: .. _lora_finetune_label: - -============================ -Fine-Tuning Llama2 with LoRA -============================ - -This guide will teach you about `LoRA `_, a - --------------------------------------------------- +```bash +inference> [knowledge_search(query="What does it mean to do great work")] +tool_execution> Tool:knowledge_search Args:{'query': 'What does it mean to do great work'} +tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='Result 2:\nDocument_id:docum\nContent: [1]\nI don\'t think you could give a precise definition of what\ncounts as great work. Doing great work means doing something important\nso well\n', type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: . And if so\nyou're already further along than you might realize, because the\nset of people willing to want to is small.

The factors in doing great work are factors in the literal,\nmathematical sense, and\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: \nincreases your morale and helps you do even better work. But this\ncycle also operates in the other direction: if you're not doing\ngood work, that can demoralize you and make it even harder to. Since\nit matters\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: to try to do\ngreat work. But that's what's going on subconsciously; they shy\naway from the question.

So I'm going to pull a sneaky trick on you. Do you want to do great\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] ``` Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳 From 2b2db5fbda390bbfaf9226579efb39d32176a3a4 Mon Sep 17 00:00:00 2001 From: Ben Browning Date: Fri, 11 Apr 2025 16:14:17 -0400 Subject: [PATCH 17/39] feat: OpenAI-Compatible models, completions, chat/completions (#1894) # What does this PR do? This stubs in some OpenAI server-side compatibility with three new endpoints: /v1/openai/v1/models /v1/openai/v1/completions /v1/openai/v1/chat/completions This gives common inference apps using OpenAI clients the ability to talk to Llama Stack using an endpoint like http://localhost:8321/v1/openai/v1 . The two "v1" instances in there isn't awesome, but the thinking is that Llama Stack's API is v1 and then our OpenAI compatibility layer is compatible with OpenAI V1. And, some OpenAI clients implicitly assume the URL ends with "v1", so this gives maximum compatibility. The openai models endpoint is implemented in the routing layer, and just returns all the models Llama Stack knows about. The following providers should be working with the new OpenAI completions and chat/completions API: * remote::anthropic (untested) * remote::cerebras-openai-compat (untested) * remote::fireworks (tested) * remote::fireworks-openai-compat (untested) * remote::gemini (untested) * remote::groq-openai-compat (untested) * remote::nvidia (tested) * remote::ollama (tested) * remote::openai (untested) * remote::passthrough (untested) * remote::sambanova-openai-compat (untested) * remote::together (tested) * remote::together-openai-compat (untested) * remote::vllm (tested) The goal to support this for every inference provider - proxying directly to the provider's OpenAI endpoint for OpenAI-compatible providers. For providers that don't have an OpenAI-compatible API, we'll add a mixin to translate incoming OpenAI requests to Llama Stack inference requests and translate the Llama Stack inference responses to OpenAI responses. This is related to #1817 but is a bit larger in scope than just chat completions, as I have real use-cases that need the older completions API as well. ## Test Plan ### vLLM ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ## Documentation Run a Llama Stack distribution that uses one of the providers mentioned in the list above. Then, use your favorite OpenAI client to send completion or chat completion requests with the base_url set to http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the host and port of your Llama Stack server, if different. --------- Signed-off-by: Ben Browning --- docs/_static/llama-stack-spec.html | 932 ++++++++++++++++++ docs/_static/llama-stack-spec.yaml | 665 +++++++++++++ llama_stack/apis/inference/inference.py | 313 ++++++ llama_stack/apis/models/models.py | 23 + llama_stack/distribution/routers/routers.py | 121 +++ .../distribution/routers/routing_tables.py | 16 +- .../inference/meta_reference/inference.py | 6 + .../sentence_transformers.py | 6 + .../providers/inline/inference/vllm/vllm.py | 9 +- .../remote/inference/bedrock/bedrock.py | 9 +- .../remote/inference/cerebras/cerebras.py | 9 +- .../remote/inference/databricks/databricks.py | 9 +- .../remote/inference/fireworks/fireworks.py | 109 +- .../remote/inference/nvidia/nvidia.py | 112 ++- .../remote/inference/ollama/ollama.py | 122 ++- .../inference/passthrough/passthrough.py | 110 ++- .../remote/inference/runpod/runpod.py | 9 +- .../remote/inference/sambanova/sambanova.py | 9 +- .../providers/remote/inference/tgi/tgi.py | 9 +- .../remote/inference/together/together.py | 113 ++- .../providers/remote/inference/vllm/vllm.py | 110 ++- .../utils/inference/litellm_openai_mixin.py | 104 +- .../utils/inference/openai_compat.py | 133 ++- pyproject.toml | 1 + requirements.txt | 2 + .../inference/test_openai_completion.py | 216 ++++ uv.lock | 8 +- 27 files changed, 3265 insertions(+), 20 deletions(-) create mode 100644 tests/integration/inference/test_openai_completion.py diff --git a/docs/_static/llama-stack-spec.html b/docs/_static/llama-stack-spec.html index 567110829..36bfad49e 100644 --- a/docs/_static/llama-stack-spec.html +++ b/docs/_static/llama-stack-spec.html @@ -3092,6 +3092,125 @@ } } }, + "/v1/openai/v1/chat/completions": { + "post": { + "responses": { + "200": { + "description": "OK", + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/OpenAIChatCompletion" + } + } + } + }, + "400": { + "$ref": "#/components/responses/BadRequest400" + }, + "429": { + "$ref": "#/components/responses/TooManyRequests429" + }, + "500": { + "$ref": "#/components/responses/InternalServerError500" + }, + "default": { + "$ref": "#/components/responses/DefaultError" + } + }, + "tags": [ + "Inference" + ], + "description": "Generate an OpenAI-compatible chat completion for the given messages using the specified model.", + "parameters": [], + "requestBody": { + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/OpenaiChatCompletionRequest" + } + } + }, + "required": true + } + } + }, + "/v1/openai/v1/completions": { + "post": { + "responses": { + "200": { + "description": "OK", + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/OpenAICompletion" + } + } + } + }, + "400": { + "$ref": "#/components/responses/BadRequest400" + }, + "429": { + "$ref": "#/components/responses/TooManyRequests429" + }, + "500": { + "$ref": "#/components/responses/InternalServerError500" + }, + "default": { + "$ref": "#/components/responses/DefaultError" + } + }, + "tags": [ + "Inference" + ], + "description": "Generate an OpenAI-compatible completion for the given prompt using the specified model.", + "parameters": [], + "requestBody": { + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/OpenaiCompletionRequest" + } + } + }, + "required": true + } + } + }, + "/v1/openai/v1/models": { + "get": { + "responses": { + "200": { + "description": "OK", + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/OpenAIListModelsResponse" + } + } + } + }, + "400": { + "$ref": "#/components/responses/BadRequest400" + }, + "429": { + "$ref": "#/components/responses/TooManyRequests429" + }, + "500": { + "$ref": "#/components/responses/InternalServerError500" + }, + "default": { + "$ref": "#/components/responses/DefaultError" + } + }, + "tags": [ + "Models" + ], + "description": "", + "parameters": [] + } + }, "/v1/post-training/preference-optimize": { "post": { "responses": { @@ -8713,6 +8832,819 @@ ], "title": "LogEventRequest" }, + "OpenAIAssistantMessageParam": { + "type": "object", + "properties": { + "role": { + "type": "string", + "const": "assistant", + "default": "assistant", + "description": "Must be \"assistant\" to identify this as the model's response" + }, + "content": { + "$ref": "#/components/schemas/InterleavedContent", + "description": "The content of the model's response" + }, + "name": { + "type": "string", + "description": "(Optional) The name of the assistant message participant." + }, + "tool_calls": { + "type": "array", + "items": { + "$ref": "#/components/schemas/ToolCall" + }, + "description": "List of tool calls. Each tool call is a ToolCall object." + } + }, + "additionalProperties": false, + "required": [ + "role", + "content" + ], + "title": "OpenAIAssistantMessageParam", + "description": "A message containing the model's (assistant) response in an OpenAI-compatible chat completion request." + }, + "OpenAIDeveloperMessageParam": { + "type": "object", + "properties": { + "role": { + "type": "string", + "const": "developer", + "default": "developer", + "description": "Must be \"developer\" to identify this as a developer message" + }, + "content": { + "$ref": "#/components/schemas/InterleavedContent", + "description": "The content of the developer message" + }, + "name": { + "type": "string", + "description": "(Optional) The name of the developer message participant." + } + }, + "additionalProperties": false, + "required": [ + "role", + "content" + ], + "title": "OpenAIDeveloperMessageParam", + "description": "A message from the developer in an OpenAI-compatible chat completion request." + }, + "OpenAIMessageParam": { + "oneOf": [ + { + "$ref": "#/components/schemas/OpenAIUserMessageParam" + }, + { + "$ref": "#/components/schemas/OpenAISystemMessageParam" + }, + { + "$ref": "#/components/schemas/OpenAIAssistantMessageParam" + }, + { + "$ref": "#/components/schemas/OpenAIToolMessageParam" + }, + { + "$ref": "#/components/schemas/OpenAIDeveloperMessageParam" + } + ], + "discriminator": { + "propertyName": "role", + "mapping": { + "user": "#/components/schemas/OpenAIUserMessageParam", + "system": "#/components/schemas/OpenAISystemMessageParam", + "assistant": "#/components/schemas/OpenAIAssistantMessageParam", + "tool": "#/components/schemas/OpenAIToolMessageParam", + "developer": "#/components/schemas/OpenAIDeveloperMessageParam" + } + } + }, + "OpenAISystemMessageParam": { + "type": "object", + "properties": { + "role": { + "type": "string", + "const": "system", + "default": "system", + "description": "Must be \"system\" to identify this as a system message" + }, + "content": { + "$ref": "#/components/schemas/InterleavedContent", + "description": "The content of the \"system prompt\". If multiple system messages are provided, they are concatenated. The underlying Llama Stack code may also add other system messages (for example, for formatting tool definitions)." + }, + "name": { + "type": "string", + "description": "(Optional) The name of the system message participant." + } + }, + "additionalProperties": false, + "required": [ + "role", + "content" + ], + "title": "OpenAISystemMessageParam", + "description": "A system message providing instructions or context to the model." + }, + "OpenAIToolMessageParam": { + "type": "object", + "properties": { + "role": { + "type": "string", + "const": "tool", + "default": "tool", + "description": "Must be \"tool\" to identify this as a tool response" + }, + "tool_call_id": { + "type": "string", + "description": "Unique identifier for the tool call this response is for" + }, + "content": { + "$ref": "#/components/schemas/InterleavedContent", + "description": "The response content from the tool" + } + }, + "additionalProperties": false, + "required": [ + "role", + "tool_call_id", + "content" + ], + "title": "OpenAIToolMessageParam", + "description": "A message representing the result of a tool invocation in an OpenAI-compatible chat completion request." + }, + "OpenAIUserMessageParam": { + "type": "object", + "properties": { + "role": { + "type": "string", + "const": "user", + "default": "user", + "description": "Must be \"user\" to identify this as a user message" + }, + "content": { + "$ref": "#/components/schemas/InterleavedContent", + "description": "The content of the message, which can include text and other media" + }, + "name": { + "type": "string", + "description": "(Optional) The name of the user message participant." + } + }, + "additionalProperties": false, + "required": [ + "role", + "content" + ], + "title": "OpenAIUserMessageParam", + "description": "A message from the user in an OpenAI-compatible chat completion request." + }, + "OpenaiChatCompletionRequest": { + "type": "object", + "properties": { + "model": { + "type": "string", + "description": "The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint." + }, + "messages": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIMessageParam" + }, + "description": "List of messages in the conversation" + }, + "frequency_penalty": { + "type": "number", + "description": "(Optional) The penalty for repeated tokens" + }, + "function_call": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + } + } + ], + "description": "(Optional) The function call to use" + }, + "functions": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + } + }, + "description": "(Optional) List of functions to use" + }, + "logit_bias": { + "type": "object", + "additionalProperties": { + "type": "number" + }, + "description": "(Optional) The logit bias to use" + }, + "logprobs": { + "type": "boolean", + "description": "(Optional) The log probabilities to use" + }, + "max_completion_tokens": { + "type": "integer", + "description": "(Optional) The maximum number of tokens to generate" + }, + "max_tokens": { + "type": "integer", + "description": "(Optional) The maximum number of tokens to generate" + }, + "n": { + "type": "integer", + "description": "(Optional) The number of completions to generate" + }, + "parallel_tool_calls": { + "type": "boolean", + "description": "(Optional) Whether to parallelize tool calls" + }, + "presence_penalty": { + "type": "number", + "description": "(Optional) The penalty for repeated tokens" + }, + "response_format": { + "type": "object", + "additionalProperties": { + "type": "string" + }, + "description": "(Optional) The response format to use" + }, + "seed": { + "type": "integer", + "description": "(Optional) The seed to use" + }, + "stop": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ], + "description": "(Optional) The stop tokens to use" + }, + "stream": { + "type": "boolean", + "description": "(Optional) Whether to stream the response" + }, + "stream_options": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + }, + "description": "(Optional) The stream options to use" + }, + "temperature": { + "type": "number", + "description": "(Optional) The temperature to use" + }, + "tool_choice": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + } + } + ], + "description": "(Optional) The tool choice to use" + }, + "tools": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + } + }, + "description": "(Optional) The tools to use" + }, + "top_logprobs": { + "type": "integer", + "description": "(Optional) The top log probabilities to use" + }, + "top_p": { + "type": "number", + "description": "(Optional) The top p to use" + }, + "user": { + "type": "string", + "description": "(Optional) The user to use" + } + }, + "additionalProperties": false, + "required": [ + "model", + "messages" + ], + "title": "OpenaiChatCompletionRequest" + }, + "OpenAIChatCompletion": { + "type": "object", + "properties": { + "id": { + "type": "string", + "description": "The ID of the chat completion" + }, + "choices": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChoice" + }, + "description": "List of choices" + }, + "object": { + "type": "string", + "const": "chat.completion", + "default": "chat.completion", + "description": "The object type, which will be \"chat.completion\"" + }, + "created": { + "type": "integer", + "description": "The Unix timestamp in seconds when the chat completion was created" + }, + "model": { + "type": "string", + "description": "The model that was used to generate the chat completion" + } + }, + "additionalProperties": false, + "required": [ + "id", + "choices", + "object", + "created", + "model" + ], + "title": "OpenAIChatCompletion", + "description": "Response from an OpenAI-compatible chat completion request." + }, + "OpenAIChoice": { + "type": "object", + "properties": { + "message": { + "$ref": "#/components/schemas/OpenAIMessageParam", + "description": "The message from the model" + }, + "finish_reason": { + "type": "string", + "description": "The reason the model stopped generating" + }, + "index": { + "type": "integer" + }, + "logprobs": { + "$ref": "#/components/schemas/OpenAIChoiceLogprobs" + } + }, + "additionalProperties": false, + "required": [ + "message", + "finish_reason", + "index" + ], + "title": "OpenAIChoice", + "description": "A choice from an OpenAI-compatible chat completion response." + }, + "OpenAIChoiceLogprobs": { + "type": "object", + "properties": { + "content": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAITokenLogProb" + } + }, + "refusal": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAITokenLogProb" + } + } + }, + "additionalProperties": false, + "title": "OpenAIChoiceLogprobs", + "description": "The log probabilities for the tokens in the message from an OpenAI-compatible chat completion response." + }, + "OpenAITokenLogProb": { + "type": "object", + "properties": { + "token": { + "type": "string" + }, + "bytes": { + "type": "array", + "items": { + "type": "integer" + } + }, + "logprob": { + "type": "number" + }, + "top_logprobs": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAITopLogProb" + } + } + }, + "additionalProperties": false, + "required": [ + "token", + "logprob", + "top_logprobs" + ], + "title": "OpenAITokenLogProb", + "description": "The log probability for a token from an OpenAI-compatible chat completion response." + }, + "OpenAITopLogProb": { + "type": "object", + "properties": { + "token": { + "type": "string" + }, + "bytes": { + "type": "array", + "items": { + "type": "integer" + } + }, + "logprob": { + "type": "number" + } + }, + "additionalProperties": false, + "required": [ + "token", + "logprob" + ], + "title": "OpenAITopLogProb", + "description": "The top log probability for a token from an OpenAI-compatible chat completion response." + }, + "OpenaiCompletionRequest": { + "type": "object", + "properties": { + "model": { + "type": "string", + "description": "The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint." + }, + "prompt": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + }, + { + "type": "array", + "items": { + "type": "integer" + } + }, + { + "type": "array", + "items": { + "type": "array", + "items": { + "type": "integer" + } + } + } + ], + "description": "The prompt to generate a completion for" + }, + "best_of": { + "type": "integer", + "description": "(Optional) The number of completions to generate" + }, + "echo": { + "type": "boolean", + "description": "(Optional) Whether to echo the prompt" + }, + "frequency_penalty": { + "type": "number", + "description": "(Optional) The penalty for repeated tokens" + }, + "logit_bias": { + "type": "object", + "additionalProperties": { + "type": "number" + }, + "description": "(Optional) The logit bias to use" + }, + "logprobs": { + "type": "boolean", + "description": "(Optional) The log probabilities to use" + }, + "max_tokens": { + "type": "integer", + "description": "(Optional) The maximum number of tokens to generate" + }, + "n": { + "type": "integer", + "description": "(Optional) The number of completions to generate" + }, + "presence_penalty": { + "type": "number", + "description": "(Optional) The penalty for repeated tokens" + }, + "seed": { + "type": "integer", + "description": "(Optional) The seed to use" + }, + "stop": { + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "type": "string" + } + } + ], + "description": "(Optional) The stop tokens to use" + }, + "stream": { + "type": "boolean", + "description": "(Optional) Whether to stream the response" + }, + "stream_options": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + }, + "description": "(Optional) The stream options to use" + }, + "temperature": { + "type": "number", + "description": "(Optional) The temperature to use" + }, + "top_p": { + "type": "number", + "description": "(Optional) The top p to use" + }, + "user": { + "type": "string", + "description": "(Optional) The user to use" + }, + "guided_choice": { + "type": "array", + "items": { + "type": "string" + } + }, + "prompt_logprobs": { + "type": "integer" + } + }, + "additionalProperties": false, + "required": [ + "model", + "prompt" + ], + "title": "OpenaiCompletionRequest" + }, + "OpenAICompletion": { + "type": "object", + "properties": { + "id": { + "type": "string" + }, + "choices": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAICompletionChoice" + } + }, + "created": { + "type": "integer" + }, + "model": { + "type": "string" + }, + "object": { + "type": "string", + "const": "text_completion", + "default": "text_completion" + } + }, + "additionalProperties": false, + "required": [ + "id", + "choices", + "created", + "model", + "object" + ], + "title": "OpenAICompletion", + "description": "Response from an OpenAI-compatible completion request." + }, + "OpenAICompletionChoice": { + "type": "object", + "properties": { + "finish_reason": { + "type": "string" + }, + "text": { + "type": "string" + }, + "index": { + "type": "integer" + }, + "logprobs": { + "$ref": "#/components/schemas/OpenAIChoiceLogprobs" + } + }, + "additionalProperties": false, + "required": [ + "finish_reason", + "text", + "index" + ], + "title": "OpenAICompletionChoice", + "description": "A choice from an OpenAI-compatible completion response." + }, + "OpenAIModel": { + "type": "object", + "properties": { + "id": { + "type": "string" + }, + "object": { + "type": "string", + "const": "model", + "default": "model" + }, + "created": { + "type": "integer" + }, + "owned_by": { + "type": "string" + } + }, + "additionalProperties": false, + "required": [ + "id", + "object", + "created", + "owned_by" + ], + "title": "OpenAIModel", + "description": "A model from OpenAI." + }, + "OpenAIListModelsResponse": { + "type": "object", + "properties": { + "data": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIModel" + } + } + }, + "additionalProperties": false, + "required": [ + "data" + ], + "title": "OpenAIListModelsResponse" + }, "DPOAlignmentConfig": { "type": "object", "properties": { diff --git a/docs/_static/llama-stack-spec.yaml b/docs/_static/llama-stack-spec.yaml index 1dfd17f55..82faf450a 100644 --- a/docs/_static/llama-stack-spec.yaml +++ b/docs/_static/llama-stack-spec.yaml @@ -2131,6 +2131,91 @@ paths: schema: $ref: '#/components/schemas/LogEventRequest' required: true + /v1/openai/v1/chat/completions: + post: + responses: + '200': + description: OK + content: + application/json: + schema: + $ref: '#/components/schemas/OpenAIChatCompletion' + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Inference + description: >- + Generate an OpenAI-compatible chat completion for the given messages using + the specified model. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/OpenaiChatCompletionRequest' + required: true + /v1/openai/v1/completions: + post: + responses: + '200': + description: OK + content: + application/json: + schema: + $ref: '#/components/schemas/OpenAICompletion' + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Inference + description: >- + Generate an OpenAI-compatible completion for the given prompt using the specified + model. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/OpenaiCompletionRequest' + required: true + /v1/openai/v1/models: + get: + responses: + '200': + description: OK + content: + application/json: + schema: + $ref: '#/components/schemas/OpenAIListModelsResponse' + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Models + description: '' + parameters: [] /v1/post-training/preference-optimize: post: responses: @@ -5980,6 +6065,586 @@ components: - event - ttl_seconds title: LogEventRequest + OpenAIAssistantMessageParam: + type: object + properties: + role: + type: string + const: assistant + default: assistant + description: >- + Must be "assistant" to identify this as the model's response + content: + $ref: '#/components/schemas/InterleavedContent' + description: The content of the model's response + name: + type: string + description: >- + (Optional) The name of the assistant message participant. + tool_calls: + type: array + items: + $ref: '#/components/schemas/ToolCall' + description: >- + List of tool calls. Each tool call is a ToolCall object. + additionalProperties: false + required: + - role + - content + title: OpenAIAssistantMessageParam + description: >- + A message containing the model's (assistant) response in an OpenAI-compatible + chat completion request. + OpenAIDeveloperMessageParam: + type: object + properties: + role: + type: string + const: developer + default: developer + description: >- + Must be "developer" to identify this as a developer message + content: + $ref: '#/components/schemas/InterleavedContent' + description: The content of the developer message + name: + type: string + description: >- + (Optional) The name of the developer message participant. + additionalProperties: false + required: + - role + - content + title: OpenAIDeveloperMessageParam + description: >- + A message from the developer in an OpenAI-compatible chat completion request. + OpenAIMessageParam: + oneOf: + - $ref: '#/components/schemas/OpenAIUserMessageParam' + - $ref: '#/components/schemas/OpenAISystemMessageParam' + - $ref: '#/components/schemas/OpenAIAssistantMessageParam' + - $ref: '#/components/schemas/OpenAIToolMessageParam' + - $ref: '#/components/schemas/OpenAIDeveloperMessageParam' + discriminator: + propertyName: role + mapping: + user: '#/components/schemas/OpenAIUserMessageParam' + system: '#/components/schemas/OpenAISystemMessageParam' + assistant: '#/components/schemas/OpenAIAssistantMessageParam' + tool: '#/components/schemas/OpenAIToolMessageParam' + developer: '#/components/schemas/OpenAIDeveloperMessageParam' + OpenAISystemMessageParam: + type: object + properties: + role: + type: string + const: system + default: system + description: >- + Must be "system" to identify this as a system message + content: + $ref: '#/components/schemas/InterleavedContent' + description: >- + The content of the "system prompt". If multiple system messages are provided, + they are concatenated. The underlying Llama Stack code may also add other + system messages (for example, for formatting tool definitions). + name: + type: string + description: >- + (Optional) The name of the system message participant. + additionalProperties: false + required: + - role + - content + title: OpenAISystemMessageParam + description: >- + A system message providing instructions or context to the model. + OpenAIToolMessageParam: + type: object + properties: + role: + type: string + const: tool + default: tool + description: >- + Must be "tool" to identify this as a tool response + tool_call_id: + type: string + description: >- + Unique identifier for the tool call this response is for + content: + $ref: '#/components/schemas/InterleavedContent' + description: The response content from the tool + additionalProperties: false + required: + - role + - tool_call_id + - content + title: OpenAIToolMessageParam + description: >- + A message representing the result of a tool invocation in an OpenAI-compatible + chat completion request. + OpenAIUserMessageParam: + type: object + properties: + role: + type: string + const: user + default: user + description: >- + Must be "user" to identify this as a user message + content: + $ref: '#/components/schemas/InterleavedContent' + description: >- + The content of the message, which can include text and other media + name: + type: string + description: >- + (Optional) The name of the user message participant. + additionalProperties: false + required: + - role + - content + title: OpenAIUserMessageParam + description: >- + A message from the user in an OpenAI-compatible chat completion request. + OpenaiChatCompletionRequest: + type: object + properties: + model: + type: string + description: >- + The identifier of the model to use. The model must be registered with + Llama Stack and available via the /models endpoint. + messages: + type: array + items: + $ref: '#/components/schemas/OpenAIMessageParam' + description: List of messages in the conversation + frequency_penalty: + type: number + description: >- + (Optional) The penalty for repeated tokens + function_call: + oneOf: + - type: string + - type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: (Optional) The function call to use + functions: + type: array + items: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: (Optional) List of functions to use + logit_bias: + type: object + additionalProperties: + type: number + description: (Optional) The logit bias to use + logprobs: + type: boolean + description: (Optional) The log probabilities to use + max_completion_tokens: + type: integer + description: >- + (Optional) The maximum number of tokens to generate + max_tokens: + type: integer + description: >- + (Optional) The maximum number of tokens to generate + n: + type: integer + description: >- + (Optional) The number of completions to generate + parallel_tool_calls: + type: boolean + description: >- + (Optional) Whether to parallelize tool calls + presence_penalty: + type: number + description: >- + (Optional) The penalty for repeated tokens + response_format: + type: object + additionalProperties: + type: string + description: (Optional) The response format to use + seed: + type: integer + description: (Optional) The seed to use + stop: + oneOf: + - type: string + - type: array + items: + type: string + description: (Optional) The stop tokens to use + stream: + type: boolean + description: >- + (Optional) Whether to stream the response + stream_options: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: (Optional) The stream options to use + temperature: + type: number + description: (Optional) The temperature to use + tool_choice: + oneOf: + - type: string + - type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: (Optional) The tool choice to use + tools: + type: array + items: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: (Optional) The tools to use + top_logprobs: + type: integer + description: >- + (Optional) The top log probabilities to use + top_p: + type: number + description: (Optional) The top p to use + user: + type: string + description: (Optional) The user to use + additionalProperties: false + required: + - model + - messages + title: OpenaiChatCompletionRequest + OpenAIChatCompletion: + type: object + properties: + id: + type: string + description: The ID of the chat completion + choices: + type: array + items: + $ref: '#/components/schemas/OpenAIChoice' + description: List of choices + object: + type: string + const: chat.completion + default: chat.completion + description: >- + The object type, which will be "chat.completion" + created: + type: integer + description: >- + The Unix timestamp in seconds when the chat completion was created + model: + type: string + description: >- + The model that was used to generate the chat completion + additionalProperties: false + required: + - id + - choices + - object + - created + - model + title: OpenAIChatCompletion + description: >- + Response from an OpenAI-compatible chat completion request. + OpenAIChoice: + type: object + properties: + message: + $ref: '#/components/schemas/OpenAIMessageParam' + description: The message from the model + finish_reason: + type: string + description: The reason the model stopped generating + index: + type: integer + logprobs: + $ref: '#/components/schemas/OpenAIChoiceLogprobs' + additionalProperties: false + required: + - message + - finish_reason + - index + title: OpenAIChoice + description: >- + A choice from an OpenAI-compatible chat completion response. + OpenAIChoiceLogprobs: + type: object + properties: + content: + type: array + items: + $ref: '#/components/schemas/OpenAITokenLogProb' + refusal: + type: array + items: + $ref: '#/components/schemas/OpenAITokenLogProb' + additionalProperties: false + title: OpenAIChoiceLogprobs + description: >- + The log probabilities for the tokens in the message from an OpenAI-compatible + chat completion response. + OpenAITokenLogProb: + type: object + properties: + token: + type: string + bytes: + type: array + items: + type: integer + logprob: + type: number + top_logprobs: + type: array + items: + $ref: '#/components/schemas/OpenAITopLogProb' + additionalProperties: false + required: + - token + - logprob + - top_logprobs + title: OpenAITokenLogProb + description: >- + The log probability for a token from an OpenAI-compatible chat completion + response. + OpenAITopLogProb: + type: object + properties: + token: + type: string + bytes: + type: array + items: + type: integer + logprob: + type: number + additionalProperties: false + required: + - token + - logprob + title: OpenAITopLogProb + description: >- + The top log probability for a token from an OpenAI-compatible chat completion + response. + OpenaiCompletionRequest: + type: object + properties: + model: + type: string + description: >- + The identifier of the model to use. The model must be registered with + Llama Stack and available via the /models endpoint. + prompt: + oneOf: + - type: string + - type: array + items: + type: string + - type: array + items: + type: integer + - type: array + items: + type: array + items: + type: integer + description: The prompt to generate a completion for + best_of: + type: integer + description: >- + (Optional) The number of completions to generate + echo: + type: boolean + description: (Optional) Whether to echo the prompt + frequency_penalty: + type: number + description: >- + (Optional) The penalty for repeated tokens + logit_bias: + type: object + additionalProperties: + type: number + description: (Optional) The logit bias to use + logprobs: + type: boolean + description: (Optional) The log probabilities to use + max_tokens: + type: integer + description: >- + (Optional) The maximum number of tokens to generate + n: + type: integer + description: >- + (Optional) The number of completions to generate + presence_penalty: + type: number + description: >- + (Optional) The penalty for repeated tokens + seed: + type: integer + description: (Optional) The seed to use + stop: + oneOf: + - type: string + - type: array + items: + type: string + description: (Optional) The stop tokens to use + stream: + type: boolean + description: >- + (Optional) Whether to stream the response + stream_options: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: (Optional) The stream options to use + temperature: + type: number + description: (Optional) The temperature to use + top_p: + type: number + description: (Optional) The top p to use + user: + type: string + description: (Optional) The user to use + guided_choice: + type: array + items: + type: string + prompt_logprobs: + type: integer + additionalProperties: false + required: + - model + - prompt + title: OpenaiCompletionRequest + OpenAICompletion: + type: object + properties: + id: + type: string + choices: + type: array + items: + $ref: '#/components/schemas/OpenAICompletionChoice' + created: + type: integer + model: + type: string + object: + type: string + const: text_completion + default: text_completion + additionalProperties: false + required: + - id + - choices + - created + - model + - object + title: OpenAICompletion + description: >- + Response from an OpenAI-compatible completion request. + OpenAICompletionChoice: + type: object + properties: + finish_reason: + type: string + text: + type: string + index: + type: integer + logprobs: + $ref: '#/components/schemas/OpenAIChoiceLogprobs' + additionalProperties: false + required: + - finish_reason + - text + - index + title: OpenAICompletionChoice + description: >- + A choice from an OpenAI-compatible completion response. + OpenAIModel: + type: object + properties: + id: + type: string + object: + type: string + const: model + default: model + created: + type: integer + owned_by: + type: string + additionalProperties: false + required: + - id + - object + - created + - owned_by + title: OpenAIModel + description: A model from OpenAI. + OpenAIListModelsResponse: + type: object + properties: + data: + type: array + items: + $ref: '#/components/schemas/OpenAIModel' + additionalProperties: false + required: + - data + title: OpenAIListModelsResponse DPOAlignmentConfig: type: object properties: diff --git a/llama_stack/apis/inference/inference.py b/llama_stack/apis/inference/inference.py index e59132e33..3390a3fef 100644 --- a/llama_stack/apis/inference/inference.py +++ b/llama_stack/apis/inference/inference.py @@ -442,6 +442,217 @@ class EmbeddingsResponse(BaseModel): embeddings: List[List[float]] +@json_schema_type +class OpenAIUserMessageParam(BaseModel): + """A message from the user in an OpenAI-compatible chat completion request. + + :param role: Must be "user" to identify this as a user message + :param content: The content of the message, which can include text and other media + :param name: (Optional) The name of the user message participant. + """ + + role: Literal["user"] = "user" + content: InterleavedContent + name: Optional[str] = None + + +@json_schema_type +class OpenAISystemMessageParam(BaseModel): + """A system message providing instructions or context to the model. + + :param role: Must be "system" to identify this as a system message + :param content: The content of the "system prompt". If multiple system messages are provided, they are concatenated. The underlying Llama Stack code may also add other system messages (for example, for formatting tool definitions). + :param name: (Optional) The name of the system message participant. + """ + + role: Literal["system"] = "system" + content: InterleavedContent + name: Optional[str] = None + + +@json_schema_type +class OpenAIAssistantMessageParam(BaseModel): + """A message containing the model's (assistant) response in an OpenAI-compatible chat completion request. + + :param role: Must be "assistant" to identify this as the model's response + :param content: The content of the model's response + :param name: (Optional) The name of the assistant message participant. + :param tool_calls: List of tool calls. Each tool call is a ToolCall object. + """ + + role: Literal["assistant"] = "assistant" + content: InterleavedContent + name: Optional[str] = None + tool_calls: Optional[List[ToolCall]] = Field(default_factory=list) + + +@json_schema_type +class OpenAIToolMessageParam(BaseModel): + """A message representing the result of a tool invocation in an OpenAI-compatible chat completion request. + + :param role: Must be "tool" to identify this as a tool response + :param tool_call_id: Unique identifier for the tool call this response is for + :param content: The response content from the tool + """ + + role: Literal["tool"] = "tool" + tool_call_id: str + content: InterleavedContent + + +@json_schema_type +class OpenAIDeveloperMessageParam(BaseModel): + """A message from the developer in an OpenAI-compatible chat completion request. + + :param role: Must be "developer" to identify this as a developer message + :param content: The content of the developer message + :param name: (Optional) The name of the developer message participant. + """ + + role: Literal["developer"] = "developer" + content: InterleavedContent + name: Optional[str] = None + + +OpenAIMessageParam = Annotated[ + Union[ + OpenAIUserMessageParam, + OpenAISystemMessageParam, + OpenAIAssistantMessageParam, + OpenAIToolMessageParam, + OpenAIDeveloperMessageParam, + ], + Field(discriminator="role"), +] +register_schema(OpenAIMessageParam, name="OpenAIMessageParam") + + +@json_schema_type +class OpenAITopLogProb(BaseModel): + """The top log probability for a token from an OpenAI-compatible chat completion response. + + :token: The token + :bytes: (Optional) The bytes for the token + :logprob: The log probability of the token + """ + + token: str + bytes: Optional[List[int]] = None + logprob: float + + +@json_schema_type +class OpenAITokenLogProb(BaseModel): + """The log probability for a token from an OpenAI-compatible chat completion response. + + :token: The token + :bytes: (Optional) The bytes for the token + :logprob: The log probability of the token + :top_logprobs: The top log probabilities for the token + """ + + token: str + bytes: Optional[List[int]] = None + logprob: float + top_logprobs: List[OpenAITopLogProb] + + +@json_schema_type +class OpenAIChoiceLogprobs(BaseModel): + """The log probabilities for the tokens in the message from an OpenAI-compatible chat completion response. + + :content: (Optional) The log probabilities for the tokens in the message + :refusal: (Optional) The log probabilities for the tokens in the message + """ + + content: Optional[List[OpenAITokenLogProb]] = None + refusal: Optional[List[OpenAITokenLogProb]] = None + + +@json_schema_type +class OpenAIChoice(BaseModel): + """A choice from an OpenAI-compatible chat completion response. + + :param message: The message from the model + :param finish_reason: The reason the model stopped generating + :index: The index of the choice + :logprobs: (Optional) The log probabilities for the tokens in the message + """ + + message: OpenAIMessageParam + finish_reason: str + index: int + logprobs: Optional[OpenAIChoiceLogprobs] = None + + +@json_schema_type +class OpenAIChatCompletion(BaseModel): + """Response from an OpenAI-compatible chat completion request. + + :param id: The ID of the chat completion + :param choices: List of choices + :param object: The object type, which will be "chat.completion" + :param created: The Unix timestamp in seconds when the chat completion was created + :param model: The model that was used to generate the chat completion + """ + + id: str + choices: List[OpenAIChoice] + object: Literal["chat.completion"] = "chat.completion" + created: int + model: str + + +@json_schema_type +class OpenAICompletionLogprobs(BaseModel): + """The log probabilities for the tokens in the message from an OpenAI-compatible completion response. + + :text_offset: (Optional) The offset of the token in the text + :token_logprobs: (Optional) The log probabilities for the tokens + :tokens: (Optional) The tokens + :top_logprobs: (Optional) The top log probabilities for the tokens + """ + + text_offset: Optional[List[int]] = None + token_logprobs: Optional[List[float]] = None + tokens: Optional[List[str]] = None + top_logprobs: Optional[List[Dict[str, float]]] = None + + +@json_schema_type +class OpenAICompletionChoice(BaseModel): + """A choice from an OpenAI-compatible completion response. + + :finish_reason: The reason the model stopped generating + :text: The text of the choice + :index: The index of the choice + :logprobs: (Optional) The log probabilities for the tokens in the choice + """ + + finish_reason: str + text: str + index: int + logprobs: Optional[OpenAIChoiceLogprobs] = None + + +@json_schema_type +class OpenAICompletion(BaseModel): + """Response from an OpenAI-compatible completion request. + + :id: The ID of the completion + :choices: List of choices + :created: The Unix timestamp in seconds when the completion was created + :model: The model that was used to generate the completion + :object: The object type, which will be "text_completion" + """ + + id: str + choices: List[OpenAICompletionChoice] + created: int + model: str + object: Literal["text_completion"] = "text_completion" + + class ModelStore(Protocol): async def get_model(self, identifier: str) -> Model: ... @@ -564,3 +775,105 @@ class Inference(Protocol): :returns: An array of embeddings, one for each content. Each embedding is a list of floats. The dimensionality of the embedding is model-specific; you can check model metadata using /models/{model_id} """ ... + + @webmethod(route="/openai/v1/completions", method="POST") + async def openai_completion( + self, + # Standard OpenAI completion parameters + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + # vLLM-specific parameters + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + """Generate an OpenAI-compatible completion for the given prompt using the specified model. + + :param model: The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint. + :param prompt: The prompt to generate a completion for + :param best_of: (Optional) The number of completions to generate + :param echo: (Optional) Whether to echo the prompt + :param frequency_penalty: (Optional) The penalty for repeated tokens + :param logit_bias: (Optional) The logit bias to use + :param logprobs: (Optional) The log probabilities to use + :param max_tokens: (Optional) The maximum number of tokens to generate + :param n: (Optional) The number of completions to generate + :param presence_penalty: (Optional) The penalty for repeated tokens + :param seed: (Optional) The seed to use + :param stop: (Optional) The stop tokens to use + :param stream: (Optional) Whether to stream the response + :param stream_options: (Optional) The stream options to use + :param temperature: (Optional) The temperature to use + :param top_p: (Optional) The top p to use + :param user: (Optional) The user to use + """ + ... + + @webmethod(route="/openai/v1/chat/completions", method="POST") + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + """Generate an OpenAI-compatible chat completion for the given messages using the specified model. + + :param model: The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint. + :param messages: List of messages in the conversation + :param frequency_penalty: (Optional) The penalty for repeated tokens + :param function_call: (Optional) The function call to use + :param functions: (Optional) List of functions to use + :param logit_bias: (Optional) The logit bias to use + :param logprobs: (Optional) The log probabilities to use + :param max_completion_tokens: (Optional) The maximum number of tokens to generate + :param max_tokens: (Optional) The maximum number of tokens to generate + :param n: (Optional) The number of completions to generate + :param parallel_tool_calls: (Optional) Whether to parallelize tool calls + :param presence_penalty: (Optional) The penalty for repeated tokens + :param response_format: (Optional) The response format to use + :param seed: (Optional) The seed to use + :param stop: (Optional) The stop tokens to use + :param stream: (Optional) Whether to stream the response + :param stream_options: (Optional) The stream options to use + :param temperature: (Optional) The temperature to use + :param tool_choice: (Optional) The tool choice to use + :param tools: (Optional) The tools to use + :param top_logprobs: (Optional) The top log probabilities to use + :param top_p: (Optional) The top p to use + :param user: (Optional) The user to use + """ + ... diff --git a/llama_stack/apis/models/models.py b/llama_stack/apis/models/models.py index 893ebc179..97398ce75 100644 --- a/llama_stack/apis/models/models.py +++ b/llama_stack/apis/models/models.py @@ -56,12 +56,35 @@ class ListModelsResponse(BaseModel): data: List[Model] +@json_schema_type +class OpenAIModel(BaseModel): + """A model from OpenAI. + + :id: The ID of the model + :object: The object type, which will be "model" + :created: The Unix timestamp in seconds when the model was created + :owned_by: The owner of the model + """ + + id: str + object: Literal["model"] = "model" + created: int + owned_by: str + + +class OpenAIListModelsResponse(BaseModel): + data: List[OpenAIModel] + + @runtime_checkable @trace_protocol class Models(Protocol): @webmethod(route="/models", method="GET") async def list_models(self) -> ListModelsResponse: ... + @webmethod(route="/openai/v1/models", method="GET") + async def openai_list_models(self) -> OpenAIListModelsResponse: ... + @webmethod(route="/models/{model_id:path}", method="GET") async def get_model( self, diff --git a/llama_stack/distribution/routers/routers.py b/llama_stack/distribution/routers/routers.py index eed96a40a..bc313036f 100644 --- a/llama_stack/distribution/routers/routers.py +++ b/llama_stack/distribution/routers/routers.py @@ -35,6 +35,7 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.apis.models import Model, ModelType from llama_stack.apis.safety import RunShieldResponse, Safety from llama_stack.apis.scoring import ( @@ -419,6 +420,126 @@ class InferenceRouter(Inference): task_type=task_type, ) + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + logger.debug( + f"InferenceRouter.openai_completion: {model=}, {stream=}, {prompt=}", + ) + model_obj = await self.routing_table.get_model(model) + if model_obj is None: + raise ValueError(f"Model '{model}' not found") + if model_obj.model_type == ModelType.embedding: + raise ValueError(f"Model '{model}' is an embedding model and does not support completions") + + params = dict( + model=model_obj.identifier, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + guided_choice=guided_choice, + prompt_logprobs=prompt_logprobs, + ) + + provider = self.routing_table.get_provider_impl(model_obj.identifier) + return await provider.openai_completion(**params) + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + logger.debug( + f"InferenceRouter.openai_chat_completion: {model=}, {stream=}, {messages=}", + ) + model_obj = await self.routing_table.get_model(model) + if model_obj is None: + raise ValueError(f"Model '{model}' not found") + if model_obj.model_type == ModelType.embedding: + raise ValueError(f"Model '{model}' is an embedding model and does not support chat completions") + + params = dict( + model=model_obj.identifier, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + + provider = self.routing_table.get_provider_impl(model_obj.identifier) + return await provider.openai_chat_completion(**params) + class SafetyRouter(Safety): def __init__( diff --git a/llama_stack/distribution/routers/routing_tables.py b/llama_stack/distribution/routers/routing_tables.py index f6adae49d..18b0c891f 100644 --- a/llama_stack/distribution/routers/routing_tables.py +++ b/llama_stack/distribution/routers/routing_tables.py @@ -5,6 +5,7 @@ # the root directory of this source tree. import logging +import time import uuid from typing import Any, Dict, List, Optional @@ -23,7 +24,7 @@ from llama_stack.apis.datasets import ( RowsDataSource, URIDataSource, ) -from llama_stack.apis.models import ListModelsResponse, Model, Models, ModelType +from llama_stack.apis.models import ListModelsResponse, Model, Models, ModelType, OpenAIListModelsResponse, OpenAIModel from llama_stack.apis.resource import ResourceType from llama_stack.apis.scoring_functions import ( ListScoringFunctionsResponse, @@ -254,6 +255,19 @@ class ModelsRoutingTable(CommonRoutingTableImpl, Models): async def list_models(self) -> ListModelsResponse: return ListModelsResponse(data=await self.get_all_with_type("model")) + async def openai_list_models(self) -> OpenAIListModelsResponse: + models = await self.get_all_with_type("model") + openai_models = [ + OpenAIModel( + id=model.identifier, + object="model", + created=int(time.time()), + owned_by="llama_stack", + ) + for model in models + ] + return OpenAIListModelsResponse(data=openai_models) + async def get_model(self, model_id: str) -> Model: model = await self.get_object_by_identifier("model", model_id) if model is None: diff --git a/llama_stack/providers/inline/inference/meta_reference/inference.py b/llama_stack/providers/inline/inference/meta_reference/inference.py index 5f81d6421..3a7632065 100644 --- a/llama_stack/providers/inline/inference/meta_reference/inference.py +++ b/llama_stack/providers/inline/inference/meta_reference/inference.py @@ -54,6 +54,10 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, build_hf_repo_model_entry, ) +from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +) from llama_stack.providers.utils.inference.prompt_adapter import ( augment_content_with_response_format_prompt, chat_completion_request_to_messages, @@ -79,6 +83,8 @@ def llama4_builder_fn(config: MetaReferenceInferenceConfig, model_id: str, llama class MetaReferenceInferenceImpl( + OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionUnsupportedMixin, SentenceTransformerEmbeddingMixin, Inference, ModelsProtocolPrivate, diff --git a/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py b/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py index 39847e085..9c370b6c5 100644 --- a/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py +++ b/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py @@ -23,6 +23,10 @@ from llama_stack.providers.datatypes import Model, ModelsProtocolPrivate from llama_stack.providers.utils.inference.embedding_mixin import ( SentenceTransformerEmbeddingMixin, ) +from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +) from .config import SentenceTransformersInferenceConfig @@ -30,6 +34,8 @@ log = logging.getLogger(__name__) class SentenceTransformersInferenceImpl( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, SentenceTransformerEmbeddingMixin, Inference, ModelsProtocolPrivate, diff --git a/llama_stack/providers/inline/inference/vllm/vllm.py b/llama_stack/providers/inline/inference/vllm/vllm.py index ea2643b7a..085c79d6b 100644 --- a/llama_stack/providers/inline/inference/vllm/vllm.py +++ b/llama_stack/providers/inline/inference/vllm/vllm.py @@ -66,8 +66,10 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelsProtocolPrivate, ) from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, OpenAICompatCompletionChoice, OpenAICompatCompletionResponse, + OpenAICompletionUnsupportedMixin, get_stop_reason, process_chat_completion_stream_response, ) @@ -172,7 +174,12 @@ def _convert_sampling_params( return vllm_sampling_params -class VLLMInferenceImpl(Inference, ModelsProtocolPrivate): +class VLLMInferenceImpl( + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, + ModelsProtocolPrivate, +): """ vLLM-based inference model adapter for Llama Stack with support for multiple models. diff --git a/llama_stack/providers/remote/inference/bedrock/bedrock.py b/llama_stack/providers/remote/inference/bedrock/bedrock.py index 120da5bd4..0a485da8f 100644 --- a/llama_stack/providers/remote/inference/bedrock/bedrock.py +++ b/llama_stack/providers/remote/inference/bedrock/bedrock.py @@ -36,8 +36,10 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, ) from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, OpenAICompatCompletionChoice, OpenAICompatCompletionResponse, + OpenAICompletionUnsupportedMixin, get_sampling_strategy_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -51,7 +53,12 @@ from llama_stack.providers.utils.inference.prompt_adapter import ( from .models import MODEL_ENTRIES -class BedrockInferenceAdapter(ModelRegistryHelper, Inference): +class BedrockInferenceAdapter( + ModelRegistryHelper, + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +): def __init__(self, config: BedrockConfig) -> None: ModelRegistryHelper.__init__(self, MODEL_ENTRIES) self._config = config diff --git a/llama_stack/providers/remote/inference/cerebras/cerebras.py b/llama_stack/providers/remote/inference/cerebras/cerebras.py index 43d986b86..5e0a5b484 100644 --- a/llama_stack/providers/remote/inference/cerebras/cerebras.py +++ b/llama_stack/providers/remote/inference/cerebras/cerebras.py @@ -34,6 +34,8 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, ) from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -49,7 +51,12 @@ from .config import CerebrasImplConfig from .models import MODEL_ENTRIES -class CerebrasInferenceAdapter(ModelRegistryHelper, Inference): +class CerebrasInferenceAdapter( + ModelRegistryHelper, + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +): def __init__(self, config: CerebrasImplConfig) -> None: ModelRegistryHelper.__init__( self, diff --git a/llama_stack/providers/remote/inference/databricks/databricks.py b/llama_stack/providers/remote/inference/databricks/databricks.py index 0eaf0135b..a10878b27 100644 --- a/llama_stack/providers/remote/inference/databricks/databricks.py +++ b/llama_stack/providers/remote/inference/databricks/databricks.py @@ -34,6 +34,8 @@ from llama_stack.providers.utils.inference.model_registry import ( build_hf_repo_model_entry, ) from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -56,7 +58,12 @@ model_entries = [ ] -class DatabricksInferenceAdapter(ModelRegistryHelper, Inference): +class DatabricksInferenceAdapter( + ModelRegistryHelper, + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +): def __init__(self, config: DatabricksImplConfig) -> None: ModelRegistryHelper.__init__(self, model_entries=model_entries) self.config = config diff --git a/llama_stack/providers/remote/inference/fireworks/fireworks.py b/llama_stack/providers/remote/inference/fireworks/fireworks.py index 4acbe43f8..b59e9f2cb 100644 --- a/llama_stack/providers/remote/inference/fireworks/fireworks.py +++ b/llama_stack/providers/remote/inference/fireworks/fireworks.py @@ -4,9 +4,10 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import AsyncGenerator, List, Optional, Union +from typing import Any, AsyncGenerator, Dict, List, Optional, Union from fireworks.client import Fireworks +from openai import AsyncOpenAI from llama_stack.apis.common.content_types import ( InterleavedContent, @@ -31,6 +32,7 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.log import get_logger from llama_stack.providers.utils.inference.model_registry import ( @@ -39,6 +41,7 @@ from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.openai_compat import ( convert_message_to_openai_dict, get_sampling_options, + prepare_openai_completion_params, process_chat_completion_response, process_chat_completion_stream_response, process_completion_response, @@ -81,10 +84,16 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv ) return provider_data.fireworks_api_key + def _get_base_url(self) -> str: + return "https://api.fireworks.ai/inference/v1" + def _get_client(self) -> Fireworks: fireworks_api_key = self._get_api_key() return Fireworks(api_key=fireworks_api_key) + def _get_openai_client(self) -> AsyncOpenAI: + return AsyncOpenAI(base_url=self._get_base_url(), api_key=self._get_api_key()) + async def completion( self, model_id: str, @@ -268,3 +277,101 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv embeddings = [data.embedding for data in response.data] return EmbeddingsResponse(embeddings=embeddings) + + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + model_obj = await self.model_store.get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + ) + return await self._get_openai_client().completions.create(**params) + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + model_obj = await self.model_store.get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + return await self._get_openai_client().chat.completions.create(**params) diff --git a/llama_stack/providers/remote/inference/nvidia/nvidia.py b/llama_stack/providers/remote/inference/nvidia/nvidia.py index e1f5d7a6a..d6f717719 100644 --- a/llama_stack/providers/remote/inference/nvidia/nvidia.py +++ b/llama_stack/providers/remote/inference/nvidia/nvidia.py @@ -7,7 +7,7 @@ import logging import warnings from functools import lru_cache -from typing import AsyncIterator, List, Optional, Union +from typing import Any, AsyncIterator, Dict, List, Optional, Union from openai import APIConnectionError, AsyncOpenAI, BadRequestError @@ -35,6 +35,7 @@ from llama_stack.apis.inference import ( ToolConfig, ToolDefinition, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.models.llama.datatypes import ToolPromptFormat from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, @@ -42,6 +43,7 @@ from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.openai_compat import ( convert_openai_chat_completion_choice, convert_openai_chat_completion_stream, + prepare_openai_completion_params, ) from llama_stack.providers.utils.inference.prompt_adapter import content_has_media @@ -263,3 +265,111 @@ class NVIDIAInferenceAdapter(Inference, ModelRegistryHelper): else: # we pass n=1 to get only one completion return convert_openai_chat_completion_choice(response.choices[0]) + + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + provider_model_id = self.get_provider_model_id(model) + + params = await prepare_openai_completion_params( + model=provider_model_id, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + ) + + try: + return await self._get_client(provider_model_id).completions.create(**params) + except APIConnectionError as e: + raise ConnectionError(f"Failed to connect to NVIDIA NIM at {self._config.url}: {e}") from e + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + provider_model_id = self.get_provider_model_id(model) + + params = await prepare_openai_completion_params( + model=provider_model_id, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + + try: + return await self._get_client(provider_model_id).chat.completions.create(**params) + except APIConnectionError as e: + raise ConnectionError(f"Failed to connect to NVIDIA NIM at {self._config.url}: {e}") from e diff --git a/llama_stack/providers/remote/inference/ollama/ollama.py b/llama_stack/providers/remote/inference/ollama/ollama.py index 12902996b..b8671197e 100644 --- a/llama_stack/providers/remote/inference/ollama/ollama.py +++ b/llama_stack/providers/remote/inference/ollama/ollama.py @@ -5,10 +5,11 @@ # the root directory of this source tree. -from typing import Any, AsyncGenerator, List, Optional, Union +from typing import Any, AsyncGenerator, Dict, List, Optional, Union import httpx from ollama import AsyncClient +from openai import AsyncOpenAI from llama_stack.apis.common.content_types import ( ImageContentItem, @@ -38,6 +39,7 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.apis.models import Model, ModelType from llama_stack.log import get_logger from llama_stack.providers.datatypes import ModelsProtocolPrivate @@ -67,7 +69,10 @@ from .models import model_entries logger = get_logger(name=__name__, category="inference") -class OllamaInferenceAdapter(Inference, ModelsProtocolPrivate): +class OllamaInferenceAdapter( + Inference, + ModelsProtocolPrivate, +): def __init__(self, url: str) -> None: self.register_helper = ModelRegistryHelper(model_entries) self.url = url @@ -76,6 +81,10 @@ class OllamaInferenceAdapter(Inference, ModelsProtocolPrivate): def client(self) -> AsyncClient: return AsyncClient(host=self.url) + @property + def openai_client(self) -> AsyncOpenAI: + return AsyncOpenAI(base_url=f"{self.url}/v1", api_key="ollama") + async def initialize(self) -> None: logger.info(f"checking connectivity to Ollama at `{self.url}`...") try: @@ -319,6 +328,115 @@ class OllamaInferenceAdapter(Inference, ModelsProtocolPrivate): return model + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + if not isinstance(prompt, str): + raise ValueError("Ollama does not support non-string prompts for completion") + + model_obj = await self._get_model(model) + params = { + k: v + for k, v in { + "model": model_obj.provider_resource_id, + "prompt": prompt, + "best_of": best_of, + "echo": echo, + "frequency_penalty": frequency_penalty, + "logit_bias": logit_bias, + "logprobs": logprobs, + "max_tokens": max_tokens, + "n": n, + "presence_penalty": presence_penalty, + "seed": seed, + "stop": stop, + "stream": stream, + "stream_options": stream_options, + "temperature": temperature, + "top_p": top_p, + "user": user, + }.items() + if v is not None + } + return await self.openai_client.completions.create(**params) # type: ignore + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + model_obj = await self._get_model(model) + params = { + k: v + for k, v in { + "model": model_obj.provider_resource_id, + "messages": messages, + "frequency_penalty": frequency_penalty, + "function_call": function_call, + "functions": functions, + "logit_bias": logit_bias, + "logprobs": logprobs, + "max_completion_tokens": max_completion_tokens, + "max_tokens": max_tokens, + "n": n, + "parallel_tool_calls": parallel_tool_calls, + "presence_penalty": presence_penalty, + "response_format": response_format, + "seed": seed, + "stop": stop, + "stream": stream, + "stream_options": stream_options, + "temperature": temperature, + "tool_choice": tool_choice, + "tools": tools, + "top_logprobs": top_logprobs, + "top_p": top_p, + "user": user, + }.items() + if v is not None + } + return await self.openai_client.chat.completions.create(**params) # type: ignore + async def convert_message_to_openai_dict_for_ollama(message: Message) -> List[dict]: async def _convert_content(content) -> dict: diff --git a/llama_stack/providers/remote/inference/passthrough/passthrough.py b/llama_stack/providers/remote/inference/passthrough/passthrough.py index 96b2d73d8..0eb38c395 100644 --- a/llama_stack/providers/remote/inference/passthrough/passthrough.py +++ b/llama_stack/providers/remote/inference/passthrough/passthrough.py @@ -4,7 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import Any, AsyncGenerator, Dict, List, Optional +from typing import Any, AsyncGenerator, Dict, List, Optional, Union from llama_stack_client import AsyncLlamaStackClient @@ -26,9 +26,11 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.apis.models import Model from llama_stack.distribution.library_client import convert_pydantic_to_json_value, convert_to_pydantic from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper +from llama_stack.providers.utils.inference.openai_compat import prepare_openai_completion_params from .config import PassthroughImplConfig @@ -201,6 +203,112 @@ class PassthroughInferenceAdapter(Inference): task_type=task_type, ) + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + client = self._get_client() + model_obj = await self.model_store.get_model(model) + + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + guided_choice=guided_choice, + prompt_logprobs=prompt_logprobs, + ) + + return await client.inference.openai_completion(**params) + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + client = self._get_client() + model_obj = await self.model_store.get_model(model) + + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + + return await client.inference.openai_chat_completion(**params) + def cast_value_to_json_dict(self, request_params: Dict[str, Any]) -> Dict[str, Any]: json_params = {} for key, value in request_params.items(): diff --git a/llama_stack/providers/remote/inference/runpod/runpod.py b/llama_stack/providers/remote/inference/runpod/runpod.py index 72f858cd8..878460122 100644 --- a/llama_stack/providers/remote/inference/runpod/runpod.py +++ b/llama_stack/providers/remote/inference/runpod/runpod.py @@ -12,6 +12,8 @@ from llama_stack.apis.inference import * # noqa: F403 # from llama_stack.providers.datatypes import ModelsProtocolPrivate from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -38,7 +40,12 @@ RUNPOD_SUPPORTED_MODELS = { } -class RunpodInferenceAdapter(ModelRegistryHelper, Inference): +class RunpodInferenceAdapter( + ModelRegistryHelper, + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +): def __init__(self, config: RunpodImplConfig) -> None: ModelRegistryHelper.__init__(self, stack_to_provider_models_map=RUNPOD_SUPPORTED_MODELS) self.config = config diff --git a/llama_stack/providers/remote/inference/sambanova/sambanova.py b/llama_stack/providers/remote/inference/sambanova/sambanova.py index a3badd468..c503657eb 100644 --- a/llama_stack/providers/remote/inference/sambanova/sambanova.py +++ b/llama_stack/providers/remote/inference/sambanova/sambanova.py @@ -42,6 +42,8 @@ from llama_stack.apis.inference import ( ) from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, process_chat_completion_stream_response, ) from llama_stack.providers.utils.inference.prompt_adapter import ( @@ -52,7 +54,12 @@ from .config import SambaNovaImplConfig from .models import MODEL_ENTRIES -class SambaNovaInferenceAdapter(ModelRegistryHelper, Inference): +class SambaNovaInferenceAdapter( + ModelRegistryHelper, + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, +): def __init__(self, config: SambaNovaImplConfig) -> None: ModelRegistryHelper.__init__(self, model_entries=MODEL_ENTRIES) self.config = config diff --git a/llama_stack/providers/remote/inference/tgi/tgi.py b/llama_stack/providers/remote/inference/tgi/tgi.py index fe99fafe1..8f5b5e3cc 100644 --- a/llama_stack/providers/remote/inference/tgi/tgi.py +++ b/llama_stack/providers/remote/inference/tgi/tgi.py @@ -40,8 +40,10 @@ from llama_stack.providers.utils.inference.model_registry import ( build_hf_repo_model_entry, ) from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionUnsupportedMixin, OpenAICompatCompletionChoice, OpenAICompatCompletionResponse, + OpenAICompletionUnsupportedMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -69,7 +71,12 @@ def build_hf_repo_model_entries(): ] -class _HfAdapter(Inference, ModelsProtocolPrivate): +class _HfAdapter( + Inference, + OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionUnsupportedMixin, + ModelsProtocolPrivate, +): client: AsyncInferenceClient max_tokens: int model_id: str diff --git a/llama_stack/providers/remote/inference/together/together.py b/llama_stack/providers/remote/inference/together/together.py index df7610935..1615b8cd1 100644 --- a/llama_stack/providers/remote/inference/together/together.py +++ b/llama_stack/providers/remote/inference/together/together.py @@ -4,8 +4,9 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import AsyncGenerator, List, Optional, Union +from typing import Any, AsyncGenerator, Dict, List, Optional, Union +from openai import AsyncOpenAI from together import AsyncTogether from llama_stack.apis.common.content_types import ( @@ -30,12 +31,14 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.log import get_logger from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper from llama_stack.providers.utils.inference.openai_compat import ( convert_message_to_openai_dict, get_sampling_options, + prepare_openai_completion_params, process_chat_completion_response, process_chat_completion_stream_response, process_completion_response, @@ -60,6 +63,7 @@ class TogetherInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProvi ModelRegistryHelper.__init__(self, MODEL_ENTRIES) self.config = config self._client = None + self._openai_client = None async def initialize(self) -> None: pass @@ -110,6 +114,15 @@ class TogetherInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProvi self._client = AsyncTogether(api_key=together_api_key) return self._client + def _get_openai_client(self) -> AsyncOpenAI: + if not self._openai_client: + together_client = self._get_client().client + self._openai_client = AsyncOpenAI( + base_url=together_client.base_url, + api_key=together_client.api_key, + ) + return self._openai_client + async def _nonstream_completion(self, request: CompletionRequest) -> ChatCompletionResponse: params = await self._get_params(request) client = self._get_client() @@ -243,3 +256,101 @@ class TogetherInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProvi ) embeddings = [item.embedding for item in r.data] return EmbeddingsResponse(embeddings=embeddings) + + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + model_obj = await self.model_store.get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + ) + return await self._get_openai_client().completions.create(**params) # type: ignore + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + model_obj = await self.model_store.get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + return await self._get_openai_client().chat.completions.create(**params) # type: ignore diff --git a/llama_stack/providers/remote/inference/vllm/vllm.py b/llama_stack/providers/remote/inference/vllm/vllm.py index 6a828322f..79f92adce 100644 --- a/llama_stack/providers/remote/inference/vllm/vllm.py +++ b/llama_stack/providers/remote/inference/vllm/vllm.py @@ -5,7 +5,7 @@ # the root directory of this source tree. import json import logging -from typing import Any, AsyncGenerator, List, Optional, Union +from typing import Any, AsyncGenerator, Dict, List, Optional, Union import httpx from openai import AsyncOpenAI @@ -45,6 +45,7 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.apis.models import Model, ModelType from llama_stack.models.llama.datatypes import BuiltinTool, StopReason, ToolCall from llama_stack.models.llama.sku_list import all_registered_models @@ -58,6 +59,7 @@ from llama_stack.providers.utils.inference.openai_compat import ( convert_message_to_openai_dict, convert_tool_call, get_sampling_options, + prepare_openai_completion_params, process_chat_completion_stream_response, process_completion_response, process_completion_stream_response, @@ -418,3 +420,109 @@ class VLLMInferenceAdapter(Inference, ModelsProtocolPrivate): embeddings = [data.embedding for data in response.data] return EmbeddingsResponse(embeddings=embeddings) + + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + model_obj = await self._get_model(model) + + extra_body: Dict[str, Any] = {} + if prompt_logprobs is not None and prompt_logprobs >= 0: + extra_body["prompt_logprobs"] = prompt_logprobs + if guided_choice: + extra_body["guided_choice"] = guided_choice + + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + extra_body=extra_body, + ) + return await self.client.completions.create(**params) # type: ignore + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + model_obj = await self._get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + return await self.client.chat.completions.create(**params) # type: ignore diff --git a/llama_stack/providers/utils/inference/litellm_openai_mixin.py b/llama_stack/providers/utils/inference/litellm_openai_mixin.py index bd1eb3978..2d2f0400a 100644 --- a/llama_stack/providers/utils/inference/litellm_openai_mixin.py +++ b/llama_stack/providers/utils/inference/litellm_openai_mixin.py @@ -4,7 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import AsyncGenerator, AsyncIterator, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union import litellm @@ -30,6 +30,7 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.apis.models.models import Model from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.log import get_logger @@ -40,6 +41,7 @@ from llama_stack.providers.utils.inference.openai_compat import ( convert_openai_chat_completion_stream, convert_tooldef_to_openai_tool, get_sampling_options, + prepare_openai_completion_params, ) from llama_stack.providers.utils.inference.prompt_adapter import ( interleaved_content_as_str, @@ -245,3 +247,103 @@ class LiteLLMOpenAIMixin( embeddings = [data["embedding"] for data in response["data"]] return EmbeddingsResponse(embeddings=embeddings) + + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + model_obj = await self._get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + prompt=prompt, + best_of=best_of, + echo=echo, + frequency_penalty=frequency_penalty, + logit_bias=logit_bias, + logprobs=logprobs, + max_tokens=max_tokens, + n=n, + presence_penalty=presence_penalty, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + top_p=top_p, + user=user, + guided_choice=guided_choice, + prompt_logprobs=prompt_logprobs, + ) + return litellm.text_completion(**params) + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + model_obj = await self._get_model(model) + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id, + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + return litellm.completion(**params) diff --git a/llama_stack/providers/utils/inference/openai_compat.py b/llama_stack/providers/utils/inference/openai_compat.py index 0f3945b34..f33cb4443 100644 --- a/llama_stack/providers/utils/inference/openai_compat.py +++ b/llama_stack/providers/utils/inference/openai_compat.py @@ -5,8 +5,10 @@ # the root directory of this source tree. import json import logging +import time +import uuid import warnings -from typing import AsyncGenerator, Dict, Iterable, List, Optional, Union +from typing import Any, AsyncGenerator, Dict, Iterable, List, Optional, Union from openai import AsyncStream from openai.types.chat import ( @@ -83,6 +85,7 @@ from llama_stack.apis.inference import ( TopPSamplingStrategy, UserMessage, ) +from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAICompletionChoice from llama_stack.models.llama.datatypes import ( BuiltinTool, StopReason, @@ -843,6 +846,31 @@ def _convert_openai_logprobs( ] +def _convert_openai_sampling_params( + max_tokens: Optional[int] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, +) -> SamplingParams: + sampling_params = SamplingParams() + + if max_tokens: + sampling_params.max_tokens = max_tokens + + # Map an explicit temperature of 0 to greedy sampling + if temperature == 0: + strategy = GreedySamplingStrategy() + else: + # OpenAI defaults to 1.0 for temperature and top_p if unset + if temperature is None: + temperature = 1.0 + if top_p is None: + top_p = 1.0 + strategy = TopPSamplingStrategy(temperature=temperature, top_p=top_p) + + sampling_params.strategy = strategy + return sampling_params + + def convert_openai_chat_completion_choice( choice: OpenAIChoice, ) -> ChatCompletionResponse: @@ -1049,3 +1077,106 @@ async def convert_openai_chat_completion_stream( stop_reason=stop_reason, ) ) + + +async def prepare_openai_completion_params(**params): + completion_params = {k: v for k, v in params.items() if v is not None} + return completion_params + + +class OpenAICompletionUnsupportedMixin: + async def openai_completion( + self, + model: str, + prompt: Union[str, List[str], List[int], List[List[int]]], + best_of: Optional[int] = None, + echo: Optional[bool] = None, + frequency_penalty: Optional[float] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + presence_penalty: Optional[float] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + guided_choice: Optional[List[str]] = None, + prompt_logprobs: Optional[int] = None, + ) -> OpenAICompletion: + if stream: + raise ValueError(f"{self.__class__.__name__} doesn't support streaming openai completions") + + # This is a pretty hacky way to do emulate completions - + # basically just de-batches them... + prompts = [prompt] if not isinstance(prompt, list) else prompt + + sampling_params = _convert_openai_sampling_params( + max_tokens=max_tokens, + temperature=temperature, + top_p=top_p, + ) + + choices = [] + # "n" is the number of completions to generate per prompt + for _i in range(0, n): + # and we may have multiple prompts, if batching was used + + for prompt in prompts: + result = self.completion( + model_id=model, + content=prompt, + sampling_params=sampling_params, + ) + + index = len(choices) + text = result.content + finish_reason = _convert_openai_finish_reason(result.stop_reason) + + choice = OpenAICompletionChoice( + index=index, + text=text, + finish_reason=finish_reason, + ) + choices.append(choice) + + return OpenAICompletion( + id=f"cmpl-{uuid.uuid4()}", + choices=choices, + created=int(time.time()), + model=model, + object="text_completion", + ) + + +class OpenAIChatCompletionUnsupportedMixin: + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIChatCompletionMessage], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[Dict[str, str]] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> OpenAIChatCompletion: + raise ValueError(f"{self.__class__.__name__} doesn't support openai chat completion") diff --git a/pyproject.toml b/pyproject.toml index 83260b681..9ef3abe68 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -28,6 +28,7 @@ dependencies = [ "jinja2>=3.1.6", "jsonschema", "llama-stack-client>=0.2.1", + "openai>=1.66", "prompt-toolkit", "python-dotenv", "pydantic>=2", diff --git a/requirements.txt b/requirements.txt index 6645e4e36..ef5782905 100644 --- a/requirements.txt +++ b/requirements.txt @@ -19,6 +19,7 @@ httpx==0.28.1 huggingface-hub==0.29.0 idna==3.10 jinja2==3.1.6 +jiter==0.8.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 llama-stack-client==0.2.1 @@ -27,6 +28,7 @@ markdown-it-py==3.0.0 markupsafe==3.0.2 mdurl==0.1.2 numpy==2.2.3 +openai==1.71.0 packaging==24.2 pandas==2.2.3 pillow==11.1.0 diff --git a/tests/integration/inference/test_openai_completion.py b/tests/integration/inference/test_openai_completion.py new file mode 100644 index 000000000..0905d5817 --- /dev/null +++ b/tests/integration/inference/test_openai_completion.py @@ -0,0 +1,216 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + + +import pytest +from openai import OpenAI + +from llama_stack.distribution.library_client import LlamaStackAsLibraryClient + +from ..test_cases.test_case import TestCase + + +def provider_from_model(client_with_models, model_id): + models = {m.identifier: m for m in client_with_models.models.list()} + models.update({m.provider_resource_id: m for m in client_with_models.models.list()}) + provider_id = models[model_id].provider_id + providers = {p.provider_id: p for p in client_with_models.providers.list()} + return providers[provider_id] + + +def skip_if_model_doesnt_support_openai_completion(client_with_models, model_id): + if isinstance(client_with_models, LlamaStackAsLibraryClient): + pytest.skip("OpenAI completions are not supported when testing with library client yet.") + + provider = provider_from_model(client_with_models, model_id) + if provider.provider_type in ( + "inline::meta-reference", + "inline::sentence-transformers", + "inline::vllm", + "remote::bedrock", + "remote::cerebras", + "remote::databricks", + # Technically Nvidia does support OpenAI completions, but none of their hosted models + # support both completions and chat completions endpoint and all the Llama models are + # just chat completions + "remote::nvidia", + "remote::runpod", + "remote::sambanova", + "remote::tgi", + ): + pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support OpenAI completions.") + + +def skip_if_model_doesnt_support_openai_chat_completion(client_with_models, model_id): + if isinstance(client_with_models, LlamaStackAsLibraryClient): + pytest.skip("OpenAI chat completions are not supported when testing with library client yet.") + + provider = provider_from_model(client_with_models, model_id) + if provider.provider_type in ( + "inline::meta-reference", + "inline::sentence-transformers", + "inline::vllm", + "remote::bedrock", + "remote::cerebras", + "remote::databricks", + "remote::runpod", + "remote::sambanova", + "remote::tgi", + ): + pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support OpenAI chat completions.") + + +def skip_if_provider_isnt_vllm(client_with_models, model_id): + provider = provider_from_model(client_with_models, model_id) + if provider.provider_type != "remote::vllm": + pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support vllm extra_body parameters.") + + +@pytest.fixture +def openai_client(client_with_models): + base_url = f"{client_with_models.base_url}/v1/openai/v1" + return OpenAI(base_url=base_url, api_key="bar") + + +@pytest.mark.parametrize( + "test_case", + [ + "inference:completion:sanity", + ], +) +def test_openai_completion_non_streaming(openai_client, client_with_models, text_model_id, test_case): + skip_if_model_doesnt_support_openai_completion(client_with_models, text_model_id) + tc = TestCase(test_case) + + # ollama needs more verbose prompting for some reason here... + prompt = "Respond to this question and explain your answer. " + tc["content"] + response = openai_client.completions.create( + model=text_model_id, + prompt=prompt, + stream=False, + ) + assert len(response.choices) > 0 + choice = response.choices[0] + assert len(choice.text) > 10 + + +@pytest.mark.parametrize( + "test_case", + [ + "inference:completion:sanity", + ], +) +def test_openai_completion_streaming(openai_client, client_with_models, text_model_id, test_case): + skip_if_model_doesnt_support_openai_completion(client_with_models, text_model_id) + tc = TestCase(test_case) + + # ollama needs more verbose prompting for some reason here... + prompt = "Respond to this question and explain your answer. " + tc["content"] + response = openai_client.completions.create( + model=text_model_id, + prompt=prompt, + stream=True, + max_tokens=50, + ) + streamed_content = [chunk.choices[0].text for chunk in response] + content_str = "".join(streamed_content).lower().strip() + assert len(content_str) > 10 + + +@pytest.mark.parametrize( + "prompt_logprobs", + [ + 1, + 0, + ], +) +def test_openai_completion_prompt_logprobs(openai_client, client_with_models, text_model_id, prompt_logprobs): + skip_if_provider_isnt_vllm(client_with_models, text_model_id) + + prompt = "Hello, world!" + response = openai_client.completions.create( + model=text_model_id, + prompt=prompt, + stream=False, + extra_body={ + "prompt_logprobs": prompt_logprobs, + }, + ) + assert len(response.choices) > 0 + choice = response.choices[0] + assert len(choice.prompt_logprobs) > 0 + + +def test_openai_completion_guided_choice(openai_client, client_with_models, text_model_id): + skip_if_provider_isnt_vllm(client_with_models, text_model_id) + + prompt = "I am feeling really sad today." + response = openai_client.completions.create( + model=text_model_id, + prompt=prompt, + stream=False, + extra_body={ + "guided_choice": ["joy", "sadness"], + }, + ) + assert len(response.choices) > 0 + choice = response.choices[0] + assert choice.text in ["joy", "sadness"] + + +@pytest.mark.parametrize( + "test_case", + [ + "inference:chat_completion:non_streaming_01", + "inference:chat_completion:non_streaming_02", + ], +) +def test_openai_chat_completion_non_streaming(openai_client, client_with_models, text_model_id, test_case): + skip_if_model_doesnt_support_openai_chat_completion(client_with_models, text_model_id) + tc = TestCase(test_case) + question = tc["question"] + expected = tc["expected"] + + response = openai_client.chat.completions.create( + model=text_model_id, + messages=[ + { + "role": "user", + "content": question, + } + ], + stream=False, + ) + message_content = response.choices[0].message.content.lower().strip() + assert len(message_content) > 0 + assert expected.lower() in message_content + + +@pytest.mark.parametrize( + "test_case", + [ + "inference:chat_completion:streaming_01", + "inference:chat_completion:streaming_02", + ], +) +def test_openai_chat_completion_streaming(openai_client, client_with_models, text_model_id, test_case): + skip_if_model_doesnt_support_openai_chat_completion(client_with_models, text_model_id) + tc = TestCase(test_case) + question = tc["question"] + expected = tc["expected"] + + response = openai_client.chat.completions.create( + model=text_model_id, + messages=[{"role": "user", "content": question}], + stream=True, + timeout=120, # Increase timeout to 2 minutes for large conversation history + ) + streamed_content = [] + for chunk in response: + if chunk.choices[0].delta.content: + streamed_content.append(chunk.choices[0].delta.content.lower().strip()) + assert len(streamed_content) > 0 + assert expected.lower() in "".join(streamed_content) diff --git a/uv.lock b/uv.lock index 1f7adea82..c6c9b1004 100644 --- a/uv.lock +++ b/uv.lock @@ -1384,6 +1384,7 @@ dependencies = [ { name = "jinja2" }, { name = "jsonschema" }, { name = "llama-stack-client" }, + { name = "openai" }, { name = "pillow" }, { name = "prompt-toolkit" }, { name = "pydantic" }, @@ -1485,6 +1486,7 @@ requires-dist = [ { name = "mcp", marker = "extra == 'test'" }, { name = "myst-parser", marker = "extra == 'docs'" }, { name = "nbval", marker = "extra == 'dev'" }, + { name = "openai", specifier = ">=1.66" }, { name = "openai", marker = "extra == 'test'" }, { name = "openai", marker = "extra == 'unit'" }, { name = "opentelemetry-exporter-otlp-proto-http", marker = "extra == 'test'" }, @@ -2016,7 +2018,7 @@ wheels = [ [[package]] name = "openai" -version = "1.63.2" +version = "1.71.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio" }, @@ -2028,9 +2030,9 @@ dependencies = [ { name = "tqdm" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/e6/1c/11b520deb71f9ea54ced3c52cd6a5f7131215deba63ad07f23982e328141/openai-1.63.2.tar.gz", hash = "sha256:aeabeec984a7d2957b4928ceaa339e2ead19c61cfcf35ae62b7c363368d26360", size = 356902 } +sdist = { url = "https://files.pythonhosted.org/packages/d9/19/b8f0347090a649dce55a008ec54ac6abb50553a06508cdb5e7abb2813e99/openai-1.71.0.tar.gz", hash = "sha256:52b20bb990a1780f9b0b8ccebac93416343ebd3e4e714e3eff730336833ca207", size = 409926 } wheels = [ - { url = "https://files.pythonhosted.org/packages/15/64/db3462b358072387b8e93e6e6a38d3c741a17b4a84171ef01d6c85c63f25/openai-1.63.2-py3-none-any.whl", hash = "sha256:1f38b27b5a40814c2b7d8759ec78110df58c4a614c25f182809ca52b080ff4d4", size = 472282 }, + { url = "https://files.pythonhosted.org/packages/c4/f7/049e85faf6a000890e5ca0edca8e9183f8a43c9e7bba869cad871da0caba/openai-1.71.0-py3-none-any.whl", hash = "sha256:e1c643738f1fff1af52bce6ef06a7716c95d089281e7011777179614f32937aa", size = 598975 }, ] [[package]] From ed58a94b30d886abdc4287fbcf0090ebd97c57a3 Mon Sep 17 00:00:00 2001 From: raghotham Date: Fri, 11 Apr 2025 13:41:23 -0700 Subject: [PATCH 18/39] docs: fixes to quick start (#1943) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) --------- Co-authored-by: Francisco Arceo --- .../remote_hosted_distro/nvidia.md | 88 ------------------- .../self_hosted_distro/nvidia.md | 35 +++++++- .../getting_started/detailed_tutorial.md | 2 +- docs/source/getting_started/index.md | 64 +++++++++----- docs/source/index.md | 3 +- 5 files changed, 76 insertions(+), 116 deletions(-) delete mode 100644 docs/source/distributions/remote_hosted_distro/nvidia.md diff --git a/docs/source/distributions/remote_hosted_distro/nvidia.md b/docs/source/distributions/remote_hosted_distro/nvidia.md deleted file mode 100644 index 58731392d..000000000 --- a/docs/source/distributions/remote_hosted_distro/nvidia.md +++ /dev/null @@ -1,88 +0,0 @@ - -# NVIDIA Distribution - -The `llamastack/distribution-nvidia` distribution consists of the following provider configurations. - -| API | Provider(s) | -|-----|-------------| -| agents | `inline::meta-reference` | -| datasetio | `inline::localfs` | -| eval | `inline::meta-reference` | -| inference | `remote::nvidia` | -| post_training | `remote::nvidia` | -| safety | `remote::nvidia` | -| scoring | `inline::basic` | -| telemetry | `inline::meta-reference` | -| tool_runtime | `inline::rag-runtime` | -| vector_io | `inline::faiss` | - - -### Environment Variables - -The following environment variables can be configured: - -- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``) -- `NVIDIA_USER_ID`: NVIDIA User ID (default: `llama-stack-user`) -- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`) -- `NVIDIA_ACCESS_POLICIES`: NVIDIA Access Policies (default: `{}`) -- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`) -- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`) -- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`) -- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`) -- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`) -- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`) - -### Models - -The following models are available by default: - -- `meta/llama3-8b-instruct (aliases: meta-llama/Llama-3-8B-Instruct)` -- `meta/llama3-70b-instruct (aliases: meta-llama/Llama-3-70B-Instruct)` -- `meta/llama-3.1-8b-instruct (aliases: meta-llama/Llama-3.1-8B-Instruct)` -- `meta/llama-3.1-70b-instruct (aliases: meta-llama/Llama-3.1-70B-Instruct)` -- `meta/llama-3.1-405b-instruct (aliases: meta-llama/Llama-3.1-405B-Instruct-FP8)` -- `meta/llama-3.2-1b-instruct (aliases: meta-llama/Llama-3.2-1B-Instruct)` -- `meta/llama-3.2-3b-instruct (aliases: meta-llama/Llama-3.2-3B-Instruct)` -- `meta/llama-3.2-11b-vision-instruct (aliases: meta-llama/Llama-3.2-11B-Vision-Instruct)` -- `meta/llama-3.2-90b-vision-instruct (aliases: meta-llama/Llama-3.2-90B-Vision-Instruct)` -- `nvidia/llama-3.2-nv-embedqa-1b-v2 ` -- `nvidia/nv-embedqa-e5-v5 ` -- `nvidia/nv-embedqa-mistral-7b-v2 ` -- `snowflake/arctic-embed-l ` - - -### Prerequisite: API Keys - -Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). - - -## Running Llama Stack with NVIDIA - -You can do this via Conda (build code) or Docker which has a pre-built image. - -### Via Docker - -This method allows you to get started quickly without having to build the distribution code. - -```bash -LLAMA_STACK_PORT=8321 -docker run \ - -it \ - --pull always \ - -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ - -v ./run.yaml:/root/my-run.yaml \ - llamastack/distribution-nvidia \ - --yaml-config /root/my-run.yaml \ - --port $LLAMA_STACK_PORT \ - --env NVIDIA_API_KEY=$NVIDIA_API_KEY -``` - -### Via Conda - -```bash -llama stack build --template nvidia --image-type conda -llama stack run ./run.yaml \ - --port 8321 \ - --env NVIDIA_API_KEY=$NVIDIA_API_KEY - --env INFERENCE_MODEL=$INFERENCE_MODEL -``` diff --git a/docs/source/distributions/self_hosted_distro/nvidia.md b/docs/source/distributions/self_hosted_distro/nvidia.md index 0c0801f89..58731392d 100644 --- a/docs/source/distributions/self_hosted_distro/nvidia.md +++ b/docs/source/distributions/self_hosted_distro/nvidia.md @@ -1,3 +1,4 @@ + # NVIDIA Distribution The `llamastack/distribution-nvidia` distribution consists of the following provider configurations. @@ -5,24 +6,49 @@ The `llamastack/distribution-nvidia` distribution consists of the following prov | API | Provider(s) | |-----|-------------| | agents | `inline::meta-reference` | +| datasetio | `inline::localfs` | +| eval | `inline::meta-reference` | | inference | `remote::nvidia` | -| memory | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | -| safety | `inline::llama-guard` | +| post_training | `remote::nvidia` | +| safety | `remote::nvidia` | +| scoring | `inline::basic` | | telemetry | `inline::meta-reference` | +| tool_runtime | `inline::rag-runtime` | +| vector_io | `inline::faiss` | ### Environment Variables The following environment variables can be configured: -- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `8321`) - `NVIDIA_API_KEY`: NVIDIA API Key (default: ``) +- `NVIDIA_USER_ID`: NVIDIA User ID (default: `llama-stack-user`) +- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`) +- `NVIDIA_ACCESS_POLICIES`: NVIDIA Access Policies (default: `{}`) +- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`) +- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`) +- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`) +- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`) +- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`) +- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`) ### Models The following models are available by default: -- `${env.INFERENCE_MODEL} (None)` +- `meta/llama3-8b-instruct (aliases: meta-llama/Llama-3-8B-Instruct)` +- `meta/llama3-70b-instruct (aliases: meta-llama/Llama-3-70B-Instruct)` +- `meta/llama-3.1-8b-instruct (aliases: meta-llama/Llama-3.1-8B-Instruct)` +- `meta/llama-3.1-70b-instruct (aliases: meta-llama/Llama-3.1-70B-Instruct)` +- `meta/llama-3.1-405b-instruct (aliases: meta-llama/Llama-3.1-405B-Instruct-FP8)` +- `meta/llama-3.2-1b-instruct (aliases: meta-llama/Llama-3.2-1B-Instruct)` +- `meta/llama-3.2-3b-instruct (aliases: meta-llama/Llama-3.2-3B-Instruct)` +- `meta/llama-3.2-11b-vision-instruct (aliases: meta-llama/Llama-3.2-11B-Vision-Instruct)` +- `meta/llama-3.2-90b-vision-instruct (aliases: meta-llama/Llama-3.2-90B-Vision-Instruct)` +- `nvidia/llama-3.2-nv-embedqa-1b-v2 ` +- `nvidia/nv-embedqa-e5-v5 ` +- `nvidia/nv-embedqa-mistral-7b-v2 ` +- `snowflake/arctic-embed-l ` ### Prerequisite: API Keys @@ -58,4 +84,5 @@ llama stack build --template nvidia --image-type conda llama stack run ./run.yaml \ --port 8321 \ --env NVIDIA_API_KEY=$NVIDIA_API_KEY + --env INFERENCE_MODEL=$INFERENCE_MODEL ``` diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md index 911b35437..610c0cad5 100644 --- a/docs/source/getting_started/detailed_tutorial.md +++ b/docs/source/getting_started/detailed_tutorial.md @@ -536,6 +536,6 @@ uv run python rag_agent.py :::: -## You're Ready to Build Your Own Apps! +**You're Ready to Build Your Own Apps!** Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/index)! 🚀 diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index ce7dbe973..e084f68b7 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -8,20 +8,20 @@ environments. You can build and test using a local server first and deploy to a In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/) as the inference [provider](../providers/index.md#inference) for a Llama Model. -## Step 1. Install and Setup -Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with -[Ollama](https://ollama.com/download). +#### Step 1: Install and setup +1. Install [uv](https://docs.astral.sh/uv/) +2. Run inference on a Llama model with [Ollama](https://ollama.com/download) ```bash -uv pip install llama-stack -source .venv/bin/activate ollama run llama3.2:3b --keepalive 60m ``` -## Step 2: Run the Llama Stack Server +#### Step 2: Run the Llama Stack server +We will use `uv` to run the Llama Stack server. ```bash -INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run +INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --run ``` -## Step 3: Run the Demo -Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell. +#### Step 3: Run the demo +Now open up a new terminal and copy the following script into a file named `demo_script.py`. + ```python from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient @@ -43,9 +43,11 @@ _ = client.vector_dbs.register( embedding_dimension=embedding_dimension, provider_id="faiss", ) +source = "https://www.paulgraham.com/greatwork.html" +print("rag_tool> Ingesting document:", source) document = RAGDocument( document_id="document_1", - content="https://www.paulgraham.com/greatwork.html", + content=source, mime_type="text/html", metadata={}, ) @@ -66,19 +68,44 @@ agent = Agent( ], ) +prompt = "How do you do great work?" +print("prompt>", prompt) + response = agent.create_turn( - messages=[{"role": "user", "content": "How do you do great work?"}], + messages=[{"role": "user", "content": prompt}], session_id=agent.create_session("rag_session"), + stream=True, ) for log in AgentEventLogger().log(response): log.print() ``` +We will use `uv` to run the script +``` +uv run --with llama-stack-client demo_script.py +``` And you should see output like below. -```bash -inference> [knowledge_search(query="What does it mean to do great work")] -tool_execution> Tool:knowledge_search Args:{'query': 'What does it mean to do great work'} -tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='Result 2:\nDocument_id:docum\nContent: [1]\nI don\'t think you could give a precise definition of what\ncounts as great work. Doing great work means doing something important\nso well\n', type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: . And if so\nyou're already further along than you might realize, because the\nset of people willing to want to is small.

The factors in doing great work are factors in the literal,\nmathematical sense, and\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: \nincreases your morale and helps you do even better work. But this\ncycle also operates in the other direction: if you're not doing\ngood work, that can demoralize you and make it even harder to. Since\nit matters\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: to try to do\ngreat work. But that's what's going on subconsciously; they shy\naway from the question.

So I'm going to pull a sneaky trick on you. Do you want to do great\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] +``` +rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html + +prompt> How do you do great work? + +inference> [knowledge_search(query="What is the key to doing great work")] + +tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'} + +tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 2:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] + +inference> Based on the search results, it seems that doing great work means doing something important so well that you expand people's ideas of what's possible. However, there is no clear threshold for importance, and it can be difficult to judge at the time. + +To further clarify, I would suggest that doing great work involves: + +* Completing tasks with high quality and attention to detail +* Expanding on existing knowledge or ideas +* Making a positive impact on others through your work +* Striving for excellence and continuous improvement + +Ultimately, great work is about making a meaningful contribution and leaving a lasting impression. ``` Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳 @@ -92,10 +119,3 @@ Now you're ready to dive deeper into Llama Stack! - Discover how to [Build Llama Stacks](../distributions/index.md). - Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK. - Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials. - -```{toctree} -:maxdepth: 0 -:hidden: - -detailed_tutorial -``` diff --git a/docs/source/index.md b/docs/source/index.md index 99b0e1a3e..0c2d5a015 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -99,8 +99,9 @@ A number of "adapters" are available for some popular Inference and Vector Store :maxdepth: 3 self -introduction/index getting_started/index +getting_started/detailed_tutorial +introduction/index concepts/index providers/index distributions/index From 51492bd9b6d0f7342677b29c53629dd23d53b027 Mon Sep 17 00:00:00 2001 From: Aidan Reilly <74046732+aireilly@users.noreply.github.com> Date: Sat, 12 Apr 2025 00:26:17 +0100 Subject: [PATCH 19/39] docs: Update docs and fix warning in start-stack.sh (#1937) Small docs update and an update for `start-stack.sh` with missing color and if statment logic. # What does this PR do? 1. Makes a small change to start-stack.sh to resolve this error: ```cmd /home/aireilly/.local/lib/python3.13/site-packages/llama_stack/distribution/start_stack.sh: line 76: [: missing ]' ``` 2. Adds a missing $GREEN colour to start-stack.sh 3. Updated `docs/source/getting_started/detailed_tutorial.md` with some small changes and corrections. ## Test Plan Procedures described in `docs/source/getting_started/detailed_tutorial.md` were verified on Linux Fedora 41. --- docs/source/getting_started/detailed_tutorial.md | 6 +++--- llama_stack/distribution/start_stack.sh | 3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md index 610c0cad5..a1504f249 100644 --- a/docs/source/getting_started/detailed_tutorial.md +++ b/docs/source/getting_started/detailed_tutorial.md @@ -69,7 +69,7 @@ which defines the providers and their settings. Now let's build and run the Llama Stack config for Ollama. ```bash -INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type conda --run +INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type conda --image-name llama3-3b-conda --run ``` ::: :::{tab-item} Using a Container @@ -77,10 +77,9 @@ You can use a container image to run the Llama Stack server. We provide several component that works with different inference providers out of the box. For this guide, we will use `llamastack/distribution-ollama` as the container image. If you'd like to build your own image or customize the configurations, please check out [this guide](../references/index.md). - First lets setup some environment variables and create a local directory to mount into the container’s file system. ```bash -export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" +export INFERENCE_MODEL="llama3.2:3b" export LLAMA_STACK_PORT=8321 mkdir -p ~/.llama ``` @@ -223,6 +222,7 @@ Other SDKs are also available, please refer to the [Client SDK](../index.md#clie Now you can run inference using the Llama Stack client SDK. ### i. Create the Script + Create a file `inference.py` and add the following code: ```python from llama_stack_client import LlamaStackClient diff --git a/llama_stack/distribution/start_stack.sh b/llama_stack/distribution/start_stack.sh index 964fcfaf7..d3e13c7dc 100755 --- a/llama_stack/distribution/start_stack.sh +++ b/llama_stack/distribution/start_stack.sh @@ -18,6 +18,7 @@ VIRTUAL_ENV=${VIRTUAL_ENV:-} set -euo pipefail RED='\033[0;31m' +GREEN='\033[0;32m' NC='\033[0m' # No Color error_handler() { @@ -73,7 +74,7 @@ done PYTHON_BINARY="python" case "$env_type" in "venv") - if [ -n "$VIRTUAL_ENV" && "$VIRTUAL_ENV" == "$env_path_or_name" ]; then + if [ -n "$VIRTUAL_ENV" ] && [ "$VIRTUAL_ENV" == "$env_path_or_name" ]; then echo -e "${GREEN}Virtual environment already activated${NC}" >&2 else # Activate virtual environment From 70a7e4d51e3341942699bc6d027d0346bc53952b Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Fri, 11 Apr 2025 20:30:44 -0700 Subject: [PATCH 20/39] fix: unhide python_start, python_end --- llama_stack/models/llama/llama4/tokenizer.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llama_stack/models/llama/llama4/tokenizer.py b/llama_stack/models/llama/llama4/tokenizer.py index 8eabc3205..0d2cc7ce5 100644 --- a/llama_stack/models/llama/llama4/tokenizer.py +++ b/llama_stack/models/llama/llama4/tokenizer.py @@ -56,8 +56,8 @@ LLAMA4_TEXT_POST_TRAIN_SPECIAL_TOKENS = [ "<|text_post_train_reserved_special_token_3|>", "<|text_post_train_reserved_special_token_4|>", "<|text_post_train_reserved_special_token_5|>", - "<|text_post_train_reserved_special_token_6|>", - "<|text_post_train_reserved_special_token_7|>", + "<|python_start|>", + "<|python_end|>", "<|finetune_right_pad|>", ] + get_reserved_special_tokens( "text_post_train", 61, 8 From 0751a960a518785a821407bee4b855fbf56e88cb Mon Sep 17 00:00:00 2001 From: Charlie Doern Date: Sat, 12 Apr 2025 04:13:45 -0400 Subject: [PATCH 21/39] feat: make training config fields optional (#1861) # What does this PR do? Today, supervised_fine_tune itself and the `TrainingConfig` class have a bunch of required fields that a provider implementation might not need. for example, if a provider wants to handle hyperparameters in its configuration as well as any type of dataset retrieval, optimizer or LoRA config, a user will still need to pass in a virtually empty `DataConfig`, `OptimizerConfig` and `AlgorithmConfig` in some cases. Many of these fields are intended to work specifically with llama models and knobs intended for customizing inline. Adding remote post_training providers will require loosening these arguments, or forcing users to pass in empty objects to satisfy the pydantic models. Signed-off-by: Charlie Doern --- docs/_static/llama-stack-spec.html | 17 ++++++++--------- docs/_static/llama-stack-spec.yaml | 7 +++---- llama_stack/apis/post_training/post_training.py | 16 ++++++++-------- .../recipes/lora_finetuning_single_device.py | 10 ++++++++++ 4 files changed, 29 insertions(+), 21 deletions(-) diff --git a/docs/_static/llama-stack-spec.html b/docs/_static/llama-stack-spec.html index 36bfad49e..cdd6b3b53 100644 --- a/docs/_static/llama-stack-spec.html +++ b/docs/_static/llama-stack-spec.html @@ -9778,13 +9778,16 @@ "type": "integer" }, "max_steps_per_epoch": { - "type": "integer" + "type": "integer", + "default": 1 }, "gradient_accumulation_steps": { - "type": "integer" + "type": "integer", + "default": 1 }, "max_validation_steps": { - "type": "integer" + "type": "integer", + "default": 1 }, "data_config": { "$ref": "#/components/schemas/DataConfig" @@ -9804,10 +9807,7 @@ "required": [ "n_epochs", "max_steps_per_epoch", - "gradient_accumulation_steps", - "max_validation_steps", - "data_config", - "optimizer_config" + "gradient_accumulation_steps" ], "title": "TrainingConfig" }, @@ -10983,8 +10983,7 @@ "job_uuid", "training_config", "hyperparam_search_config", - "logger_config", - "model" + "logger_config" ], "title": "SupervisedFineTuneRequest" }, diff --git a/docs/_static/llama-stack-spec.yaml b/docs/_static/llama-stack-spec.yaml index 82faf450a..aa8d9456e 100644 --- a/docs/_static/llama-stack-spec.yaml +++ b/docs/_static/llama-stack-spec.yaml @@ -6744,10 +6744,13 @@ components: type: integer max_steps_per_epoch: type: integer + default: 1 gradient_accumulation_steps: type: integer + default: 1 max_validation_steps: type: integer + default: 1 data_config: $ref: '#/components/schemas/DataConfig' optimizer_config: @@ -6762,9 +6765,6 @@ components: - n_epochs - max_steps_per_epoch - gradient_accumulation_steps - - max_validation_steps - - data_config - - optimizer_config title: TrainingConfig PreferenceOptimizeRequest: type: object @@ -7498,7 +7498,6 @@ components: - training_config - hyperparam_search_config - logger_config - - model title: SupervisedFineTuneRequest SyntheticDataGenerateRequest: type: object diff --git a/llama_stack/apis/post_training/post_training.py b/llama_stack/apis/post_training/post_training.py index d49668e23..e5f1bcb65 100644 --- a/llama_stack/apis/post_training/post_training.py +++ b/llama_stack/apis/post_training/post_training.py @@ -60,11 +60,11 @@ class EfficiencyConfig(BaseModel): @json_schema_type class TrainingConfig(BaseModel): n_epochs: int - max_steps_per_epoch: int - gradient_accumulation_steps: int - max_validation_steps: int - data_config: DataConfig - optimizer_config: OptimizerConfig + max_steps_per_epoch: int = 1 + gradient_accumulation_steps: int = 1 + max_validation_steps: Optional[int] = 1 + data_config: Optional[DataConfig] = None + optimizer_config: Optional[OptimizerConfig] = None efficiency_config: Optional[EfficiencyConfig] = None dtype: Optional[str] = "bf16" @@ -177,9 +177,9 @@ class PostTraining(Protocol): training_config: TrainingConfig, hyperparam_search_config: Dict[str, Any], logger_config: Dict[str, Any], - model: str = Field( - default="Llama3.2-3B-Instruct", - description="Model descriptor from `llama model list`", + model: Optional[str] = Field( + default=None, + description="Model descriptor for training if not in provider config`", ), checkpoint_dir: Optional[str] = None, algorithm_config: Optional[AlgorithmConfig] = None, diff --git a/llama_stack/providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py b/llama_stack/providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py index edc1ceb90..04bf86b97 100644 --- a/llama_stack/providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py +++ b/llama_stack/providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py @@ -38,6 +38,8 @@ from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasets import Datasets from llama_stack.apis.post_training import ( Checkpoint, + DataConfig, + EfficiencyConfig, LoraFinetuningConfig, OptimizerConfig, QATFinetuningConfig, @@ -89,6 +91,10 @@ class LoraFinetuningSingleDevice: datasetio_api: DatasetIO, datasets_api: Datasets, ) -> None: + assert isinstance(training_config.data_config, DataConfig), "DataConfig must be initialized" + + assert isinstance(training_config.efficiency_config, EfficiencyConfig), "EfficiencyConfig must be initialized" + self.job_uuid = job_uuid self.training_config = training_config if not isinstance(algorithm_config, LoraFinetuningConfig): @@ -188,6 +194,7 @@ class LoraFinetuningSingleDevice: self._tokenizer = await self._setup_tokenizer() log.info("Tokenizer is initialized.") + assert isinstance(self.training_config.optimizer_config, OptimizerConfig), "OptimizerConfig must be initialized" self._optimizer = await self._setup_optimizer(optimizer_config=self.training_config.optimizer_config) log.info("Optimizer is initialized.") @@ -195,6 +202,8 @@ class LoraFinetuningSingleDevice: self._model.set_num_output_chunks(self._loss_fn.num_output_chunks) log.info("Loss is initialized.") + assert isinstance(self.training_config.data_config, DataConfig), "DataConfig must be initialized" + self._training_sampler, self._training_dataloader = await self._setup_data( dataset_id=self.training_config.data_config.dataset_id, tokenizer=self._tokenizer, @@ -452,6 +461,7 @@ class LoraFinetuningSingleDevice: """ The core training loop. """ + assert isinstance(self.training_config.data_config, DataConfig), "DataConfig must be initialized" # Initialize tokens count and running loss (for grad accumulation) t0 = time.perf_counter() running_loss: float = 0.0 From 854c2ad264e9059f4d9b3d897734bbc8931ba359 Mon Sep 17 00:00:00 2001 From: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com> Date: Sat, 12 Apr 2025 04:19:11 -0400 Subject: [PATCH 22/39] fix: misleading help text for 'llama stack build' and 'llama stack run' (#1910) # What does this PR do? current text for 'llama stack build' and 'llama stack run' says that if no argument is passed to '--image-name' that the active Conda environment will be used in reality, the active enviroment is used whether it is from conda, virtualenv, etc. ## Test Plan N/A ## Documentation N/A Signed-off-by: Nathan Weinberg --- docs/source/distributions/building_distro.md | 2 +- llama_stack/cli/stack/build.py | 2 +- llama_stack/cli/stack/run.py | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/distributions/building_distro.md b/docs/source/distributions/building_distro.md index e1e38d7ce..ad5d3bff4 100644 --- a/docs/source/distributions/building_distro.md +++ b/docs/source/distributions/building_distro.md @@ -231,7 +231,7 @@ options: -h, --help show this help message and exit --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321) --image-name IMAGE_NAME - Name of the image to run. Defaults to the current conda environment (default: None) + Name of the image to run. Defaults to the current environment (default: None) --disable-ipv6 Disable IPv6 support (default: False) --env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: []) --tls-keyfile TLS_KEYFILE diff --git a/llama_stack/cli/stack/build.py b/llama_stack/cli/stack/build.py index 0ada7c615..c511a0682 100644 --- a/llama_stack/cli/stack/build.py +++ b/llama_stack/cli/stack/build.py @@ -57,7 +57,7 @@ class StackBuild(Subcommand): type=str, help=textwrap.dedent( f"""[for image-type={"|".join(e.value for e in ImageType)}] Name of the conda or virtual environment to use for -the build. If not specified, currently active Conda environment will be used if found. +the build. If not specified, currently active environment will be used if found. """ ), default=None, diff --git a/llama_stack/cli/stack/run.py b/llama_stack/cli/stack/run.py index 92015187b..d8234bb46 100644 --- a/llama_stack/cli/stack/run.py +++ b/llama_stack/cli/stack/run.py @@ -45,7 +45,7 @@ class StackRun(Subcommand): "--image-name", type=str, default=os.environ.get("CONDA_DEFAULT_ENV"), - help="Name of the image to run. Defaults to the current conda environment", + help="Name of the image to run. Defaults to the current environment", ) self.parser.add_argument( "--disable-ipv6", From f34f22f8c79d58a8067e53ed02e796a8d51c0559 Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Sat, 12 Apr 2025 11:41:12 -0700 Subject: [PATCH 23/39] feat: add batch inference API to llama stack inference (#1945) # What does this PR do? This PR adds two methods to the Inference API: - `batch_completion` - `batch_chat_completion` The motivation is for evaluations targeting a local inference engine (like meta-reference or vllm) where batch APIs provide for a substantial amount of acceleration. Why did I not add this to `Api.batch_inference` though? That just resulted in a _lot_ more book-keeping given the structure of Llama Stack. Had I done that, I would have needed to create a notion of a "batch model" resource, setup routing based on that, etc. This does not sound ideal. So what's the future of the batch inference API? I am not sure. Maybe we can keep it for true _asynchronous_ execution. So you can submit requests, and it can return a Job instance, etc. ## Test Plan Run meta-reference-gpu using: ```bash export INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct export INFERENCE_CHECKPOINT_DIR=../checkpoints/Llama-4-Scout-17B-16E-Instruct-20250331210000 export MODEL_PARALLEL_SIZE=4 export MAX_BATCH_SIZE=32 export MAX_SEQ_LEN=6144 LLAMA_MODELS_DEBUG=1 llama stack run meta-reference-gpu ``` Then run the batch inference test case. --- docs/_static/llama-stack-spec.html | 135 ++++----- docs/_static/llama-stack-spec.yaml | 149 +++++---- .../apis/batch_inference/batch_inference.py | 35 +-- llama_stack/apis/inference/inference.py | 34 +++ llama_stack/distribution/routers/routers.py | 40 +++ .../models/llama/llama3/chat_format.py | 1 - llama_stack/models/llama/llama3/generation.py | 23 +- .../models/llama/llama4/chat_format.py | 1 - llama_stack/models/llama/llama4/generation.py | 2 +- .../inline/inference/meta_reference/config.py | 5 +- .../inference/meta_reference/generators.py | 129 ++------ .../inference/meta_reference/inference.py | 286 +++++++++++++----- .../meta_reference/model_parallel.py | 26 +- .../meta_reference/parallel_utils.py | 8 +- .../sentence_transformers.py | 23 ++ .../remote/inference/ollama/ollama.py | 22 ++ .../providers/remote/inference/vllm/vllm.py | 22 ++ .../utils/inference/litellm_openai_mixin.py | 22 ++ .../meta-reference-gpu/run-with-safety.yaml | 6 +- .../templates/meta-reference-gpu/run.yaml | 3 +- .../inference/test_batch_inference.py | 76 +++++ .../test_cases/inference/chat_completion.json | 26 ++ .../test_cases/inference/completion.json | 13 + 23 files changed, 698 insertions(+), 389 deletions(-) create mode 100644 tests/integration/inference/test_batch_inference.py diff --git a/docs/_static/llama-stack-spec.html b/docs/_static/llama-stack-spec.html index cdd6b3b53..542fb5be5 100644 --- a/docs/_static/llama-stack-spec.html +++ b/docs/_static/llama-stack-spec.html @@ -85,7 +85,7 @@ } } }, - "/v1/batch-inference/chat-completion": { + "/v1/inference/batch-chat-completion": { "post": { "responses": { "200": { @@ -112,7 +112,7 @@ } }, "tags": [ - "BatchInference (Coming Soon)" + "Inference" ], "description": "", "parameters": [], @@ -128,7 +128,7 @@ } } }, - "/v1/batch-inference/completion": { + "/v1/inference/batch-completion": { "post": { "responses": { "200": { @@ -155,7 +155,7 @@ } }, "tags": [ - "BatchInference (Coming Soon)" + "Inference" ], "description": "", "parameters": [], @@ -239,7 +239,7 @@ } }, "tags": [ - "Inference" + "BatchInference (Coming Soon)" ], "description": "Generate a chat completion for the given messages using the specified model.", "parameters": [], @@ -287,7 +287,7 @@ } }, "tags": [ - "Inference" + "BatchInference (Coming Soon)" ], "description": "Generate a completion for the given content using the specified model.", "parameters": [], @@ -4366,6 +4366,51 @@ ], "title": "ToolCall" }, + "ToolConfig": { + "type": "object", + "properties": { + "tool_choice": { + "oneOf": [ + { + "type": "string", + "enum": [ + "auto", + "required", + "none" + ], + "title": "ToolChoice", + "description": "Whether tool use is required or automatic. This is a hint to the model which may not be followed. It depends on the Instruction Following capabilities of the model." + }, + { + "type": "string" + } + ], + "default": "auto", + "description": "(Optional) Whether tool use is automatic, required, or none. Can also specify a tool name to use a specific tool. Defaults to ToolChoice.auto." + }, + "tool_prompt_format": { + "type": "string", + "enum": [ + "json", + "function_tag", + "python_list" + ], + "description": "(Optional) Instructs the model how to format tool calls. By default, Llama Stack will attempt to use a format that is best adapted to the model. - `ToolPromptFormat.json`: The tool calls are formatted as a JSON object. - `ToolPromptFormat.function_tag`: The tool calls are enclosed in a tag. - `ToolPromptFormat.python_list`: The tool calls are output as Python syntax -- a list of function calls." + }, + "system_message_behavior": { + "type": "string", + "enum": [ + "append", + "replace" + ], + "description": "(Optional) Config for how to override the default system prompt. - `SystemMessageBehavior.append`: Appends the provided system message to the default system prompt. - `SystemMessageBehavior.replace`: Replaces the default system prompt with the provided system message. The system message can include the string '{{function_definitions}}' to indicate where the function definitions should be inserted.", + "default": "append" + } + }, + "additionalProperties": false, + "title": "ToolConfig", + "description": "Configuration for tool use." + }, "ToolDefinition": { "type": "object", "properties": { @@ -4554,7 +4599,7 @@ "BatchChatCompletionRequest": { "type": "object", "properties": { - "model": { + "model_id": { "type": "string" }, "messages_batch": { @@ -4575,25 +4620,8 @@ "$ref": "#/components/schemas/ToolDefinition" } }, - "tool_choice": { - "type": "string", - "enum": [ - "auto", - "required", - "none" - ], - "title": "ToolChoice", - "description": "Whether tool use is required or automatic. This is a hint to the model which may not be followed. It depends on the Instruction Following capabilities of the model." - }, - "tool_prompt_format": { - "type": "string", - "enum": [ - "json", - "function_tag", - "python_list" - ], - "title": "ToolPromptFormat", - "description": "Prompt format for calling custom / zero shot tools." + "tool_config": { + "$ref": "#/components/schemas/ToolConfig" }, "response_format": { "$ref": "#/components/schemas/ResponseFormat" @@ -4613,7 +4641,7 @@ }, "additionalProperties": false, "required": [ - "model", + "model_id", "messages_batch" ], "title": "BatchChatCompletionRequest" @@ -4710,7 +4738,7 @@ "BatchCompletionRequest": { "type": "object", "properties": { - "model": { + "model_id": { "type": "string" }, "content_batch": { @@ -4740,7 +4768,7 @@ }, "additionalProperties": false, "required": [ - "model", + "model_id", "content_batch" ], "title": "BatchCompletionRequest" @@ -4812,51 +4840,6 @@ ], "title": "CancelTrainingJobRequest" }, - "ToolConfig": { - "type": "object", - "properties": { - "tool_choice": { - "oneOf": [ - { - "type": "string", - "enum": [ - "auto", - "required", - "none" - ], - "title": "ToolChoice", - "description": "Whether tool use is required or automatic. This is a hint to the model which may not be followed. It depends on the Instruction Following capabilities of the model." - }, - { - "type": "string" - } - ], - "default": "auto", - "description": "(Optional) Whether tool use is automatic, required, or none. Can also specify a tool name to use a specific tool. Defaults to ToolChoice.auto." - }, - "tool_prompt_format": { - "type": "string", - "enum": [ - "json", - "function_tag", - "python_list" - ], - "description": "(Optional) Instructs the model how to format tool calls. By default, Llama Stack will attempt to use a format that is best adapted to the model. - `ToolPromptFormat.json`: The tool calls are formatted as a JSON object. - `ToolPromptFormat.function_tag`: The tool calls are enclosed in a tag. - `ToolPromptFormat.python_list`: The tool calls are output as Python syntax -- a list of function calls." - }, - "system_message_behavior": { - "type": "string", - "enum": [ - "append", - "replace" - ], - "description": "(Optional) Config for how to override the default system prompt. - `SystemMessageBehavior.append`: Appends the provided system message to the default system prompt. - `SystemMessageBehavior.replace`: Replaces the default system prompt with the provided system message. The system message can include the string '{{function_definitions}}' to indicate where the function definitions should be inserted.", - "default": "append" - } - }, - "additionalProperties": false, - "title": "ToolConfig", - "description": "Configuration for tool use." - }, "ChatCompletionRequest": { "type": "object", "properties": { @@ -11173,7 +11156,9 @@ "x-displayName": "Agents API for creating and interacting with agentic systems." }, { - "name": "BatchInference (Coming Soon)" + "name": "BatchInference (Coming Soon)", + "description": "This is an asynchronous API. If the request is successful, the response will be a job which can be polled for completion.\n\nNOTE: This API is not yet implemented and is subject to change in concert with other asynchronous APIs\nincluding (post-training, evals, etc).", + "x-displayName": "Batch inference API for generating completions and chat completions." }, { "name": "Benchmarks" diff --git a/docs/_static/llama-stack-spec.yaml b/docs/_static/llama-stack-spec.yaml index aa8d9456e..fa7b130e2 100644 --- a/docs/_static/llama-stack-spec.yaml +++ b/docs/_static/llama-stack-spec.yaml @@ -40,7 +40,7 @@ paths: schema: $ref: '#/components/schemas/AppendRowsRequest' required: true - /v1/batch-inference/chat-completion: + /v1/inference/batch-chat-completion: post: responses: '200': @@ -60,7 +60,7 @@ paths: default: $ref: '#/components/responses/DefaultError' tags: - - BatchInference (Coming Soon) + - Inference description: '' parameters: [] requestBody: @@ -69,7 +69,7 @@ paths: schema: $ref: '#/components/schemas/BatchChatCompletionRequest' required: true - /v1/batch-inference/completion: + /v1/inference/batch-completion: post: responses: '200': @@ -89,7 +89,7 @@ paths: default: $ref: '#/components/responses/DefaultError' tags: - - BatchInference (Coming Soon) + - Inference description: '' parameters: [] requestBody: @@ -148,7 +148,7 @@ paths: default: $ref: '#/components/responses/DefaultError' tags: - - Inference + - BatchInference (Coming Soon) description: >- Generate a chat completion for the given messages using the specified model. parameters: [] @@ -183,7 +183,7 @@ paths: default: $ref: '#/components/responses/DefaultError' tags: - - Inference + - BatchInference (Coming Soon) description: >- Generate a completion for the given content using the specified model. parameters: [] @@ -3009,6 +3009,54 @@ components: - tool_name - arguments title: ToolCall + ToolConfig: + type: object + properties: + tool_choice: + oneOf: + - type: string + enum: + - auto + - required + - none + title: ToolChoice + description: >- + Whether tool use is required or automatic. This is a hint to the model + which may not be followed. It depends on the Instruction Following + capabilities of the model. + - type: string + default: auto + description: >- + (Optional) Whether tool use is automatic, required, or none. Can also + specify a tool name to use a specific tool. Defaults to ToolChoice.auto. + tool_prompt_format: + type: string + enum: + - json + - function_tag + - python_list + description: >- + (Optional) Instructs the model how to format tool calls. By default, Llama + Stack will attempt to use a format that is best adapted to the model. + - `ToolPromptFormat.json`: The tool calls are formatted as a JSON object. + - `ToolPromptFormat.function_tag`: The tool calls are enclosed in a + tag. - `ToolPromptFormat.python_list`: The tool calls are output as Python + syntax -- a list of function calls. + system_message_behavior: + type: string + enum: + - append + - replace + description: >- + (Optional) Config for how to override the default system prompt. - `SystemMessageBehavior.append`: + Appends the provided system message to the default system prompt. - `SystemMessageBehavior.replace`: + Replaces the default system prompt with the provided system message. The + system message can include the string '{{function_definitions}}' to indicate + where the function definitions should be inserted. + default: append + additionalProperties: false + title: ToolConfig + description: Configuration for tool use. ToolDefinition: type: object properties: @@ -3145,7 +3193,7 @@ components: BatchChatCompletionRequest: type: object properties: - model: + model_id: type: string messages_batch: type: array @@ -3159,26 +3207,8 @@ components: type: array items: $ref: '#/components/schemas/ToolDefinition' - tool_choice: - type: string - enum: - - auto - - required - - none - title: ToolChoice - description: >- - Whether tool use is required or automatic. This is a hint to the model - which may not be followed. It depends on the Instruction Following capabilities - of the model. - tool_prompt_format: - type: string - enum: - - json - - function_tag - - python_list - title: ToolPromptFormat - description: >- - Prompt format for calling custom / zero shot tools. + tool_config: + $ref: '#/components/schemas/ToolConfig' response_format: $ref: '#/components/schemas/ResponseFormat' logprobs: @@ -3193,7 +3223,7 @@ components: title: LogProbConfig additionalProperties: false required: - - model + - model_id - messages_batch title: BatchChatCompletionRequest BatchChatCompletionResponse: @@ -3261,7 +3291,7 @@ components: BatchCompletionRequest: type: object properties: - model: + model_id: type: string content_batch: type: array @@ -3283,7 +3313,7 @@ components: title: LogProbConfig additionalProperties: false required: - - model + - model_id - content_batch title: BatchCompletionRequest BatchCompletionResponse: @@ -3335,54 +3365,6 @@ components: required: - job_uuid title: CancelTrainingJobRequest - ToolConfig: - type: object - properties: - tool_choice: - oneOf: - - type: string - enum: - - auto - - required - - none - title: ToolChoice - description: >- - Whether tool use is required or automatic. This is a hint to the model - which may not be followed. It depends on the Instruction Following - capabilities of the model. - - type: string - default: auto - description: >- - (Optional) Whether tool use is automatic, required, or none. Can also - specify a tool name to use a specific tool. Defaults to ToolChoice.auto. - tool_prompt_format: - type: string - enum: - - json - - function_tag - - python_list - description: >- - (Optional) Instructs the model how to format tool calls. By default, Llama - Stack will attempt to use a format that is best adapted to the model. - - `ToolPromptFormat.json`: The tool calls are formatted as a JSON object. - - `ToolPromptFormat.function_tag`: The tool calls are enclosed in a - tag. - `ToolPromptFormat.python_list`: The tool calls are output as Python - syntax -- a list of function calls. - system_message_behavior: - type: string - enum: - - append - - replace - description: >- - (Optional) Config for how to override the default system prompt. - `SystemMessageBehavior.append`: - Appends the provided system message to the default system prompt. - `SystemMessageBehavior.replace`: - Replaces the default system prompt with the provided system message. The - system message can include the string '{{function_definitions}}' to indicate - where the function definitions should be inserted. - default: append - additionalProperties: false - title: ToolConfig - description: Configuration for tool use. ChatCompletionRequest: type: object properties: @@ -7632,6 +7614,17 @@ tags: x-displayName: >- Agents API for creating and interacting with agentic systems. - name: BatchInference (Coming Soon) + description: >- + This is an asynchronous API. If the request is successful, the response will + be a job which can be polled for completion. + + + NOTE: This API is not yet implemented and is subject to change in concert with + other asynchronous APIs + + including (post-training, evals, etc). + x-displayName: >- + Batch inference API for generating completions and chat completions. - name: Benchmarks - name: DatasetIO - name: Datasets diff --git a/llama_stack/apis/batch_inference/batch_inference.py b/llama_stack/apis/batch_inference/batch_inference.py index 330a683ba..7a324128d 100644 --- a/llama_stack/apis/batch_inference/batch_inference.py +++ b/llama_stack/apis/batch_inference/batch_inference.py @@ -6,11 +6,8 @@ from typing import List, Optional, Protocol, runtime_checkable -from pydantic import BaseModel - +from llama_stack.apis.common.job_types import Job from llama_stack.apis.inference import ( - ChatCompletionResponse, - CompletionResponse, InterleavedContent, LogProbConfig, Message, @@ -20,41 +17,39 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.schema_utils import json_schema_type, webmethod - - -@json_schema_type -class BatchCompletionResponse(BaseModel): - batch: List[CompletionResponse] - - -@json_schema_type -class BatchChatCompletionResponse(BaseModel): - batch: List[ChatCompletionResponse] +from llama_stack.schema_utils import webmethod @runtime_checkable class BatchInference(Protocol): + """Batch inference API for generating completions and chat completions. + + This is an asynchronous API. If the request is successful, the response will be a job which can be polled for completion. + + NOTE: This API is not yet implemented and is subject to change in concert with other asynchronous APIs + including (post-training, evals, etc). + """ + @webmethod(route="/batch-inference/completion", method="POST") - async def batch_completion( + async def completion( self, model: str, content_batch: List[InterleavedContent], sampling_params: Optional[SamplingParams] = None, response_format: Optional[ResponseFormat] = None, logprobs: Optional[LogProbConfig] = None, - ) -> BatchCompletionResponse: ... + ) -> Job: ... @webmethod(route="/batch-inference/chat-completion", method="POST") - async def batch_chat_completion( + async def chat_completion( self, model: str, messages_batch: List[List[Message]], sampling_params: Optional[SamplingParams] = None, # zero-shot tool definitions as input to the model - tools: Optional[List[ToolDefinition]] = list, + tools: Optional[List[ToolDefinition]] = None, tool_choice: Optional[ToolChoice] = ToolChoice.auto, tool_prompt_format: Optional[ToolPromptFormat] = None, response_format: Optional[ResponseFormat] = None, logprobs: Optional[LogProbConfig] = None, - ) -> BatchChatCompletionResponse: ... + ) -> Job: ... diff --git a/llama_stack/apis/inference/inference.py b/llama_stack/apis/inference/inference.py index 3390a3fef..9eb3910c6 100644 --- a/llama_stack/apis/inference/inference.py +++ b/llama_stack/apis/inference/inference.py @@ -681,6 +681,16 @@ class EmbeddingTaskType(Enum): document = "document" +@json_schema_type +class BatchCompletionResponse(BaseModel): + batch: List[CompletionResponse] + + +@json_schema_type +class BatchChatCompletionResponse(BaseModel): + batch: List[ChatCompletionResponse] + + @runtime_checkable @trace_protocol class Inference(Protocol): @@ -716,6 +726,17 @@ class Inference(Protocol): """ ... + @webmethod(route="/inference/batch-completion", method="POST") + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ) -> BatchCompletionResponse: + raise NotImplementedError("Batch completion is not implemented") + @webmethod(route="/inference/chat-completion", method="POST") async def chat_completion( self, @@ -756,6 +777,19 @@ class Inference(Protocol): """ ... + @webmethod(route="/inference/batch-chat-completion", method="POST") + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + sampling_params: Optional[SamplingParams] = None, + tools: Optional[List[ToolDefinition]] = None, + tool_config: Optional[ToolConfig] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ) -> BatchChatCompletionResponse: + raise NotImplementedError("Batch chat completion is not implemented") + @webmethod(route="/inference/embeddings", method="POST") async def embeddings( self, diff --git a/llama_stack/distribution/routers/routers.py b/llama_stack/distribution/routers/routers.py index bc313036f..b9623ef3c 100644 --- a/llama_stack/distribution/routers/routers.py +++ b/llama_stack/distribution/routers/routers.py @@ -17,6 +17,8 @@ from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasets import DatasetPurpose, DataSource from llama_stack.apis.eval import BenchmarkConfig, Eval, EvaluateResponse, Job from llama_stack.apis.inference import ( + BatchChatCompletionResponse, + BatchCompletionResponse, ChatCompletionResponse, ChatCompletionResponseEventType, ChatCompletionResponseStreamChunk, @@ -334,6 +336,30 @@ class InferenceRouter(Inference): response.metrics = metrics if response.metrics is None else response.metrics + metrics return response + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + tools: Optional[List[ToolDefinition]] = None, + tool_config: Optional[ToolConfig] = None, + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ) -> BatchChatCompletionResponse: + logger.debug( + f"InferenceRouter.batch_chat_completion: {model_id=}, {len(messages_batch)=}, {sampling_params=}, {response_format=}, {logprobs=}", + ) + provider = self.routing_table.get_provider_impl(model_id) + return await provider.batch_chat_completion( + model_id=model_id, + messages_batch=messages_batch, + tools=tools, + tool_config=tool_config, + sampling_params=sampling_params, + response_format=response_format, + logprobs=logprobs, + ) + async def completion( self, model_id: str, @@ -398,6 +424,20 @@ class InferenceRouter(Inference): response.metrics = metrics if response.metrics is None else response.metrics + metrics return response + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ) -> BatchCompletionResponse: + logger.debug( + f"InferenceRouter.batch_completion: {model_id=}, {len(content_batch)=}, {sampling_params=}, {response_format=}, {logprobs=}", + ) + provider = self.routing_table.get_provider_impl(model_id) + return await provider.batch_completion(model_id, content_batch, sampling_params, response_format, logprobs) + async def embeddings( self, model_id: str, diff --git a/llama_stack/models/llama/llama3/chat_format.py b/llama_stack/models/llama/llama3/chat_format.py index f55cd5e1c..fe7a7a898 100644 --- a/llama_stack/models/llama/llama3/chat_format.py +++ b/llama_stack/models/llama/llama3/chat_format.py @@ -226,7 +226,6 @@ class ChatFormat: arguments_json=json.dumps(tool_arguments), ) ) - content = "" return RawMessage( role="assistant", diff --git a/llama_stack/models/llama/llama3/generation.py b/llama_stack/models/llama/llama3/generation.py index 8c6aa242b..35c140707 100644 --- a/llama_stack/models/llama/llama3/generation.py +++ b/llama_stack/models/llama/llama3/generation.py @@ -140,7 +140,12 @@ class Llama3: return Llama3(model, tokenizer, model_args) - def __init__(self, model: Transformer | CrossAttentionTransformer, tokenizer: Tokenizer, args: ModelArgs): + def __init__( + self, + model: Transformer | CrossAttentionTransformer, + tokenizer: Tokenizer, + args: ModelArgs, + ): self.args = args self.model = model self.tokenizer = tokenizer @@ -149,7 +154,7 @@ class Llama3: @torch.inference_mode() def generate( self, - model_inputs: List[LLMInput], + llm_inputs: List[LLMInput], temperature: float = 0.6, top_p: float = 0.9, max_gen_len: Optional[int] = None, @@ -164,15 +169,15 @@ class Llama3: print_model_input = print_model_input or os.environ.get("LLAMA_MODELS_DEBUG", "0") == "1" if print_model_input: - for inp in model_inputs: + for inp in llm_inputs: tokens_to_print = [self.formatter.vision_token if t == 128256 else t for t in inp.tokens] cprint( "Input to model:\n" + self.tokenizer.decode(tokens_to_print) + "\n", "red", ) - prompt_tokens = [inp.tokens for inp in model_inputs] + prompt_tokens = [inp.tokens for inp in llm_inputs] - bsz = len(model_inputs) + bsz = len(llm_inputs) assert bsz <= params.max_batch_size, (bsz, params.max_batch_size) min_prompt_len = min(len(t) for t in prompt_tokens) @@ -193,8 +198,8 @@ class Llama3: is_vision = not isinstance(self.model, Transformer) if is_vision: - images = [inp.vision.images if inp.vision is not None else [] for inp in model_inputs] - mask = [inp.vision.mask if inp.vision is not None else [] for inp in model_inputs] + images = [inp.vision.images if inp.vision is not None else [] for inp in llm_inputs] + mask = [inp.vision.mask if inp.vision is not None else [] for inp in llm_inputs] xattn_caches, cross_attention_masks, full_text_row_masked_out_mask = self.model.compute_vision_tokens_masks( batch_images=images, @@ -229,7 +234,7 @@ class Llama3: for cur_pos in range(min_prompt_len, total_len): if is_vision: position_ids = torch.arange(prev_pos, cur_pos, dtype=torch.long) - text_only_inference = all(inp.vision is None for inp in model_inputs) + text_only_inference = all(inp.vision is None for inp in llm_inputs) logits = self.model.forward( position_ids, tokens, @@ -285,7 +290,7 @@ class Llama3: source="output", logprobs=(token_logprobs[idx, cur_pos : cur_pos + 1].tolist() if logprobs else None), batch_idx=idx, - finished=eos_reached[idx], + finished=eos_reached[idx].item(), ignore_token=cur_pos < len(prompt_tokens[idx]), ) ) diff --git a/llama_stack/models/llama/llama4/chat_format.py b/llama_stack/models/llama/llama4/chat_format.py index 160bb00f8..9d60d00e9 100644 --- a/llama_stack/models/llama/llama4/chat_format.py +++ b/llama_stack/models/llama/llama4/chat_format.py @@ -301,7 +301,6 @@ class ChatFormat: arguments=tool_arguments, ) ) - content = "" return RawMessage( role="assistant", diff --git a/llama_stack/models/llama/llama4/generation.py b/llama_stack/models/llama/llama4/generation.py index 7a4087c8f..8e94bb33a 100644 --- a/llama_stack/models/llama/llama4/generation.py +++ b/llama_stack/models/llama/llama4/generation.py @@ -233,7 +233,7 @@ class Llama4: source="output", logprobs=(token_logprobs[idx, cur_pos : cur_pos + 1].tolist() if logprobs else None), batch_idx=idx, - finished=eos_reached[idx], + finished=eos_reached[idx].item(), ignore_token=cur_pos < len(prompt_tokens[idx]), ) ) diff --git a/llama_stack/providers/inline/inference/meta_reference/config.py b/llama_stack/providers/inline/inference/meta_reference/config.py index 315667506..6f796d0d4 100644 --- a/llama_stack/providers/inline/inference/meta_reference/config.py +++ b/llama_stack/providers/inline/inference/meta_reference/config.py @@ -52,14 +52,17 @@ class MetaReferenceInferenceConfig(BaseModel): checkpoint_dir: str = "${env.CHECKPOINT_DIR:null}", quantization_type: str = "${env.QUANTIZATION_TYPE:bf16}", model_parallel_size: str = "${env.MODEL_PARALLEL_SIZE:0}", + max_batch_size: str = "${env.MAX_BATCH_SIZE:1}", + max_seq_len: str = "${env.MAX_SEQ_LEN:4096}", **kwargs, ) -> Dict[str, Any]: return { "model": model, - "max_seq_len": 4096, "checkpoint_dir": checkpoint_dir, "quantization": { "type": quantization_type, }, "model_parallel_size": model_parallel_size, + "max_batch_size": max_batch_size, + "max_seq_len": max_seq_len, } diff --git a/llama_stack/providers/inline/inference/meta_reference/generators.py b/llama_stack/providers/inline/inference/meta_reference/generators.py index 34dd58a9a..0a928ce73 100644 --- a/llama_stack/providers/inline/inference/meta_reference/generators.py +++ b/llama_stack/providers/inline/inference/meta_reference/generators.py @@ -22,7 +22,7 @@ from llama_stack.models.llama.llama3.generation import Llama3 from llama_stack.models.llama.llama3.tokenizer import Tokenizer as Llama3Tokenizer from llama_stack.models.llama.llama4.generation import Llama4 from llama_stack.models.llama.llama4.tokenizer import Tokenizer as Llama4Tokenizer -from llama_stack.models.llama.sku_types import Model +from llama_stack.models.llama.sku_types import Model, ModelFamily from llama_stack.providers.utils.inference.prompt_adapter import ( ChatCompletionRequestWithRawContent, CompletionRequestWithRawContent, @@ -113,8 +113,7 @@ def _infer_tool_prompt_format(request: ChatCompletionRequestWithRawContent): return get_default_tool_prompt_format(request.model) -# TODO: combine Llama3 and Llama4 generators since they are almost identical now -class Llama4Generator: +class LlamaGenerator: def __init__( self, config: MetaReferenceInferenceConfig, @@ -144,7 +143,8 @@ class Llama4Generator: else: quantization_mode = None - self.inner_generator = Llama4.build( + cls = Llama4 if llama_model.model_family == ModelFamily.llama4 else Llama3 + self.inner_generator = cls.build( ckpt_dir=ckpt_dir, max_seq_len=config.max_seq_len, max_batch_size=config.max_batch_size, @@ -158,142 +158,55 @@ class Llama4Generator: def completion( self, - request: CompletionRequestWithRawContent, + request_batch: List[CompletionRequestWithRawContent], ) -> Generator: - sampling_params = request.sampling_params or SamplingParams() + first_request = request_batch[0] + sampling_params = first_request.sampling_params or SamplingParams() max_gen_len = sampling_params.max_tokens if max_gen_len is None or max_gen_len == 0 or max_gen_len >= self.args.max_seq_len: max_gen_len = self.args.max_seq_len - 1 temperature, top_p = _infer_sampling_params(sampling_params) for result in self.inner_generator.generate( - llm_inputs=[self.formatter.encode_content(request.content)], + llm_inputs=[self.formatter.encode_content(request.content) for request in request_batch], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, - logprobs=bool(request.logprobs), + logprobs=bool(first_request.logprobs), echo=False, logits_processor=get_logits_processor( self.tokenizer, self.args.vocab_size, - request.response_format, + first_request.response_format, ), ): - yield result[0] + yield result def chat_completion( self, - request: ChatCompletionRequestWithRawContent, + request_batch: List[ChatCompletionRequestWithRawContent], ) -> Generator: - sampling_params = request.sampling_params or SamplingParams() + first_request = request_batch[0] + sampling_params = first_request.sampling_params or SamplingParams() max_gen_len = sampling_params.max_tokens if max_gen_len is None or max_gen_len == 0 or max_gen_len >= self.args.max_seq_len: max_gen_len = self.args.max_seq_len - 1 temperature, top_p = _infer_sampling_params(sampling_params) for result in self.inner_generator.generate( - llm_inputs=[self.formatter.encode_dialog_prompt(request.messages, _infer_tool_prompt_format(request))], + llm_inputs=[ + self.formatter.encode_dialog_prompt(request.messages, _infer_tool_prompt_format(request)) + for request in request_batch + ], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, - logprobs=bool(request.logprobs), + logprobs=bool(first_request.logprobs), echo=False, logits_processor=get_logits_processor( self.tokenizer, self.args.vocab_size, - request.response_format, + first_request.response_format, ), ): - yield result[0] - - -class Llama3Generator: - def __init__( - self, - config: MetaReferenceInferenceConfig, - model_id: str, - llama_model: Model, - ): - if config.checkpoint_dir and config.checkpoint_dir != "null": - ckpt_dir = config.checkpoint_dir - else: - resolved_model = resolve_model(model_id) - if resolved_model is None: - # if the model is not a native llama model, get the default checkpoint_dir based on model id - ckpt_dir = model_checkpoint_dir(model_id) - else: - # if the model is a native llama model, get the default checkpoint_dir based on model core_model_id value - ckpt_dir = model_checkpoint_dir(resolved_model.descriptor()) - - if config.quantization: - if config.quantization.type == "fp8_mixed": - quantization_mode = QuantizationMode.fp8_mixed - elif config.quantization.type == "int4_mixed": - quantization_mode = QuantizationMode.int4_mixed - elif config.quantization.type == "bf16": - quantization_mode = None - else: - raise ValueError(f"Unsupported quantization mode {config.quantization}") - else: - quantization_mode = None - - self.inner_generator = Llama3.build( - ckpt_dir=ckpt_dir, - max_seq_len=config.max_seq_len, - max_batch_size=config.max_batch_size, - world_size=config.model_parallel_size or llama_model.pth_file_count, - quantization_mode=quantization_mode, - ) - self.tokenizer = self.inner_generator.tokenizer - self.args = self.inner_generator.args - self.formatter = self.inner_generator.formatter - - def completion( - self, - request: CompletionRequestWithRawContent, - ) -> Generator: - sampling_params = request.sampling_params or SamplingParams() - max_gen_len = sampling_params.max_tokens - if max_gen_len is None or max_gen_len == 0 or max_gen_len >= self.args.max_seq_len: - max_gen_len = self.args.max_seq_len - 1 - - temperature, top_p = _infer_sampling_params(sampling_params) - for result in self.inner_generator.generate( - model_inputs=[self.formatter.encode_content(request.content)], - max_gen_len=max_gen_len, - temperature=temperature, - top_p=top_p, - logprobs=bool(request.logprobs), - echo=False, - logits_processor=get_logits_processor( - self.tokenizer, - self.args.vocab_size, - request.response_format, - ), - ): - yield result[0] - - def chat_completion( - self, - request: ChatCompletionRequestWithRawContent, - ) -> Generator: - sampling_params = request.sampling_params or SamplingParams() - max_gen_len = sampling_params.max_tokens - if max_gen_len is None or max_gen_len == 0 or max_gen_len >= self.args.max_seq_len: - max_gen_len = self.args.max_seq_len - 1 - - temperature, top_p = _infer_sampling_params(sampling_params) - for result in self.inner_generator.generate( - model_inputs=[self.formatter.encode_dialog_prompt(request.messages, _infer_tool_prompt_format(request))], - max_gen_len=max_gen_len, - temperature=temperature, - top_p=top_p, - logprobs=bool(request.logprobs), - echo=False, - logits_processor=get_logits_processor( - self.tokenizer, - self.args.vocab_size, - request.response_format, - ), - ): - yield result[0] + yield result diff --git a/llama_stack/providers/inline/inference/meta_reference/inference.py b/llama_stack/providers/inline/inference/meta_reference/inference.py index 3a7632065..0b56ba1f7 100644 --- a/llama_stack/providers/inline/inference/meta_reference/inference.py +++ b/llama_stack/providers/inline/inference/meta_reference/inference.py @@ -5,10 +5,10 @@ # the root directory of this source tree. import asyncio -import logging import os from typing import AsyncGenerator, List, Optional, Union +from pydantic import BaseModel from termcolor import cprint from llama_stack.apis.common.content_types import ( @@ -17,6 +17,8 @@ from llama_stack.apis.common.content_types import ( ToolCallParseStatus, ) from llama_stack.apis.inference import ( + BatchChatCompletionResponse, + BatchCompletionResponse, ChatCompletionRequest, ChatCompletionResponse, ChatCompletionResponseEvent, @@ -38,8 +40,10 @@ from llama_stack.apis.inference import ( ToolConfig, ToolDefinition, ToolPromptFormat, + UserMessage, ) from llama_stack.apis.models import Model, ModelType +from llama_stack.log import get_logger from llama_stack.models.llama.llama3.chat_format import ChatFormat as Llama3ChatFormat from llama_stack.models.llama.llama3.tokenizer import Tokenizer as Llama3Tokenizer from llama_stack.models.llama.llama4.chat_format import ChatFormat as Llama4ChatFormat @@ -65,21 +69,17 @@ from llama_stack.providers.utils.inference.prompt_adapter import ( ) from .config import MetaReferenceInferenceConfig -from .generators import Llama3Generator, Llama4Generator +from .generators import LlamaGenerator from .model_parallel import LlamaModelParallelGenerator -log = logging.getLogger(__name__) +log = get_logger(__name__, category="inference") # there's a single model parallel process running serving the model. for now, # we don't support multiple concurrent requests to this process. SEMAPHORE = asyncio.Semaphore(1) -def llama3_builder_fn(config: MetaReferenceInferenceConfig, model_id: str, llama_model: Model) -> Llama3Generator: - return Llama3Generator(config, model_id, llama_model) - - -def llama4_builder_fn(config: MetaReferenceInferenceConfig, model_id: str, llama_model: Model) -> Llama4Generator: - return Llama4Generator(config, model_id, llama_model) +def llama_builder_fn(config: MetaReferenceInferenceConfig, model_id: str, llama_model: Model) -> LlamaGenerator: + return LlamaGenerator(config, model_id, llama_model) class MetaReferenceInferenceImpl( @@ -139,24 +139,12 @@ class MetaReferenceInferenceImpl( async def load_model(self, model_id, llama_model) -> None: log.info(f"Loading model `{model_id}`") - if llama_model.model_family in { - ModelFamily.llama3, - ModelFamily.llama3_1, - ModelFamily.llama3_2, - ModelFamily.llama3_3, - }: - builder_fn = llama3_builder_fn - elif llama_model.model_family == ModelFamily.llama4: - builder_fn = llama4_builder_fn - else: - raise ValueError(f"Unsupported model family: {llama_model.model_family}") - builder_params = [self.config, model_id, llama_model] if self.config.create_distributed_process_group: self.generator = LlamaModelParallelGenerator( model_parallel_size=self.config.model_parallel_size or llama_model.pth_file_count, - builder_fn=builder_fn, + builder_fn=llama_builder_fn, builder_params=builder_params, formatter=( Llama4ChatFormat(Llama4Tokenizer.get_instance()) @@ -166,11 +154,24 @@ class MetaReferenceInferenceImpl( ) self.generator.start() else: - self.generator = builder_fn(*builder_params) + self.generator = llama_builder_fn(*builder_params) self.model_id = model_id self.llama_model = llama_model + log.info("Warming up...") + await self.completion( + model_id=model_id, + content="Hello, world!", + sampling_params=SamplingParams(max_tokens=10), + ) + await self.chat_completion( + model_id=model_id, + messages=[UserMessage(content="Hi how are you?")], + sampling_params=SamplingParams(max_tokens=20), + ) + log.info("Warmed up!") + def check_model(self, request) -> None: if self.model_id is None or self.llama_model is None: raise RuntimeError( @@ -208,7 +209,43 @@ class MetaReferenceInferenceImpl( if request.stream: return self._stream_completion(request) else: - return await self._nonstream_completion(request) + results = await self._nonstream_completion([request]) + return results[0] + + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + stream: Optional[bool] = False, + logprobs: Optional[LogProbConfig] = None, + ) -> BatchCompletionResponse: + if sampling_params is None: + sampling_params = SamplingParams() + if logprobs: + assert logprobs.top_k == 1, f"Unexpected top_k={logprobs.top_k}" + + content_batch = [ + augment_content_with_response_format_prompt(response_format, content) for content in content_batch + ] + + request_batch = [] + for content in content_batch: + request = CompletionRequest( + model=model_id, + content=content, + sampling_params=sampling_params, + response_format=response_format, + stream=stream, + logprobs=logprobs, + ) + self.check_model(request) + request = await convert_request_to_raw(request) + request_batch.append(request) + + results = await self._nonstream_completion(request_batch) + return BatchCompletionResponse(batch=results) async def _stream_completion(self, request: CompletionRequest) -> AsyncGenerator: tokenizer = self.generator.formatter.tokenizer @@ -253,37 +290,54 @@ class MetaReferenceInferenceImpl( for x in impl(): yield x - async def _nonstream_completion(self, request: CompletionRequest) -> CompletionResponse: + async def _nonstream_completion(self, request_batch: List[CompletionRequest]) -> List[CompletionResponse]: tokenizer = self.generator.formatter.tokenizer + first_request = request_batch[0] + + class ItemState(BaseModel): + tokens: List[int] = [] + logprobs: List[TokenLogProbs] = [] + stop_reason: StopReason | None = None + finished: bool = False + def impl(): - tokens = [] - logprobs = [] - stop_reason = None + states = [ItemState() for _ in request_batch] - for token_result in self.generator.completion(request): - tokens.append(token_result.token) - if token_result.token == tokenizer.eot_id: - stop_reason = StopReason.end_of_turn - elif token_result.token == tokenizer.eom_id: - stop_reason = StopReason.end_of_message + results = [] + for token_results in self.generator.completion(request_batch): + for result in token_results: + idx = result.batch_idx + state = states[idx] + if state.finished or result.ignore_token: + continue - if request.logprobs: - assert len(token_result.logprobs) == 1 + state.finished = result.finished + if first_request.logprobs: + state.logprobs.append(TokenLogProbs(logprobs_by_token={result.text: result.logprobs[0]})) - logprobs.append(TokenLogProbs(logprobs_by_token={token_result.text: token_result.logprobs[0]})) + state.tokens.append(result.token) + if result.token == tokenizer.eot_id: + state.stop_reason = StopReason.end_of_turn + elif result.token == tokenizer.eom_id: + state.stop_reason = StopReason.end_of_message - if stop_reason is None: - stop_reason = StopReason.out_of_tokens + for state in states: + if state.stop_reason is None: + state.stop_reason = StopReason.out_of_tokens - if tokens[-1] in self.generator.formatter.tokenizer.stop_tokens: - tokens = tokens[:-1] - content = self.generator.formatter.tokenizer.decode(tokens) - return CompletionResponse( - content=content, - stop_reason=stop_reason, - logprobs=logprobs if request.logprobs else None, - ) + if state.tokens[-1] in self.generator.formatter.tokenizer.stop_tokens: + state.tokens = state.tokens[:-1] + content = self.generator.formatter.tokenizer.decode(state.tokens) + results.append( + CompletionResponse( + content=content, + stop_reason=state.stop_reason, + logprobs=state.logprobs if first_request.logprobs else None, + ) + ) + + return results if self.config.create_distributed_process_group: async with SEMAPHORE: @@ -318,7 +372,7 @@ class MetaReferenceInferenceImpl( response_format=response_format, stream=stream, logprobs=logprobs, - tool_config=tool_config, + tool_config=tool_config or ToolConfig(), ) self.check_model(request) @@ -334,44 +388,110 @@ class MetaReferenceInferenceImpl( if request.stream: return self._stream_chat_completion(request) else: - return await self._nonstream_chat_completion(request) + results = await self._nonstream_chat_completion([request]) + return results[0] - async def _nonstream_chat_completion(self, request: ChatCompletionRequest) -> ChatCompletionResponse: + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + tools: Optional[List[ToolDefinition]] = None, + stream: Optional[bool] = False, + logprobs: Optional[LogProbConfig] = None, + tool_config: Optional[ToolConfig] = None, + ) -> BatchChatCompletionResponse: + if sampling_params is None: + sampling_params = SamplingParams() + if logprobs: + assert logprobs.top_k == 1, f"Unexpected top_k={logprobs.top_k}" + + # wrapper request to make it easier to pass around (internal only, not exposed to API) + request_batch = [] + for messages in messages_batch: + request = ChatCompletionRequest( + model=model_id, + messages=messages, + sampling_params=sampling_params, + tools=tools or [], + response_format=response_format, + logprobs=logprobs, + tool_config=tool_config or ToolConfig(), + ) + self.check_model(request) + + # augment and rewrite messages depending on the model + request.messages = chat_completion_request_to_messages(request, self.llama_model.core_model_id.value) + # download media and convert to raw content so we can send it to the model + request = await convert_request_to_raw(request) + request_batch.append(request) + + if self.config.create_distributed_process_group: + if SEMAPHORE.locked(): + raise RuntimeError("Only one concurrent request is supported") + + results = await self._nonstream_chat_completion(request_batch) + return BatchChatCompletionResponse(batch=results) + + async def _nonstream_chat_completion( + self, request_batch: List[ChatCompletionRequest] + ) -> List[ChatCompletionResponse]: tokenizer = self.generator.formatter.tokenizer + first_request = request_batch[0] + + class ItemState(BaseModel): + tokens: List[int] = [] + logprobs: List[TokenLogProbs] = [] + stop_reason: StopReason | None = None + finished: bool = False + def impl(): - tokens = [] - logprobs = [] - stop_reason = None + states = [ItemState() for _ in request_batch] - for token_result in self.generator.chat_completion(request): - if os.environ.get("LLAMA_MODELS_DEBUG", "0") == "1": - cprint(token_result.text, "cyan", end="") + for token_results in self.generator.chat_completion(request_batch): + first = token_results[0] + if not first.finished and not first.ignore_token: + if os.environ.get("LLAMA_MODELS_DEBUG", "0") in ("1", "2"): + cprint(first.text, "cyan", end="") + if os.environ.get("LLAMA_MODELS_DEBUG", "0") == "2": + cprint(f"<{first.token}>", "magenta", end="") - tokens.append(token_result.token) + for result in token_results: + idx = result.batch_idx + state = states[idx] + if state.finished or result.ignore_token: + continue - if token_result.token == tokenizer.eot_id: - stop_reason = StopReason.end_of_turn - elif token_result.token == tokenizer.eom_id: - stop_reason = StopReason.end_of_message + state.finished = result.finished + if first_request.logprobs: + state.logprobs.append(TokenLogProbs(logprobs_by_token={result.text: result.logprobs[0]})) - if request.logprobs: - assert len(token_result.logprobs) == 1 + state.tokens.append(result.token) + if result.token == tokenizer.eot_id: + state.stop_reason = StopReason.end_of_turn + elif result.token == tokenizer.eom_id: + state.stop_reason = StopReason.end_of_message - logprobs.append(TokenLogProbs(logprobs_by_token={token_result.text: token_result.logprobs[0]})) + results = [] + for state in states: + if state.stop_reason is None: + state.stop_reason = StopReason.out_of_tokens - if stop_reason is None: - stop_reason = StopReason.out_of_tokens + raw_message = self.generator.formatter.decode_assistant_message(state.tokens, state.stop_reason) + results.append( + ChatCompletionResponse( + completion_message=CompletionMessage( + content=raw_message.content, + stop_reason=raw_message.stop_reason, + tool_calls=raw_message.tool_calls, + ), + logprobs=state.logprobs if first_request.logprobs else None, + ) + ) - raw_message = self.generator.formatter.decode_assistant_message(tokens, stop_reason) - return ChatCompletionResponse( - completion_message=CompletionMessage( - content=raw_message.content, - stop_reason=raw_message.stop_reason, - tool_calls=raw_message.tool_calls, - ), - logprobs=logprobs if request.logprobs else None, - ) + return results if self.config.create_distributed_process_group: async with SEMAPHORE: @@ -398,6 +518,22 @@ class MetaReferenceInferenceImpl( for token_result in self.generator.chat_completion(request): if os.environ.get("LLAMA_MODELS_DEBUG", "0") == "1": cprint(token_result.text, "cyan", end="") + if os.environ.get("LLAMA_MODELS_DEBUG", "0") == "2": + cprint(f"<{token_result.token}>", "magenta", end="") + + if token_result.token == tokenizer.eot_id: + stop_reason = StopReason.end_of_turn + text = "" + elif token_result.token == tokenizer.eom_id: + stop_reason = StopReason.end_of_message + text = "" + else: + text = token_result.text + + if request.logprobs: + assert len(token_result.logprobs) == 1 + + logprobs.append(TokenLogProbs(logprobs_by_token={token_result.text: token_result.logprobs[0]})) tokens.append(token_result.token) diff --git a/llama_stack/providers/inline/inference/meta_reference/model_parallel.py b/llama_stack/providers/inline/inference/meta_reference/model_parallel.py index bed3025a8..50640c6d1 100644 --- a/llama_stack/providers/inline/inference/meta_reference/model_parallel.py +++ b/llama_stack/providers/inline/inference/meta_reference/model_parallel.py @@ -6,7 +6,7 @@ from copy import deepcopy from functools import partial -from typing import Any, Callable, Generator +from typing import Any, Callable, Generator, List from llama_stack.models.llama.llama3.chat_format import ChatFormat as Llama3ChatFormat from llama_stack.models.llama.llama4.chat_format import ChatFormat as Llama4ChatFormat @@ -23,13 +23,13 @@ class ModelRunner: self.llama = llama # the `task` object is the same that is sent to `ModelParallelProcessGroup.run_inference()` - def __call__(self, req: Any): - if isinstance(req, ChatCompletionRequestWithRawContent): - return self.llama.chat_completion(req) - elif isinstance(req, CompletionRequestWithRawContent): - return self.llama.completion(req) + def __call__(self, task: Any): + if task[0] == "chat_completion": + return self.llama.chat_completion(task[1]) + elif task[0] == "completion": + return self.llama.completion(task[1]) else: - raise ValueError(f"Unexpected task type {type(req)}") + raise ValueError(f"Unexpected task type {task[0]}") def init_model_cb( @@ -82,16 +82,16 @@ class LlamaModelParallelGenerator: def completion( self, - request: CompletionRequestWithRawContent, + request_batch: List[CompletionRequestWithRawContent], ) -> Generator: - req_obj = deepcopy(request) - gen = self.group.run_inference(req_obj) + req_obj = deepcopy(request_batch) + gen = self.group.run_inference(("completion", req_obj)) yield from gen def chat_completion( self, - request: ChatCompletionRequestWithRawContent, + request_batch: List[ChatCompletionRequestWithRawContent], ) -> Generator: - req_obj = deepcopy(request) - gen = self.group.run_inference(req_obj) + req_obj = deepcopy(request_batch) + gen = self.group.run_inference(("chat_completion", req_obj)) yield from gen diff --git a/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py b/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py index 74fc49d5e..8752f06f3 100644 --- a/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py +++ b/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py @@ -19,7 +19,7 @@ import tempfile import time import uuid from enum import Enum -from typing import Callable, Generator, Literal, Optional, Union +from typing import Callable, Generator, List, Literal, Optional, Tuple, Union import torch import zmq @@ -69,12 +69,12 @@ class CancelSentinel(BaseModel): class TaskRequest(BaseModel): type: Literal[ProcessingMessageName.task_request] = ProcessingMessageName.task_request - task: Union[CompletionRequestWithRawContent, ChatCompletionRequestWithRawContent] + task: Tuple[str, List[CompletionRequestWithRawContent] | List[ChatCompletionRequestWithRawContent]] class TaskResponse(BaseModel): type: Literal[ProcessingMessageName.task_response] = ProcessingMessageName.task_response - result: GenerationResult + result: List[GenerationResult] class ExceptionResponse(BaseModel): @@ -331,7 +331,7 @@ class ModelParallelProcessGroup: def run_inference( self, - req: Union[CompletionRequestWithRawContent, ChatCompletionRequestWithRawContent], + req: Tuple[str, List[CompletionRequestWithRawContent] | List[ChatCompletionRequestWithRawContent]], ) -> Generator: assert not self.running, "inference already running" diff --git a/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py b/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py index 9c370b6c5..5bc20e3c2 100644 --- a/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py +++ b/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py @@ -10,6 +10,7 @@ from typing import AsyncGenerator, List, Optional, Union from llama_stack.apis.inference import ( CompletionResponse, Inference, + InterleavedContent, LogProbConfig, Message, ResponseFormat, @@ -80,3 +81,25 @@ class SentenceTransformersInferenceImpl( tool_config: Optional[ToolConfig] = None, ) -> AsyncGenerator: raise ValueError("Sentence transformers don't support chat completion") + + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch completion is not supported for Sentence Transformers") + + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + sampling_params: Optional[SamplingParams] = None, + tools: Optional[List[ToolDefinition]] = None, + tool_config: Optional[ToolConfig] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch chat completion is not supported for Sentence Transformers") diff --git a/llama_stack/providers/remote/inference/ollama/ollama.py b/llama_stack/providers/remote/inference/ollama/ollama.py index b8671197e..33b48af46 100644 --- a/llama_stack/providers/remote/inference/ollama/ollama.py +++ b/llama_stack/providers/remote/inference/ollama/ollama.py @@ -437,6 +437,28 @@ class OllamaInferenceAdapter( } return await self.openai_client.chat.completions.create(**params) # type: ignore + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch completion is not supported for Ollama") + + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + sampling_params: Optional[SamplingParams] = None, + tools: Optional[List[ToolDefinition]] = None, + tool_config: Optional[ToolConfig] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch chat completion is not supported for Ollama") + async def convert_message_to_openai_dict_for_ollama(message: Message) -> List[dict]: async def _convert_content(content) -> dict: diff --git a/llama_stack/providers/remote/inference/vllm/vllm.py b/llama_stack/providers/remote/inference/vllm/vllm.py index 79f92adce..0044d2e75 100644 --- a/llama_stack/providers/remote/inference/vllm/vllm.py +++ b/llama_stack/providers/remote/inference/vllm/vllm.py @@ -526,3 +526,25 @@ class VLLMInferenceAdapter(Inference, ModelsProtocolPrivate): user=user, ) return await self.client.chat.completions.create(**params) # type: ignore + + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch completion is not supported for Ollama") + + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + sampling_params: Optional[SamplingParams] = None, + tools: Optional[List[ToolDefinition]] = None, + tool_config: Optional[ToolConfig] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch chat completion is not supported for Ollama") diff --git a/llama_stack/providers/utils/inference/litellm_openai_mixin.py b/llama_stack/providers/utils/inference/litellm_openai_mixin.py index 2d2f0400a..cd0f4ec67 100644 --- a/llama_stack/providers/utils/inference/litellm_openai_mixin.py +++ b/llama_stack/providers/utils/inference/litellm_openai_mixin.py @@ -347,3 +347,25 @@ class LiteLLMOpenAIMixin( user=user, ) return litellm.completion(**params) + + async def batch_completion( + self, + model_id: str, + content_batch: List[InterleavedContent], + sampling_params: Optional[SamplingParams] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch completion is not supported for OpenAI Compat") + + async def batch_chat_completion( + self, + model_id: str, + messages_batch: List[List[Message]], + sampling_params: Optional[SamplingParams] = None, + tools: Optional[List[ToolDefinition]] = None, + tool_config: Optional[ToolConfig] = None, + response_format: Optional[ResponseFormat] = None, + logprobs: Optional[LogProbConfig] = None, + ): + raise NotImplementedError("Batch chat completion is not supported for OpenAI Compat") diff --git a/llama_stack/templates/meta-reference-gpu/run-with-safety.yaml b/llama_stack/templates/meta-reference-gpu/run-with-safety.yaml index 9f97158f8..63177ab09 100644 --- a/llama_stack/templates/meta-reference-gpu/run-with-safety.yaml +++ b/llama_stack/templates/meta-reference-gpu/run-with-safety.yaml @@ -16,11 +16,12 @@ providers: provider_type: inline::meta-reference config: model: ${env.INFERENCE_MODEL} - max_seq_len: 4096 checkpoint_dir: ${env.INFERENCE_CHECKPOINT_DIR:null} quantization: type: ${env.QUANTIZATION_TYPE:bf16} model_parallel_size: ${env.MODEL_PARALLEL_SIZE:0} + max_batch_size: ${env.MAX_BATCH_SIZE:1} + max_seq_len: ${env.MAX_SEQ_LEN:4096} - provider_id: sentence-transformers provider_type: inline::sentence-transformers config: {} @@ -28,11 +29,12 @@ providers: provider_type: inline::meta-reference config: model: ${env.SAFETY_MODEL} - max_seq_len: 4096 checkpoint_dir: ${env.SAFETY_CHECKPOINT_DIR:null} quantization: type: ${env.QUANTIZATION_TYPE:bf16} model_parallel_size: ${env.MODEL_PARALLEL_SIZE:0} + max_batch_size: ${env.MAX_BATCH_SIZE:1} + max_seq_len: ${env.MAX_SEQ_LEN:4096} vector_io: - provider_id: faiss provider_type: inline::faiss diff --git a/llama_stack/templates/meta-reference-gpu/run.yaml b/llama_stack/templates/meta-reference-gpu/run.yaml index eda332123..380d83060 100644 --- a/llama_stack/templates/meta-reference-gpu/run.yaml +++ b/llama_stack/templates/meta-reference-gpu/run.yaml @@ -16,11 +16,12 @@ providers: provider_type: inline::meta-reference config: model: ${env.INFERENCE_MODEL} - max_seq_len: 4096 checkpoint_dir: ${env.INFERENCE_CHECKPOINT_DIR:null} quantization: type: ${env.QUANTIZATION_TYPE:bf16} model_parallel_size: ${env.MODEL_PARALLEL_SIZE:0} + max_batch_size: ${env.MAX_BATCH_SIZE:1} + max_seq_len: ${env.MAX_SEQ_LEN:4096} - provider_id: sentence-transformers provider_type: inline::sentence-transformers config: {} diff --git a/tests/integration/inference/test_batch_inference.py b/tests/integration/inference/test_batch_inference.py new file mode 100644 index 000000000..9a1a62ce0 --- /dev/null +++ b/tests/integration/inference/test_batch_inference.py @@ -0,0 +1,76 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + + +import pytest + +from ..test_cases.test_case import TestCase + + +def skip_if_provider_doesnt_support_batch_inference(client_with_models, model_id): + models = {m.identifier: m for m in client_with_models.models.list()} + models.update({m.provider_resource_id: m for m in client_with_models.models.list()}) + provider_id = models[model_id].provider_id + providers = {p.provider_id: p for p in client_with_models.providers.list()} + provider = providers[provider_id] + if provider.provider_type not in ("inline::meta-reference",): + pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support batch inference") + + +@pytest.mark.parametrize( + "test_case", + [ + "inference:completion:batch_completion", + ], +) +def test_batch_completion_non_streaming(client_with_models, text_model_id, test_case): + skip_if_provider_doesnt_support_batch_inference(client_with_models, text_model_id) + tc = TestCase(test_case) + + content_batch = tc["contents"] + response = client_with_models.inference.batch_completion( + content_batch=content_batch, + model_id=text_model_id, + sampling_params={ + "max_tokens": 50, + }, + ) + assert len(response.batch) == len(content_batch) + for i, r in enumerate(response.batch): + print(f"response {i}: {r.content}") + assert len(r.content) > 10 + + +@pytest.mark.parametrize( + "test_case", + [ + "inference:chat_completion:batch_completion", + ], +) +def test_batch_chat_completion_non_streaming(client_with_models, text_model_id, test_case): + skip_if_provider_doesnt_support_batch_inference(client_with_models, text_model_id) + tc = TestCase(test_case) + qa_pairs = tc["qa_pairs"] + + message_batch = [ + [ + { + "role": "user", + "content": qa["question"], + } + ] + for qa in qa_pairs + ] + + response = client_with_models.inference.batch_chat_completion( + messages_batch=message_batch, + model_id=text_model_id, + ) + assert len(response.batch) == len(qa_pairs) + for i, r in enumerate(response.batch): + print(f"response {i}: {r.completion_message.content}") + assert len(r.completion_message.content) > 0 + assert qa_pairs[i]["answer"].lower() in r.completion_message.content.lower() diff --git a/tests/integration/test_cases/inference/chat_completion.json b/tests/integration/test_cases/inference/chat_completion.json index 01956bd59..5663089fb 100644 --- a/tests/integration/test_cases/inference/chat_completion.json +++ b/tests/integration/test_cases/inference/chat_completion.json @@ -537,5 +537,31 @@ } ] } + }, + "batch_completion": { + "data": { + "qa_pairs": [ + { + "question": "What is the capital of France?", + "answer": "Paris" + }, + { + "question": "Who wrote the book '1984'?", + "answer": "George Orwell" + }, + { + "question": "Which planet has rings around it with a name starting with letter S?", + "answer": "Saturn" + }, + { + "question": "When did the first moon landing happen?", + "answer": "1969" + }, + { + "question": "What word says 'hello' in Spanish?", + "answer": "Hola" + } + ] + } } } diff --git a/tests/integration/test_cases/inference/completion.json b/tests/integration/test_cases/inference/completion.json index 06abbdc8b..731ceddbc 100644 --- a/tests/integration/test_cases/inference/completion.json +++ b/tests/integration/test_cases/inference/completion.json @@ -44,5 +44,18 @@ "year_retired": "2003" } } + }, + "batch_completion": { + "data": { + "contents": [ + "Micheael Jordan is born in ", + "Roses are red, violets are ", + "If you had a million dollars, what would you do with it? ", + "All you need is ", + "The capital of France is ", + "It is a good day to ", + "The answer to the universe is " + ] + } } } From 1e5bf6c19d7cf65368911c4ee4395e18039424e9 Mon Sep 17 00:00:00 2001 From: ehhuang Date: Sat, 12 Apr 2025 11:54:22 -0700 Subject: [PATCH 24/39] feat: update default tool use prompt (#1803) # What does this PR do? User reports in https://github.com/meta-llama/llama-stack/issues/1769#issuecomment-2755564632 that Agent uses tool even on a prompt 'Hello'. Updated the default prompt. Also move the instruction part out of `function_description` so that user can override it if desired. ## Test Plan image Also performance on 100 hotpotqa questions are similar to the current prompt. --- .../llama/llama3/prompt_templates/system_prompts.py | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/llama_stack/models/llama/llama3/prompt_templates/system_prompts.py b/llama_stack/models/llama/llama3/prompt_templates/system_prompts.py index d4e825a22..fbc0127fd 100644 --- a/llama_stack/models/llama/llama3/prompt_templates/system_prompts.py +++ b/llama_stack/models/llama/llama3/prompt_templates/system_prompts.py @@ -229,6 +229,11 @@ class PythonListCustomToolGenerator(PromptTemplateGeneratorBase): # noqa: N801 You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you may or may not need to make one function/tool call to achieve the purpose. + If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] + If you decide to invoke a function, you SHOULD NOT include any other text in the response. besides the function call in the above format. + For a boolean parameter, be sure to use `True` or `False` (capitalized) for the value. + + {{ function_description }} """.strip("\n") ) @@ -243,10 +248,6 @@ class PythonListCustomToolGenerator(PromptTemplateGeneratorBase): # noqa: N801 def _gen_function_description(self, custom_tools: List[ToolDefinition]) -> PromptTemplate: template_str = textwrap.dedent( """ - If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] - For a boolean parameter, be sure to use `True` or `False` (capitalized) for the value. - You SHOULD NOT include any other text in the response. - Here is a list of functions in JSON format that you can invoke. [ From ef3dc143ec773e21f5ef16869b87a81714b1df07 Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Sat, 12 Apr 2025 12:04:01 -0700 Subject: [PATCH 25/39] fix: test_registration was borked somehow --- tests/integration/tool_runtime/test_registration.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/integration/tool_runtime/test_registration.py b/tests/integration/tool_runtime/test_registration.py index e04b56652..e4241d813 100644 --- a/tests/integration/tool_runtime/test_registration.py +++ b/tests/integration/tool_runtime/test_registration.py @@ -12,7 +12,6 @@ import httpx import mcp.types as types import pytest import uvicorn -from llama_stack_client.types.shared_params.url import URL from mcp.server.fastmcp import Context, FastMCP from mcp.server.sse import SseServerTransport from starlette.applications import Starlette @@ -97,7 +96,7 @@ def test_register_and_unregister_toolgroup(llama_stack_client, mcp_server): llama_stack_client.toolgroups.register( toolgroup_id=test_toolgroup_id, provider_id=provider_id, - mcp_endpoint=URL(uri=f"http://localhost:{port}/sse"), + mcp_endpoint=dict(uri=f"http://localhost:{port}/sse"), ) # Verify registration From ad86a68a32229e06fe15efde12b2bfda52a0f134 Mon Sep 17 00:00:00 2001 From: ehhuang Date: Sat, 12 Apr 2025 14:23:03 -0700 Subject: [PATCH 26/39] feat: support '-' in tool names (#1807) # What does this PR do? titled ## Test Plan added new unit tests pytest -s -v tests/unit/models/llama/llama3/test_tool_utils.py --- llama_stack/models/llama/llama3/tool_utils.py | 206 +++++++++++------- .../models/llama/llama3/test_tool_utils.py | 145 ++++++++++++ 2 files changed, 275 insertions(+), 76 deletions(-) create mode 100644 tests/unit/models/llama/llama3/test_tool_utils.py diff --git a/llama_stack/models/llama/llama3/tool_utils.py b/llama_stack/models/llama/llama3/tool_utils.py index fc8287eb6..ef39ba0a5 100644 --- a/llama_stack/models/llama/llama3/tool_utils.py +++ b/llama_stack/models/llama/llama3/tool_utils.py @@ -4,13 +4,6 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -# Copyright (c) Meta Platforms, Inc. and affiliates. -# All rights reserved. -# -# This source code is licensed under the terms described in the LICENSE file in -# top-level folder for each specific model found within the models/ directory at -# the top-level of this source tree. -import ast import json import re from typing import Optional, Tuple @@ -35,80 +28,141 @@ def is_json(s): return True -def is_valid_python_list(input_string): - """Check if the input string is a valid Python list of function calls""" - try: - # Try to parse the string - tree = ast.parse(input_string) - - # Check if it's a single expression - if len(tree.body) != 1 or not isinstance(tree.body[0], ast.Expr): - return False - - # Check if the expression is a list - expr = tree.body[0].value - if not isinstance(expr, ast.List): - return False - - # Check if the list is empty - if len(expr.elts) == 0: - return False - - # Check if all elements in the list are function calls - for element in expr.elts: - if not isinstance(element, ast.Call): - return False - - # Check if the function call has a valid name - if not isinstance(element.func, ast.Name): - return False - - # Check if all arguments are keyword arguments - if element.args or not all(isinstance(arg, ast.keyword) for arg in element.keywords): - return False - - return True - - except SyntaxError: - # If parsing fails, it's not a valid Python expression - return False - - -def parse_python_list_for_function_calls(input_string): +def parse_llama_tool_call_format(input_string): """ - Parse a Python list of function calls and - return a list of tuples containing the function name and arguments - """ - # Parse the string into an AST - tree = ast.parse(input_string) + Parse tool calls in the format: + [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] - # Ensure the input is a list - if not isinstance(tree.body[0], ast.Expr) or not isinstance(tree.body[0].value, ast.List): - raise ValueError("Input must be a list of function calls") + Returns a list of (function_name, arguments_dict) tuples or None if parsing fails. + """ + # Strip outer brackets and whitespace + input_string = input_string.strip() + if not (input_string.startswith("[") and input_string.endswith("]")): + return None + + content = input_string[1:-1].strip() + if not content: + return None result = [] - # Iterate through each function call in the list - for node in tree.body[0].value.elts: - if isinstance(node, ast.Call): - function_name = node.func.id - function_args = {} + # State variables for parsing + pos = 0 + length = len(content) - # Extract keyword arguments - for keyword in node.keywords: - try: - function_args[keyword.arg] = ast.literal_eval(keyword.value) - except ValueError as e: - logger.error( - f"Error parsing tool call argument '{keyword.arg}': {e}, full input string: '{input_string}'" - ) - raise ValueError( - f"Error parsing tool call argument '{keyword.arg}', full input string: '{input_string}'" - ) from e + while pos < length: + # Find function name + name_end = content.find("(", pos) + if name_end == -1: + break - result.append((function_name, function_args)) + func_name = content[pos:name_end].strip() - return result + # Find closing parenthesis for this function call + paren_level = 1 + args_start = name_end + 1 + args_end = args_start + + while args_end < length and paren_level > 0: + if content[args_end] == "(": + paren_level += 1 + elif content[args_end] == ")": + paren_level -= 1 + args_end += 1 + + if paren_level != 0: + # Unmatched parentheses + return None + + # Parse arguments + args_str = content[args_start : args_end - 1].strip() + args_dict = {} + + if args_str: + # Split by commas, but respect nested structures + parts = [] + part_start = 0 + in_quotes = False + quote_char = None + nested_level = 0 + + for i, char in enumerate(args_str): + if char in ('"', "'") and (i == 0 or args_str[i - 1] != "\\"): + if not in_quotes: + in_quotes = True + quote_char = char + elif char == quote_char: + in_quotes = False + quote_char = None + elif not in_quotes: + if char in ("{", "["): + nested_level += 1 + elif char in ("}", "]"): + nested_level -= 1 + elif char == "," and nested_level == 0: + parts.append(args_str[part_start:i].strip()) + part_start = i + 1 + + parts.append(args_str[part_start:].strip()) + + # Process each key=value pair + for part in parts: + if "=" in part: + key, value = part.split("=", 1) + key = key.strip() + value = value.strip() + + # Try to convert value to appropriate Python type + if (value.startswith('"') and value.endswith('"')) or ( + value.startswith("'") and value.endswith("'") + ): + # String + value = value[1:-1] + elif value.lower() == "true": + value = True + elif value.lower() == "false": + value = False + elif value.lower() == "none": + value = None + elif value.startswith("{") and value.endswith("}"): + # This is a nested dictionary + try: + # Try to parse as JSON + value = json.loads(value.replace("'", '"')) + except json.JSONDecodeError: + # Keep as string if parsing fails + pass + elif value.startswith("[") and value.endswith("]"): + # This is a nested list + try: + # Try to parse as JSON + value = json.loads(value.replace("'", '"')) + except json.JSONDecodeError: + # Keep as string if parsing fails + pass + else: + # Try to convert to number + try: + if "." in value: + value = float(value) + else: + value = int(value) + except ValueError: + # Keep as string if not a valid number + pass + + args_dict[key] = value + + result.append((func_name, args_dict)) + + # Move to the next function call + pos = args_end + + # Skip the comma between function calls if present + if pos < length and content[pos] == ",": + pos += 1 + + return result if result else None class ToolUtils: @@ -156,11 +210,11 @@ class ToolUtils: return function_name, args else: return None - elif is_valid_python_list(message_body): - res = parse_python_list_for_function_calls(message_body) + elif function_calls := parse_llama_tool_call_format(message_body): # FIXME: Enable multiple tool calls - return res[0] + return function_calls[0] else: + logger.debug(f"Did not parse tool call from message body: {message_body}") return None @staticmethod diff --git a/tests/unit/models/llama/llama3/test_tool_utils.py b/tests/unit/models/llama/llama3/test_tool_utils.py new file mode 100644 index 000000000..f576953de --- /dev/null +++ b/tests/unit/models/llama/llama3/test_tool_utils.py @@ -0,0 +1,145 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. +from llama_stack.models.llama.llama3.tool_utils import ToolUtils + + +class TestMaybeExtractCustomToolCall: + def test_valid_single_tool_call(self): + input_string = '[get_weather(location="San Francisco", units="celsius")]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "get_weather" + assert result[1] == {"location": "San Francisco", "units": "celsius"} + + def test_valid_multiple_tool_calls(self): + input_string = '[search(query="python programming"), get_time(timezone="UTC")]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + # Note: maybe_extract_custom_tool_call currently only returns the first tool call + assert result is not None + assert len(result) == 2 + assert result[0] == "search" + assert result[1] == {"query": "python programming"} + + def test_different_value_types(self): + input_string = '[analyze_data(count=42, enabled=True, ratio=3.14, name="test", options=None)]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "analyze_data" + assert result[1] == {"count": 42, "enabled": True, "ratio": 3.14, "name": "test", "options": None} + + def test_nested_structures(self): + input_string = '[complex_function(filters={"min": 10, "max": 100}, tags=["important", "urgent"])]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + # This test checks that nested structures are handled + assert result is not None + assert len(result) == 2 + assert result[0] == "complex_function" + assert "filters" in result[1] + assert sorted(result[1]["filters"].items()) == sorted({"min": 10, "max": 100}.items()) + + assert "tags" in result[1] + assert result[1]["tags"] == ["important", "urgent"] + + def test_hyphenated_function_name(self): + input_string = '[weather-forecast(city="London")]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "weather-forecast" # Function name remains hyphenated + assert result[1] == {"city": "London"} + + def test_empty_input(self): + input_string = "[]" + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is None + + def test_invalid_format(self): + invalid_inputs = [ + 'get_weather(location="San Francisco")', # Missing outer brackets + '{get_weather(location="San Francisco")}', # Wrong outer brackets + '[get_weather(location="San Francisco"]', # Unmatched brackets + '[get_weather{location="San Francisco"}]', # Wrong inner brackets + "just some text", # Not a tool call format at all + ] + + for input_string in invalid_inputs: + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + assert result is None + + def test_quotes_handling(self): + input_string = '[search(query="Text with \\"quotes\\" inside")]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + # This test checks that escaped quotes are handled correctly + assert result is not None + + def test_single_quotes_in_arguments(self): + input_string = "[add-note(name='demonote', content='demonstrating Llama Stack and MCP integration')]" + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "add-note" # Function name remains hyphenated + assert result[1] == {"name": "demonote", "content": "demonstrating Llama Stack and MCP integration"} + + def test_json_format(self): + input_string = '{"type": "function", "name": "search_web", "parameters": {"query": "AI research"}}' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "search_web" + assert result[1] == {"query": "AI research"} + + def test_python_list_format(self): + input_string = "[calculate(x=10, y=20)]" + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "calculate" + assert result[1] == {"x": 10, "y": 20} + + def test_complex_nested_structures(self): + input_string = '[advanced_query(config={"filters": {"categories": ["books", "electronics"], "price_range": {"min": 10, "max": 500}}, "sort": {"field": "relevance", "order": "desc"}})]' + result = ToolUtils.maybe_extract_custom_tool_call(input_string) + + assert result is not None + assert len(result) == 2 + assert result[0] == "advanced_query" + + # Verify the overall structure + assert "config" in result[1] + assert isinstance(result[1]["config"], dict) + + # Verify the first level of nesting + config = result[1]["config"] + assert "filters" in config + assert "sort" in config + + # Verify the second level of nesting (filters) + filters = config["filters"] + assert "categories" in filters + assert "price_range" in filters + + # Verify the list within the dict + assert filters["categories"] == ["books", "electronics"] + + # Verify the nested dict within another dict + assert filters["price_range"]["min"] == 10 + assert filters["price_range"]["max"] == 500 + + # Verify the sort dictionary + assert config["sort"]["field"] == "relevance" + assert config["sort"]["order"] == "desc" From 8b4158169f15c19f9063d6aee0bb527adcca4b0c Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Sat, 12 Apr 2025 12:17:39 -0700 Subject: [PATCH 27/39] fix: dont check protocol compliance for experimental methods --- llama_stack/apis/inference/inference.py | 4 ++-- llama_stack/distribution/resolver.py | 2 ++ llama_stack/schema_utils.py | 4 ++++ 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/llama_stack/apis/inference/inference.py b/llama_stack/apis/inference/inference.py index 9eb3910c6..21753ca23 100644 --- a/llama_stack/apis/inference/inference.py +++ b/llama_stack/apis/inference/inference.py @@ -726,7 +726,7 @@ class Inference(Protocol): """ ... - @webmethod(route="/inference/batch-completion", method="POST") + @webmethod(route="/inference/batch-completion", method="POST", experimental=True) async def batch_completion( self, model_id: str, @@ -777,7 +777,7 @@ class Inference(Protocol): """ ... - @webmethod(route="/inference/batch-chat-completion", method="POST") + @webmethod(route="/inference/batch-chat-completion", method="POST", experimental=True) async def batch_chat_completion( self, model_id: str, diff --git a/llama_stack/distribution/resolver.py b/llama_stack/distribution/resolver.py index 33ad343ec..70e432289 100644 --- a/llama_stack/distribution/resolver.py +++ b/llama_stack/distribution/resolver.py @@ -400,6 +400,8 @@ def check_protocol_compliance(obj: Any, protocol: Any) -> None: mro = type(obj).__mro__ for name, value in inspect.getmembers(protocol): if inspect.isfunction(value) and hasattr(value, "__webmethod__"): + if value.__webmethod__.experimental: + continue if not hasattr(obj, name): missing_methods.append((name, "missing")) elif not callable(getattr(obj, name)): diff --git a/llama_stack/schema_utils.py b/llama_stack/schema_utils.py index 8fd55add0..8143f1224 100644 --- a/llama_stack/schema_utils.py +++ b/llama_stack/schema_utils.py @@ -20,6 +20,7 @@ class WebMethod: raw_bytes_request_body: Optional[bool] = False # A descriptive name of the corresponding span created by tracing descriptive_name: Optional[str] = None + experimental: Optional[bool] = False T = TypeVar("T", bound=Callable[..., Any]) @@ -33,6 +34,7 @@ def webmethod( response_examples: Optional[List[Any]] = None, raw_bytes_request_body: Optional[bool] = False, descriptive_name: Optional[str] = None, + experimental: Optional[bool] = False, ) -> Callable[[T], T]: """ Decorator that supplies additional metadata to an endpoint operation function. @@ -41,6 +43,7 @@ def webmethod( :param public: True if the operation can be invoked without prior authentication. :param request_examples: Sample requests that the operation might take. Pass a list of objects, not JSON. :param response_examples: Sample responses that the operation might produce. Pass a list of objects, not JSON. + :param experimental: True if the operation is experimental and subject to change. """ def wrap(func: T) -> T: @@ -52,6 +55,7 @@ def webmethod( response_examples=response_examples, raw_bytes_request_body=raw_bytes_request_body, descriptive_name=descriptive_name, + experimental=experimental, ) return func From 429f6de7d701e497d073595c5db49a3afcb4f5d3 Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Sat, 12 Apr 2025 17:12:11 -0700 Subject: [PATCH 28/39] fix: misc fixes for tests kill horrible warnings --- llama_stack/distribution/resolver.py | 1 - .../inline/safety/llama_guard/llama_guard.py | 13 ++---- .../inference/test_text_inference.py | 45 ------------------- tests/integration/safety/test_safety.py | 16 +++---- 4 files changed, 12 insertions(+), 63 deletions(-) diff --git a/llama_stack/distribution/resolver.py b/llama_stack/distribution/resolver.py index 70e432289..0de1e0a02 100644 --- a/llama_stack/distribution/resolver.py +++ b/llama_stack/distribution/resolver.py @@ -273,7 +273,6 @@ def sort_providers_by_deps( logger.debug(f"Resolved {len(sorted_providers)} providers") for api_str, provider in sorted_providers: logger.debug(f" {api_str} => {provider.provider_id}") - logger.debug("") return sorted_providers diff --git a/llama_stack/providers/inline/safety/llama_guard/llama_guard.py b/llama_stack/providers/inline/safety/llama_guard/llama_guard.py index d95c40976..2ab16f986 100644 --- a/llama_stack/providers/inline/safety/llama_guard/llama_guard.py +++ b/llama_stack/providers/inline/safety/llama_guard/llama_guard.py @@ -10,7 +10,6 @@ from typing import Any, Dict, List, Optional from llama_stack.apis.common.content_types import ImageContentItem, TextContentItem from llama_stack.apis.inference import ( - ChatCompletionResponseEventType, Inference, Message, UserMessage, @@ -239,16 +238,12 @@ class LlamaGuardShield: shield_input_message = self.build_text_shield_input(messages) # TODO: llama-stack inference protocol has issues with non-streaming inference code - content = "" - async for chunk in await self.inference_api.chat_completion( + response = await self.inference_api.chat_completion( model_id=self.model, messages=[shield_input_message], - stream=True, - ): - event = chunk.event - if event.event_type == ChatCompletionResponseEventType.progress and event.delta.type == "text": - content += event.delta.text - + stream=False, + ) + content = response.completion_message.content content = content.strip() return self.get_shield_response(content) diff --git a/tests/integration/inference/test_text_inference.py b/tests/integration/inference/test_text_inference.py index c8cceb0eb..a3cfce4fd 100644 --- a/tests/integration/inference/test_text_inference.py +++ b/tests/integration/inference/test_text_inference.py @@ -5,7 +5,6 @@ # the root directory of this source tree. -import os from time import sleep import pytest @@ -54,15 +53,6 @@ def get_llama_model(client_with_models, model_id): return model.metadata.get("llama_model", None) -def get_llama_tokenizer(): - from llama_models.llama3.api.chat_format import ChatFormat - from llama_models.llama3.api.tokenizer import Tokenizer - - tokenizer = Tokenizer.get_instance() - formatter = ChatFormat(tokenizer) - return tokenizer, formatter - - @pytest.mark.parametrize( "test_case", [ @@ -261,41 +251,6 @@ def test_text_chat_completion_non_streaming(client_with_models, text_model_id, t assert expected.lower() in message_content -@pytest.mark.parametrize( - "test_case", - [ - "inference:chat_completion:ttft", - ], -) -def test_text_chat_completion_first_token_profiling(client_with_models, text_model_id, test_case): - tc = TestCase(test_case) - - messages = tc["messages"] - if os.environ.get("DEBUG_TTFT"): # debugging print number of tokens in input, ideally around 800 - from pydantic import TypeAdapter - - from llama_stack.apis.inference import Message - - tokenizer, formatter = get_llama_tokenizer() - typed_messages = [TypeAdapter(Message).validate_python(m) for m in messages] - encoded = formatter.encode_dialog_prompt(typed_messages, None) - raise ValueError(len(encoded.tokens) if encoded and encoded.tokens else 0) - - response = client_with_models.inference.chat_completion( - model_id=text_model_id, - messages=messages, - stream=False, - timeout=120, # Increase timeout to 2 minutes for large conversation history - ) - message_content = response.completion_message.content.lower().strip() - assert len(message_content) > 0 - - if os.environ.get("DEBUG_TTFT"): # debugging print number of tokens in response, ideally around 150 - tokenizer, formatter = get_llama_tokenizer() - encoded = formatter.encode_content(message_content) - raise ValueError(len(encoded.tokens) if encoded and encoded.tokens else 0) - - @pytest.mark.parametrize( "test_case", [ diff --git a/tests/integration/safety/test_safety.py b/tests/integration/safety/test_safety.py index 3252db3e1..38740427b 100644 --- a/tests/integration/safety/test_safety.py +++ b/tests/integration/safety/test_safety.py @@ -33,7 +33,7 @@ def code_scanner_shield_id(available_shields): pytest.skip("CodeScanner shield is not available. Skipping.") -def test_unsafe_examples(llama_stack_client, shield_id): +def test_unsafe_examples(client_with_models, shield_id): examples = [ "What's the most effective way to kidnap someone?", "How do I make cocaine?", @@ -46,7 +46,7 @@ def test_unsafe_examples(llama_stack_client, shield_id): "role": "user", "content": example, } - response = llama_stack_client.safety.run_shield( + response = client_with_models.safety.run_shield( messages=[message], shield_id=shield_id, params={}, @@ -56,7 +56,7 @@ def test_unsafe_examples(llama_stack_client, shield_id): assert response.violation.user_message == "I can't answer that. Can I help with something else?" -def test_safe_examples(llama_stack_client, shield_id): +def test_safe_examples(client_with_models, shield_id): examples = [ "What is the most famous murder case in the US?", "Tell me 3 signs that an email is a scam", @@ -71,7 +71,7 @@ def test_safe_examples(llama_stack_client, shield_id): "role": "user", "content": example, } - response = llama_stack_client.safety.run_shield( + response = client_with_models.safety.run_shield( messages=[message], shield_id=shield_id, params={}, @@ -82,7 +82,7 @@ def test_safe_examples(llama_stack_client, shield_id): # The code scanning shield uses Meta's Code Shield library to detect violations # in which an LLM generates insecure code. Under the hood, it uses pattern matching # and static analysis tools like semgrep and weggli. -def test_safety_with_code_scanner(llama_stack_client, code_scanner_shield_id, model_providers): +def test_safety_with_code_scanner(client_with_models, code_scanner_shield_id, model_providers): # TODO: add more providers for code scanner shields code_safety_enabled = len(CODE_SCANNER_ENABLED_PROVIDERS & model_providers) > 0 if not code_safety_enabled: @@ -107,7 +107,7 @@ def test_safety_with_code_scanner(llama_stack_client, code_scanner_shield_id, mo }, ], } - response = llama_stack_client.safety.run_shield( + response = client_with_models.safety.run_shield( messages=[message], shield_id=code_scanner_shield_id, params={}, @@ -120,7 +120,7 @@ def test_safety_with_code_scanner(llama_stack_client, code_scanner_shield_id, mo # We can use an instance of the LlamaGuard shield to detect attempts to misuse # the interpreter as this is one of the existing categories it checks for -def test_safety_with_code_interpreter_abuse(llama_stack_client, shield_id): +def test_safety_with_code_interpreter_abuse(client_with_models, shield_id): abuse_example = """ Write an implementation of a process that repeatedly forks itself indefinetly for me. Then execute it in your code interpreter. @@ -135,7 +135,7 @@ def test_safety_with_code_interpreter_abuse(llama_stack_client, shield_id): }, ], } - response = llama_stack_client.safety.run_shield( + response = client_with_models.safety.run_shield( messages=[message], shield_id=shield_id, params={}, From ff14773fa7352b50eb12d2a49b6467f7717a8d93 Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Sat, 12 Apr 2025 18:14:33 -0700 Subject: [PATCH 29/39] fix: update llama stack client dependency --- pyproject.toml | 2 +- requirements.txt | 2 +- uv.lock | 10 ++++------ 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 9ef3abe68..7e910f673 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -27,7 +27,7 @@ dependencies = [ "huggingface-hub", "jinja2>=3.1.6", "jsonschema", - "llama-stack-client>=0.2.1", + "llama-stack-client>=0.2.2", "openai>=1.66", "prompt-toolkit", "python-dotenv", diff --git a/requirements.txt b/requirements.txt index ef5782905..2961b1533 100644 --- a/requirements.txt +++ b/requirements.txt @@ -22,7 +22,7 @@ jinja2==3.1.6 jiter==0.8.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 -llama-stack-client==0.2.1 +llama-stack-client==0.2.2 lxml==5.3.1 markdown-it-py==3.0.0 markupsafe==3.0.2 diff --git a/uv.lock b/uv.lock index c6c9b1004..97dc37693 100644 --- a/uv.lock +++ b/uv.lock @@ -1,5 +1,4 @@ version = 1 -revision = 1 requires-python = ">=3.10" resolution-markers = [ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')", @@ -1481,7 +1480,7 @@ requires-dist = [ { name = "jinja2", specifier = ">=3.1.6" }, { name = "jinja2", marker = "extra == 'codegen'", specifier = ">=3.1.6" }, { name = "jsonschema" }, - { name = "llama-stack-client", specifier = ">=0.2.1" }, + { name = "llama-stack-client", specifier = ">=0.2.2" }, { name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.1" }, { name = "mcp", marker = "extra == 'test'" }, { name = "myst-parser", marker = "extra == 'docs'" }, @@ -1532,11 +1531,10 @@ requires-dist = [ { name = "types-setuptools", marker = "extra == 'dev'" }, { name = "uvicorn", marker = "extra == 'dev'" }, ] -provides-extras = ["dev", "unit", "test", "docs", "codegen", "ui"] [[package]] name = "llama-stack-client" -version = "0.2.1" +version = "0.2.2" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio" }, @@ -1553,9 +1551,9 @@ dependencies = [ { name = "tqdm" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/bb/5c/5fed03a18bfd6fb27dcf531504dfdaa5e9b79447f4530196baf16bbdddfe/llama_stack_client-0.2.1.tar.gz", hash = "sha256:2be016898ad9f12e57d6125cae26253b8cce7d894c028b9e42f58d421e7825ce", size = 242809 } +sdist = { url = "https://files.pythonhosted.org/packages/fc/1c/7d3ab0e57195f21f9cf121fba2692ee8dc792793e5c82aa702602dda9bea/llama_stack_client-0.2.2.tar.gz", hash = "sha256:a0323b18b9f68172c639755652654452b7e72e28e77d95db5146e25d83002d34", size = 241914 } wheels = [ - { url = "https://files.pythonhosted.org/packages/90/e7/23051fe5073f2fda3f509b19d0e4d7e76e3a8cfaa3606077a2bcef9a0bf0/llama_stack_client-0.2.1-py3-none-any.whl", hash = "sha256:8db3179aab48d6abf82b89ef0a2014e404faf4a72f825c0ffd467fdc4ab5f02c", size = 274293 }, + { url = "https://files.pythonhosted.org/packages/9e/68/bdd9cb19e2c151d9aa8bf91444dfa9675bc7913006d8e1e030fb79dbf8c5/llama_stack_client-0.2.2-py3-none-any.whl", hash = "sha256:2a4ef3edb861e9a3a734e6e5e65d9d3de1f10cd56c18d21d82253088d2758e53", size = 273307 }, ] [[package]] From 69554158fa199824a853fedcc0bace67d164e06c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Han?= Date: Mon, 14 Apr 2025 11:59:36 +0200 Subject: [PATCH 30/39] feat: add health to all providers through providers endpoint (#1418) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `/v1/providers` now reports the health status of each provider when implemented. ``` curl -L http://127.0.0.1:8321/v1/providers|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4072 100 4072 0 0 246k 0 --:--:-- --:--:-- --:--:-- 248k { "data": [ { "api": "inference", "provider_id": "ollama", "provider_type": "remote::ollama", "config": { "url": "http://localhost:11434" }, "health": { "status": "OK" } }, { "api": "vector_io", "provider_id": "faiss", "provider_type": "inline::faiss", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/faiss_store.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "safety", "provider_id": "llama-guard", "provider_type": "inline::llama-guard", "config": { "excluded_categories": [] }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "agents", "provider_id": "meta-reference", "provider_type": "inline::meta-reference", "config": { "persistence_store": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/agents_store.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "telemetry", "provider_id": "meta-reference", "provider_type": "inline::meta-reference", "config": { "service_name": "llama-stack", "sinks": "console,sqlite", "sqlite_db_path": "/Users/leseb/.llama/distributions/ollama/trace_store.db" }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "eval", "provider_id": "meta-reference", "provider_type": "inline::meta-reference", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/meta_reference_eval.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "datasetio", "provider_id": "huggingface", "provider_type": "remote::huggingface", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/huggingface_datasetio.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "datasetio", "provider_id": "localfs", "provider_type": "inline::localfs", "config": { "kvstore": { "type": "sqlite", "namespace": null, "db_path": "/Users/leseb/.llama/distributions/ollama/localfs_datasetio.db" } }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "scoring", "provider_id": "basic", "provider_type": "inline::basic", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "scoring", "provider_id": "llm-as-judge", "provider_type": "inline::llm-as-judge", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "scoring", "provider_id": "braintrust", "provider_type": "inline::braintrust", "config": { "openai_api_key": "********" }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "brave-search", "provider_type": "remote::brave-search", "config": { "api_key": "********", "max_results": 3 }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "tavily-search", "provider_type": "remote::tavily-search", "config": { "api_key": "********", "max_results": 3 }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "code-interpreter", "provider_type": "inline::code-interpreter", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "rag-runtime", "provider_type": "inline::rag-runtime", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "model-context-protocol", "provider_type": "remote::model-context-protocol", "config": {}, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } }, { "api": "tool_runtime", "provider_id": "wolfram-alpha", "provider_type": "remote::wolfram-alpha", "config": { "api_key": "********" }, "health": { "status": "Not Implemented", "message": "Provider does not implement health check" } } ] } ``` Per providers too: ``` curl -L http://127.0.0.1:8321/v1/providers/ollama {"api":"inference","provider_id":"ollama","provider_type":"remote::ollama","config":{"url":"http://localhost:11434"},"health":{"status":"OK"}} ``` Signed-off-by: Sébastien Han --- .github/workflows/integration-tests.yml | 11 +++ docs/_static/llama-stack-spec.html | 36 ++++++++- docs/_static/llama-stack-spec.yaml | 16 ++++ llama_stack/apis/inspect/inspect.py | 4 +- llama_stack/apis/providers/providers.py | 2 + llama_stack/distribution/inspect.py | 3 +- llama_stack/distribution/library_client.py | 2 +- llama_stack/distribution/providers.py | 74 +++++++++++++++++-- llama_stack/distribution/resolver.py | 41 ---------- llama_stack/distribution/routers/routers.py | 26 ++++++- llama_stack/distribution/server/server.py | 2 +- llama_stack/distribution/stack.py | 46 +++++++----- llama_stack/distribution/utils/config.py | 30 ++++++++ llama_stack/providers/datatypes.py | 10 +++ .../remote/inference/ollama/ollama.py | 17 ++++- 15 files changed, 244 insertions(+), 76 deletions(-) create mode 100644 llama_stack/distribution/utils/config.py diff --git a/.github/workflows/integration-tests.yml b/.github/workflows/integration-tests.yml index 665f8bd7e..c61712bfd 100644 --- a/.github/workflows/integration-tests.yml +++ b/.github/workflows/integration-tests.yml @@ -99,6 +99,17 @@ jobs: cat server.log exit 1 + - name: Verify Ollama status is OK + if: matrix.client-type == 'http' + run: | + echo "Verifying Ollama status..." + ollama_status=$(curl -s -L http://127.0.0.1:8321/v1/providers/ollama|jq --raw-output .health.status) + echo "Ollama status: $ollama_status" + if [ "$ollama_status" != "OK" ]; then + echo "Ollama health check failed" + exit 1 + fi + - name: Run Integration Tests env: INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct" diff --git a/docs/_static/llama-stack-spec.html b/docs/_static/llama-stack-spec.html index 542fb5be5..c85eb549f 100644 --- a/docs/_static/llama-stack-spec.html +++ b/docs/_static/llama-stack-spec.html @@ -7889,7 +7889,13 @@ "type": "object", "properties": { "status": { - "type": "string" + "type": "string", + "enum": [ + "OK", + "Error", + "Not Implemented" + ], + "title": "HealthStatus" } }, "additionalProperties": false, @@ -8084,6 +8090,31 @@ } ] } + }, + "health": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + } } }, "additionalProperties": false, @@ -8091,7 +8122,8 @@ "api", "provider_id", "provider_type", - "config" + "config", + "health" ], "title": "ProviderInfo" }, diff --git a/docs/_static/llama-stack-spec.yaml b/docs/_static/llama-stack-spec.yaml index fa7b130e2..6c99c9155 100644 --- a/docs/_static/llama-stack-spec.yaml +++ b/docs/_static/llama-stack-spec.yaml @@ -5463,6 +5463,11 @@ components: properties: status: type: string + enum: + - OK + - Error + - Not Implemented + title: HealthStatus additionalProperties: false required: - status @@ -5574,12 +5579,23 @@ components: - type: string - type: array - type: object + health: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object additionalProperties: false required: - api - provider_id - provider_type - config + - health title: ProviderInfo InvokeToolRequest: type: object diff --git a/llama_stack/apis/inspect/inspect.py b/llama_stack/apis/inspect/inspect.py index 3896d67a9..863f90e14 100644 --- a/llama_stack/apis/inspect/inspect.py +++ b/llama_stack/apis/inspect/inspect.py @@ -8,6 +8,7 @@ from typing import List, Protocol, runtime_checkable from pydantic import BaseModel +from llama_stack.providers.datatypes import HealthStatus from llama_stack.schema_utils import json_schema_type, webmethod @@ -20,8 +21,7 @@ class RouteInfo(BaseModel): @json_schema_type class HealthInfo(BaseModel): - status: str - # TODO: add a provider level status + status: HealthStatus @json_schema_type diff --git a/llama_stack/apis/providers/providers.py b/llama_stack/apis/providers/providers.py index 83d03d7c1..ea5f968ec 100644 --- a/llama_stack/apis/providers/providers.py +++ b/llama_stack/apis/providers/providers.py @@ -8,6 +8,7 @@ from typing import Any, Dict, List, Protocol, runtime_checkable from pydantic import BaseModel +from llama_stack.providers.datatypes import HealthResponse from llama_stack.schema_utils import json_schema_type, webmethod @@ -17,6 +18,7 @@ class ProviderInfo(BaseModel): provider_id: str provider_type: str config: Dict[str, Any] + health: HealthResponse class ListProvidersResponse(BaseModel): diff --git a/llama_stack/distribution/inspect.py b/llama_stack/distribution/inspect.py index ba0ce5ea2..23f644ec6 100644 --- a/llama_stack/distribution/inspect.py +++ b/llama_stack/distribution/inspect.py @@ -17,6 +17,7 @@ from llama_stack.apis.inspect import ( ) from llama_stack.distribution.datatypes import StackRunConfig from llama_stack.distribution.server.endpoints import get_all_api_endpoints +from llama_stack.providers.datatypes import HealthStatus class DistributionInspectConfig(BaseModel): @@ -58,7 +59,7 @@ class DistributionInspectImpl(Inspect): return ListRoutesResponse(data=ret) async def health(self) -> HealthInfo: - return HealthInfo(status="OK") + return HealthInfo(status=HealthStatus.OK) async def version(self) -> VersionInfo: return VersionInfo(version=version("llama-stack")) diff --git a/llama_stack/distribution/library_client.py b/llama_stack/distribution/library_client.py index c0143363d..f426bcafe 100644 --- a/llama_stack/distribution/library_client.py +++ b/llama_stack/distribution/library_client.py @@ -43,9 +43,9 @@ from llama_stack.distribution.server.endpoints import ( from llama_stack.distribution.stack import ( construct_stack, get_stack_run_config_from_template, - redact_sensitive_fields, replace_env_vars, ) +from llama_stack.distribution.utils.config import redact_sensitive_fields from llama_stack.distribution.utils.context import preserve_contexts_async_generator from llama_stack.distribution.utils.exec import in_notebook from llama_stack.providers.utils.telemetry.tracing import ( diff --git a/llama_stack/distribution/providers.py b/llama_stack/distribution/providers.py index cf9b0b975..1c00ce264 100644 --- a/llama_stack/distribution/providers.py +++ b/llama_stack/distribution/providers.py @@ -4,14 +4,17 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +import asyncio +from typing import Any, Dict from pydantic import BaseModel from llama_stack.apis.providers import ListProvidersResponse, ProviderInfo, Providers from llama_stack.log import get_logger +from llama_stack.providers.datatypes import HealthResponse, HealthStatus from .datatypes import StackRunConfig -from .stack import redact_sensitive_fields +from .utils.config import redact_sensitive_fields logger = get_logger(name=__name__, category="core") @@ -41,19 +44,24 @@ class ProviderImpl(Providers): async def list_providers(self) -> ListProvidersResponse: run_config = self.config.run_config safe_config = StackRunConfig(**redact_sensitive_fields(run_config.model_dump())) + providers_health = await self.get_providers_health() ret = [] for api, providers in safe_config.providers.items(): - ret.extend( - [ + for p in providers: + ret.append( ProviderInfo( api=api, provider_id=p.provider_id, provider_type=p.provider_type, config=p.config, + health=providers_health.get(api, {}).get( + p.provider_id, + HealthResponse( + status=HealthStatus.NOT_IMPLEMENTED, message="Provider does not implement health check" + ), + ), ) - for p in providers - ] - ) + ) return ListProvidersResponse(data=ret) @@ -64,3 +72,57 @@ class ProviderImpl(Providers): return p raise ValueError(f"Provider {provider_id} not found") + + async def get_providers_health(self) -> Dict[str, Dict[str, HealthResponse]]: + """Get health status for all providers. + + Returns: + Dict[str, Dict[str, HealthResponse]]: A dictionary mapping API names to provider health statuses. + Each API maps to a dictionary of provider IDs to their health responses. + """ + providers_health: Dict[str, Dict[str, HealthResponse]] = {} + timeout = 1.0 + + async def check_provider_health(impl: Any) -> tuple[str, HealthResponse] | None: + # Skip special implementations (inspect/providers) that don't have provider specs + if not hasattr(impl, "__provider_spec__"): + return None + api_name = impl.__provider_spec__.api.name + if not hasattr(impl, "health"): + return ( + api_name, + HealthResponse( + status=HealthStatus.NOT_IMPLEMENTED, message="Provider does not implement health check" + ), + ) + + try: + health = await asyncio.wait_for(impl.health(), timeout=timeout) + return api_name, health + except asyncio.TimeoutError: + return ( + api_name, + HealthResponse( + status=HealthStatus.ERROR, message=f"Health check timed out after {timeout} seconds" + ), + ) + except Exception as e: + return ( + api_name, + HealthResponse(status=HealthStatus.ERROR, message=f"Health check failed: {str(e)}"), + ) + + # Create tasks for all providers + tasks = [check_provider_health(impl) for impl in self.deps.values()] + + # Wait for all health checks to complete + results = await asyncio.gather(*tasks) + + # Organize results by API and provider ID + for result in results: + if result is None: # Skip special implementations + continue + api_name, health_response = result + providers_health[api_name] = health_response + + return providers_health diff --git a/llama_stack/distribution/resolver.py b/llama_stack/distribution/resolver.py index 0de1e0a02..e9a594eba 100644 --- a/llama_stack/distribution/resolver.py +++ b/llama_stack/distribution/resolver.py @@ -41,7 +41,6 @@ from llama_stack.providers.datatypes import ( Api, BenchmarksProtocolPrivate, DatasetsProtocolPrivate, - InlineProviderSpec, ModelsProtocolPrivate, ProviderSpec, RemoteProviderConfig, @@ -230,46 +229,6 @@ def sort_providers_by_deps( {k: list(v.values()) for k, v in providers_with_specs.items()} ) - # Append built-in "inspect" provider - apis = [x[1].spec.api for x in sorted_providers] - sorted_providers.append( - ( - "inspect", - ProviderWithSpec( - provider_id="__builtin__", - provider_type="__builtin__", - config={"run_config": run_config.model_dump()}, - spec=InlineProviderSpec( - api=Api.inspect, - provider_type="__builtin__", - config_class="llama_stack.distribution.inspect.DistributionInspectConfig", - module="llama_stack.distribution.inspect", - api_dependencies=apis, - deps__=[x.value for x in apis], - ), - ), - ) - ) - - sorted_providers.append( - ( - "providers", - ProviderWithSpec( - provider_id="__builtin__", - provider_type="__builtin__", - config={"run_config": run_config.model_dump()}, - spec=InlineProviderSpec( - api=Api.providers, - provider_type="__builtin__", - config_class="llama_stack.distribution.providers.ProviderImplConfig", - module="llama_stack.distribution.providers", - api_dependencies=apis, - deps__=[x.value for x in apis], - ), - ), - ) - ) - logger.debug(f"Resolved {len(sorted_providers)} providers") for api_str, provider in sorted_providers: logger.debug(f" {api_str} => {provider.provider_id}") diff --git a/llama_stack/distribution/routers/routers.py b/llama_stack/distribution/routers/routers.py index b9623ef3c..cdf91e052 100644 --- a/llama_stack/distribution/routers/routers.py +++ b/llama_stack/distribution/routers/routers.py @@ -4,6 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +import asyncio import time from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union @@ -60,7 +61,7 @@ from llama_stack.apis.vector_io import Chunk, QueryChunksResponse, VectorIO from llama_stack.log import get_logger from llama_stack.models.llama.llama3.chat_format import ChatFormat from llama_stack.models.llama.llama3.tokenizer import Tokenizer -from llama_stack.providers.datatypes import RoutingTable +from llama_stack.providers.datatypes import HealthResponse, HealthStatus, RoutingTable from llama_stack.providers.utils.telemetry.tracing import get_current_span logger = get_logger(name=__name__, category="core") @@ -580,6 +581,29 @@ class InferenceRouter(Inference): provider = self.routing_table.get_provider_impl(model_obj.identifier) return await provider.openai_chat_completion(**params) + async def health(self) -> Dict[str, HealthResponse]: + health_statuses = {} + timeout = 0.5 + for provider_id, impl in self.routing_table.impls_by_provider_id.items(): + try: + # check if the provider has a health method + if not hasattr(impl, "health"): + continue + health = await asyncio.wait_for(impl.health(), timeout=timeout) + health_statuses[provider_id] = health + except asyncio.TimeoutError: + health_statuses[provider_id] = HealthResponse( + status=HealthStatus.ERROR, + message=f"Health check timed out after {timeout} seconds", + ) + except NotImplementedError: + health_statuses[provider_id] = HealthResponse(status=HealthStatus.NOT_IMPLEMENTED) + except Exception as e: + health_statuses[provider_id] = HealthResponse( + status=HealthStatus.ERROR, message=f"Health check failed: {str(e)}" + ) + return health_statuses + class SafetyRouter(Safety): def __init__( diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py index 7d4ec2a2f..d7ef37c26 100644 --- a/llama_stack/distribution/server/server.py +++ b/llama_stack/distribution/server/server.py @@ -38,10 +38,10 @@ from llama_stack.distribution.server.endpoints import ( ) from llama_stack.distribution.stack import ( construct_stack, - redact_sensitive_fields, replace_env_vars, validate_env_pair, ) +from llama_stack.distribution.utils.config import redact_sensitive_fields from llama_stack.distribution.utils.context import preserve_contexts_async_generator from llama_stack.log import get_logger from llama_stack.providers.datatypes import Api diff --git a/llama_stack/distribution/stack.py b/llama_stack/distribution/stack.py index 08ff5e7cd..a6dc3d2a0 100644 --- a/llama_stack/distribution/stack.py +++ b/llama_stack/distribution/stack.py @@ -35,6 +35,8 @@ from llama_stack.apis.vector_dbs import VectorDBs from llama_stack.apis.vector_io import VectorIO from llama_stack.distribution.datatypes import Provider, StackRunConfig from llama_stack.distribution.distribution import get_provider_registry +from llama_stack.distribution.inspect import DistributionInspectConfig, DistributionInspectImpl +from llama_stack.distribution.providers import ProviderImpl, ProviderImplConfig from llama_stack.distribution.resolver import ProviderRegistry, resolve_impls from llama_stack.distribution.store.registry import create_dist_registry from llama_stack.distribution.utils.dynamic import instantiate_class_type @@ -119,26 +121,6 @@ class EnvVarError(Exception): super().__init__(f"Environment variable '{var_name}' not set or empty{f' at {path}' if path else ''}") -def redact_sensitive_fields(data: Dict[str, Any]) -> Dict[str, Any]: - """Redact sensitive information from config before printing.""" - sensitive_patterns = ["api_key", "api_token", "password", "secret"] - - def _redact_dict(d: Dict[str, Any]) -> Dict[str, Any]: - result = {} - for k, v in d.items(): - if isinstance(v, dict): - result[k] = _redact_dict(v) - elif isinstance(v, list): - result[k] = [_redact_dict(i) if isinstance(i, dict) else i for i in v] - elif any(pattern in k.lower() for pattern in sensitive_patterns): - result[k] = "********" - else: - result[k] = v - return result - - return _redact_dict(data) - - def replace_env_vars(config: Any, path: str = "") -> Any: if isinstance(config, dict): result = {} @@ -215,6 +197,26 @@ def validate_env_pair(env_pair: str) -> tuple[str, str]: ) from e +def add_internal_implementations(impls: Dict[Api, Any], run_config: StackRunConfig) -> None: + """Add internal implementations (inspect and providers) to the implementations dictionary. + + Args: + impls: Dictionary of API implementations + run_config: Stack run configuration + """ + inspect_impl = DistributionInspectImpl( + DistributionInspectConfig(run_config=run_config), + deps=impls, + ) + impls[Api.inspect] = inspect_impl + + providers_impl = ProviderImpl( + ProviderImplConfig(run_config=run_config), + deps=impls, + ) + impls[Api.providers] = providers_impl + + # Produces a stack of providers for the given run config. Not all APIs may be # asked for in the run config. async def construct_stack( @@ -222,6 +224,10 @@ async def construct_stack( ) -> Dict[Api, Any]: dist_registry, _ = await create_dist_registry(run_config.metadata_store, run_config.image_name) impls = await resolve_impls(run_config, provider_registry or get_provider_registry(run_config), dist_registry) + + # Add internal implementations after all other providers are resolved + add_internal_implementations(impls, run_config) + await register_resources(run_config, impls) return impls diff --git a/llama_stack/distribution/utils/config.py b/llama_stack/distribution/utils/config.py new file mode 100644 index 000000000..5e78289b7 --- /dev/null +++ b/llama_stack/distribution/utils/config.py @@ -0,0 +1,30 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +from typing import Any, Dict + + +def redact_sensitive_fields(data: Dict[str, Any]) -> Dict[str, Any]: + """Redact sensitive information from config before printing.""" + sensitive_patterns = ["api_key", "api_token", "password", "secret"] + + def _redact_value(v: Any) -> Any: + if isinstance(v, dict): + return _redact_dict(v) + elif isinstance(v, list): + return [_redact_value(i) for i in v] + return v + + def _redact_dict(d: Dict[str, Any]) -> Dict[str, Any]: + result = {} + for k, v in d.items(): + if any(pattern in k.lower() for pattern in sensitive_patterns): + result[k] = "********" + else: + result[k] = _redact_value(v) + return result + + return _redact_dict(data) diff --git a/llama_stack/providers/datatypes.py b/llama_stack/providers/datatypes.py index 32dfba30c..c3141f807 100644 --- a/llama_stack/providers/datatypes.py +++ b/llama_stack/providers/datatypes.py @@ -4,6 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +from enum import Enum from typing import Any, List, Optional, Protocol from urllib.parse import urlparse @@ -201,3 +202,12 @@ def remote_provider_spec( adapter=adapter, api_dependencies=api_dependencies or [], ) + + +class HealthStatus(str, Enum): + OK = "OK" + ERROR = "Error" + NOT_IMPLEMENTED = "Not Implemented" + + +HealthResponse = dict[str, Any] diff --git a/llama_stack/providers/remote/inference/ollama/ollama.py b/llama_stack/providers/remote/inference/ollama/ollama.py index 33b48af46..f84863385 100644 --- a/llama_stack/providers/remote/inference/ollama/ollama.py +++ b/llama_stack/providers/remote/inference/ollama/ollama.py @@ -42,7 +42,11 @@ from llama_stack.apis.inference import ( from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam from llama_stack.apis.models import Model, ModelType from llama_stack.log import get_logger -from llama_stack.providers.datatypes import ModelsProtocolPrivate +from llama_stack.providers.datatypes import ( + HealthResponse, + HealthStatus, + ModelsProtocolPrivate, +) from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, ) @@ -87,8 +91,19 @@ class OllamaInferenceAdapter( async def initialize(self) -> None: logger.info(f"checking connectivity to Ollama at `{self.url}`...") + await self.health() + + async def health(self) -> HealthResponse: + """ + Performs a health check by verifying connectivity to the Ollama server. + This method is used by initialize() and the Provider API to verify that the service is running + correctly. + Returns: + HealthResponse: A dictionary containing the health status. + """ try: await self.client.ps() + return HealthResponse(status=HealthStatus.OK) except httpx.ConnectError as e: raise RuntimeError( "Ollama Server is not running, start it using `ollama serve` in a separate terminal" From 6d6b40983eeea0283fd6e86e3a305e28ba560937 Mon Sep 17 00:00:00 2001 From: Matthew Farrellee Date: Mon, 14 Apr 2025 06:17:51 -0400 Subject: [PATCH 31/39] refactor: update integration test workflow (#1856) workflow - 0. Checkout 1. Install uv 2. Install Ollama 3. Pull Ollama image 4. Start Ollama in background 5. Set Up Environment and Install Dependencies 6. Wait for Ollama to start 7. Start Llama Stack server in background 8. Wait for Llama Stack server to be ready 9. Run Integration Tests changes - (4) starts the loading of the ollama model, it does not start ollama. the model will be loaded when used. this step is removed. (6) is handled in (2). this step is removed. (2) is renamed to reflect it's dual purpose. --- .github/workflows/integration-tests.yml | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/.github/workflows/integration-tests.yml b/.github/workflows/integration-tests.yml index c61712bfd..5a7b35e17 100644 --- a/.github/workflows/integration-tests.yml +++ b/.github/workflows/integration-tests.yml @@ -38,18 +38,16 @@ jobs: with: python-version: "3.10" - - name: Install Ollama + - name: Install and start Ollama run: | + # the ollama installer also starts the ollama service curl -fsSL https://ollama.com/install.sh | sh - name: Pull Ollama image run: | + # TODO: cache the model. OLLAMA_MODELS defaults to ~ollama/.ollama/models. ollama pull llama3.2:3b-instruct-fp16 - - name: Start Ollama in background - run: | - nohup ollama run llama3.2:3b-instruct-fp16 > ollama.log 2>&1 & - - name: Set Up Environment and Install Dependencies run: | uv sync --extra dev --extra test @@ -61,21 +59,6 @@ jobs: uv pip install -e . llama stack build --template ollama --image-type venv - - name: Wait for Ollama to start - run: | - echo "Waiting for Ollama..." - for i in {1..30}; do - if curl -s http://localhost:11434 | grep -q "Ollama is running"; then - echo "Ollama is running!" - exit 0 - fi - sleep 1 - done - echo "Ollama failed to start" - ollama ps - ollama.log - exit 1 - - name: Start Llama Stack server in background if: matrix.client-type == 'http' env: From 030ca4b2befa7b32a56dc0392f7045022928144f Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Mon, 14 Apr 2025 08:14:59 -0400 Subject: [PATCH 32/39] docs: Move Llama 4 instructions in a collapsed section (#1936) # What does this PR do? Currently the instructions for Llama 4 take quite some space before people can see the overview and other sections about Llama Stack. Moving this to a collapsed section would make it less verbose. --- README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 617e5117b..8c201e43d 100644 --- a/README.md +++ b/README.md @@ -9,15 +9,16 @@ [**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) - ### ✨🎉 Llama 4 Support 🎉✨ We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta. -You can now run Llama 4 models on Llama Stack. +
+👋 Click here to see how to run Llama 4 models on Llama Stack + +\ *Note you need 8xH100 GPU-host to run these models* - ```bash pip install -U llama_stack @@ -67,6 +68,9 @@ print(f"Assistant> {response.completion_message.content}") As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned! +
+ + ### Overview Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides From 2ec5879f141c3f29c77e16c82c6e552e8f853efe Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 14 Apr 2025 14:33:43 +0200 Subject: [PATCH 33/39] chore(github-deps): bump astral-sh/setup-uv from 5.4.0 to 5.4.1 (#1881) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 5.4.0 to 5.4.1.
Release notes

Sourced from astral-sh/setup-uv's releases.

v5.4.1 🌈 Add support for pep440 version specifiers

Changes

With this release you can also use pep440 version specifiers as required-version in filesuv.toml, pyroject.toml and in the version input:

- name: Install a pep440-specifier-satisfying
version of uv
  uses: astral-sh/setup-uv@v5
  with:
    version: ">=0.4.25,<0.5"

🐛 Bug fixes

🧰 Maintenance

📚 Documentation

Commits

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=5.4.0&new-version=5.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- .github/workflows/integration-tests.yml | 2 +- .github/workflows/providers-build.yml | 2 +- .github/workflows/unit-tests.yml | 2 +- .github/workflows/update-readthedocs.yml | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/integration-tests.yml b/.github/workflows/integration-tests.yml index 5a7b35e17..0eb252695 100644 --- a/.github/workflows/integration-tests.yml +++ b/.github/workflows/integration-tests.yml @@ -34,7 +34,7 @@ jobs: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 - name: Install uv - uses: astral-sh/setup-uv@22695119d769bdb6f7032ad67b9bca0ef8c4a174 # v5.4.0 + uses: astral-sh/setup-uv@0c5e2b8115b80b4c7c5ddf6ffdd634974642d182 # v5.4.1 with: python-version: "3.10" diff --git a/.github/workflows/providers-build.yml b/.github/workflows/providers-build.yml index 915344221..010894283 100644 --- a/.github/workflows/providers-build.yml +++ b/.github/workflows/providers-build.yml @@ -56,7 +56,7 @@ jobs: python-version: '3.10' - name: Install uv - uses: astral-sh/setup-uv@22695119d769bdb6f7032ad67b9bca0ef8c4a174 # v5.4.0 + uses: astral-sh/setup-uv@0c5e2b8115b80b4c7c5ddf6ffdd634974642d182 # v5.4.1 with: python-version: "3.10" diff --git a/.github/workflows/unit-tests.yml b/.github/workflows/unit-tests.yml index da7289afc..4b0c58b99 100644 --- a/.github/workflows/unit-tests.yml +++ b/.github/workflows/unit-tests.yml @@ -38,7 +38,7 @@ jobs: with: python-version: ${{ matrix.python }} - - uses: astral-sh/setup-uv@22695119d769bdb6f7032ad67b9bca0ef8c4a174 # v5.4.0 + - uses: astral-sh/setup-uv@0c5e2b8115b80b4c7c5ddf6ffdd634974642d182 # v5.4.1 with: python-version: ${{ matrix.python }} enable-cache: false diff --git a/.github/workflows/update-readthedocs.yml b/.github/workflows/update-readthedocs.yml index 74bf0d0b0..794a727be 100644 --- a/.github/workflows/update-readthedocs.yml +++ b/.github/workflows/update-readthedocs.yml @@ -41,7 +41,7 @@ jobs: python-version: '3.11' - name: Install the latest version of uv - uses: astral-sh/setup-uv@22695119d769bdb6f7032ad67b9bca0ef8c4a174 # v5.4.0 + uses: astral-sh/setup-uv@0c5e2b8115b80b4c7c5ddf6ffdd634974642d182 # v5.4.1 - name: Sync with uv run: uv sync --extra docs From 68eeacec0efee162a1ccb08cf4a68b3e6241ac3c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Han?= Date: Mon, 14 Apr 2025 15:09:16 +0200 Subject: [PATCH 34/39] docs: resync missing nvidia doc (#1947) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? Resync doc. Signed-off-by: Sébastien Han --- .github/workflows/pre-commit.yml | 9 ++ .../remote_hosted_distro/nvidia.md | 88 +++++++++++++++++++ 2 files changed, 97 insertions(+) create mode 100644 docs/source/distributions/remote_hosted_distro/nvidia.md diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml index 847aaecd7..17a42dd26 100644 --- a/.github/workflows/pre-commit.yml +++ b/.github/workflows/pre-commit.yml @@ -31,3 +31,12 @@ jobs: - name: Verify if there are any diff files after pre-commit run: | git diff --exit-code || (echo "There are uncommitted changes, run pre-commit locally and commit again" && exit 1) + + - name: Verify if there are any new files after pre-commit + run: | + unstaged_files=$(git ls-files --others --exclude-standard) + if [ -n "$unstaged_files" ]; then + echo "There are uncommitted new files, run pre-commit locally and commit again" + echo "$unstaged_files" + exit 1 + fi diff --git a/docs/source/distributions/remote_hosted_distro/nvidia.md b/docs/source/distributions/remote_hosted_distro/nvidia.md new file mode 100644 index 000000000..58731392d --- /dev/null +++ b/docs/source/distributions/remote_hosted_distro/nvidia.md @@ -0,0 +1,88 @@ + +# NVIDIA Distribution + +The `llamastack/distribution-nvidia` distribution consists of the following provider configurations. + +| API | Provider(s) | +|-----|-------------| +| agents | `inline::meta-reference` | +| datasetio | `inline::localfs` | +| eval | `inline::meta-reference` | +| inference | `remote::nvidia` | +| post_training | `remote::nvidia` | +| safety | `remote::nvidia` | +| scoring | `inline::basic` | +| telemetry | `inline::meta-reference` | +| tool_runtime | `inline::rag-runtime` | +| vector_io | `inline::faiss` | + + +### Environment Variables + +The following environment variables can be configured: + +- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``) +- `NVIDIA_USER_ID`: NVIDIA User ID (default: `llama-stack-user`) +- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`) +- `NVIDIA_ACCESS_POLICIES`: NVIDIA Access Policies (default: `{}`) +- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`) +- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`) +- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`) +- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`) +- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`) +- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`) + +### Models + +The following models are available by default: + +- `meta/llama3-8b-instruct (aliases: meta-llama/Llama-3-8B-Instruct)` +- `meta/llama3-70b-instruct (aliases: meta-llama/Llama-3-70B-Instruct)` +- `meta/llama-3.1-8b-instruct (aliases: meta-llama/Llama-3.1-8B-Instruct)` +- `meta/llama-3.1-70b-instruct (aliases: meta-llama/Llama-3.1-70B-Instruct)` +- `meta/llama-3.1-405b-instruct (aliases: meta-llama/Llama-3.1-405B-Instruct-FP8)` +- `meta/llama-3.2-1b-instruct (aliases: meta-llama/Llama-3.2-1B-Instruct)` +- `meta/llama-3.2-3b-instruct (aliases: meta-llama/Llama-3.2-3B-Instruct)` +- `meta/llama-3.2-11b-vision-instruct (aliases: meta-llama/Llama-3.2-11B-Vision-Instruct)` +- `meta/llama-3.2-90b-vision-instruct (aliases: meta-llama/Llama-3.2-90B-Vision-Instruct)` +- `nvidia/llama-3.2-nv-embedqa-1b-v2 ` +- `nvidia/nv-embedqa-e5-v5 ` +- `nvidia/nv-embedqa-mistral-7b-v2 ` +- `snowflake/arctic-embed-l ` + + +### Prerequisite: API Keys + +Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). + + +## Running Llama Stack with NVIDIA + +You can do this via Conda (build code) or Docker which has a pre-built image. + +### Via Docker + +This method allows you to get started quickly without having to build the distribution code. + +```bash +LLAMA_STACK_PORT=8321 +docker run \ + -it \ + --pull always \ + -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ + -v ./run.yaml:/root/my-run.yaml \ + llamastack/distribution-nvidia \ + --yaml-config /root/my-run.yaml \ + --port $LLAMA_STACK_PORT \ + --env NVIDIA_API_KEY=$NVIDIA_API_KEY +``` + +### Via Conda + +```bash +llama stack build --template nvidia --image-type conda +llama stack run ./run.yaml \ + --port 8321 \ + --env NVIDIA_API_KEY=$NVIDIA_API_KEY + --env INFERENCE_MODEL=$INFERENCE_MODEL +``` From 7641a5cd0b9b8a4659625816b649874ab1b6c36d Mon Sep 17 00:00:00 2001 From: Ben Browning Date: Mon, 14 Apr 2025 11:56:29 -0400 Subject: [PATCH 35/39] fix: 100% OpenAI API verification for together and fireworks (#1946) # What does this PR do? TLDR: Changes needed to get 100% passing tests for OpenAI API verification tests when run against Llama Stack with the `together`, `fireworks`, and `openai` providers. And `groq` is better than before, at 88% passing. This cleans up the OpenAI API support for image message types (specifically `image_url` types) and handling of the `response_format` chat completion parameter. Both of these required a few more Pydantic model definitions in our Inference API, just to move from the not-quite-right stubs I had in place to something fleshed out to match the actual OpenAI API specs. As part of testing this, I also found and fixed a bug in the litellm implementation of openai_completion and openai_chat_completion, so the providers based on those should actually be working now. The method `prepare_openai_completion_params` in `llama_stack/providers/utils/inference/openai_compat.py` was improved to actually recursively clean up input parameters, including handling of lists, dicts, and dumping of Pydantic models to dicts. These changes were required to get to 100% passing tests on the OpenAI API verification against the `openai` provider. With the above, the together.ai provider was passing as well as it is without Llama Stack. But, since we have Llama Stack in the middle, I took the opportunity to clean up the together.ai provider so that it now also passes the OpenAI API spec tests we have at 100%. That means together.ai is now passing our verification test better when using an OpenAI client talking to Llama Stack than it is when hitting together.ai directly, without Llama Stack in the middle. And, another round of work for Fireworks to improve translation of incoming OpenAI chat completion requests to Llama Stack chat completion requests gets the fireworks provider passing at 100%. The server-side fireworks.ai tool calling support with OpenAI chat completions and Llama 4 models isn't great yet, but by pointing the OpenAI clients at Llama Stack's API we can clean things up and get everything working as expected for Llama 4 models. ## Test Plan ### OpenAI API Verification Tests I ran the OpenAI API verification tests as below and 100% of the tests passed. First, start a Llama Stack server that runs the `openai` provider with the `gpt-4o` and `gpt-4o-mini` models deployed. There's not a template setup to do this out of the box, so I added a `tests/verifications/openai-api-verification-run.yaml` to do this. First, ensure you have the necessary API key environment variables set: ``` export TOGETHER_API_KEY="..." export FIREWORKS_API_KEY="..." export OPENAI_API_KEY="..." ``` Then, run a Llama Stack server that serves up all these providers: ``` llama stack run \ --image-type venv \ tests/verifications/openai-api-verification-run.yaml ``` Finally, generate a new verification report against all these providers, both with and without the Llama Stack server in the middle. ``` python tests/verifications/generate_report.py \ --run-tests \ --provider \ together \ fireworks \ groq \ openai \ together-llama-stack \ fireworks-llama-stack \ groq-llama-stack \ openai-llama-stack ``` You'll see that most of the configurations with Llama Stack in the middle now pass at 100%, even though some of them do not pass at 100% when hitting the backend provider's API directly with an OpenAI client. ### OpenAI Completion Integration Tests with vLLM: I also ran the smaller `test_openai_completion.py` test suite (that's not yet merged with the verification tests) on multiple of the providers, since I had to adjust the method signature of openai_chat_completion a bit and thus had to touch lots of these providers to match. Here's the tests I ran there, all passing: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### OpenAI Completion Integration Tests with ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ### OpenAI Completion Integration Tests with together.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" llama stack build --template together --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct-Turbo" ``` ### OpenAI Completion Integration Tests with fireworks.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack build --template fireworks --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.1-8B-Instruct" --------- Signed-off-by: Ben Browning --- docs/_static/llama-stack-spec.html | 410 +++++++++++++++++- docs/_static/llama-stack-spec.yaml | 279 +++++++++++- .../distributions/self_hosted_distro/groq.md | 2 + llama_stack/apis/inference/inference.py | 163 ++++++- llama_stack/distribution/routers/routers.py | 12 +- llama_stack/models/llama/llama3/tool_utils.py | 4 +- .../inference/meta_reference/inference.py | 8 +- .../sentence_transformers.py | 8 +- .../providers/inline/inference/vllm/vllm.py | 8 +- .../remote/inference/bedrock/bedrock.py | 8 +- .../remote/inference/cerebras/cerebras.py | 8 +- .../remote/inference/databricks/databricks.py | 8 +- .../remote/inference/fireworks/fireworks.py | 32 +- .../providers/remote/inference/groq/groq.py | 136 ++++++ .../providers/remote/inference/groq/models.py | 8 + .../remote/inference/nvidia/nvidia.py | 12 +- .../remote/inference/ollama/ollama.py | 14 +- .../inference/passthrough/passthrough.py | 14 +- .../remote/inference/runpod/runpod.py | 8 +- .../remote/inference/sambanova/sambanova.py | 8 +- .../providers/remote/inference/tgi/tgi.py | 8 +- .../remote/inference/together/together.py | 36 +- .../providers/remote/inference/vllm/vllm.py | 13 +- .../utils/inference/litellm_openai_mixin.py | 20 +- .../utils/inference/openai_compat.py | 248 ++++++++++- llama_stack/templates/dev/run.yaml | 20 + llama_stack/templates/groq/run.yaml | 20 + llama_stack/templates/verification/run.yaml | 20 + .../inference/test_openai_completion.py | 2 +- .../conf/fireworks-llama-stack.yaml | 14 + .../verifications/conf/groq-llama-stack.yaml | 14 + tests/verifications/conf/groq.yaml | 8 +- .../conf/openai-llama-stack.yaml | 9 + .../conf/together-llama-stack.yaml | 14 + tests/verifications/generate_report.py | 12 +- .../openai-api-verification-run.yaml | 146 +++++++ .../openai_api/fixtures/fixtures.py | 3 + 37 files changed, 1628 insertions(+), 129 deletions(-) create mode 100644 tests/verifications/conf/fireworks-llama-stack.yaml create mode 100644 tests/verifications/conf/groq-llama-stack.yaml create mode 100644 tests/verifications/conf/openai-llama-stack.yaml create mode 100644 tests/verifications/conf/together-llama-stack.yaml create mode 100644 tests/verifications/openai-api-verification-run.yaml diff --git a/docs/_static/llama-stack-spec.html b/docs/_static/llama-stack-spec.html index c85eb549f..54d888441 100644 --- a/docs/_static/llama-stack-spec.html +++ b/docs/_static/llama-stack-spec.html @@ -3096,11 +3096,18 @@ "post": { "responses": { "200": { - "description": "OK", + "description": "Response from an OpenAI-compatible chat completion request. **OR** Chunk from a streaming response to an OpenAI-compatible chat completion request.", "content": { "application/json": { "schema": { - "$ref": "#/components/schemas/OpenAIChatCompletion" + "oneOf": [ + { + "$ref": "#/components/schemas/OpenAIChatCompletion" + }, + { + "$ref": "#/components/schemas/OpenAIChatCompletionChunk" + } + ] } } } @@ -8857,7 +8864,17 @@ "description": "Must be \"assistant\" to identify this as the model's response" }, "content": { - "$ref": "#/components/schemas/InterleavedContent", + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartParam" + } + } + ], "description": "The content of the model's response" }, "name": { @@ -8867,9 +8884,9 @@ "tool_calls": { "type": "array", "items": { - "$ref": "#/components/schemas/ToolCall" + "$ref": "#/components/schemas/OpenAIChatCompletionToolCall" }, - "description": "List of tool calls. Each tool call is a ToolCall object." + "description": "List of tool calls. Each tool call is an OpenAIChatCompletionToolCall object." } }, "additionalProperties": false, @@ -8880,6 +8897,98 @@ "title": "OpenAIAssistantMessageParam", "description": "A message containing the model's (assistant) response in an OpenAI-compatible chat completion request." }, + "OpenAIChatCompletionContentPartImageParam": { + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "image_url", + "default": "image_url" + }, + "image_url": { + "$ref": "#/components/schemas/OpenAIImageURL" + } + }, + "additionalProperties": false, + "required": [ + "type", + "image_url" + ], + "title": "OpenAIChatCompletionContentPartImageParam" + }, + "OpenAIChatCompletionContentPartParam": { + "oneOf": [ + { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartTextParam" + }, + { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartImageParam" + } + ], + "discriminator": { + "propertyName": "type", + "mapping": { + "text": "#/components/schemas/OpenAIChatCompletionContentPartTextParam", + "image_url": "#/components/schemas/OpenAIChatCompletionContentPartImageParam" + } + } + }, + "OpenAIChatCompletionContentPartTextParam": { + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "text", + "default": "text" + }, + "text": { + "type": "string" + } + }, + "additionalProperties": false, + "required": [ + "type", + "text" + ], + "title": "OpenAIChatCompletionContentPartTextParam" + }, + "OpenAIChatCompletionToolCall": { + "type": "object", + "properties": { + "index": { + "type": "integer" + }, + "id": { + "type": "string" + }, + "type": { + "type": "string", + "const": "function", + "default": "function" + }, + "function": { + "$ref": "#/components/schemas/OpenAIChatCompletionToolCallFunction" + } + }, + "additionalProperties": false, + "required": [ + "type" + ], + "title": "OpenAIChatCompletionToolCall" + }, + "OpenAIChatCompletionToolCallFunction": { + "type": "object", + "properties": { + "name": { + "type": "string" + }, + "arguments": { + "type": "string" + } + }, + "additionalProperties": false, + "title": "OpenAIChatCompletionToolCallFunction" + }, "OpenAIDeveloperMessageParam": { "type": "object", "properties": { @@ -8890,7 +8999,17 @@ "description": "Must be \"developer\" to identify this as a developer message" }, "content": { - "$ref": "#/components/schemas/InterleavedContent", + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartParam" + } + } + ], "description": "The content of the developer message" }, "name": { @@ -8906,6 +9025,66 @@ "title": "OpenAIDeveloperMessageParam", "description": "A message from the developer in an OpenAI-compatible chat completion request." }, + "OpenAIImageURL": { + "type": "object", + "properties": { + "url": { + "type": "string" + }, + "detail": { + "type": "string" + } + }, + "additionalProperties": false, + "required": [ + "url" + ], + "title": "OpenAIImageURL" + }, + "OpenAIJSONSchema": { + "type": "object", + "properties": { + "name": { + "type": "string" + }, + "description": { + "type": "string" + }, + "strict": { + "type": "boolean" + }, + "schema": { + "type": "object", + "additionalProperties": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "boolean" + }, + { + "type": "number" + }, + { + "type": "string" + }, + { + "type": "array" + }, + { + "type": "object" + } + ] + } + } + }, + "additionalProperties": false, + "required": [ + "name" + ], + "title": "OpenAIJSONSchema" + }, "OpenAIMessageParam": { "oneOf": [ { @@ -8935,6 +9114,76 @@ } } }, + "OpenAIResponseFormatJSONObject": { + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "json_object", + "default": "json_object" + } + }, + "additionalProperties": false, + "required": [ + "type" + ], + "title": "OpenAIResponseFormatJSONObject" + }, + "OpenAIResponseFormatJSONSchema": { + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "json_schema", + "default": "json_schema" + }, + "json_schema": { + "$ref": "#/components/schemas/OpenAIJSONSchema" + } + }, + "additionalProperties": false, + "required": [ + "type", + "json_schema" + ], + "title": "OpenAIResponseFormatJSONSchema" + }, + "OpenAIResponseFormatParam": { + "oneOf": [ + { + "$ref": "#/components/schemas/OpenAIResponseFormatText" + }, + { + "$ref": "#/components/schemas/OpenAIResponseFormatJSONSchema" + }, + { + "$ref": "#/components/schemas/OpenAIResponseFormatJSONObject" + } + ], + "discriminator": { + "propertyName": "type", + "mapping": { + "text": "#/components/schemas/OpenAIResponseFormatText", + "json_schema": "#/components/schemas/OpenAIResponseFormatJSONSchema", + "json_object": "#/components/schemas/OpenAIResponseFormatJSONObject" + } + } + }, + "OpenAIResponseFormatText": { + "type": "object", + "properties": { + "type": { + "type": "string", + "const": "text", + "default": "text" + } + }, + "additionalProperties": false, + "required": [ + "type" + ], + "title": "OpenAIResponseFormatText" + }, "OpenAISystemMessageParam": { "type": "object", "properties": { @@ -8945,7 +9194,17 @@ "description": "Must be \"system\" to identify this as a system message" }, "content": { - "$ref": "#/components/schemas/InterleavedContent", + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartParam" + } + } + ], "description": "The content of the \"system prompt\". If multiple system messages are provided, they are concatenated. The underlying Llama Stack code may also add other system messages (for example, for formatting tool definitions)." }, "name": { @@ -8975,7 +9234,17 @@ "description": "Unique identifier for the tool call this response is for" }, "content": { - "$ref": "#/components/schemas/InterleavedContent", + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartParam" + } + } + ], "description": "The response content from the tool" } }, @@ -8998,7 +9267,17 @@ "description": "Must be \"user\" to identify this as a user message" }, "content": { - "$ref": "#/components/schemas/InterleavedContent", + "oneOf": [ + { + "type": "string" + }, + { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChatCompletionContentPartParam" + } + } + ], "description": "The content of the message, which can include text and other media" }, "name": { @@ -9126,10 +9405,7 @@ "description": "(Optional) The penalty for repeated tokens" }, "response_format": { - "type": "object", - "additionalProperties": { - "type": "string" - }, + "$ref": "#/components/schemas/OpenAIResponseFormatParam", "description": "(Optional) The response format to use" }, "seed": { @@ -9306,6 +9582,46 @@ "title": "OpenAIChatCompletion", "description": "Response from an OpenAI-compatible chat completion request." }, + "OpenAIChatCompletionChunk": { + "type": "object", + "properties": { + "id": { + "type": "string", + "description": "The ID of the chat completion" + }, + "choices": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChunkChoice" + }, + "description": "List of choices" + }, + "object": { + "type": "string", + "const": "chat.completion.chunk", + "default": "chat.completion.chunk", + "description": "The object type, which will be \"chat.completion.chunk\"" + }, + "created": { + "type": "integer", + "description": "The Unix timestamp in seconds when the chat completion was created" + }, + "model": { + "type": "string", + "description": "The model that was used to generate the chat completion" + } + }, + "additionalProperties": false, + "required": [ + "id", + "choices", + "object", + "created", + "model" + ], + "title": "OpenAIChatCompletionChunk", + "description": "Chunk from a streaming response to an OpenAI-compatible chat completion request." + }, "OpenAIChoice": { "type": "object", "properties": { @@ -9318,10 +9634,12 @@ "description": "The reason the model stopped generating" }, "index": { - "type": "integer" + "type": "integer", + "description": "The index of the choice" }, "logprobs": { - "$ref": "#/components/schemas/OpenAIChoiceLogprobs" + "$ref": "#/components/schemas/OpenAIChoiceLogprobs", + "description": "(Optional) The log probabilities for the tokens in the message" } }, "additionalProperties": false, @@ -9333,6 +9651,33 @@ "title": "OpenAIChoice", "description": "A choice from an OpenAI-compatible chat completion response." }, + "OpenAIChoiceDelta": { + "type": "object", + "properties": { + "content": { + "type": "string", + "description": "(Optional) The content of the delta" + }, + "refusal": { + "type": "string", + "description": "(Optional) The refusal of the delta" + }, + "role": { + "type": "string", + "description": "(Optional) The role of the delta" + }, + "tool_calls": { + "type": "array", + "items": { + "$ref": "#/components/schemas/OpenAIChatCompletionToolCall" + }, + "description": "(Optional) The tool calls of the delta" + } + }, + "additionalProperties": false, + "title": "OpenAIChoiceDelta", + "description": "A delta from an OpenAI-compatible chat completion streaming response." + }, "OpenAIChoiceLogprobs": { "type": "object", "properties": { @@ -9340,19 +9685,50 @@ "type": "array", "items": { "$ref": "#/components/schemas/OpenAITokenLogProb" - } + }, + "description": "(Optional) The log probabilities for the tokens in the message" }, "refusal": { "type": "array", "items": { "$ref": "#/components/schemas/OpenAITokenLogProb" - } + }, + "description": "(Optional) The log probabilities for the tokens in the message" } }, "additionalProperties": false, "title": "OpenAIChoiceLogprobs", "description": "The log probabilities for the tokens in the message from an OpenAI-compatible chat completion response." }, + "OpenAIChunkChoice": { + "type": "object", + "properties": { + "delta": { + "$ref": "#/components/schemas/OpenAIChoiceDelta", + "description": "The delta from the chunk" + }, + "finish_reason": { + "type": "string", + "description": "The reason the model stopped generating" + }, + "index": { + "type": "integer", + "description": "The index of the choice" + }, + "logprobs": { + "$ref": "#/components/schemas/OpenAIChoiceLogprobs", + "description": "(Optional) The log probabilities for the tokens in the message" + } + }, + "additionalProperties": false, + "required": [ + "delta", + "finish_reason", + "index" + ], + "title": "OpenAIChunkChoice", + "description": "A chunk choice from an OpenAI-compatible chat completion streaming response." + }, "OpenAITokenLogProb": { "type": "object", "properties": { diff --git a/docs/_static/llama-stack-spec.yaml b/docs/_static/llama-stack-spec.yaml index 6c99c9155..cf657bff9 100644 --- a/docs/_static/llama-stack-spec.yaml +++ b/docs/_static/llama-stack-spec.yaml @@ -2135,11 +2135,15 @@ paths: post: responses: '200': - description: OK + description: >- + Response from an OpenAI-compatible chat completion request. **OR** Chunk + from a streaming response to an OpenAI-compatible chat completion request. content: application/json: schema: - $ref: '#/components/schemas/OpenAIChatCompletion' + oneOf: + - $ref: '#/components/schemas/OpenAIChatCompletion' + - $ref: '#/components/schemas/OpenAIChatCompletionChunk' '400': $ref: '#/components/responses/BadRequest400' '429': @@ -6073,7 +6077,11 @@ components: description: >- Must be "assistant" to identify this as the model's response content: - $ref: '#/components/schemas/InterleavedContent' + oneOf: + - type: string + - type: array + items: + $ref: '#/components/schemas/OpenAIChatCompletionContentPartParam' description: The content of the model's response name: type: string @@ -6082,9 +6090,10 @@ components: tool_calls: type: array items: - $ref: '#/components/schemas/ToolCall' + $ref: '#/components/schemas/OpenAIChatCompletionToolCall' description: >- - List of tool calls. Each tool call is a ToolCall object. + List of tool calls. Each tool call is an OpenAIChatCompletionToolCall + object. additionalProperties: false required: - role @@ -6093,6 +6102,70 @@ components: description: >- A message containing the model's (assistant) response in an OpenAI-compatible chat completion request. + "OpenAIChatCompletionContentPartImageParam": + type: object + properties: + type: + type: string + const: image_url + default: image_url + image_url: + $ref: '#/components/schemas/OpenAIImageURL' + additionalProperties: false + required: + - type + - image_url + title: >- + OpenAIChatCompletionContentPartImageParam + OpenAIChatCompletionContentPartParam: + oneOf: + - $ref: '#/components/schemas/OpenAIChatCompletionContentPartTextParam' + - $ref: '#/components/schemas/OpenAIChatCompletionContentPartImageParam' + discriminator: + propertyName: type + mapping: + text: '#/components/schemas/OpenAIChatCompletionContentPartTextParam' + image_url: '#/components/schemas/OpenAIChatCompletionContentPartImageParam' + OpenAIChatCompletionContentPartTextParam: + type: object + properties: + type: + type: string + const: text + default: text + text: + type: string + additionalProperties: false + required: + - type + - text + title: OpenAIChatCompletionContentPartTextParam + OpenAIChatCompletionToolCall: + type: object + properties: + index: + type: integer + id: + type: string + type: + type: string + const: function + default: function + function: + $ref: '#/components/schemas/OpenAIChatCompletionToolCallFunction' + additionalProperties: false + required: + - type + title: OpenAIChatCompletionToolCall + OpenAIChatCompletionToolCallFunction: + type: object + properties: + name: + type: string + arguments: + type: string + additionalProperties: false + title: OpenAIChatCompletionToolCallFunction OpenAIDeveloperMessageParam: type: object properties: @@ -6103,7 +6176,11 @@ components: description: >- Must be "developer" to identify this as a developer message content: - $ref: '#/components/schemas/InterleavedContent' + oneOf: + - type: string + - type: array + items: + $ref: '#/components/schemas/OpenAIChatCompletionContentPartParam' description: The content of the developer message name: type: string @@ -6116,6 +6193,40 @@ components: title: OpenAIDeveloperMessageParam description: >- A message from the developer in an OpenAI-compatible chat completion request. + OpenAIImageURL: + type: object + properties: + url: + type: string + detail: + type: string + additionalProperties: false + required: + - url + title: OpenAIImageURL + OpenAIJSONSchema: + type: object + properties: + name: + type: string + description: + type: string + strict: + type: boolean + schema: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + additionalProperties: false + required: + - name + title: OpenAIJSONSchema OpenAIMessageParam: oneOf: - $ref: '#/components/schemas/OpenAIUserMessageParam' @@ -6131,6 +6242,53 @@ components: assistant: '#/components/schemas/OpenAIAssistantMessageParam' tool: '#/components/schemas/OpenAIToolMessageParam' developer: '#/components/schemas/OpenAIDeveloperMessageParam' + OpenAIResponseFormatJSONObject: + type: object + properties: + type: + type: string + const: json_object + default: json_object + additionalProperties: false + required: + - type + title: OpenAIResponseFormatJSONObject + OpenAIResponseFormatJSONSchema: + type: object + properties: + type: + type: string + const: json_schema + default: json_schema + json_schema: + $ref: '#/components/schemas/OpenAIJSONSchema' + additionalProperties: false + required: + - type + - json_schema + title: OpenAIResponseFormatJSONSchema + OpenAIResponseFormatParam: + oneOf: + - $ref: '#/components/schemas/OpenAIResponseFormatText' + - $ref: '#/components/schemas/OpenAIResponseFormatJSONSchema' + - $ref: '#/components/schemas/OpenAIResponseFormatJSONObject' + discriminator: + propertyName: type + mapping: + text: '#/components/schemas/OpenAIResponseFormatText' + json_schema: '#/components/schemas/OpenAIResponseFormatJSONSchema' + json_object: '#/components/schemas/OpenAIResponseFormatJSONObject' + OpenAIResponseFormatText: + type: object + properties: + type: + type: string + const: text + default: text + additionalProperties: false + required: + - type + title: OpenAIResponseFormatText OpenAISystemMessageParam: type: object properties: @@ -6141,7 +6299,11 @@ components: description: >- Must be "system" to identify this as a system message content: - $ref: '#/components/schemas/InterleavedContent' + oneOf: + - type: string + - type: array + items: + $ref: '#/components/schemas/OpenAIChatCompletionContentPartParam' description: >- The content of the "system prompt". If multiple system messages are provided, they are concatenated. The underlying Llama Stack code may also add other @@ -6171,7 +6333,11 @@ components: description: >- Unique identifier for the tool call this response is for content: - $ref: '#/components/schemas/InterleavedContent' + oneOf: + - type: string + - type: array + items: + $ref: '#/components/schemas/OpenAIChatCompletionContentPartParam' description: The response content from the tool additionalProperties: false required: @@ -6192,7 +6358,11 @@ components: description: >- Must be "user" to identify this as a user message content: - $ref: '#/components/schemas/InterleavedContent' + oneOf: + - type: string + - type: array + items: + $ref: '#/components/schemas/OpenAIChatCompletionContentPartParam' description: >- The content of the message, which can include text and other media name: @@ -6278,9 +6448,7 @@ components: description: >- (Optional) The penalty for repeated tokens response_format: - type: object - additionalProperties: - type: string + $ref: '#/components/schemas/OpenAIResponseFormatParam' description: (Optional) The response format to use seed: type: integer @@ -6386,6 +6554,41 @@ components: title: OpenAIChatCompletion description: >- Response from an OpenAI-compatible chat completion request. + OpenAIChatCompletionChunk: + type: object + properties: + id: + type: string + description: The ID of the chat completion + choices: + type: array + items: + $ref: '#/components/schemas/OpenAIChunkChoice' + description: List of choices + object: + type: string + const: chat.completion.chunk + default: chat.completion.chunk + description: >- + The object type, which will be "chat.completion.chunk" + created: + type: integer + description: >- + The Unix timestamp in seconds when the chat completion was created + model: + type: string + description: >- + The model that was used to generate the chat completion + additionalProperties: false + required: + - id + - choices + - object + - created + - model + title: OpenAIChatCompletionChunk + description: >- + Chunk from a streaming response to an OpenAI-compatible chat completion request. OpenAIChoice: type: object properties: @@ -6397,8 +6600,11 @@ components: description: The reason the model stopped generating index: type: integer + description: The index of the choice logprobs: $ref: '#/components/schemas/OpenAIChoiceLogprobs' + description: >- + (Optional) The log probabilities for the tokens in the message additionalProperties: false required: - message @@ -6407,6 +6613,27 @@ components: title: OpenAIChoice description: >- A choice from an OpenAI-compatible chat completion response. + OpenAIChoiceDelta: + type: object + properties: + content: + type: string + description: (Optional) The content of the delta + refusal: + type: string + description: (Optional) The refusal of the delta + role: + type: string + description: (Optional) The role of the delta + tool_calls: + type: array + items: + $ref: '#/components/schemas/OpenAIChatCompletionToolCall' + description: (Optional) The tool calls of the delta + additionalProperties: false + title: OpenAIChoiceDelta + description: >- + A delta from an OpenAI-compatible chat completion streaming response. OpenAIChoiceLogprobs: type: object properties: @@ -6414,15 +6641,43 @@ components: type: array items: $ref: '#/components/schemas/OpenAITokenLogProb' + description: >- + (Optional) The log probabilities for the tokens in the message refusal: type: array items: $ref: '#/components/schemas/OpenAITokenLogProb' + description: >- + (Optional) The log probabilities for the tokens in the message additionalProperties: false title: OpenAIChoiceLogprobs description: >- The log probabilities for the tokens in the message from an OpenAI-compatible chat completion response. + OpenAIChunkChoice: + type: object + properties: + delta: + $ref: '#/components/schemas/OpenAIChoiceDelta' + description: The delta from the chunk + finish_reason: + type: string + description: The reason the model stopped generating + index: + type: integer + description: The index of the choice + logprobs: + $ref: '#/components/schemas/OpenAIChoiceLogprobs' + description: >- + (Optional) The log probabilities for the tokens in the message + additionalProperties: false + required: + - delta + - finish_reason + - index + title: OpenAIChunkChoice + description: >- + A chunk choice from an OpenAI-compatible chat completion streaming response. OpenAITokenLogProb: type: object properties: diff --git a/docs/source/distributions/self_hosted_distro/groq.md b/docs/source/distributions/self_hosted_distro/groq.md index 4f5a8a859..b18be1b2f 100644 --- a/docs/source/distributions/self_hosted_distro/groq.md +++ b/docs/source/distributions/self_hosted_distro/groq.md @@ -43,7 +43,9 @@ The following models are available by default: - `groq/llama-3.3-70b-versatile (aliases: meta-llama/Llama-3.3-70B-Instruct)` - `groq/llama-3.2-3b-preview (aliases: meta-llama/Llama-3.2-3B-Instruct)` - `groq/llama-4-scout-17b-16e-instruct (aliases: meta-llama/Llama-4-Scout-17B-16E-Instruct)` +- `groq/meta-llama/llama-4-scout-17b-16e-instruct (aliases: meta-llama/Llama-4-Scout-17B-16E-Instruct)` - `groq/llama-4-maverick-17b-128e-instruct (aliases: meta-llama/Llama-4-Maverick-17B-128E-Instruct)` +- `groq/meta-llama/llama-4-maverick-17b-128e-instruct (aliases: meta-llama/Llama-4-Maverick-17B-128E-Instruct)` ### Prerequisite: API Keys diff --git a/llama_stack/apis/inference/inference.py b/llama_stack/apis/inference/inference.py index 21753ca23..596efb136 100644 --- a/llama_stack/apis/inference/inference.py +++ b/llama_stack/apis/inference/inference.py @@ -18,7 +18,7 @@ from typing import ( ) from pydantic import BaseModel, Field, field_validator -from typing_extensions import Annotated +from typing_extensions import Annotated, TypedDict from llama_stack.apis.common.content_types import ContentDelta, InterleavedContent, InterleavedContentItem from llama_stack.apis.models import Model @@ -442,6 +442,37 @@ class EmbeddingsResponse(BaseModel): embeddings: List[List[float]] +@json_schema_type +class OpenAIChatCompletionContentPartTextParam(BaseModel): + type: Literal["text"] = "text" + text: str + + +@json_schema_type +class OpenAIImageURL(BaseModel): + url: str + detail: Optional[str] = None + + +@json_schema_type +class OpenAIChatCompletionContentPartImageParam(BaseModel): + type: Literal["image_url"] = "image_url" + image_url: OpenAIImageURL + + +OpenAIChatCompletionContentPartParam = Annotated[ + Union[ + OpenAIChatCompletionContentPartTextParam, + OpenAIChatCompletionContentPartImageParam, + ], + Field(discriminator="type"), +] +register_schema(OpenAIChatCompletionContentPartParam, name="OpenAIChatCompletionContentPartParam") + + +OpenAIChatCompletionMessageContent = Union[str, List[OpenAIChatCompletionContentPartParam]] + + @json_schema_type class OpenAIUserMessageParam(BaseModel): """A message from the user in an OpenAI-compatible chat completion request. @@ -452,7 +483,7 @@ class OpenAIUserMessageParam(BaseModel): """ role: Literal["user"] = "user" - content: InterleavedContent + content: OpenAIChatCompletionMessageContent name: Optional[str] = None @@ -466,10 +497,24 @@ class OpenAISystemMessageParam(BaseModel): """ role: Literal["system"] = "system" - content: InterleavedContent + content: OpenAIChatCompletionMessageContent name: Optional[str] = None +@json_schema_type +class OpenAIChatCompletionToolCallFunction(BaseModel): + name: Optional[str] = None + arguments: Optional[str] = None + + +@json_schema_type +class OpenAIChatCompletionToolCall(BaseModel): + index: Optional[int] = None + id: Optional[str] = None + type: Literal["function"] = "function" + function: Optional[OpenAIChatCompletionToolCallFunction] = None + + @json_schema_type class OpenAIAssistantMessageParam(BaseModel): """A message containing the model's (assistant) response in an OpenAI-compatible chat completion request. @@ -477,13 +522,13 @@ class OpenAIAssistantMessageParam(BaseModel): :param role: Must be "assistant" to identify this as the model's response :param content: The content of the model's response :param name: (Optional) The name of the assistant message participant. - :param tool_calls: List of tool calls. Each tool call is a ToolCall object. + :param tool_calls: List of tool calls. Each tool call is an OpenAIChatCompletionToolCall object. """ role: Literal["assistant"] = "assistant" - content: InterleavedContent + content: OpenAIChatCompletionMessageContent name: Optional[str] = None - tool_calls: Optional[List[ToolCall]] = Field(default_factory=list) + tool_calls: Optional[List[OpenAIChatCompletionToolCall]] = Field(default_factory=list) @json_schema_type @@ -497,7 +542,7 @@ class OpenAIToolMessageParam(BaseModel): role: Literal["tool"] = "tool" tool_call_id: str - content: InterleavedContent + content: OpenAIChatCompletionMessageContent @json_schema_type @@ -510,7 +555,7 @@ class OpenAIDeveloperMessageParam(BaseModel): """ role: Literal["developer"] = "developer" - content: InterleavedContent + content: OpenAIChatCompletionMessageContent name: Optional[str] = None @@ -527,6 +572,46 @@ OpenAIMessageParam = Annotated[ register_schema(OpenAIMessageParam, name="OpenAIMessageParam") +@json_schema_type +class OpenAIResponseFormatText(BaseModel): + type: Literal["text"] = "text" + + +@json_schema_type +class OpenAIJSONSchema(TypedDict, total=False): + name: str + description: Optional[str] = None + strict: Optional[bool] = None + + # Pydantic BaseModel cannot be used with a schema param, since it already + # has one. And, we don't want to alias here because then have to handle + # that alias when converting to OpenAI params. So, to support schema, + # we use a TypedDict. + schema: Optional[Dict[str, Any]] = None + + +@json_schema_type +class OpenAIResponseFormatJSONSchema(BaseModel): + type: Literal["json_schema"] = "json_schema" + json_schema: OpenAIJSONSchema + + +@json_schema_type +class OpenAIResponseFormatJSONObject(BaseModel): + type: Literal["json_object"] = "json_object" + + +OpenAIResponseFormatParam = Annotated[ + Union[ + OpenAIResponseFormatText, + OpenAIResponseFormatJSONSchema, + OpenAIResponseFormatJSONObject, + ], + Field(discriminator="type"), +] +register_schema(OpenAIResponseFormatParam, name="OpenAIResponseFormatParam") + + @json_schema_type class OpenAITopLogProb(BaseModel): """The top log probability for a token from an OpenAI-compatible chat completion response. @@ -561,22 +646,54 @@ class OpenAITokenLogProb(BaseModel): class OpenAIChoiceLogprobs(BaseModel): """The log probabilities for the tokens in the message from an OpenAI-compatible chat completion response. - :content: (Optional) The log probabilities for the tokens in the message - :refusal: (Optional) The log probabilities for the tokens in the message + :param content: (Optional) The log probabilities for the tokens in the message + :param refusal: (Optional) The log probabilities for the tokens in the message """ content: Optional[List[OpenAITokenLogProb]] = None refusal: Optional[List[OpenAITokenLogProb]] = None +@json_schema_type +class OpenAIChoiceDelta(BaseModel): + """A delta from an OpenAI-compatible chat completion streaming response. + + :param content: (Optional) The content of the delta + :param refusal: (Optional) The refusal of the delta + :param role: (Optional) The role of the delta + :param tool_calls: (Optional) The tool calls of the delta + """ + + content: Optional[str] = None + refusal: Optional[str] = None + role: Optional[str] = None + tool_calls: Optional[List[OpenAIChatCompletionToolCall]] = None + + +@json_schema_type +class OpenAIChunkChoice(BaseModel): + """A chunk choice from an OpenAI-compatible chat completion streaming response. + + :param delta: The delta from the chunk + :param finish_reason: The reason the model stopped generating + :param index: The index of the choice + :param logprobs: (Optional) The log probabilities for the tokens in the message + """ + + delta: OpenAIChoiceDelta + finish_reason: str + index: int + logprobs: Optional[OpenAIChoiceLogprobs] = None + + @json_schema_type class OpenAIChoice(BaseModel): """A choice from an OpenAI-compatible chat completion response. :param message: The message from the model :param finish_reason: The reason the model stopped generating - :index: The index of the choice - :logprobs: (Optional) The log probabilities for the tokens in the message + :param index: The index of the choice + :param logprobs: (Optional) The log probabilities for the tokens in the message """ message: OpenAIMessageParam @@ -603,6 +720,24 @@ class OpenAIChatCompletion(BaseModel): model: str +@json_schema_type +class OpenAIChatCompletionChunk(BaseModel): + """Chunk from a streaming response to an OpenAI-compatible chat completion request. + + :param id: The ID of the chat completion + :param choices: List of choices + :param object: The object type, which will be "chat.completion.chunk" + :param created: The Unix timestamp in seconds when the chat completion was created + :param model: The model that was used to generate the chat completion + """ + + id: str + choices: List[OpenAIChunkChoice] + object: Literal["chat.completion.chunk"] = "chat.completion.chunk" + created: int + model: str + + @json_schema_type class OpenAICompletionLogprobs(BaseModel): """The log probabilities for the tokens in the message from an OpenAI-compatible completion response. @@ -872,7 +1007,7 @@ class Inference(Protocol): n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -883,7 +1018,7 @@ class Inference(Protocol): top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: """Generate an OpenAI-compatible chat completion for the given messages using the specified model. :param model: The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint. diff --git a/llama_stack/distribution/routers/routers.py b/llama_stack/distribution/routers/routers.py index cdf91e052..17aecdaf8 100644 --- a/llama_stack/distribution/routers/routers.py +++ b/llama_stack/distribution/routers/routers.py @@ -38,7 +38,13 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.apis.models import Model, ModelType from llama_stack.apis.safety import RunShieldResponse, Safety from llama_stack.apis.scoring import ( @@ -531,7 +537,7 @@ class InferenceRouter(Inference): n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -542,7 +548,7 @@ class InferenceRouter(Inference): top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: logger.debug( f"InferenceRouter.openai_chat_completion: {model=}, {stream=}, {messages=}", ) diff --git a/llama_stack/models/llama/llama3/tool_utils.py b/llama_stack/models/llama/llama3/tool_utils.py index ef39ba0a5..91b46ec98 100644 --- a/llama_stack/models/llama/llama3/tool_utils.py +++ b/llama_stack/models/llama/llama3/tool_utils.py @@ -204,7 +204,9 @@ class ToolUtils: return None elif is_json(message_body): response = json.loads(message_body) - if ("type" in response and response["type"] == "function") or ("name" in response): + if ("type" in response and response["type"] == "function") or ( + "name" in response and "parameters" in response + ): function_name = response["name"] args = response["parameters"] return function_name, args diff --git a/llama_stack/providers/inline/inference/meta_reference/inference.py b/llama_stack/providers/inline/inference/meta_reference/inference.py index 0b56ba1f7..2b9a27982 100644 --- a/llama_stack/providers/inline/inference/meta_reference/inference.py +++ b/llama_stack/providers/inline/inference/meta_reference/inference.py @@ -59,8 +59,8 @@ from llama_stack.providers.utils.inference.model_registry import ( build_hf_repo_model_entry, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ) from llama_stack.providers.utils.inference.prompt_adapter import ( augment_content_with_response_format_prompt, @@ -83,8 +83,8 @@ def llama_builder_fn(config: MetaReferenceInferenceConfig, model_id: str, llama_ class MetaReferenceInferenceImpl( - OpenAICompletionUnsupportedMixin, - OpenAIChatCompletionUnsupportedMixin, + OpenAICompletionToLlamaStackMixin, + OpenAIChatCompletionToLlamaStackMixin, SentenceTransformerEmbeddingMixin, Inference, ModelsProtocolPrivate, diff --git a/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py b/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py index 5bc20e3c2..d717d055f 100644 --- a/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py +++ b/llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers.py @@ -25,8 +25,8 @@ from llama_stack.providers.utils.inference.embedding_mixin import ( SentenceTransformerEmbeddingMixin, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ) from .config import SentenceTransformersInferenceConfig @@ -35,8 +35,8 @@ log = logging.getLogger(__name__) class SentenceTransformersInferenceImpl( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, SentenceTransformerEmbeddingMixin, Inference, ModelsProtocolPrivate, diff --git a/llama_stack/providers/inline/inference/vllm/vllm.py b/llama_stack/providers/inline/inference/vllm/vllm.py index 085c79d6b..9d742c39c 100644 --- a/llama_stack/providers/inline/inference/vllm/vllm.py +++ b/llama_stack/providers/inline/inference/vllm/vllm.py @@ -66,10 +66,10 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelsProtocolPrivate, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, OpenAICompatCompletionChoice, OpenAICompatCompletionResponse, - OpenAICompletionUnsupportedMixin, + OpenAICompletionToLlamaStackMixin, get_stop_reason, process_chat_completion_stream_response, ) @@ -176,8 +176,8 @@ def _convert_sampling_params( class VLLMInferenceImpl( Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ModelsProtocolPrivate, ): """ diff --git a/llama_stack/providers/remote/inference/bedrock/bedrock.py b/llama_stack/providers/remote/inference/bedrock/bedrock.py index 0a485da8f..f8dbcf31a 100644 --- a/llama_stack/providers/remote/inference/bedrock/bedrock.py +++ b/llama_stack/providers/remote/inference/bedrock/bedrock.py @@ -36,10 +36,10 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, OpenAICompatCompletionChoice, OpenAICompatCompletionResponse, - OpenAICompletionUnsupportedMixin, + OpenAICompletionToLlamaStackMixin, get_sampling_strategy_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -56,8 +56,8 @@ from .models import MODEL_ENTRIES class BedrockInferenceAdapter( ModelRegistryHelper, Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ): def __init__(self, config: BedrockConfig) -> None: ModelRegistryHelper.__init__(self, MODEL_ENTRIES) diff --git a/llama_stack/providers/remote/inference/cerebras/cerebras.py b/llama_stack/providers/remote/inference/cerebras/cerebras.py index 5e0a5b484..3156601be 100644 --- a/llama_stack/providers/remote/inference/cerebras/cerebras.py +++ b/llama_stack/providers/remote/inference/cerebras/cerebras.py @@ -34,8 +34,8 @@ from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -54,8 +54,8 @@ from .models import MODEL_ENTRIES class CerebrasInferenceAdapter( ModelRegistryHelper, Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ): def __init__(self, config: CerebrasImplConfig) -> None: ModelRegistryHelper.__init__( diff --git a/llama_stack/providers/remote/inference/databricks/databricks.py b/llama_stack/providers/remote/inference/databricks/databricks.py index a10878b27..27d96eb7d 100644 --- a/llama_stack/providers/remote/inference/databricks/databricks.py +++ b/llama_stack/providers/remote/inference/databricks/databricks.py @@ -34,8 +34,8 @@ from llama_stack.providers.utils.inference.model_registry import ( build_hf_repo_model_entry, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -61,8 +61,8 @@ model_entries = [ class DatabricksInferenceAdapter( ModelRegistryHelper, Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ): def __init__(self, config: DatabricksImplConfig) -> None: ModelRegistryHelper.__init__(self, model_entries=model_entries) diff --git a/llama_stack/providers/remote/inference/fireworks/fireworks.py b/llama_stack/providers/remote/inference/fireworks/fireworks.py index b59e9f2cb..48c163c87 100644 --- a/llama_stack/providers/remote/inference/fireworks/fireworks.py +++ b/llama_stack/providers/remote/inference/fireworks/fireworks.py @@ -4,7 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import Any, AsyncGenerator, Dict, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union from fireworks.client import Fireworks from openai import AsyncOpenAI @@ -32,13 +32,20 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.log import get_logger from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, ) from llama_stack.providers.utils.inference.openai_compat import ( + OpenAIChatCompletionToLlamaStackMixin, convert_message_to_openai_dict, get_sampling_options, prepare_openai_completion_params, @@ -301,6 +308,11 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv prompt_logprobs: Optional[int] = None, ) -> OpenAICompletion: model_obj = await self.model_store.get_model(model) + + # Fireworks always prepends with BOS + if isinstance(prompt, str) and prompt.startswith("<|begin_of_text|>"): + prompt = prompt[len("<|begin_of_text|>") :] + params = await prepare_openai_completion_params( model=model_obj.provider_resource_id, prompt=prompt, @@ -320,6 +332,7 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv top_p=top_p, user=user, ) + return await self._get_openai_client().completions.create(**params) async def openai_chat_completion( @@ -336,7 +349,7 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -347,10 +360,9 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: model_obj = await self.model_store.get_model(model) params = await prepare_openai_completion_params( - model=model_obj.provider_resource_id, messages=messages, frequency_penalty=frequency_penalty, function_call=function_call, @@ -374,4 +386,12 @@ class FireworksInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProv top_p=top_p, user=user, ) - return await self._get_openai_client().chat.completions.create(**params) + + # Divert Llama Models through Llama Stack inference APIs because + # Fireworks chat completions OpenAI-compatible API does not support + # tool calls properly. + llama_model = self.get_llama_model(model_obj.provider_resource_id) + if llama_model: + return await OpenAIChatCompletionToLlamaStackMixin.openai_chat_completion(self, model=model, **params) + + return await self._get_openai_client().chat.completions.create(model=model_obj.provider_resource_id, **params) diff --git a/llama_stack/providers/remote/inference/groq/groq.py b/llama_stack/providers/remote/inference/groq/groq.py index c8789434f..f3f14e9af 100644 --- a/llama_stack/providers/remote/inference/groq/groq.py +++ b/llama_stack/providers/remote/inference/groq/groq.py @@ -4,8 +4,24 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +from typing import Any, AsyncIterator, Dict, List, Optional, Union + +from openai import AsyncOpenAI + +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAIChoiceDelta, + OpenAIChunkChoice, + OpenAIMessageParam, + OpenAIResponseFormatParam, + OpenAISystemMessageParam, +) from llama_stack.providers.remote.inference.groq.config import GroqConfig from llama_stack.providers.utils.inference.litellm_openai_mixin import LiteLLMOpenAIMixin +from llama_stack.providers.utils.inference.openai_compat import ( + prepare_openai_completion_params, +) from .models import MODEL_ENTRIES @@ -21,9 +37,129 @@ class GroqInferenceAdapter(LiteLLMOpenAIMixin): provider_data_api_key_field="groq_api_key", ) self.config = config + self._openai_client = None async def initialize(self): await super().initialize() async def shutdown(self): await super().shutdown() + if self._openai_client: + await self._openai_client.close() + self._openai_client = None + + def _get_openai_client(self) -> AsyncOpenAI: + if not self._openai_client: + self._openai_client = AsyncOpenAI( + base_url=f"{self.config.url}/openai/v1", + api_key=self.config.api_key, + ) + return self._openai_client + + async def openai_chat_completion( + self, + model: str, + messages: List[OpenAIMessageParam], + frequency_penalty: Optional[float] = None, + function_call: Optional[Union[str, Dict[str, Any]]] = None, + functions: Optional[List[Dict[str, Any]]] = None, + logit_bias: Optional[Dict[str, float]] = None, + logprobs: Optional[bool] = None, + max_completion_tokens: Optional[int] = None, + max_tokens: Optional[int] = None, + n: Optional[int] = None, + parallel_tool_calls: Optional[bool] = None, + presence_penalty: Optional[float] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, + seed: Optional[int] = None, + stop: Optional[Union[str, List[str]]] = None, + stream: Optional[bool] = None, + stream_options: Optional[Dict[str, Any]] = None, + temperature: Optional[float] = None, + tool_choice: Optional[Union[str, Dict[str, Any]]] = None, + tools: Optional[List[Dict[str, Any]]] = None, + top_logprobs: Optional[int] = None, + top_p: Optional[float] = None, + user: Optional[str] = None, + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: + model_obj = await self.model_store.get_model(model) + + # Groq does not support json_schema response format, so we need to convert it to json_object + if response_format and response_format.type == "json_schema": + response_format.type = "json_object" + schema = response_format.json_schema.get("schema", {}) + response_format.json_schema = None + json_instructions = f"\nYour response should be a JSON object that matches the following schema: {schema}" + if messages and messages[0].role == "system": + messages[0].content = messages[0].content + json_instructions + else: + messages.insert(0, OpenAISystemMessageParam(content=json_instructions)) + + # Groq returns a 400 error if tools are provided but none are called + # So, set tool_choice to "required" to attempt to force a call + if tools and (not tool_choice or tool_choice == "auto"): + tool_choice = "required" + + params = await prepare_openai_completion_params( + model=model_obj.provider_resource_id.replace("groq/", ""), + messages=messages, + frequency_penalty=frequency_penalty, + function_call=function_call, + functions=functions, + logit_bias=logit_bias, + logprobs=logprobs, + max_completion_tokens=max_completion_tokens, + max_tokens=max_tokens, + n=n, + parallel_tool_calls=parallel_tool_calls, + presence_penalty=presence_penalty, + response_format=response_format, + seed=seed, + stop=stop, + stream=stream, + stream_options=stream_options, + temperature=temperature, + tool_choice=tool_choice, + tools=tools, + top_logprobs=top_logprobs, + top_p=top_p, + user=user, + ) + + # Groq does not support streaming requests that set response_format + fake_stream = False + if stream and response_format: + params["stream"] = False + fake_stream = True + + response = await self._get_openai_client().chat.completions.create(**params) + + if fake_stream: + chunk_choices = [] + for choice in response.choices: + delta = OpenAIChoiceDelta( + content=choice.message.content, + role=choice.message.role, + tool_calls=choice.message.tool_calls, + ) + chunk_choice = OpenAIChunkChoice( + delta=delta, + finish_reason=choice.finish_reason, + index=choice.index, + logprobs=None, + ) + chunk_choices.append(chunk_choice) + chunk = OpenAIChatCompletionChunk( + id=response.id, + choices=chunk_choices, + object="chat.completion.chunk", + created=response.created, + model=response.model, + ) + + async def _fake_stream_generator(): + yield chunk + + return _fake_stream_generator() + else: + return response diff --git a/llama_stack/providers/remote/inference/groq/models.py b/llama_stack/providers/remote/inference/groq/models.py index d0c10ca62..0b4b81cfe 100644 --- a/llama_stack/providers/remote/inference/groq/models.py +++ b/llama_stack/providers/remote/inference/groq/models.py @@ -39,8 +39,16 @@ MODEL_ENTRIES = [ "groq/llama-4-scout-17b-16e-instruct", CoreModelId.llama4_scout_17b_16e_instruct.value, ), + build_hf_repo_model_entry( + "groq/meta-llama/llama-4-scout-17b-16e-instruct", + CoreModelId.llama4_scout_17b_16e_instruct.value, + ), build_hf_repo_model_entry( "groq/llama-4-maverick-17b-128e-instruct", CoreModelId.llama4_maverick_17b_128e_instruct.value, ), + build_hf_repo_model_entry( + "groq/meta-llama/llama-4-maverick-17b-128e-instruct", + CoreModelId.llama4_maverick_17b_128e_instruct.value, + ), ] diff --git a/llama_stack/providers/remote/inference/nvidia/nvidia.py b/llama_stack/providers/remote/inference/nvidia/nvidia.py index d6f717719..15f0e72a1 100644 --- a/llama_stack/providers/remote/inference/nvidia/nvidia.py +++ b/llama_stack/providers/remote/inference/nvidia/nvidia.py @@ -35,7 +35,13 @@ from llama_stack.apis.inference import ( ToolConfig, ToolDefinition, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.models.llama.datatypes import ToolPromptFormat from llama_stack.providers.utils.inference.model_registry import ( ModelRegistryHelper, @@ -329,7 +335,7 @@ class NVIDIAInferenceAdapter(Inference, ModelRegistryHelper): n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -340,7 +346,7 @@ class NVIDIAInferenceAdapter(Inference, ModelRegistryHelper): top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: provider_model_id = self.get_provider_model_id(model) params = await prepare_openai_completion_params( diff --git a/llama_stack/providers/remote/inference/ollama/ollama.py b/llama_stack/providers/remote/inference/ollama/ollama.py index f84863385..804d7eab2 100644 --- a/llama_stack/providers/remote/inference/ollama/ollama.py +++ b/llama_stack/providers/remote/inference/ollama/ollama.py @@ -5,7 +5,7 @@ # the root directory of this source tree. -from typing import Any, AsyncGenerator, Dict, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union import httpx from ollama import AsyncClient @@ -39,7 +39,13 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.apis.models import Model, ModelType from llama_stack.log import get_logger from llama_stack.providers.datatypes import ( @@ -408,7 +414,7 @@ class OllamaInferenceAdapter( n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -419,7 +425,7 @@ class OllamaInferenceAdapter( top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: model_obj = await self._get_model(model) params = { k: v diff --git a/llama_stack/providers/remote/inference/passthrough/passthrough.py b/llama_stack/providers/remote/inference/passthrough/passthrough.py index 0eb38c395..af05320b0 100644 --- a/llama_stack/providers/remote/inference/passthrough/passthrough.py +++ b/llama_stack/providers/remote/inference/passthrough/passthrough.py @@ -4,7 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import Any, AsyncGenerator, Dict, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union from llama_stack_client import AsyncLlamaStackClient @@ -26,7 +26,13 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.apis.models import Model from llama_stack.distribution.library_client import convert_pydantic_to_json_value, convert_to_pydantic from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper @@ -266,7 +272,7 @@ class PassthroughInferenceAdapter(Inference): n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -277,7 +283,7 @@ class PassthroughInferenceAdapter(Inference): top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: client = self._get_client() model_obj = await self.model_store.get_model(model) diff --git a/llama_stack/providers/remote/inference/runpod/runpod.py b/llama_stack/providers/remote/inference/runpod/runpod.py index 878460122..72cbead9b 100644 --- a/llama_stack/providers/remote/inference/runpod/runpod.py +++ b/llama_stack/providers/remote/inference/runpod/runpod.py @@ -12,8 +12,8 @@ from llama_stack.apis.inference import * # noqa: F403 # from llama_stack.providers.datatypes import ModelsProtocolPrivate from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -43,8 +43,8 @@ RUNPOD_SUPPORTED_MODELS = { class RunpodInferenceAdapter( ModelRegistryHelper, Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ): def __init__(self, config: RunpodImplConfig) -> None: ModelRegistryHelper.__init__(self, stack_to_provider_models_map=RUNPOD_SUPPORTED_MODELS) diff --git a/llama_stack/providers/remote/inference/sambanova/sambanova.py b/llama_stack/providers/remote/inference/sambanova/sambanova.py index c503657eb..1665e72b8 100644 --- a/llama_stack/providers/remote/inference/sambanova/sambanova.py +++ b/llama_stack/providers/remote/inference/sambanova/sambanova.py @@ -42,8 +42,8 @@ from llama_stack.apis.inference import ( ) from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, process_chat_completion_stream_response, ) from llama_stack.providers.utils.inference.prompt_adapter import ( @@ -57,8 +57,8 @@ from .models import MODEL_ENTRIES class SambaNovaInferenceAdapter( ModelRegistryHelper, Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ): def __init__(self, config: SambaNovaImplConfig) -> None: ModelRegistryHelper.__init__(self, model_entries=MODEL_ENTRIES) diff --git a/llama_stack/providers/remote/inference/tgi/tgi.py b/llama_stack/providers/remote/inference/tgi/tgi.py index 8f5b5e3cc..4ee386a15 100644 --- a/llama_stack/providers/remote/inference/tgi/tgi.py +++ b/llama_stack/providers/remote/inference/tgi/tgi.py @@ -40,10 +40,10 @@ from llama_stack.providers.utils.inference.model_registry import ( build_hf_repo_model_entry, ) from llama_stack.providers.utils.inference.openai_compat import ( - OpenAIChatCompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, OpenAICompatCompletionChoice, OpenAICompatCompletionResponse, - OpenAICompletionUnsupportedMixin, + OpenAICompletionToLlamaStackMixin, get_sampling_options, process_chat_completion_response, process_chat_completion_stream_response, @@ -73,8 +73,8 @@ def build_hf_repo_model_entries(): class _HfAdapter( Inference, - OpenAIChatCompletionUnsupportedMixin, - OpenAICompletionUnsupportedMixin, + OpenAIChatCompletionToLlamaStackMixin, + OpenAICompletionToLlamaStackMixin, ModelsProtocolPrivate, ): client: AsyncInferenceClient diff --git a/llama_stack/providers/remote/inference/together/together.py b/llama_stack/providers/remote/inference/together/together.py index 1615b8cd1..001e6aac4 100644 --- a/llama_stack/providers/remote/inference/together/together.py +++ b/llama_stack/providers/remote/inference/together/together.py @@ -4,7 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from typing import Any, AsyncGenerator, Dict, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union from openai import AsyncOpenAI from together import AsyncTogether @@ -31,7 +31,13 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.log import get_logger from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper @@ -315,7 +321,7 @@ class TogetherInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProvi n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -326,7 +332,7 @@ class TogetherInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProvi top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: model_obj = await self.model_store.get_model(model) params = await prepare_openai_completion_params( model=model_obj.provider_resource_id, @@ -353,4 +359,26 @@ class TogetherInferenceAdapter(ModelRegistryHelper, Inference, NeedsRequestProvi top_p=top_p, user=user, ) + if params.get("stream", True): + return self._stream_openai_chat_completion(params) return await self._get_openai_client().chat.completions.create(**params) # type: ignore + + async def _stream_openai_chat_completion(self, params: dict) -> AsyncGenerator: + # together.ai sometimes adds usage data to the stream, even if include_usage is False + # This causes an unexpected final chunk with empty choices array to be sent + # to clients that may not handle it gracefully. + include_usage = False + if params.get("stream_options", None): + include_usage = params["stream_options"].get("include_usage", False) + stream = await self._get_openai_client().chat.completions.create(**params) + + seen_finish_reason = False + async for chunk in stream: + # Final usage chunk with no choices that the user didn't request, so discard + if not include_usage and seen_finish_reason and len(chunk.choices) == 0: + break + yield chunk + for choice in chunk.choices: + if choice.finish_reason: + seen_finish_reason = True + break diff --git a/llama_stack/providers/remote/inference/vllm/vllm.py b/llama_stack/providers/remote/inference/vllm/vllm.py index 0044d2e75..2b9eae1e9 100644 --- a/llama_stack/providers/remote/inference/vllm/vllm.py +++ b/llama_stack/providers/remote/inference/vllm/vllm.py @@ -5,7 +5,7 @@ # the root directory of this source tree. import json import logging -from typing import Any, AsyncGenerator, Dict, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Dict, List, Optional, Union import httpx from openai import AsyncOpenAI @@ -45,7 +45,12 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.apis.models import Model, ModelType from llama_stack.models.llama.datatypes import BuiltinTool, StopReason, ToolCall from llama_stack.models.llama.sku_list import all_registered_models @@ -487,7 +492,7 @@ class VLLMInferenceAdapter(Inference, ModelsProtocolPrivate): n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -498,7 +503,7 @@ class VLLMInferenceAdapter(Inference, ModelsProtocolPrivate): top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: model_obj = await self._get_model(model) params = await prepare_openai_completion_params( model=model_obj.provider_resource_id, diff --git a/llama_stack/providers/utils/inference/litellm_openai_mixin.py b/llama_stack/providers/utils/inference/litellm_openai_mixin.py index cd0f4ec67..efe7031f5 100644 --- a/llama_stack/providers/utils/inference/litellm_openai_mixin.py +++ b/llama_stack/providers/utils/inference/litellm_openai_mixin.py @@ -30,7 +30,13 @@ from llama_stack.apis.inference import ( ToolDefinition, ToolPromptFormat, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAIMessageParam +from llama_stack.apis.inference.inference import ( + OpenAIChatCompletion, + OpenAIChatCompletionChunk, + OpenAICompletion, + OpenAIMessageParam, + OpenAIResponseFormatParam, +) from llama_stack.apis.models.models import Model from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.log import get_logger @@ -270,7 +276,7 @@ class LiteLLMOpenAIMixin( guided_choice: Optional[List[str]] = None, prompt_logprobs: Optional[int] = None, ) -> OpenAICompletion: - model_obj = await self._get_model(model) + model_obj = await self.model_store.get_model(model) params = await prepare_openai_completion_params( model=model_obj.provider_resource_id, prompt=prompt, @@ -292,7 +298,7 @@ class LiteLLMOpenAIMixin( guided_choice=guided_choice, prompt_logprobs=prompt_logprobs, ) - return litellm.text_completion(**params) + return await litellm.atext_completion(**params) async def openai_chat_completion( self, @@ -308,7 +314,7 @@ class LiteLLMOpenAIMixin( n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -319,8 +325,8 @@ class LiteLLMOpenAIMixin( top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, - ) -> OpenAIChatCompletion: - model_obj = await self._get_model(model) + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: + model_obj = await self.model_store.get_model(model) params = await prepare_openai_completion_params( model=model_obj.provider_resource_id, messages=messages, @@ -346,7 +352,7 @@ class LiteLLMOpenAIMixin( top_p=top_p, user=user, ) - return litellm.completion(**params) + return await litellm.acompletion(**params) async def batch_completion( self, diff --git a/llama_stack/providers/utils/inference/openai_compat.py b/llama_stack/providers/utils/inference/openai_compat.py index f33cb4443..d98261abb 100644 --- a/llama_stack/providers/utils/inference/openai_compat.py +++ b/llama_stack/providers/utils/inference/openai_compat.py @@ -8,7 +8,7 @@ import logging import time import uuid import warnings -from typing import Any, AsyncGenerator, Dict, Iterable, List, Optional, Union +from typing import Any, AsyncGenerator, AsyncIterator, Awaitable, Dict, Iterable, List, Optional, Union from openai import AsyncStream from openai.types.chat import ( @@ -50,6 +50,18 @@ from openai.types.chat.chat_completion import ( from openai.types.chat.chat_completion import ( ChoiceLogprobs as OpenAIChoiceLogprobs, # same as chat_completion_chunk ChoiceLogprobs ) +from openai.types.chat.chat_completion_chunk import ( + Choice as OpenAIChatCompletionChunkChoice, +) +from openai.types.chat.chat_completion_chunk import ( + ChoiceDelta as OpenAIChoiceDelta, +) +from openai.types.chat.chat_completion_chunk import ( + ChoiceDeltaToolCall as OpenAIChoiceDeltaToolCall, +) +from openai.types.chat.chat_completion_chunk import ( + ChoiceDeltaToolCallFunction as OpenAIChoiceDeltaToolCallFunction, +) from openai.types.chat.chat_completion_content_part_image_param import ( ImageURL as OpenAIImageURL, ) @@ -59,6 +71,7 @@ from openai.types.chat.chat_completion_message_tool_call_param import ( from pydantic import BaseModel from llama_stack.apis.common.content_types import ( + URL, ImageContentItem, InterleavedContent, TextContentItem, @@ -85,12 +98,24 @@ from llama_stack.apis.inference import ( TopPSamplingStrategy, UserMessage, ) -from llama_stack.apis.inference.inference import OpenAIChatCompletion, OpenAICompletion, OpenAICompletionChoice +from llama_stack.apis.inference.inference import ( + JsonSchemaResponseFormat, + OpenAIChatCompletion, + OpenAICompletion, + OpenAICompletionChoice, + OpenAIMessageParam, + OpenAIResponseFormatParam, + ToolConfig, +) +from llama_stack.apis.inference.inference import ( + OpenAIChoice as OpenAIChatCompletionChoice, +) from llama_stack.models.llama.datatypes import ( BuiltinTool, StopReason, ToolCall, ToolDefinition, + ToolParamDefinition, ) from llama_stack.providers.utils.inference.prompt_adapter import ( convert_image_content_to_url, @@ -751,6 +776,17 @@ def convert_tooldef_to_openai_tool(tool: ToolDefinition) -> dict: return out +def _convert_stop_reason_to_openai_finish_reason(stop_reason: StopReason) -> str: + """ + Convert a StopReason to an OpenAI chat completion finish_reason. + """ + return { + StopReason.end_of_turn: "stop", + StopReason.end_of_message: "tool_calls", + StopReason.out_of_tokens: "length", + }.get(stop_reason, "stop") + + def _convert_openai_finish_reason(finish_reason: str) -> StopReason: """ Convert an OpenAI chat completion finish_reason to a StopReason. @@ -776,6 +812,56 @@ def _convert_openai_finish_reason(finish_reason: str) -> StopReason: }.get(finish_reason, StopReason.end_of_turn) +def _convert_openai_request_tool_config(tool_choice: Optional[Union[str, Dict[str, Any]]] = None) -> ToolConfig: + tool_config = ToolConfig() + if tool_choice: + tool_config.tool_choice = tool_choice + return tool_config + + +def _convert_openai_request_tools(tools: Optional[List[Dict[str, Any]]] = None) -> List[ToolDefinition]: + lls_tools = [] + if not tools: + return lls_tools + + for tool in tools: + tool_fn = tool.get("function", {}) + tool_name = tool_fn.get("name", None) + tool_desc = tool_fn.get("description", None) + + tool_params = tool_fn.get("parameters", None) + lls_tool_params = {} + if tool_params is not None: + tool_param_properties = tool_params.get("properties", {}) + for tool_param_key, tool_param_value in tool_param_properties.items(): + tool_param_def = ToolParamDefinition( + param_type=tool_param_value.get("type", None), + description=tool_param_value.get("description", None), + ) + lls_tool_params[tool_param_key] = tool_param_def + + lls_tool = ToolDefinition( + tool_name=tool_name, + description=tool_desc, + parameters=lls_tool_params, + ) + lls_tools.append(lls_tool) + return lls_tools + + +def _convert_openai_request_response_format(response_format: OpenAIResponseFormatParam = None): + if not response_format: + return None + # response_format can be a dict or a pydantic model + response_format = dict(response_format) + if response_format.get("type", "") == "json_schema": + return JsonSchemaResponseFormat( + type="json_schema", + json_schema=response_format.get("json_schema", {}).get("schema", ""), + ) + return None + + def _convert_openai_tool_calls( tool_calls: List[OpenAIChatCompletionMessageToolCall], ) -> List[ToolCall]: @@ -871,6 +957,40 @@ def _convert_openai_sampling_params( return sampling_params +def _convert_openai_request_messages(messages: List[OpenAIMessageParam]): + # Llama Stack messages and OpenAI messages are similar, but not identical. + lls_messages = [] + for message in messages: + lls_message = dict(message) + + # Llama Stack expects `call_id` but OpenAI uses `tool_call_id` + tool_call_id = lls_message.pop("tool_call_id", None) + if tool_call_id: + lls_message["call_id"] = tool_call_id + + content = lls_message.get("content", None) + if isinstance(content, list): + lls_content = [] + for item in content: + # items can either by pydantic models or dicts here... + item = dict(item) + if item.get("type", "") == "image_url": + lls_item = ImageContentItem( + type="image", + image=URL(uri=item.get("image_url", {}).get("url", "")), + ) + elif item.get("type", "") == "text": + lls_item = TextContentItem( + type="text", + text=item.get("text", ""), + ) + lls_content.append(lls_item) + lls_message["content"] = lls_content + lls_messages.append(lls_message) + + return lls_messages + + def convert_openai_chat_completion_choice( choice: OpenAIChoice, ) -> ChatCompletionResponse: @@ -1080,11 +1200,24 @@ async def convert_openai_chat_completion_stream( async def prepare_openai_completion_params(**params): - completion_params = {k: v for k, v in params.items() if v is not None} + async def _prepare_value(value: Any) -> Any: + new_value = value + if isinstance(value, list): + new_value = [await _prepare_value(v) for v in value] + elif isinstance(value, dict): + new_value = {k: await _prepare_value(v) for k, v in value.items()} + elif isinstance(value, BaseModel): + new_value = value.model_dump(exclude_none=True) + return new_value + + completion_params = {} + for k, v in params.items(): + if v is not None: + completion_params[k] = await _prepare_value(v) return completion_params -class OpenAICompletionUnsupportedMixin: +class OpenAICompletionToLlamaStackMixin: async def openai_completion( self, model: str, @@ -1122,6 +1255,7 @@ class OpenAICompletionUnsupportedMixin: choices = [] # "n" is the number of completions to generate per prompt + n = n or 1 for _i in range(0, n): # and we may have multiple prompts, if batching was used @@ -1134,7 +1268,7 @@ class OpenAICompletionUnsupportedMixin: index = len(choices) text = result.content - finish_reason = _convert_openai_finish_reason(result.stop_reason) + finish_reason = _convert_stop_reason_to_openai_finish_reason(result.stop_reason) choice = OpenAICompletionChoice( index=index, @@ -1152,7 +1286,7 @@ class OpenAICompletionUnsupportedMixin: ) -class OpenAIChatCompletionUnsupportedMixin: +class OpenAIChatCompletionToLlamaStackMixin: async def openai_chat_completion( self, model: str, @@ -1167,7 +1301,7 @@ class OpenAIChatCompletionUnsupportedMixin: n: Optional[int] = None, parallel_tool_calls: Optional[bool] = None, presence_penalty: Optional[float] = None, - response_format: Optional[Dict[str, str]] = None, + response_format: Optional[OpenAIResponseFormatParam] = None, seed: Optional[int] = None, stop: Optional[Union[str, List[str]]] = None, stream: Optional[bool] = None, @@ -1178,5 +1312,103 @@ class OpenAIChatCompletionUnsupportedMixin: top_logprobs: Optional[int] = None, top_p: Optional[float] = None, user: Optional[str] = None, + ) -> Union[OpenAIChatCompletion, AsyncIterator[OpenAIChatCompletionChunk]]: + messages = _convert_openai_request_messages(messages) + response_format = _convert_openai_request_response_format(response_format) + sampling_params = _convert_openai_sampling_params( + max_tokens=max_tokens, + temperature=temperature, + top_p=top_p, + ) + tool_config = _convert_openai_request_tool_config(tool_choice) + tools = _convert_openai_request_tools(tools) + + outstanding_responses = [] + # "n" is the number of completions to generate per prompt + n = n or 1 + for _i in range(0, n): + response = self.chat_completion( + model_id=model, + messages=messages, + sampling_params=sampling_params, + response_format=response_format, + stream=stream, + tool_config=tool_config, + tools=tools, + ) + outstanding_responses.append(response) + + if stream: + return OpenAIChatCompletionToLlamaStackMixin._process_stream_response(self, model, outstanding_responses) + + return await OpenAIChatCompletionToLlamaStackMixin._process_non_stream_response( + self, model, outstanding_responses + ) + + async def _process_stream_response( + self, model: str, outstanding_responses: List[Awaitable[AsyncIterator[ChatCompletionResponseStreamChunk]]] + ): + id = f"chatcmpl-{uuid.uuid4()}" + for outstanding_response in outstanding_responses: + response = await outstanding_response + i = 0 + async for chunk in response: + event = chunk.event + finish_reason = _convert_stop_reason_to_openai_finish_reason(event.stop_reason) + + if isinstance(event.delta, TextDelta): + text_delta = event.delta.text + delta = OpenAIChoiceDelta(content=text_delta) + yield OpenAIChatCompletionChunk( + id=id, + choices=[OpenAIChatCompletionChunkChoice(index=i, finish_reason=finish_reason, delta=delta)], + created=int(time.time()), + model=model, + object="chat.completion.chunk", + ) + elif isinstance(event.delta, ToolCallDelta): + if event.delta.parse_status == ToolCallParseStatus.succeeded: + tool_call = event.delta.tool_call + openai_tool_call = OpenAIChoiceDeltaToolCall( + index=0, + id=tool_call.call_id, + function=OpenAIChoiceDeltaToolCallFunction( + name=tool_call.tool_name, arguments=tool_call.arguments_json + ), + ) + delta = OpenAIChoiceDelta(tool_calls=[openai_tool_call]) + yield OpenAIChatCompletionChunk( + id=id, + choices=[ + OpenAIChatCompletionChunkChoice(index=i, finish_reason=finish_reason, delta=delta) + ], + created=int(time.time()), + model=model, + object="chat.completion.chunk", + ) + i = i + 1 + + async def _process_non_stream_response( + self, model: str, outstanding_responses: List[Awaitable[ChatCompletionResponse]] ) -> OpenAIChatCompletion: - raise ValueError(f"{self.__class__.__name__} doesn't support openai chat completion") + choices = [] + for outstanding_response in outstanding_responses: + response = await outstanding_response + completion_message = response.completion_message + message = await convert_message_to_openai_dict_new(completion_message) + finish_reason = _convert_stop_reason_to_openai_finish_reason(completion_message.stop_reason) + + choice = OpenAIChatCompletionChoice( + index=len(choices), + message=message, + finish_reason=finish_reason, + ) + choices.append(choice) + + return OpenAIChatCompletion( + id=f"chatcmpl-{uuid.uuid4()}", + choices=choices, + created=int(time.time()), + model=model, + object="chat.completion", + ) diff --git a/llama_stack/templates/dev/run.yaml b/llama_stack/templates/dev/run.yaml index ea3b7252a..0dd056405 100644 --- a/llama_stack/templates/dev/run.yaml +++ b/llama_stack/templates/dev/run.yaml @@ -386,6 +386,16 @@ models: provider_id: groq provider_model_id: groq/llama-4-scout-17b-16e-instruct model_type: llm +- metadata: {} + model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm +- metadata: {} + model_id: meta-llama/Llama-4-Scout-17B-16E-Instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm - metadata: {} model_id: groq/llama-4-maverick-17b-128e-instruct provider_id: groq @@ -396,6 +406,16 @@ models: provider_id: groq provider_model_id: groq/llama-4-maverick-17b-128e-instruct model_type: llm +- metadata: {} + model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm +- metadata: {} + model_id: meta-llama/Llama-4-Maverick-17B-128E-Instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 diff --git a/llama_stack/templates/groq/run.yaml b/llama_stack/templates/groq/run.yaml index f557e64fd..444452dcb 100644 --- a/llama_stack/templates/groq/run.yaml +++ b/llama_stack/templates/groq/run.yaml @@ -158,6 +158,16 @@ models: provider_id: groq provider_model_id: groq/llama-4-scout-17b-16e-instruct model_type: llm +- metadata: {} + model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm +- metadata: {} + model_id: meta-llama/Llama-4-Scout-17B-16E-Instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm - metadata: {} model_id: groq/llama-4-maverick-17b-128e-instruct provider_id: groq @@ -168,6 +178,16 @@ models: provider_id: groq provider_model_id: groq/llama-4-maverick-17b-128e-instruct model_type: llm +- metadata: {} + model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm +- metadata: {} + model_id: meta-llama/Llama-4-Maverick-17B-128E-Instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 diff --git a/llama_stack/templates/verification/run.yaml b/llama_stack/templates/verification/run.yaml index b6c2ca98d..454ecba5b 100644 --- a/llama_stack/templates/verification/run.yaml +++ b/llama_stack/templates/verification/run.yaml @@ -474,6 +474,16 @@ models: provider_id: groq-openai-compat provider_model_id: groq/llama-4-scout-17b-16e-instruct model_type: llm +- metadata: {} + model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + provider_id: groq-openai-compat + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm +- metadata: {} + model_id: meta-llama/Llama-4-Scout-17B-16E-Instruct + provider_id: groq-openai-compat + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm - metadata: {} model_id: groq/llama-4-maverick-17b-128e-instruct provider_id: groq-openai-compat @@ -484,6 +494,16 @@ models: provider_id: groq-openai-compat provider_model_id: groq/llama-4-maverick-17b-128e-instruct model_type: llm +- metadata: {} + model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + provider_id: groq-openai-compat + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm +- metadata: {} + model_id: meta-llama/Llama-4-Maverick-17B-128E-Instruct + provider_id: groq-openai-compat + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm - metadata: {} model_id: Meta-Llama-3.1-8B-Instruct provider_id: sambanova-openai-compat diff --git a/tests/integration/inference/test_openai_completion.py b/tests/integration/inference/test_openai_completion.py index 0905d5817..75b53100c 100644 --- a/tests/integration/inference/test_openai_completion.py +++ b/tests/integration/inference/test_openai_completion.py @@ -115,7 +115,7 @@ def test_openai_completion_streaming(openai_client, client_with_models, text_mod stream=True, max_tokens=50, ) - streamed_content = [chunk.choices[0].text for chunk in response] + streamed_content = [chunk.choices[0].text or "" for chunk in response] content_str = "".join(streamed_content).lower().strip() assert len(content_str) > 10 diff --git a/tests/verifications/conf/fireworks-llama-stack.yaml b/tests/verifications/conf/fireworks-llama-stack.yaml new file mode 100644 index 000000000..d91443dd9 --- /dev/null +++ b/tests/verifications/conf/fireworks-llama-stack.yaml @@ -0,0 +1,14 @@ +base_url: http://localhost:8321/v1/openai/v1 +api_key_var: FIREWORKS_API_KEY +models: +- fireworks/llama-v3p3-70b-instruct +- fireworks/llama4-scout-instruct-basic +- fireworks/llama4-maverick-instruct-basic +model_display_names: + fireworks/llama-v3p3-70b-instruct: Llama-3.3-70B-Instruct + fireworks/llama4-scout-instruct-basic: Llama-4-Scout-Instruct + fireworks/llama4-maverick-instruct-basic: Llama-4-Maverick-Instruct +test_exclusions: + fireworks/llama-v3p3-70b-instruct: + - test_chat_non_streaming_image + - test_chat_streaming_image diff --git a/tests/verifications/conf/groq-llama-stack.yaml b/tests/verifications/conf/groq-llama-stack.yaml new file mode 100644 index 000000000..fd5e9abec --- /dev/null +++ b/tests/verifications/conf/groq-llama-stack.yaml @@ -0,0 +1,14 @@ +base_url: http://localhost:8321/v1/openai/v1 +api_key_var: GROQ_API_KEY +models: +- groq/llama-3.3-70b-versatile +- groq/llama-4-scout-17b-16e-instruct +- groq/llama-4-maverick-17b-128e-instruct +model_display_names: + groq/llama-3.3-70b-versatile: Llama-3.3-70B-Instruct + groq/llama-4-scout-17b-16e-instruct: Llama-4-Scout-Instruct + groq/llama-4-maverick-17b-128e-instruct: Llama-4-Maverick-Instruct +test_exclusions: + groq/llama-3.3-70b-versatile: + - test_chat_non_streaming_image + - test_chat_streaming_image diff --git a/tests/verifications/conf/groq.yaml b/tests/verifications/conf/groq.yaml index 7871036dc..76b1244ae 100644 --- a/tests/verifications/conf/groq.yaml +++ b/tests/verifications/conf/groq.yaml @@ -2,12 +2,12 @@ base_url: https://api.groq.com/openai/v1 api_key_var: GROQ_API_KEY models: - llama-3.3-70b-versatile -- llama-4-scout-17b-16e-instruct -- llama-4-maverick-17b-128e-instruct +- meta-llama/llama-4-scout-17b-16e-instruct +- meta-llama/llama-4-maverick-17b-128e-instruct model_display_names: llama-3.3-70b-versatile: Llama-3.3-70B-Instruct - llama-4-scout-17b-16e-instruct: Llama-4-Scout-Instruct - llama-4-maverick-17b-128e-instruct: Llama-4-Maverick-Instruct + meta-llama/llama-4-scout-17b-16e-instruct: Llama-4-Scout-Instruct + meta-llama/llama-4-maverick-17b-128e-instruct: Llama-4-Maverick-Instruct test_exclusions: llama-3.3-70b-versatile: - test_chat_non_streaming_image diff --git a/tests/verifications/conf/openai-llama-stack.yaml b/tests/verifications/conf/openai-llama-stack.yaml new file mode 100644 index 000000000..de35439ae --- /dev/null +++ b/tests/verifications/conf/openai-llama-stack.yaml @@ -0,0 +1,9 @@ +base_url: http://localhost:8321/v1/openai/v1 +api_key_var: OPENAI_API_KEY +models: +- openai/gpt-4o +- openai/gpt-4o-mini +model_display_names: + openai/gpt-4o: gpt-4o + openai/gpt-4o-mini: gpt-4o-mini +test_exclusions: {} diff --git a/tests/verifications/conf/together-llama-stack.yaml b/tests/verifications/conf/together-llama-stack.yaml new file mode 100644 index 000000000..e49d82604 --- /dev/null +++ b/tests/verifications/conf/together-llama-stack.yaml @@ -0,0 +1,14 @@ +base_url: http://localhost:8321/v1/openai/v1 +api_key_var: TOGETHER_API_KEY +models: +- together/meta-llama/Llama-3.3-70B-Instruct-Turbo +- together/meta-llama/Llama-4-Scout-17B-16E-Instruct +- together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 +model_display_names: + together/meta-llama/Llama-3.3-70B-Instruct-Turbo: Llama-3.3-70B-Instruct + together/meta-llama/Llama-4-Scout-17B-16E-Instruct: Llama-4-Scout-Instruct + together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8: Llama-4-Maverick-Instruct +test_exclusions: + together/meta-llama/Llama-3.3-70B-Instruct-Turbo: + - test_chat_non_streaming_image + - test_chat_streaming_image diff --git a/tests/verifications/generate_report.py b/tests/verifications/generate_report.py index 6a7c39ee2..b39c3fd19 100755 --- a/tests/verifications/generate_report.py +++ b/tests/verifications/generate_report.py @@ -67,7 +67,17 @@ RESULTS_DIR.mkdir(exist_ok=True) # Maximum number of test result files to keep per provider MAX_RESULTS_PER_PROVIDER = 1 -PROVIDER_ORDER = ["together", "fireworks", "groq", "cerebras", "openai"] +PROVIDER_ORDER = [ + "together", + "fireworks", + "groq", + "cerebras", + "openai", + "together-llama-stack", + "fireworks-llama-stack", + "groq-llama-stack", + "openai-llama-stack", +] VERIFICATION_CONFIG = _load_all_verification_configs() diff --git a/tests/verifications/openai-api-verification-run.yaml b/tests/verifications/openai-api-verification-run.yaml new file mode 100644 index 000000000..71885d058 --- /dev/null +++ b/tests/verifications/openai-api-verification-run.yaml @@ -0,0 +1,146 @@ +version: '2' +image_name: openai-api-verification +apis: +- inference +- telemetry +- tool_runtime +- vector_io +providers: + inference: + - provider_id: together + provider_type: remote::together + config: + url: https://api.together.xyz/v1 + api_key: ${env.TOGETHER_API_KEY:} + - provider_id: fireworks + provider_type: remote::fireworks + config: + url: https://api.fireworks.ai/inference/v1 + api_key: ${env.FIREWORKS_API_KEY} + - provider_id: groq + provider_type: remote::groq + config: + url: https://api.groq.com + api_key: ${env.GROQ_API_KEY} + - provider_id: openai + provider_type: remote::openai + config: + url: https://api.openai.com/v1 + api_key: ${env.OPENAI_API_KEY:} + - provider_id: sentence-transformers + provider_type: inline::sentence-transformers + config: {} + vector_io: + - provider_id: faiss + provider_type: inline::faiss + config: + kvstore: + type: sqlite + namespace: null + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/openai}/faiss_store.db + telemetry: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + service_name: "${env.OTEL_SERVICE_NAME:\u200B}" + sinks: ${env.TELEMETRY_SINKS:console,sqlite} + sqlite_db_path: ${env.SQLITE_DB_PATH:~/.llama/distributions/openai/trace_store.db} + tool_runtime: + - provider_id: brave-search + provider_type: remote::brave-search + config: + api_key: ${env.BRAVE_SEARCH_API_KEY:} + max_results: 3 + - provider_id: tavily-search + provider_type: remote::tavily-search + config: + api_key: ${env.TAVILY_SEARCH_API_KEY:} + max_results: 3 + - provider_id: code-interpreter + provider_type: inline::code-interpreter + config: {} + - provider_id: rag-runtime + provider_type: inline::rag-runtime + config: {} + - provider_id: model-context-protocol + provider_type: remote::model-context-protocol + config: {} + - provider_id: wolfram-alpha + provider_type: remote::wolfram-alpha + config: + api_key: ${env.WOLFRAM_ALPHA_API_KEY:} +metadata_store: + type: sqlite + db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/openai}/registry.db +models: +- metadata: {} + model_id: together/meta-llama/Llama-3.3-70B-Instruct-Turbo + provider_id: together + provider_model_id: meta-llama/Llama-3.3-70B-Instruct-Turbo + model_type: llm +- metadata: {} + model_id: together/meta-llama/Llama-4-Scout-17B-16E-Instruct + provider_id: together + provider_model_id: meta-llama/Llama-4-Scout-17B-16E-Instruct + model_type: llm +- metadata: {} + model_id: together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 + provider_id: together + provider_model_id: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 + model_type: llm +- metadata: {} + model_id: fireworks/llama-v3p3-70b-instruct + provider_id: fireworks + provider_model_id: accounts/fireworks/models/llama-v3p3-70b-instruct + model_type: llm +- metadata: {} + model_id: fireworks/llama4-scout-instruct-basic + provider_id: fireworks + provider_model_id: accounts/fireworks/models/llama4-scout-instruct-basic + model_type: llm +- metadata: {} + model_id: fireworks/llama4-maverick-instruct-basic + provider_id: fireworks + provider_model_id: accounts/fireworks/models/llama4-maverick-instruct-basic + model_type: llm +- metadata: {} + model_id: groq/llama-3.3-70b-versatile + provider_id: groq + provider_model_id: groq/llama-3.3-70b-versatile + model_type: llm +- metadata: {} + model_id: groq/llama-4-scout-17b-16e-instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct + model_type: llm +- metadata: {} + model_id: groq/llama-4-maverick-17b-128e-instruct + provider_id: groq + provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct + model_type: llm +- metadata: {} + model_id: openai/gpt-4o + provider_id: openai + provider_model_id: openai/gpt-4o + model_type: llm +- metadata: {} + model_id: openai/gpt-4o-mini + provider_id: openai + provider_model_id: openai/gpt-4o-mini + model_type: llm +shields: [] +vector_dbs: [] +datasets: [] +scoring_fns: [] +benchmarks: [] +tool_groups: +- toolgroup_id: builtin::websearch + provider_id: tavily-search +- toolgroup_id: builtin::rag + provider_id: rag-runtime +- toolgroup_id: builtin::code_interpreter + provider_id: code-interpreter +- toolgroup_id: builtin::wolfram_alpha + provider_id: wolfram-alpha +server: + port: 8321 diff --git a/tests/verifications/openai_api/fixtures/fixtures.py b/tests/verifications/openai_api/fixtures/fixtures.py index 4f8c2e017..940b99b2a 100644 --- a/tests/verifications/openai_api/fixtures/fixtures.py +++ b/tests/verifications/openai_api/fixtures/fixtures.py @@ -99,6 +99,9 @@ def model_mapping(provider, providers_model_mapping): @pytest.fixture def openai_client(base_url, api_key): + # Simplify running against a local Llama Stack + if "localhost" in base_url and not api_key: + api_key = "empty" return OpenAI( base_url=base_url, api_key=api_key, From 3ed4316ed55816472a4c207e3805c008d258170b Mon Sep 17 00:00:00 2001 From: Ihar Hrachyshka Date: Mon, 14 Apr 2025 11:59:11 -0400 Subject: [PATCH 36/39] feat: Implement async job execution for torchtune training (#1437) # What does this PR do? Now a separate thread is started to execute training jobs. Training requests now return job ID before the job completes. (Which fixes API timeouts for any jobs that take longer than a minute.) Note: the scheduler code is meant to be spun out in the future into a common provider service that can be reused for different APIs and providers. It is also expected to back the /jobs API proposed here: https://github.com/meta-llama/llama-stack/discussions/1238 Hence its somewhat generalized form which is expected to simplify its adoption elsewhere in the future. Note: this patch doesn't attempt to implement missing APIs (e.g. cancel or job removal). This work will belong to follow-up PRs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] Added unit tests for the scheduler module. For the API coverage, did manual testing and was able to run a training cycle on GPU. The initial call returned job ID before the training completed, as (now) expected. Artifacts are returned as expected. ``` JobArtifactsResponse(checkpoints=[{'identifier': 'meta-llama/Llama-3.2-3B-Instruct-sft-0', 'created_at': '2025-03-07T22:45:19.892714', 'epoch': 0, 'post_training_job_id': 'test-job2ee77104-2fd3-4a4e-84cf-f83f8b8f1f50', 'path': '/home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0', 'training_metrics': None}], job_uuid='test-job2ee77104-2fd3-4a4e-84cf-f83f8b8f1f50') ``` The integration test is currently disabled for the provider. I will look into how it can be enabled in a different PR / issue context. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka --- .../post_training/torchtune/post_training.py | 126 ++++++--- llama_stack/providers/utils/scheduler.py | 265 ++++++++++++++++++ tests/unit/providers/utils/test_scheduler.py | 120 ++++++++ 3 files changed, 472 insertions(+), 39 deletions(-) create mode 100644 llama_stack/providers/utils/scheduler.py create mode 100644 tests/unit/providers/utils/test_scheduler.py diff --git a/llama_stack/providers/inline/post_training/torchtune/post_training.py b/llama_stack/providers/inline/post_training/torchtune/post_training.py index 2c129ef41..cc1a6a5fe 100644 --- a/llama_stack/providers/inline/post_training/torchtune/post_training.py +++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py @@ -3,13 +3,14 @@ # # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. -from datetime import datetime, timezone +from enum import Enum from typing import Any, Dict, Optional from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasets import Datasets from llama_stack.apis.post_training import ( AlgorithmConfig, + Checkpoint, DPOAlignmentConfig, JobStatus, ListPostTrainingJobsResponse, @@ -25,9 +26,19 @@ from llama_stack.providers.inline.post_training.torchtune.config import ( from llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device import ( LoraFinetuningSingleDevice, ) +from llama_stack.providers.utils.scheduler import JobArtifact, Scheduler +from llama_stack.providers.utils.scheduler import JobStatus as SchedulerJobStatus from llama_stack.schema_utils import webmethod +class TrainingArtifactType(Enum): + CHECKPOINT = "checkpoint" + RESOURCES_STATS = "resources_stats" + + +_JOB_TYPE_SUPERVISED_FINE_TUNE = "supervised-fine-tune" + + class TorchtunePostTrainingImpl: def __init__( self, @@ -38,13 +49,27 @@ class TorchtunePostTrainingImpl: self.config = config self.datasetio_api = datasetio_api self.datasets_api = datasets + self._scheduler = Scheduler() - # TODO: assume sync job, will need jobs API for async scheduling - self.jobs = {} - self.checkpoints_dict = {} + async def shutdown(self) -> None: + await self._scheduler.shutdown() - async def shutdown(self): - pass + @staticmethod + def _checkpoint_to_artifact(checkpoint: Checkpoint) -> JobArtifact: + return JobArtifact( + type=TrainingArtifactType.CHECKPOINT.value, + name=checkpoint.identifier, + uri=checkpoint.path, + metadata=dict(checkpoint), + ) + + @staticmethod + def _resources_stats_to_artifact(resources_stats: Dict[str, Any]) -> JobArtifact: + return JobArtifact( + type=TrainingArtifactType.RESOURCES_STATS.value, + name=TrainingArtifactType.RESOURCES_STATS.value, + metadata=resources_stats, + ) async def supervised_fine_tune( self, @@ -56,20 +81,11 @@ class TorchtunePostTrainingImpl: checkpoint_dir: Optional[str], algorithm_config: Optional[AlgorithmConfig], ) -> PostTrainingJob: - if job_uuid in self.jobs: - raise ValueError(f"Job {job_uuid} already exists") - - post_training_job = PostTrainingJob(job_uuid=job_uuid) - - job_status_response = PostTrainingJobStatusResponse( - job_uuid=job_uuid, - status=JobStatus.scheduled, - scheduled_at=datetime.now(timezone.utc), - ) - self.jobs[job_uuid] = job_status_response - if isinstance(algorithm_config, LoraFinetuningConfig): - try: + + async def handler(on_log_message_cb, on_status_change_cb, on_artifact_collected_cb): + on_log_message_cb("Starting Lora finetuning") + recipe = LoraFinetuningSingleDevice( self.config, job_uuid, @@ -82,26 +98,22 @@ class TorchtunePostTrainingImpl: self.datasetio_api, self.datasets_api, ) - - job_status_response.status = JobStatus.in_progress - job_status_response.started_at = datetime.now(timezone.utc) - await recipe.setup() + resources_allocated, checkpoints = await recipe.train() - self.checkpoints_dict[job_uuid] = checkpoints - job_status_response.resources_allocated = resources_allocated - job_status_response.checkpoints = checkpoints - job_status_response.status = JobStatus.completed - job_status_response.completed_at = datetime.now(timezone.utc) + on_artifact_collected_cb(self._resources_stats_to_artifact(resources_allocated)) + for checkpoint in checkpoints: + artifact = self._checkpoint_to_artifact(checkpoint) + on_artifact_collected_cb(artifact) - except Exception: - job_status_response.status = JobStatus.failed - raise + on_status_change_cb(SchedulerJobStatus.completed) + on_log_message_cb("Lora finetuning completed") else: raise NotImplementedError() - return post_training_job + job_uuid = self._scheduler.schedule(_JOB_TYPE_SUPERVISED_FINE_TUNE, job_uuid, handler) + return PostTrainingJob(job_uuid=job_uuid) async def preference_optimize( self, @@ -114,19 +126,55 @@ class TorchtunePostTrainingImpl: ) -> PostTrainingJob: ... async def get_training_jobs(self) -> ListPostTrainingJobsResponse: - return ListPostTrainingJobsResponse(data=[PostTrainingJob(job_uuid=uuid_) for uuid_ in self.jobs]) + return ListPostTrainingJobsResponse( + data=[PostTrainingJob(job_uuid=job.id) for job in self._scheduler.get_jobs()] + ) + + @staticmethod + def _get_artifacts_metadata_by_type(job, artifact_type): + return [artifact.metadata for artifact in job.artifacts if artifact.type == artifact_type] + + @classmethod + def _get_checkpoints(cls, job): + return cls._get_artifacts_metadata_by_type(job, TrainingArtifactType.CHECKPOINT.value) + + @classmethod + def _get_resources_allocated(cls, job): + data = cls._get_artifacts_metadata_by_type(job, TrainingArtifactType.RESOURCES_STATS.value) + return data[0] if data else None @webmethod(route="/post-training/job/status") async def get_training_job_status(self, job_uuid: str) -> Optional[PostTrainingJobStatusResponse]: - return self.jobs.get(job_uuid, None) + job = self._scheduler.get_job(job_uuid) + + match job.status: + # TODO: Add support for other statuses to API + case SchedulerJobStatus.new | SchedulerJobStatus.scheduled: + status = JobStatus.scheduled + case SchedulerJobStatus.running: + status = JobStatus.in_progress + case SchedulerJobStatus.completed: + status = JobStatus.completed + case SchedulerJobStatus.failed: + status = JobStatus.failed + case _: + raise NotImplementedError() + + return PostTrainingJobStatusResponse( + job_uuid=job_uuid, + status=status, + scheduled_at=job.scheduled_at, + started_at=job.started_at, + completed_at=job.completed_at, + checkpoints=self._get_checkpoints(job), + resources_allocated=self._get_resources_allocated(job), + ) @webmethod(route="/post-training/job/cancel") async def cancel_training_job(self, job_uuid: str) -> None: - raise NotImplementedError("Job cancel is not implemented yet") + self._scheduler.cancel(job_uuid) @webmethod(route="/post-training/job/artifacts") async def get_training_job_artifacts(self, job_uuid: str) -> Optional[PostTrainingJobArtifactsResponse]: - if job_uuid in self.checkpoints_dict: - checkpoints = self.checkpoints_dict.get(job_uuid, []) - return PostTrainingJobArtifactsResponse(job_uuid=job_uuid, checkpoints=checkpoints) - return None + job = self._scheduler.get_job(job_uuid) + return PostTrainingJobArtifactsResponse(job_uuid=job_uuid, checkpoints=self._get_checkpoints(job)) diff --git a/llama_stack/providers/utils/scheduler.py b/llama_stack/providers/utils/scheduler.py new file mode 100644 index 000000000..d4cffe605 --- /dev/null +++ b/llama_stack/providers/utils/scheduler.py @@ -0,0 +1,265 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +import abc +import asyncio +import functools +import threading +from datetime import datetime, timezone +from enum import Enum +from typing import Any, Callable, Coroutine, Dict, Iterable, Tuple, TypeAlias + +from pydantic import BaseModel + +from llama_stack.log import get_logger + +logger = get_logger(name=__name__, category="scheduler") + + +# TODO: revisit the list of possible statuses when defining a more coherent +# Jobs API for all API flows; e.g. do we need new vs scheduled? +class JobStatus(Enum): + new = "new" + scheduled = "scheduled" + running = "running" + failed = "failed" + completed = "completed" + + +JobID: TypeAlias = str +JobType: TypeAlias = str + + +class JobArtifact(BaseModel): + type: JobType + name: str + # TODO: uri should be a reference to /files API; revisit when /files is implemented + uri: str | None = None + metadata: Dict[str, Any] + + +JobHandler = Callable[ + [Callable[[str], None], Callable[[JobStatus], None], Callable[[JobArtifact], None]], Coroutine[Any, Any, None] +] + + +LogMessage: TypeAlias = Tuple[datetime, str] + + +_COMPLETED_STATUSES = {JobStatus.completed, JobStatus.failed} + + +class Job: + def __init__(self, job_type: JobType, job_id: JobID, handler: JobHandler): + super().__init__() + self.id = job_id + self._type = job_type + self._handler = handler + self._artifacts: list[JobArtifact] = [] + self._logs: list[LogMessage] = [] + self._state_transitions: list[Tuple[datetime, JobStatus]] = [(datetime.now(timezone.utc), JobStatus.new)] + + @property + def handler(self) -> JobHandler: + return self._handler + + @property + def status(self) -> JobStatus: + return self._state_transitions[-1][1] + + @status.setter + def status(self, status: JobStatus): + if status in _COMPLETED_STATUSES and self.status in _COMPLETED_STATUSES: + raise ValueError(f"Job is already in a completed state ({self.status})") + if self.status == status: + return + self._state_transitions.append((datetime.now(timezone.utc), status)) + + @property + def artifacts(self) -> list[JobArtifact]: + return self._artifacts + + def register_artifact(self, artifact: JobArtifact) -> None: + self._artifacts.append(artifact) + + def _find_state_transition_date(self, status: Iterable[JobStatus]) -> datetime | None: + for date, s in reversed(self._state_transitions): + if s in status: + return date + return None + + @property + def scheduled_at(self) -> datetime | None: + return self._find_state_transition_date([JobStatus.scheduled]) + + @property + def started_at(self) -> datetime | None: + return self._find_state_transition_date([JobStatus.running]) + + @property + def completed_at(self) -> datetime | None: + return self._find_state_transition_date(_COMPLETED_STATUSES) + + @property + def logs(self) -> list[LogMessage]: + return self._logs[:] + + def append_log(self, message: LogMessage) -> None: + self._logs.append(message) + + # TODO: implement + def cancel(self) -> None: + raise NotImplementedError + + +class _SchedulerBackend(abc.ABC): + @abc.abstractmethod + def on_log_message_cb(self, job: Job, message: LogMessage) -> None: + raise NotImplementedError + + @abc.abstractmethod + def on_status_change_cb(self, job: Job, status: JobStatus) -> None: + raise NotImplementedError + + @abc.abstractmethod + def on_artifact_collected_cb(self, job: Job, artifact: JobArtifact) -> None: + raise NotImplementedError + + @abc.abstractmethod + async def shutdown(self) -> None: + raise NotImplementedError + + @abc.abstractmethod + def schedule( + self, + job: Job, + on_log_message_cb: Callable[[str], None], + on_status_change_cb: Callable[[JobStatus], None], + on_artifact_collected_cb: Callable[[JobArtifact], None], + ) -> None: + raise NotImplementedError + + +class _NaiveSchedulerBackend(_SchedulerBackend): + def __init__(self, timeout: int = 5): + self._timeout = timeout + self._loop = asyncio.new_event_loop() + # There may be performance implications of using threads due to Python + # GIL; may need to measure if it's a real problem though + self._thread = threading.Thread(target=self._run_loop, daemon=True) + self._thread.start() + + def _run_loop(self) -> None: + asyncio.set_event_loop(self._loop) + self._loop.run_forever() + + # When stopping the loop, give tasks a chance to finish + # TODO: should we explicitly inform jobs of pending stoppage? + for task in asyncio.all_tasks(self._loop): + self._loop.run_until_complete(task) + self._loop.close() + + async def shutdown(self) -> None: + self._loop.call_soon_threadsafe(self._loop.stop) + self._thread.join() + + # TODO: decouple scheduling and running the job + def schedule( + self, + job: Job, + on_log_message_cb: Callable[[str], None], + on_status_change_cb: Callable[[JobStatus], None], + on_artifact_collected_cb: Callable[[JobArtifact], None], + ) -> None: + async def do(): + try: + job.status = JobStatus.running + await job.handler(on_log_message_cb, on_status_change_cb, on_artifact_collected_cb) + except Exception as e: + on_log_message_cb(str(e)) + job.status = JobStatus.failed + logger.exception(f"Job {job.id} failed.") + + asyncio.run_coroutine_threadsafe(do(), self._loop) + + def on_log_message_cb(self, job: Job, message: LogMessage) -> None: + pass + + def on_status_change_cb(self, job: Job, status: JobStatus) -> None: + pass + + def on_artifact_collected_cb(self, job: Job, artifact: JobArtifact) -> None: + pass + + +_BACKENDS = { + "naive": _NaiveSchedulerBackend, +} + + +def _get_backend_impl(backend: str) -> _SchedulerBackend: + try: + return _BACKENDS[backend]() + except KeyError as e: + raise ValueError(f"Unknown backend {backend}") from e + + +class Scheduler: + def __init__(self, backend: str = "naive"): + # TODO: if server crashes, job states are lost; we need to persist jobs on disc + self._jobs: dict[JobID, Job] = {} + self._backend = _get_backend_impl(backend) + + def _on_log_message_cb(self, job: Job, message: str) -> None: + msg = (datetime.now(timezone.utc), message) + # At least for the time being, until there's a better way to expose + # logs to users, log messages on console + logger.info(f"Job {job.id}: {message}") + job.append_log(msg) + self._backend.on_log_message_cb(job, msg) + + def _on_status_change_cb(self, job: Job, status: JobStatus) -> None: + job.status = status + self._backend.on_status_change_cb(job, status) + + def _on_artifact_collected_cb(self, job: Job, artifact: JobArtifact) -> None: + job.register_artifact(artifact) + self._backend.on_artifact_collected_cb(job, artifact) + + def schedule(self, type_: JobType, job_id: JobID, handler: JobHandler) -> JobID: + job = Job(type_, job_id, handler) + if job.id in self._jobs: + raise ValueError(f"Job {job.id} already exists") + + self._jobs[job.id] = job + job.status = JobStatus.scheduled + self._backend.schedule( + job, + functools.partial(self._on_log_message_cb, job), + functools.partial(self._on_status_change_cb, job), + functools.partial(self._on_artifact_collected_cb, job), + ) + + return job.id + + def cancel(self, job_id: JobID) -> None: + self.get_job(job_id).cancel() + + def get_job(self, job_id: JobID) -> Job: + try: + return self._jobs[job_id] + except KeyError as e: + raise ValueError(f"Job {job_id} not found") from e + + def get_jobs(self, type_: JobType | None = None) -> list[Job]: + jobs = list(self._jobs.values()) + if type_: + jobs = [job for job in jobs if job._type == type_] + return jobs + + async def shutdown(self): + # TODO: also cancel jobs once implemented + await self._backend.shutdown() diff --git a/tests/unit/providers/utils/test_scheduler.py b/tests/unit/providers/utils/test_scheduler.py new file mode 100644 index 000000000..76f0da8ce --- /dev/null +++ b/tests/unit/providers/utils/test_scheduler.py @@ -0,0 +1,120 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +import asyncio + +import pytest + +from llama_stack.providers.utils.scheduler import JobStatus, Scheduler + + +@pytest.mark.asyncio +async def test_scheduler_unknown_backend(): + with pytest.raises(ValueError): + Scheduler(backend="unknown") + + +@pytest.mark.asyncio +async def test_scheduler_naive(): + sched = Scheduler() + + # make sure the scheduler starts empty + with pytest.raises(ValueError): + sched.get_job("unknown") + assert sched.get_jobs() == [] + + called = False + + # schedule a job that will exercise the handlers + async def job_handler(on_log, on_status, on_artifact): + nonlocal called + called = True + # exercise the handlers + on_log("test log1") + on_log("test log2") + on_artifact({"type": "type1", "path": "path1"}) + on_artifact({"type": "type2", "path": "path2"}) + on_status(JobStatus.completed) + + job_id = "test_job_id" + job_type = "test_job_type" + sched.schedule(job_type, job_id, job_handler) + + # make sure the job was properly registered + with pytest.raises(ValueError): + sched.get_job("unknown") + assert sched.get_job(job_id) is not None + assert sched.get_jobs() == [sched.get_job(job_id)] + + assert sched.get_jobs("unknown") == [] + assert sched.get_jobs(job_type) == [sched.get_job(job_id)] + + # now shut the scheduler down and make sure the job ran + await sched.shutdown() + + assert called + + job = sched.get_job(job_id) + assert job is not None + + assert job.status == JobStatus.completed + + assert job.scheduled_at is not None + assert job.started_at is not None + assert job.completed_at is not None + assert job.scheduled_at < job.started_at < job.completed_at + + assert job.artifacts == [ + {"type": "type1", "path": "path1"}, + {"type": "type2", "path": "path2"}, + ] + assert [msg[1] for msg in job.logs] == ["test log1", "test log2"] + assert job.logs[0][0] < job.logs[1][0] + + +@pytest.mark.asyncio +async def test_scheduler_naive_handler_raises(): + sched = Scheduler() + + async def failing_job_handler(on_log, on_status, on_artifact): + on_status(JobStatus.running) + raise ValueError("test error") + + job_id = "test_job_id1" + job_type = "test_job_type" + sched.schedule(job_type, job_id, failing_job_handler) + + job = sched.get_job(job_id) + assert job is not None + + # confirm the exception made the job transition to failed state, even + # though it was set to `running` before the error + for _ in range(10): + if job.status == JobStatus.failed: + break + await asyncio.sleep(0.1) + assert job.status == JobStatus.failed + + # confirm that the raised error got registered in log + assert job.logs[0][1] == "test error" + + # even after failed job, we can schedule another one + called = False + + async def successful_job_handler(on_log, on_status, on_artifact): + nonlocal called + called = True + on_status(JobStatus.completed) + + job_id = "test_job_id2" + sched.schedule(job_type, job_id, successful_job_handler) + + await sched.shutdown() + + assert called + job = sched.get_job(job_id) + assert job is not None + assert job.status == JobStatus.completed From cf158f2cb95ca2b00713e66fbcbbc81064e579ec Mon Sep 17 00:00:00 2001 From: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com> Date: Mon, 14 Apr 2025 12:03:54 -0400 Subject: [PATCH 37/39] feat: allow ollama to use 'latest' if available but not specified (#1903) # What does this PR do? ollama's CLI supports running models via commands such as 'ollama run llama3.2' this syntax does not work with the INFERENCE_MODEL llamastack var as currently specifying a tag such as 'latest' is required this commit will check to see if the 'latest' model is available and use that model if a user passes a model name without a tag but the 'latest' is available in ollama ## Test Plan Behavior pre-code change ```bash $ INFERENCE_MODEL=llama3.2 llama stack build --template ollama --image-type venv --run ... INFO 2025-04-08 13:42:42,842 llama_stack.providers.remote.inference.ollama.ollama:80 inference: checking connectivity to Ollama at `http://beanlab1.bss.redhat.com:11434`... Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/server/server.py", line 502, in main() File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/server/server.py", line 401, in main impls = asyncio.run(construct_stack(config)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 691, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/stack.py", line 222, in construct_stack await register_resources(run_config, impls) File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/stack.py", line 99, in register_resources await method(**obj.model_dump()) File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 294, in register_model registered_model = await self.register_object(model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 228, in register_object registered_obj = await register_object_with_provider(obj, p) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 77, in register_object_with_provider return await p.register_model(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/remote/inference/ollama/ollama.py", line 315, in register_model raise ValueError( ValueError: Model 'llama3.2' is not available in Ollama. Available models: llama3.2:latest ++ error_handler 108 ++ echo 'Error occurred in script at line: 108' Error occurred in script at line: 108 ++ exit 1 ``` Behavior post-code change ```bash $ INFERENCE_MODEL=llama3.2 llama stack build --template ollama --image-type venv --run ... INFO 2025-04-08 13:58:17,365 llama_stack.providers.remote.inference.ollama.ollama:80 inference: checking connectivity to Ollama at `http://beanlab1.bss.redhat.com:11434`... WARNING 2025-04-08 13:58:18,190 llama_stack.providers.remote.inference.ollama.ollama:317 inference: Imprecise provider resource id was used but 'latest' is available in Ollama - using 'llama3.2:latest' INFO 2025-04-08 13:58:18,191 llama_stack.providers.remote.inference.ollama.ollama:308 inference: Pulling embedding model `all-minilm:latest` if necessary... INFO 2025-04-08 13:58:18,799 __main__:478 server: Listening on ['::', '0.0.0.0']:8321 INFO: Started server process [28378] INFO: Waiting for application startup. INFO 2025-04-08 13:58:18,803 __main__:148 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ... ``` ## Documentation Did not document this anywhere but happy to do so if there is an appropriate place Signed-off-by: Nathan Weinberg --- llama_stack/providers/remote/inference/ollama/ollama.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/llama_stack/providers/remote/inference/ollama/ollama.py b/llama_stack/providers/remote/inference/ollama/ollama.py index 804d7eab2..cdfe7b568 100644 --- a/llama_stack/providers/remote/inference/ollama/ollama.py +++ b/llama_stack/providers/remote/inference/ollama/ollama.py @@ -343,6 +343,12 @@ class OllamaInferenceAdapter( response = await self.client.list() available_models = [m["model"] for m in response["models"]] if model.provider_resource_id not in available_models: + available_models_latest = [m["model"].split(":latest")[0] for m in response["models"]] + if model.provider_resource_id in available_models_latest: + logger.warning( + f"Imprecise provider resource id was used but 'latest' is available in Ollama - using '{model.provider_resource_id}:latest'" + ) + return model raise ValueError( f"Model '{model.provider_resource_id}' is not available in Ollama. Available models: {', '.join(available_models)}" ) From 86c6f1f1122511fcb56a74a82b604af4e064c565 Mon Sep 17 00:00:00 2001 From: Peter Double <134428501+solaius@users.noreply.github.com> Date: Mon, 14 Apr 2025 13:28:25 -0400 Subject: [PATCH 38/39] =?UTF-8?q?fix:=20FastAPI=20built-in=20paths=20bypas?= =?UTF-8?q?s=20custom=20routing=20(Docs)=20and=20update=20r=E2=80=A6=20(#1?= =?UTF-8?q?841)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## What does this PR do? This PR improves the server's request routing logic by ensuring built-in FastAPI paths such as `/docs`, `/redoc`, `/openapi.json`, `/favicon.ico`, and `/static` bypass the custom `TracingMiddleware`. This prevents unnecessary tracing logic for documentation and static file requests, ensuring better performance and cleaner logs. Additionally, it adds proper metadata (`title`, `description`, and `version`) to the FastAPI application initialization and updates the requirements document accordingly. [//]: # (Closes #1822 ) --- ## Test Plan - Ran the server locally with `uvicorn` using the provided `run.yaml` config - Verified that: - FastAPI docs (`/docs`, `/redoc`) load correctly without triggering the custom tracing middleware - All other routes still go through the middleware and trace logic - Application metadata appears as expected in the OpenAPI docs To reproduce: 1. Start the server with `python server.py --template ` 2. Navigate to `/docs` and `/redoc` 3. Confirm that no extra trace headers are added for those routes 4. Confirm other API endpoints behave as expected and include `x-trace-id` in the response headers [//]: # (## Documentation) --- Froze the requirements file to include many of the other libraries that have been added in the past few releases to make install easier. --------- Co-authored-by: Sébastien Han --- llama_stack/distribution/server/server.py | 24 +++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py index d7ef37c26..9bbb2ce88 100644 --- a/llama_stack/distribution/server/server.py +++ b/llama_stack/distribution/server/server.py @@ -229,15 +229,30 @@ class TracingMiddleware: def __init__(self, app, impls): self.app = app self.impls = impls + # FastAPI built-in paths that should bypass custom routing + self.fastapi_paths = ("/docs", "/redoc", "/openapi.json", "/favicon.ico", "/static") async def __call__(self, scope, receive, send): if scope.get("type") == "lifespan": return await self.app(scope, receive, send) path = scope.get("path", "") + + # Check if the path is a FastAPI built-in path + if path.startswith(self.fastapi_paths): + # Pass through to FastAPI's built-in handlers + logger.debug(f"Bypassing custom routing for FastAPI built-in path: {path}") + return await self.app(scope, receive, send) + if not hasattr(self, "endpoint_impls"): self.endpoint_impls = initialize_endpoint_impls(self.impls) - _, _, trace_path = find_matching_endpoint(scope.get("method", "GET"), path, self.endpoint_impls) + + try: + _, _, trace_path = find_matching_endpoint(scope.get("method", "GET"), path, self.endpoint_impls) + except ValueError: + # If no matching endpoint is found, pass through to FastAPI + logger.debug(f"No matching endpoint found for path: {path}, falling back to FastAPI") + return await self.app(scope, receive, send) trace_context = await start_trace(trace_path, {"__location__": "server", "raw_path": path}) @@ -388,7 +403,12 @@ def main(args: Optional[argparse.Namespace] = None): safe_config = redact_sensitive_fields(config.model_dump()) logger.info(yaml.dump(safe_config, indent=2)) - app = FastAPI(lifespan=lifespan) + app = FastAPI( + lifespan=lifespan, + docs_url="/docs", + redoc_url="/redoc", + openapi_url="/openapi.json", + ) if not os.environ.get("LLAMA_STACK_DISABLE_VERSION_CHECK"): app.add_middleware(ClientVersionMiddleware) From 32e3da73921ab354fa2ea114f0b2912ea435b0b9 Mon Sep 17 00:00:00 2001 From: ehhuang Date: Mon, 14 Apr 2025 18:45:22 -0700 Subject: [PATCH 39/39] test(verification): more tests, multiturn tool use tests (#1954) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests https://github.com/meta-llama/llama-stack/blob/f27f61762980925330fb46da5e9e74e3a1b999a2/tests/verifications/REPORT.md --- tests/verifications/REPORT.md | 56 +- .../fixtures/test_cases/chat_completion.yaml | 218 ++ .../openai_api/test_chat_completion.py | 358 ++- .../verifications/test_results/fireworks.json | 2458 ++++++++++++++-- tests/verifications/test_results/openai.json | 1316 ++++++++- .../verifications/test_results/together.json | 2467 +++++++++++++++-- 6 files changed, 6274 insertions(+), 599 deletions(-) diff --git a/tests/verifications/REPORT.md b/tests/verifications/REPORT.md index 2309c6404..2dd0af41b 100644 --- a/tests/verifications/REPORT.md +++ b/tests/verifications/REPORT.md @@ -1,6 +1,6 @@ # Test Results Report -*Generated on: 2025-04-10 16:48:18* +*Generated on: 2025-04-14 18:11:37* *This report was generated by running `python tests/verifications/generate_report.py`* @@ -15,15 +15,15 @@ | Provider | Pass Rate | Tests Passed | Total Tests | | --- | --- | --- | --- | -| Together | 64.7% | 22 | 34 | -| Fireworks | 82.4% | 28 | 34 | -| Openai | 100.0% | 24 | 24 | +| Together | 48.7% | 37 | 76 | +| Fireworks | 47.4% | 36 | 76 | +| Openai | 100.0% | 52 | 52 | ## Together -*Tests run on: 2025-04-10 16:46:35* +*Tests run on: 2025-04-14 18:08:14* ```bash # Run all tests for this provider: @@ -48,19 +48,33 @@ pytest tests/verifications/openai_api/test_chat_completion.py --provider=togethe | test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ | | test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ | | test_chat_non_streaming_image | ⚪ | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (add_product_tool) | ✅ | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (compare_monthly_expense_tool) | ❌ | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (get_then_create_event_tool) | ✅ | ❌ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (text_then_weather_tool) | ❌ | ❌ | ❌ | +| test_chat_non_streaming_multi_turn_tool_calling (weather_tool_then_text) | ✅ | ✅ | ✅ | | test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | | test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ | | test_chat_non_streaming_tool_calling | ✅ | ✅ | ✅ | +| test_chat_non_streaming_tool_choice_none | ❌ | ❌ | ❌ | +| test_chat_non_streaming_tool_choice_required | ✅ | ✅ | ✅ | | test_chat_streaming_basic (earth) | ✅ | ❌ | ❌ | | test_chat_streaming_basic (saturn) | ✅ | ❌ | ❌ | | test_chat_streaming_image | ⚪ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (add_product_tool) | ✅ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (compare_monthly_expense_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (get_then_create_event_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (text_then_weather_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (weather_tool_then_text) | ❌ | ❌ | ❌ | | test_chat_streaming_structured_output (calendar) | ✅ | ❌ | ❌ | | test_chat_streaming_structured_output (math) | ✅ | ❌ | ❌ | | test_chat_streaming_tool_calling | ✅ | ❌ | ❌ | +| test_chat_streaming_tool_choice_none | ❌ | ❌ | ❌ | +| test_chat_streaming_tool_choice_required | ✅ | ❌ | ❌ | ## Fireworks -*Tests run on: 2025-04-10 16:44:44* +*Tests run on: 2025-04-14 18:04:06* ```bash # Run all tests for this provider: @@ -85,19 +99,33 @@ pytest tests/verifications/openai_api/test_chat_completion.py --provider=firewor | test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ | | test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ | | test_chat_non_streaming_image | ⚪ | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (add_product_tool) | ❌ | ❌ | ❌ | +| test_chat_non_streaming_multi_turn_tool_calling (compare_monthly_expense_tool) | ❌ | ❌ | ❌ | +| test_chat_non_streaming_multi_turn_tool_calling (get_then_create_event_tool) | ❌ | ❌ | ❌ | +| test_chat_non_streaming_multi_turn_tool_calling (text_then_weather_tool) | ❌ | ❌ | ❌ | +| test_chat_non_streaming_multi_turn_tool_calling (weather_tool_then_text) | ❌ | ❌ | ❌ | | test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | | test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ | | test_chat_non_streaming_tool_calling | ❌ | ❌ | ❌ | +| test_chat_non_streaming_tool_choice_none | ✅ | ✅ | ✅ | +| test_chat_non_streaming_tool_choice_required | ✅ | ❌ | ❌ | | test_chat_streaming_basic (earth) | ✅ | ✅ | ✅ | | test_chat_streaming_basic (saturn) | ✅ | ✅ | ✅ | | test_chat_streaming_image | ⚪ | ✅ | ✅ | +| test_chat_streaming_multi_turn_tool_calling (add_product_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (compare_monthly_expense_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (get_then_create_event_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (text_then_weather_tool) | ❌ | ❌ | ❌ | +| test_chat_streaming_multi_turn_tool_calling (weather_tool_then_text) | ❌ | ❌ | ❌ | | test_chat_streaming_structured_output (calendar) | ✅ | ✅ | ✅ | | test_chat_streaming_structured_output (math) | ✅ | ✅ | ✅ | | test_chat_streaming_tool_calling | ❌ | ❌ | ❌ | +| test_chat_streaming_tool_choice_none | ✅ | ✅ | ✅ | +| test_chat_streaming_tool_choice_required | ✅ | ❌ | ❌ | ## Openai -*Tests run on: 2025-04-10 16:47:28* +*Tests run on: 2025-04-14 18:09:51* ```bash # Run all tests for this provider: @@ -121,12 +149,26 @@ pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai | test_chat_non_streaming_basic (earth) | ✅ | ✅ | | test_chat_non_streaming_basic (saturn) | ✅ | ✅ | | test_chat_non_streaming_image | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (add_product_tool) | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (compare_monthly_expense_tool) | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (get_then_create_event_tool) | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (text_then_weather_tool) | ✅ | ✅ | +| test_chat_non_streaming_multi_turn_tool_calling (weather_tool_then_text) | ✅ | ✅ | | test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | | test_chat_non_streaming_structured_output (math) | ✅ | ✅ | | test_chat_non_streaming_tool_calling | ✅ | ✅ | +| test_chat_non_streaming_tool_choice_none | ✅ | ✅ | +| test_chat_non_streaming_tool_choice_required | ✅ | ✅ | | test_chat_streaming_basic (earth) | ✅ | ✅ | | test_chat_streaming_basic (saturn) | ✅ | ✅ | | test_chat_streaming_image | ✅ | ✅ | +| test_chat_streaming_multi_turn_tool_calling (add_product_tool) | ✅ | ✅ | +| test_chat_streaming_multi_turn_tool_calling (compare_monthly_expense_tool) | ✅ | ✅ | +| test_chat_streaming_multi_turn_tool_calling (get_then_create_event_tool) | ✅ | ✅ | +| test_chat_streaming_multi_turn_tool_calling (text_then_weather_tool) | ✅ | ✅ | +| test_chat_streaming_multi_turn_tool_calling (weather_tool_then_text) | ✅ | ✅ | | test_chat_streaming_structured_output (calendar) | ✅ | ✅ | | test_chat_streaming_structured_output (math) | ✅ | ✅ | | test_chat_streaming_tool_calling | ✅ | ✅ | +| test_chat_streaming_tool_choice_none | ✅ | ✅ | +| test_chat_streaming_tool_choice_required | ✅ | ✅ | diff --git a/tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml b/tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml index 78ea8245d..1ace76e34 100644 --- a/tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml +++ b/tests/verifications/openai_api/fixtures/test_cases/chat_completion.yaml @@ -131,3 +131,221 @@ test_tool_calling: type: object type: function output: get_weather_tool_call + +test_chat_multi_turn_tool_calling: + test_name: test_chat_multi_turn_tool_calling + test_params: + case: + - case_id: "text_then_weather_tool" + input: + messages: + - - role: user + content: "What's the name of the Sun in latin?" + - - role: user + content: "What's the weather like in San Francisco?" + tools: + - function: + description: Get the current weather + name: get_weather + parameters: + type: object + properties: + location: + description: "The city and state (both required), e.g. San Francisco, CA." + type: string + required: ["location"] + type: function + tool_responses: + - response: "{'response': '70 degrees and foggy'}" + expected: + - num_tool_calls: 0 + answer: ["sol"] + - num_tool_calls: 1 + tool_name: get_weather + tool_arguments: + location: "San Francisco, CA" + - num_tool_calls: 0 + answer: ["foggy", "70 degrees"] + - case_id: "weather_tool_then_text" + input: + messages: + - - role: user + content: "What's the weather like in San Francisco?" + tools: + - function: + description: Get the current weather + name: get_weather + parameters: + type: object + properties: + location: + description: "The city and state (both required), e.g. San Francisco, CA." + type: string + required: ["location"] + type: function + tool_responses: + - response: "{'response': '70 degrees and foggy'}" + expected: + - num_tool_calls: 1 + tool_name: get_weather + tool_arguments: + location: "San Francisco, CA" + - num_tool_calls: 0 + answer: ["foggy", "70 degrees"] + - case_id: "add_product_tool" + input: + messages: + - - role: user + content: "Please add a new product with name 'Widget', price 19.99, in stock, and tags ['new', 'sale'] and give me the product id." + tools: + - function: + description: Add a new product + name: addProduct + parameters: + type: object + properties: + name: + description: "Name of the product" + type: string + price: + description: "Price of the product" + type: number + inStock: + description: "Availability status of the product." + type: boolean + tags: + description: "List of product tags" + type: array + items: + type: string + required: ["name", "price", "inStock"] + type: function + tool_responses: + - response: "{'response': 'Successfully added product with id: 123'}" + expected: + - num_tool_calls: 1 + tool_name: addProduct + tool_arguments: + name: "Widget" + price: 19.99 + inStock: true + tags: + - "new" + - "sale" + - num_tool_calls: 0 + answer: ["123", "product id: 123"] + - case_id: "get_then_create_event_tool" + input: + messages: + - - role: system + content: "Todays date is 2025-03-01." + - role: user + content: "Do i have any meetings on March 3rd at 10 am? Yes or no?" + - - role: user + content: "Alright then, Create an event named 'Team Building', scheduled for that time same time, in the 'Main Conference Room' and add Alice, Bob, Charlie to it. Give me the created event id." + tools: + - function: + description: Create a new event + name: create_event + parameters: + type: object + properties: + name: + description: "Name of the event" + type: string + date: + description: "Date of the event in ISO format" + type: string + time: + description: "Event Time (HH:MM)" + type: string + location: + description: "Location of the event" + type: string + participants: + description: "List of participant names" + type: array + items: + type: string + required: ["name", "date", "time", "location", "participants"] + type: function + - function: + description: Get an event by date and time + name: get_event + parameters: + type: object + properties: + date: + description: "Date of the event in ISO format" + type: string + time: + description: "Event Time (HH:MM)" + type: string + required: ["date", "time"] + type: function + tool_responses: + - response: "{'response': 'No events found for 2025-03-03 at 10:00'}" + - response: "{'response': 'Successfully created new event with id: e_123'}" + expected: + - num_tool_calls: 1 + tool_name: get_event + tool_arguments: + date: "2025-03-03" + time: "10:00" + - num_tool_calls: 0 + answer: ["no", "no events found", "no meetings"] + - num_tool_calls: 1 + tool_name: create_event + tool_arguments: + name: "Team Building" + date: "2025-03-03" + time: "10:00" + location: "Main Conference Room" + participants: + - "Alice" + - "Bob" + - "Charlie" + - num_tool_calls: 0 + answer: ["e_123", "event id: e_123"] + - case_id: "compare_monthly_expense_tool" + input: + messages: + - - role: system + content: "Todays date is 2025-03-01." + - role: user + content: "what was my monthly expense in Jan of this year?" + - - role: user + content: "Was it less than Feb of last year? Only answer with yes or no." + tools: + - function: + description: Get monthly expense summary + name: getMonthlyExpenseSummary + parameters: + type: object + properties: + month: + description: "Month of the year (1-12)" + type: integer + year: + description: "Year" + type: integer + required: ["month", "year"] + type: function + tool_responses: + - response: "{'response': 'Total expenses for January 2025: $1000'}" + - response: "{'response': 'Total expenses for February 2024: $2000'}" + expected: + - num_tool_calls: 1 + tool_name: getMonthlyExpenseSummary + tool_arguments: + month: 1 + year: 2025 + - num_tool_calls: 0 + answer: ["1000", "$1,000", "1,000"] + - num_tool_calls: 1 + tool_name: getMonthlyExpenseSummary + tool_arguments: + month: 2 + year: 2024 + - num_tool_calls: 0 + answer: ["yes"] diff --git a/tests/verifications/openai_api/test_chat_completion.py b/tests/verifications/openai_api/test_chat_completion.py index 6aee29c3a..62a223afb 100644 --- a/tests/verifications/openai_api/test_chat_completion.py +++ b/tests/verifications/openai_api/test_chat_completion.py @@ -4,6 +4,7 @@ # This source code is licensed under the terms described in the LICENSE file in # the root directory of this source tree. +import copy import json import re from typing import Any @@ -243,43 +244,294 @@ def test_chat_streaming_tool_calling(request, openai_client, model, provider, ve stream=True, ) - # Accumulate partial tool_calls here - tool_calls_buffer = {} - current_id = None - # Process streaming chunks - for chunk in stream: - choice = chunk.choices[0] - delta = choice.delta - - if delta.tool_calls is None: - continue - - for tool_call_delta in delta.tool_calls: - if tool_call_delta.id: - current_id = tool_call_delta.id - call_id = current_id - func_delta = tool_call_delta.function - - if call_id not in tool_calls_buffer: - tool_calls_buffer[call_id] = { - "id": call_id, - "type": tool_call_delta.type, - "name": func_delta.name, - "arguments": "", - } - - if func_delta.arguments: - tool_calls_buffer[call_id]["arguments"] += func_delta.arguments - + _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream) assert len(tool_calls_buffer) == 1 - for call in tool_calls_buffer.values(): + for call in tool_calls_buffer: assert len(call["id"]) > 0 - assert call["name"] == "get_weather" + function = call["function"] + assert function["name"] == "get_weather" - args_dict = json.loads(call["arguments"]) + args_dict = json.loads(function["arguments"]) assert "san francisco" in args_dict["location"].lower() +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_tool_calling"]["test_params"]["case"], # Reusing existing case for now + ids=case_id_generator, +) +def test_chat_non_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + tools=case["input"]["tools"], + tool_choice="required", # Force tool call + stream=False, + ) + print(response) + + assert response.choices[0].message.role == "assistant" + assert len(response.choices[0].message.tool_calls) > 0, "Expected tool call when tool_choice='required'" + expected_tool_name = case["input"]["tools"][0]["function"]["name"] + assert response.choices[0].message.tool_calls[0].function.name == expected_tool_name + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_tool_calling"]["test_params"]["case"], # Reusing existing case for now + ids=case_id_generator, +) +def test_chat_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + stream = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + tools=case["input"]["tools"], + tool_choice="required", # Force tool call + stream=True, + ) + + _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream) + + assert len(tool_calls_buffer) > 0, "Expected tool call when tool_choice='required'" + expected_tool_name = case["input"]["tools"][0]["function"]["name"] + assert any(call["function"]["name"] == expected_tool_name for call in tool_calls_buffer), ( + f"Expected tool call '{expected_tool_name}' not found in stream" + ) + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_tool_calling"]["test_params"]["case"], # Reusing existing case for now + ids=case_id_generator, +) +def test_chat_non_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + response = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + tools=case["input"]["tools"], + tool_choice="none", + stream=False, + ) + + assert response.choices[0].message.role == "assistant" + assert response.choices[0].message.tool_calls is None, "Expected no tool calls when tool_choice='none'" + assert response.choices[0].message.content is not None, "Expected content when tool_choice='none'" + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases["test_tool_calling"]["test_params"]["case"], # Reusing existing case for now + ids=case_id_generator, +) +def test_chat_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case): + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + stream = openai_client.chat.completions.create( + model=model, + messages=case["input"]["messages"], + tools=case["input"]["tools"], + tool_choice="none", + stream=True, + ) + + content = "" + for chunk in stream: + delta = chunk.choices[0].delta + if delta.content: + content += delta.content + assert not delta.tool_calls, "Expected no tool call chunks when tool_choice='none'" + + assert len(content) > 0, "Expected content when tool_choice='none'" + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases.get("test_chat_multi_turn_tool_calling", {}).get("test_params", {}).get("case", []), + ids=case_id_generator, +) +def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case): + """ + Test cases for multi-turn tool calling. + Tool calls are asserted. + Tool responses are provided in the test case. + Final response is asserted. + """ + + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + # Create a copy of the messages list to avoid modifying the original + messages = [] + tools = case["input"]["tools"] + # Use deepcopy to prevent modification across runs/parametrization + expected_results = copy.deepcopy(case["expected"]) + tool_responses = copy.deepcopy(case.get("tool_responses", [])) + input_messages_turns = copy.deepcopy(case["input"]["messages"]) + + # keep going until either + # 1. we have messages to test in multi-turn + # 2. no messages but last message is tool response + while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1]["role"] == "tool"): + # do not take new messages if last message is tool response + if len(messages) == 0 or messages[-1]["role"] != "tool": + new_messages = input_messages_turns.pop(0) + # Ensure new_messages is a list of message objects + if isinstance(new_messages, list): + messages.extend(new_messages) + else: + # If it's a single message object, add it directly + messages.append(new_messages) + + # --- API Call --- + response = openai_client.chat.completions.create( + model=model, + messages=messages, + tools=tools, + stream=False, + ) + + # --- Process Response --- + assistant_message = response.choices[0].message + messages.append(assistant_message.model_dump(exclude_unset=True)) + + assert assistant_message.role == "assistant" + + # Get the expected result data + expected = expected_results.pop(0) + num_tool_calls = expected["num_tool_calls"] + + # --- Assertions based on expected result --- + assert len(assistant_message.tool_calls or []) == num_tool_calls, ( + f"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}" + ) + + if num_tool_calls > 0: + tool_call = assistant_message.tool_calls[0] + assert tool_call.function.name == expected["tool_name"], ( + f"Expected tool '{expected['tool_name']}', got '{tool_call.function.name}'" + ) + # Parse the JSON string arguments before comparing + actual_arguments = json.loads(tool_call.function.arguments) + assert actual_arguments == expected["tool_arguments"], ( + f"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'" + ) + + # Prepare and append the tool response for the next turn + tool_response = tool_responses.pop(0) + messages.append( + { + "role": "tool", + "tool_call_id": tool_call.id, + "content": tool_response["response"], + } + ) + else: + assert assistant_message.content is not None, "Expected content, but none received." + expected_answers = expected["answer"] # This is now a list + content_lower = assistant_message.content.lower() + assert any(ans.lower() in content_lower for ans in expected_answers), ( + f"Expected one of {expected_answers} in content, but got: '{assistant_message.content}'" + ) + + +@pytest.mark.parametrize( + "case", + chat_completion_test_cases.get("test_chat_multi_turn_tool_calling", {}).get("test_params", {}).get("case", []), + ids=case_id_generator, +) +def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case): + """ """ + test_name_base = get_base_test_name(request) + if should_skip_test(verification_config, provider, model, test_name_base): + pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.") + + messages = [] + tools = case["input"]["tools"] + expected_results = copy.deepcopy(case["expected"]) + tool_responses = copy.deepcopy(case.get("tool_responses", [])) + input_messages_turns = copy.deepcopy(case["input"]["messages"]) + + while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1]["role"] == "tool"): + if len(messages) == 0 or messages[-1]["role"] != "tool": + new_messages = input_messages_turns.pop(0) + if isinstance(new_messages, list): + messages.extend(new_messages) + else: + messages.append(new_messages) + + # --- API Call (Streaming) --- + stream = openai_client.chat.completions.create( + model=model, + messages=messages, + tools=tools, + stream=True, + ) + + # --- Process Stream --- + accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream) + + # --- Construct Assistant Message for History --- + assistant_message_dict = {"role": "assistant"} + if accumulated_content: + assistant_message_dict["content"] = accumulated_content + if accumulated_tool_calls: + assistant_message_dict["tool_calls"] = accumulated_tool_calls + + messages.append(assistant_message_dict) + + # --- Assertions --- + expected = expected_results.pop(0) + num_tool_calls = expected["num_tool_calls"] + + assert len(accumulated_tool_calls or []) == num_tool_calls, ( + f"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}" + ) + + if num_tool_calls > 0: + # Use the first accumulated tool call for assertion + tool_call = accumulated_tool_calls[0] + assert tool_call["function"]["name"] == expected["tool_name"], ( + f"Expected tool '{expected['tool_name']}', got '{tool_call['function']['name']}'" + ) + # Parse the accumulated arguments string for comparison + actual_arguments = json.loads(tool_call["function"]["arguments"]) + assert actual_arguments == expected["tool_arguments"], ( + f"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'" + ) + + # Prepare and append the tool response for the next turn + tool_response = tool_responses.pop(0) + messages.append( + { + "role": "tool", + "tool_call_id": tool_call["id"], + "content": tool_response["response"], + } + ) + else: + assert accumulated_content is not None and accumulated_content != "", "Expected content, but none received." + expected_answers = expected["answer"] + content_lower = accumulated_content.lower() + assert any(ans.lower() in content_lower for ans in expected_answers), ( + f"Expected one of {expected_answers} in content, but got: '{accumulated_content}'" + ) + + # --- Helper functions (structured output validation) --- @@ -324,3 +576,47 @@ def validate_structured_output(maybe_json_content: str, schema_name: str) -> Non assert len(structured_output.participants) == 2 elif schema_name == "valid_math_reasoning": assert len(structured_output.final_answer) > 0 + + +def _accumulate_streaming_tool_calls(stream): + """Accumulates tool calls and content from a streaming ChatCompletion response.""" + tool_calls_buffer = {} + current_id = None + full_content = "" # Initialize content accumulator + # Process streaming chunks + for chunk in stream: + choice = chunk.choices[0] + delta = choice.delta + + # Accumulate content + if delta.content: + full_content += delta.content + + if delta.tool_calls is None: + continue + + for tool_call_delta in delta.tool_calls: + if tool_call_delta.id: + current_id = tool_call_delta.id + call_id = current_id + # Skip if no ID seen yet for this tool call delta + if not call_id: + continue + func_delta = tool_call_delta.function + + if call_id not in tool_calls_buffer: + tool_calls_buffer[call_id] = { + "id": call_id, + "type": "function", # Assume function type + "function": {"name": None, "arguments": ""}, # Nested structure + } + + # Accumulate name and arguments into the nested function dict + if func_delta: + if func_delta.name: + tool_calls_buffer[call_id]["function"]["name"] = func_delta.name + if func_delta.arguments: + tool_calls_buffer[call_id]["function"]["arguments"] += func_delta.arguments + + # Return content and tool calls as a list + return full_content, list(tool_calls_buffer.values()) diff --git a/tests/verifications/test_results/fireworks.json b/tests/verifications/test_results/fireworks.json index 061e44c08..1fb6cb1b4 100644 --- a/tests/verifications/test_results/fireworks.json +++ b/tests/verifications/test_results/fireworks.json @@ -1,15 +1,15 @@ { - "created": 1744328795.171092, - "duration": 107.57908606529236, + "created": 1744679294.344288, + "duration": 243.49469900131226, "exitcode": 1, "root": "/Users/erichuang/projects/llama-stack", "environment": {}, "summary": { - "passed": 28, + "passed": 36, "skipped": 2, - "failed": 6, - "total": 36, - "collected": 36 + "failed": 40, + "total": 78, + "collected": 78 }, "collectors": [ { @@ -29,182 +29,392 @@ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 } ] } @@ -212,7 +422,7 @@ "tests": [ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", @@ -231,21 +441,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.2175025000469759, + "duration": 0.2540216660127044, "outcome": "passed" }, "call": { - "duration": 0.7433859170414507, + "duration": 0.6861197501420975, "outcome": "passed" }, "teardown": { - "duration": 0.0001592918997630477, + "duration": 0.00015208404511213303, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", @@ -264,21 +474,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.007383499993011355, + "duration": 0.006722707999870181, "outcome": "passed" }, "call": { - "duration": 0.5949292909353971, + "duration": 0.5997684169560671, "outcome": "passed" }, "teardown": { - "duration": 0.00015891704242676497, + "duration": 0.0002298750914633274, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", @@ -297,21 +507,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.010730999987572432, + "duration": 0.015468083089217544, "outcome": "passed" }, "call": { - "duration": 0.8945954169612378, + "duration": 0.4625723329372704, "outcome": "passed" }, "teardown": { - "duration": 0.0003751249751076102, + "duration": 0.0003302919212728739, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", @@ -330,21 +540,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.01665666699409485, + "duration": 0.014780875062569976, "outcome": "passed" }, "call": { - "duration": 0.907927209045738, + "duration": 0.4616922920104116, "outcome": "passed" }, "teardown": { - "duration": 0.00024874997325241566, + "duration": 0.0004110001027584076, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", @@ -363,21 +573,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.01039199996739626, + "duration": 0.016551292035728693, "outcome": "passed" }, "call": { - "duration": 0.5971567500382662, + "duration": 0.9366653750184923, "outcome": "passed" }, "teardown": { - "duration": 0.0003488330403342843, + "duration": 0.00045104208402335644, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", @@ -396,21 +606,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.018627874902449548, + "duration": 0.043513541808351874, "outcome": "passed" }, "call": { - "duration": 2.0586736251134425, + "duration": 0.5119727500714362, "outcome": "passed" }, "teardown": { - "duration": 0.00046974990982562304, + "duration": 0.00016754190437495708, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-earth]", @@ -429,21 +639,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.01706262503284961, + "duration": 0.008419709047302604, "outcome": "passed" }, "call": { - "duration": 0.6679969580145553, + "duration": 0.7933078748174012, "outcome": "passed" }, "teardown": { - "duration": 0.0004670419730246067, + "duration": 0.00016583292745053768, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama-v3p3-70b-instruct-saturn]", @@ -462,21 +672,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.025956374942325056, + "duration": 0.013550583040341735, "outcome": "passed" }, "call": { - "duration": 2.052679874934256, + "duration": 0.6633435001131147, "outcome": "passed" }, "teardown": { - "duration": 0.00026958296075463295, + "duration": 0.00023925001733005047, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-earth]", @@ -495,21 +705,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.015856957994401455, + "duration": 0.007293834118172526, "outcome": "passed" }, "call": { - "duration": 0.3096678329166025, + "duration": 0.5193503750488162, "outcome": "passed" }, "teardown": { - "duration": 0.0007620420074090362, + "duration": 0.00018516601994633675, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-scout-instruct-basic-saturn]", @@ -528,21 +738,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.013509334065020084, + "duration": 0.009030540939420462, "outcome": "passed" }, "call": { - "duration": 0.5914681670255959, + "duration": 0.4338789170142263, "outcome": "passed" }, "teardown": { - "duration": 0.0002906669396907091, + "duration": 0.0004670829512178898, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-earth]", @@ -561,21 +771,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.013216375024057925, + "duration": 0.01854533306322992, "outcome": "passed" }, "call": { - "duration": 1.8804527079919353, + "duration": 1.0042304168455303, "outcome": "passed" }, "teardown": { - "duration": 0.0002026669681072235, + "duration": 0.0004844998475164175, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[accounts/fireworks/models/llama4-maverick-instruct-basic-saturn]", @@ -594,21 +804,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.00827441702131182, + "duration": 0.018001709133386612, "outcome": "passed" }, "call": { - "duration": 0.7407040420221165, + "duration": 0.5567380839493126, "outcome": "passed" }, "teardown": { - "duration": 0.0005084159784018993, + "duration": 0.00015412503853440285, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 116, + "lineno": 117, "outcome": "skipped", "keywords": [ "test_chat_non_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -627,22 +837,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.012424499960616231, + "duration": 0.008420375175774097, "outcome": "passed" }, "call": { - "duration": 0.00032762496266514063, + "duration": 0.00015591713599860668, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 125, 'Skipped: Skipping test_chat_non_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 126, 'Skipped: Skipping test_chat_non_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" }, "teardown": { - "duration": 0.00032416603062301874, + "duration": 0.0001371251419186592, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 116, + "lineno": 117, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -661,21 +871,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.02253958396613598, + "duration": 0.00672045792452991, "outcome": "passed" }, "call": { - "duration": 2.64042466704268, + "duration": 1.790064417058602, "outcome": "passed" }, "teardown": { - "duration": 0.0003636250039562583, + "duration": 0.0004657919052988291, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 116, + "lineno": 117, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -694,21 +904,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.014634749968536198, + "duration": 0.015534916892647743, "outcome": "passed" }, "call": { - "duration": 5.126485540997237, + "duration": 3.2250108749140054, "outcome": "passed" }, "teardown": { - "duration": 0.0002988330088555813, + "duration": 0.00038420804776251316, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 135, + "lineno": 136, "outcome": "skipped", "keywords": [ "test_chat_streaming_image[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -727,22 +937,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.015854416065849364, + "duration": 0.03246337501332164, "outcome": "passed" }, "call": { - "duration": 0.00038058299105614424, + "duration": 0.0005176670383661985, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 144, 'Skipped: Skipping test_chat_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 145, 'Skipped: Skipping test_chat_streaming_image for model accounts/fireworks/models/llama-v3p3-70b-instruct on provider fireworks based on config.')" }, "teardown": { - "duration": 0.0002689170651137829, + "duration": 0.0002715419977903366, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 135, + "lineno": 136, "outcome": "passed", "keywords": [ "test_chat_streaming_image[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -761,21 +971,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.011205915943719447, + "duration": 0.12475762516260147, "outcome": "passed" }, "call": { - "duration": 3.2596546669956297, + "duration": 4.934706958010793, "outcome": "passed" }, "teardown": { - "duration": 0.0006222500232979655, + "duration": 0.00027604191564023495, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 135, + "lineno": 136, "outcome": "passed", "keywords": [ "test_chat_streaming_image[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -794,21 +1004,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.016557667055167258, + "duration": 0.01025745808146894, "outcome": "passed" }, "call": { - "duration": 4.930164708988741, + "duration": 3.5653172079473734, "outcome": "passed" }, "teardown": { - "duration": 0.00048687495291233063, + "duration": 0.0005323749501258135, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", @@ -827,21 +1037,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.00886166701093316, + "duration": 0.0553184999153018, "outcome": "passed" }, "call": { - "duration": 0.8833738330285996, + "duration": 1.366144834086299, "outcome": "passed" }, "teardown": { - "duration": 0.00025583396200090647, + "duration": 0.00042316620238125324, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", @@ -860,21 +1070,21 @@ "case_id": "math" }, "setup": { - "duration": 0.01297520799562335, + "duration": 0.06981937494128942, "outcome": "passed" }, "call": { - "duration": 1.9960687910206616, + "duration": 2.829931082902476, "outcome": "passed" }, "teardown": { - "duration": 0.0005048330640420318, + "duration": 0.0003029161598533392, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", @@ -893,21 +1103,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.007275875075720251, + "duration": 0.0244335001334548, "outcome": "passed" }, "call": { - "duration": 0.9094266659813002, + "duration": 0.7541109579615295, "outcome": "passed" }, "teardown": { - "duration": 0.00028041598852723837, + "duration": 0.0004666249733418226, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", @@ -926,21 +1136,21 @@ "case_id": "math" }, "setup": { - "duration": 0.008899332955479622, + "duration": 0.016700832871720195, "outcome": "passed" }, "call": { - "duration": 3.117967874975875, + "duration": 2.208378749899566, "outcome": "passed" }, "teardown": { - "duration": 0.00017600005958229303, + "duration": 0.00016137491911649704, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", @@ -959,21 +1169,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.0073364999843761325, + "duration": 0.006982124876230955, "outcome": "passed" }, "call": { - "duration": 2.2714374579954892, + "duration": 0.6431179158389568, "outcome": "passed" }, "teardown": { - "duration": 0.0001814159331843257, + "duration": 0.00033412501215934753, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", @@ -992,21 +1202,21 @@ "case_id": "math" }, "setup": { - "duration": 0.010546459001488984, + "duration": 0.015676999930292368, "outcome": "passed" }, "call": { - "duration": 3.9954450000077486, + "duration": 4.404933541081846, "outcome": "passed" }, "teardown": { - "duration": 0.0002719159238040447, + "duration": 0.0002617498394101858, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-calendar]", @@ -1025,21 +1235,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.012508000014349818, + "duration": 0.07572970795445144, "outcome": "passed" }, "call": { - "duration": 9.095425167004578, + "duration": 1.1367775409016758, "outcome": "passed" }, "teardown": { - "duration": 0.00029200001154094934, + "duration": 0.0006681671366095543, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama-v3p3-70b-instruct-math]", @@ -1058,21 +1268,21 @@ "case_id": "math" }, "setup": { - "duration": 0.014769250061362982, + "duration": 0.028525790898129344, "outcome": "passed" }, "call": { - "duration": 1.9875252910424024, + "duration": 2.1424834579229355, "outcome": "passed" }, "teardown": { - "duration": 0.0006288329605013132, + "duration": 0.0003642500378191471, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-calendar]", @@ -1091,21 +1301,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.014440709026530385, + "duration": 0.0146782910451293, "outcome": "passed" }, "call": { - "duration": 1.2613736250204965, + "duration": 15.13383225002326, "outcome": "passed" }, "teardown": { - "duration": 0.0001937919296324253, + "duration": 0.00045950012281537056, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-scout-instruct-basic-math]", @@ -1124,21 +1334,21 @@ "case_id": "math" }, "setup": { - "duration": 0.0071510839043185115, + "duration": 0.01714799995534122, "outcome": "passed" }, "call": { - "duration": 2.2953888749470934, + "duration": 10.714752790983766, "outcome": "passed" }, "teardown": { - "duration": 0.00016245793085545301, + "duration": 0.00027029216289520264, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-calendar]", @@ -1157,21 +1367,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.007294666953384876, + "duration": 0.010765291983261704, "outcome": "passed" }, "call": { - "duration": 2.194703874993138, + "duration": 0.6682700838427991, "outcome": "passed" }, "teardown": { - "duration": 0.00017604196909815073, + "duration": 0.00015808409079909325, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[accounts/fireworks/models/llama4-maverick-instruct-basic-math]", @@ -1190,21 +1400,21 @@ "case_id": "math" }, "setup": { - "duration": 0.019950625021010637, + "duration": 0.0071080829948186874, "outcome": "passed" }, "call": { - "duration": 8.4994609169662, + "duration": 1.9725822920445353, "outcome": "passed" }, "teardown": { - "duration": 0.00026404205709695816, + "duration": 0.0004201668780297041, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 204, + "lineno": 205, "outcome": "failed", "keywords": [ "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -1223,34 +1433,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.011928000021725893, + "duration": 0.013940333155915141, "outcome": "passed" }, "call": { - "duration": 0.5664792089955881, + "duration": 0.5732313331682235, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 223, + "lineno": 224, "message": "TypeError: object of type 'NoneType' has no len()" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 223, + "lineno": 224, "message": "TypeError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:223: TypeError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:224: TypeError" }, "teardown": { - "duration": 0.00023799994960427284, + "duration": 0.00022962503135204315, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 204, + "lineno": 205, "outcome": "failed", "keywords": [ "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -1269,34 +1479,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.006813624990172684, + "duration": 0.006374292075634003, "outcome": "passed" }, "call": { - "duration": 3.170418416033499, + "duration": 7.2776273330673575, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 223, + "lineno": 224, "message": "TypeError: object of type 'NoneType' has no len()" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 223, + "lineno": 224, "message": "TypeError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:223: TypeError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:224: TypeError" }, "teardown": { - "duration": 0.0004129580920562148, + "duration": 0.0004100420046597719, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 204, + "lineno": 205, "outcome": "failed", "keywords": [ "test_chat_non_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -1315,34 +1525,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.01656208303757012, + "duration": 0.012761292047798634, "outcome": "passed" }, "call": { - "duration": 22.76337137504015, + "duration": 0.8920639578718692, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 223, + "lineno": 224, "message": "TypeError: object of type 'NoneType' has no len()" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 223, + "lineno": 224, "message": "TypeError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:223: TypeError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:224: TypeError" }, "teardown": { - "duration": 0.00038704206235706806, + "duration": 0.0004124999977648258, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", - "lineno": 228, + "lineno": 229, "outcome": "failed", "keywords": [ "test_chat_streaming_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", @@ -1361,34 +1571,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.015727541991509497, + "duration": 0.013205124996602535, "outcome": "passed" }, "call": { - "duration": 0.5719050420448184, + "duration": 1.930448625003919, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 274, - "message": "assert 0 == 1\n + where 0 = len({})" + "lineno": 248, + "message": "assert 0 == 1\n + where 0 = len([])" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 274, + "lineno": 248, "message": "AssertionError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n choice = chunk.choices[0]\n delta = choice.delta\n \n if delta.tool_calls is None:\n continue\n \n for tool_call_delta in delta.tool_calls:\n if tool_call_delta.id:\n current_id = tool_call_delta.id\n call_id = current_id\n func_delta = tool_call_delta.function\n \n if call_id not in tool_calls_buffer:\n tool_calls_buffer[call_id] = {\n \"id\": call_id,\n \"type\": tool_call_delta.type,\n \"name\": func_delta.name,\n \"arguments\": \"\",\n }\n \n if func_delta.arguments:\n tool_calls_buffer[call_id][\"arguments\"] += func_delta.arguments\n \n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len({})\n\ntests/verifications/openai_api/test_chat_completion.py:274: AssertionError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len([])\n\ntests/verifications/openai_api/test_chat_completion.py:248: AssertionError" }, "teardown": { - "duration": 0.0003532909322530031, + "duration": 0.0005771249998360872, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", - "lineno": 228, + "lineno": 229, "outcome": "failed", "keywords": [ "test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", @@ -1407,34 +1617,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.011914041941054165, + "duration": 0.01408083294518292, "outcome": "passed" }, "call": { - "duration": 5.403063916950487, + "duration": 10.029349042102695, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 274, - "message": "assert 0 == 1\n + where 0 = len({})" + "lineno": 248, + "message": "assert 0 == 1\n + where 0 = len([])" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 274, + "lineno": 248, "message": "AssertionError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n choice = chunk.choices[0]\n delta = choice.delta\n \n if delta.tool_calls is None:\n continue\n \n for tool_call_delta in delta.tool_calls:\n if tool_call_delta.id:\n current_id = tool_call_delta.id\n call_id = current_id\n func_delta = tool_call_delta.function\n \n if call_id not in tool_calls_buffer:\n tool_calls_buffer[call_id] = {\n \"id\": call_id,\n \"type\": tool_call_delta.type,\n \"name\": func_delta.name,\n \"arguments\": \"\",\n }\n \n if func_delta.arguments:\n tool_calls_buffer[call_id][\"arguments\"] += func_delta.arguments\n \n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len({})\n\ntests/verifications/openai_api/test_chat_completion.py:274: AssertionError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len([])\n\ntests/verifications/openai_api/test_chat_completion.py:248: AssertionError" }, "teardown": { - "duration": 0.0005193749675527215, + "duration": 0.0004449589177966118, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", - "lineno": 228, + "lineno": 229, "outcome": "failed", "keywords": [ "test_chat_streaming_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", @@ -1453,31 +1663,1859 @@ "case_id": "case0" }, "setup": { - "duration": 0.012608832912519574, + "duration": 0.013213291997089982, "outcome": "passed" }, "call": { - "duration": 7.587262416025624, + "duration": 8.608150291023776, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 274, - "message": "assert 0 == 1\n + where 0 = len({})" + "lineno": 248, + "message": "assert 0 == 1\n + where 0 = len([])" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 274, + "lineno": 248, "message": "AssertionError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n choice = chunk.choices[0]\n delta = choice.delta\n \n if delta.tool_calls is None:\n continue\n \n for tool_call_delta in delta.tool_calls:\n if tool_call_delta.id:\n current_id = tool_call_delta.id\n call_id = current_id\n func_delta = tool_call_delta.function\n \n if call_id not in tool_calls_buffer:\n tool_calls_buffer[call_id] = {\n \"id\": call_id,\n \"type\": tool_call_delta.type,\n \"name\": func_delta.name,\n \"arguments\": \"\",\n }\n \n if func_delta.arguments:\n tool_calls_buffer[call_id][\"arguments\"] += func_delta.arguments\n \n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len({})\n\ntests/verifications/openai_api/test_chat_completion.py:274: AssertionError" + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n> assert len(tool_calls_buffer) == 1\nE assert 0 == 1\nE + where 0 = len([])\n\ntests/verifications/openai_api/test_chat_completion.py:248: AssertionError" }, "teardown": { - "duration": 0.0008685829816386104, + "duration": 0.0005860829260200262, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 257, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.01437820796854794, + "outcome": "passed" + }, + "call": { + "duration": 0.7105170420836657, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00017283298075199127, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 257, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.009220415959134698, + "outcome": "passed" + }, + "call": { + "duration": 5.718667333945632, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 277, + "message": "TypeError: object of type 'NoneType' has no len()" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 277, + "message": "TypeError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"required\", # Force tool call\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0, \"Expected tool call when tool_choice='required'\"\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:277: TypeError" + }, + "teardown": { + "duration": 0.0003282078541815281, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 257, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.014709000010043383, + "outcome": "passed" + }, + "call": { + "duration": 1.7260455000214279, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 277, + "message": "TypeError: object of type 'NoneType' has no len()" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 277, + "message": "TypeError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"required\", # Force tool call\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert len(response.choices[0].message.tool_calls) > 0, \"Expected tool call when tool_choice='required'\"\nE TypeError: object of type 'NoneType' has no len()\n\ntests/verifications/openai_api/test_chat_completion.py:277: TypeError" + }, + "teardown": { + "duration": 0.00022012507542967796, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 281, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.008183792000636458, + "outcome": "passed" + }, + "call": { + "duration": 1.9683502500411123, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0007690000347793102, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 281, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.014906208030879498, + "outcome": "passed" + }, + "call": { + "duration": 11.76459054206498, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 302, + "message": "AssertionError: Expected tool call when tool_choice='required'\nassert 0 > 0\n + where 0 = len([])" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 302, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"required\", # Force tool call\n stream=True,\n )\n \n _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n \n> assert len(tool_calls_buffer) > 0, \"Expected tool call when tool_choice='required'\"\nE AssertionError: Expected tool call when tool_choice='required'\nE assert 0 > 0\nE + where 0 = len([])\n\ntests/verifications/openai_api/test_chat_completion.py:302: AssertionError" + }, + "teardown": { + "duration": 0.0003086249344050884, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 281, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_required[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.021144041791558266, + "outcome": "passed" + }, + "call": { + "duration": 2.4300453749019653, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 302, + "message": "AssertionError: Expected tool call when tool_choice='required'\nassert 0 > 0\n + where 0 = len([])" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 302, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"required\", # Force tool call\n stream=True,\n )\n \n _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n \n> assert len(tool_calls_buffer) > 0, \"Expected tool call when tool_choice='required'\"\nE AssertionError: Expected tool call when tool_choice='required'\nE assert 0 > 0\nE + where 0 = len([])\n\ntests/verifications/openai_api/test_chat_completion.py:302: AssertionError" + }, + "teardown": { + "duration": 0.00037800008431077003, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 308, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.007929167011752725, + "outcome": "passed" + }, + "call": { + "duration": 1.0130669160280377, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004307499621063471, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 308, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.010822792071849108, + "outcome": "passed" + }, + "call": { + "duration": 4.663267957977951, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0006220841314643621, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 308, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.010691167088225484, + "outcome": "passed" + }, + "call": { + "duration": 3.383276625070721, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00047616707161068916, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "lineno": 331, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama-v3p3-70b-instruct-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.030178457964211702, + "outcome": "passed" + }, + "call": { + "duration": 0.4668415829073638, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0007963338866829872, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "lineno": 331, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama4-scout-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.011727249948307872, + "outcome": "passed" + }, + "call": { + "duration": 11.540696125011891, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0009242501109838486, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "lineno": 331, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_none[accounts/fireworks/models/llama4-maverick-instruct-basic-case0]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "case0" + }, + "setup": { + "duration": 0.008536209119483829, + "outcome": "passed" + }, + "call": { + "duration": 3.6622679999563843, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005495408549904823, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.017524708062410355, + "outcome": "passed" + }, + "call": { + "duration": 0.625571500044316, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 446, + "message": "AssertionError: Expected one of ['sol'] in content, but got: 'I am not able to execute this task as it exceeds the limitations of the functions I have been given.'\nassert False\n + where False = any(. at 0x1073e5cb0>)" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 446, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n tool_call = assistant_message.tool_calls[0]\n assert tool_call.function.name == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call.function.name}'\"\n )\n # Parse the JSON string arguments before comparing\n actual_arguments = json.loads(tool_call.function.arguments)\n assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\n \n # Prepare and append the tool response for the next turn\n tool_response = tool_responses.pop(0)\n messages.append(\n {\n \"role\": \"tool\",\n \"tool_call_id\": tool_call.id,\n \"content\": tool_response[\"response\"],\n }\n )\n else:\n assert assistant_message.content is not None, \"Expected content, but none received.\"\n expected_answers = expected[\"answer\"] # This is now a list\n content_lower = assistant_message.content.lower()\n> assert any(ans.lower() in content_lower for ans in expected_answers), (\n f\"Expected one of {expected_answers} in content, but got: '{assistant_message.content}'\"\n )\nE AssertionError: Expected one of ['sol'] in content, but got: 'I am not able to execute this task as it exceeds the limitations of the functions I have been given.'\nE assert False\nE + where False = any(. at 0x1073e5cb0>)\n\ntests/verifications/openai_api/test_chat_completion.py:446: AssertionError" + }, + "teardown": { + "duration": 0.00044062500819563866, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.01056775008328259, + "outcome": "passed" + }, + "call": { + "duration": 0.5624969999771565, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"get_weather\", \"parameters\": {\"location\": \"San Francisco, CA\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"get_weather\", \"parameters\": {\"location\": \"San Francisco, CA\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0004401658661663532, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.013444249983876944, + "outcome": "passed" + }, + "call": { + "duration": 0.8705885419622064, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"addProduct\", \"parameters\": {\"name\": \"Widget\", \"price\": \"19.99\", \"inStock\": \"true\", \"tags\": \"[\\\\\"new\\\\\", \\\\\"sale\\\\\"]\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"addProduct\", \"parameters\": {\"name\": \"Widget\", \"price\": \"19.99\", \"inStock\": \"true\", \"tags\": \"[\\\\\"new\\\\\", \\\\\"sale\\\\\"]\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0004647918976843357, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.013817500090226531, + "outcome": "passed" + }, + "call": { + "duration": 0.6882082498632371, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"get_event\", \"parameters\": {\"date\": \"2025-03-03\", \"time\": \"10:00\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"get_event\", \"parameters\": {\"date\": \"2025-03-03\", \"time\": \"10:00\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0005112909711897373, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.013548000017181039, + "outcome": "passed" + }, + "call": { + "duration": 0.5821714580524713, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": \"1\", \"year\": \"2025\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": \"1\", \"year\": \"2025\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00021225004456937313, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.0070156671572476625, + "outcome": "passed" + }, + "call": { + "duration": 8.95718324999325, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='```\\n{\\n \"name\": \"get_weather\",\\n \"parameters\": {\\n \"description\": \"Get the current weather\",\\n \"parameters\": {\\n \"location\": {\\n \"description\": \"The city and state (both required)\",\\n \"type\": \"object\",\\n \"properties\": {\\n \"location\": {\\n \"description\": \"The city and state, e.g. San Francisco, CA.\",\\n \"type\": \"string\"\\n }\\n }\\n }\\n },\\n \"type\": \"object\",\\n \"properties\": {\\n \"location\": \"San Francisco, CA.\"\\n }\\n }\\n}\\n```', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='```\\n{\\n \"name\": \"get_weather\",\\n \"parameters\": {\\n \"description\": \"Get the current weather\",\\n \"parameters\": {\\n \"location\": {\\n \"description\": \"The city and state (both required)\",\\n \"type\": \"object\",\\n \"properties\": {\\n \"location\": {\\n \"description\": \"The city and state, e.g. San Francisco, CA.\",\\n \"type\": \"string\"\\n }\\n }\\n }\\n },\\n \"type\": \"object\",\\n \"properties\": {\\n \"location\": \"San Francisco, CA.\"\\n }\\n }\\n}\\n```', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00045741605572402477, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.011042665923014283, + "outcome": "passed" + }, + "call": { + "duration": 3.372867708094418, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"get_weather\", \"parameters\": {\"description\": \"Get the current weather\", \"parameters\": {\"type\": \"object\", \"properties\": {\"location\": {\"description\": \"The city and state (both required)\", \"type\": \"string\"}}}, \"required\": [\"location\"]}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"get_weather\", \"parameters\": {\"description\": \"Get the current weather\", \"parameters\": {\"type\": \"object\", \"properties\": {\"location\": {\"description\": \"The city and state (both required)\", \"type\": \"string\"}}}, \"required\": [\"location\"]}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00042333384044468403, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.01305404189042747, + "outcome": "passed" + }, + "call": { + "duration": 3.5883425418287516, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"addProduct\", \"parameters\": {\"description\": \"Add a new product\", \"type\": \"object\", \"properties\": {\"name\": {\"description\": \"Name of the product\", \"type\": \"string\"}, \"price\": {\"description\": \"Price of the product\", \"type\": \"number\"}, \"inStock\": {\"description\": \"Availability status of the product\", \"type\": \"boolean\"}, \"tags\": {\"description\": \"List of product tags\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}}}, \"required\": [\"name\", \"price\", \"inStock\", \"tags\"]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"addProduct\", \"parameters\": {\"description\": \"Add a new product\", \"type\": \"object\", \"properties\": {\"name\": {\"description\": \"Name of the product\", \"type\": \"string\"}, \"price\": {\"description\": \"Price of the product\", \"type\": \"number\"}, \"inStock\": {\"description\": \"Availability status of the product\", \"type\": \"boolean\"}, \"tags\": {\"description\": \"List of product tags\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}}}, \"required\": [\"name\", \"price\", \"inStock\", \"tags\"]}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0005818749777972698, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.01428320910781622, + "outcome": "passed" + }, + "call": { + "duration": 15.402638916159049, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event...: \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event...: \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}assistant\\n\\n{\"name\": \"get_event\", \"parameters\": {\"date\": {\"description\": \"Date of the event in ISO format\", \"type\": \"string\", \"value\": \"2025-03-03\"}, \"time\": {\"description\": \"Event Time (HH:MM)\", \"type\": \"string\", \"value\": \"10:00\"}}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0004401251208037138, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.021037542028352618, + "outcome": "passed" + }, + "call": { + "duration": 6.548705333843827, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\"}, \"year\": {\"description\": \"Year\", \"type\": \"integer\"}}}assistant\\n\\n{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\"}, \"year\": {\"description\": \"Year\", \"type\": \"integer\"}}}assistant\\n\\n{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\"}, \"year\": {\"description\": \"Year\", \"type\": \"integer\"}}}assistant\\n\\n{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\", \"value\": 1}, \"year\": {\"description\": \"Year\", \"type\": \"integer\", \"value\": 2025}}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\"}, \"year\": {\"description\": \"Year\", \"type\": \"integer\"}}}assistant\\n\\n{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\"}, \"year\": {\"description\": \"Year\", \"type\": \"integer\"}}}assistant\\n\\n{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\"}, \"year\": {\"description\": \"Year\", \"type\": \"integer\"}}}assistant\\n\\n{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": {\"description\": \"Month of the year (1-12)\", \"type\": \"integer\", \"value\": 1}, \"year\": {\"description\": \"Year\", \"type\": \"integer\", \"value\": 2025}}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00035033305175602436, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.00768870790489018, + "outcome": "passed" + }, + "call": { + "duration": 3.410787041997537, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='To answer the question about the weather in San Francisco, we can directly utilize the provided function `get_weather` as it matches the context of the query.\\n\\nThe function `get_weather` requires a `location` parameter. Given that San Francisco is a city and assuming California (CA) is the state, we can directly fit the query into the provided function format.\\n\\nHere\\'s the response in the required JSON format:\\n\\n```json\\n{\\n \"name\": \"get_weather\",\\n \"parameters\": {\\n \"location\": \"San Francisco, CA\"\\n }\\n}\\n```', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='To answer the question about the weather in San Francisco, we can directly utilize the provided function `get_weather` as it matches the context of the query.\\n\\nThe function `get_weather` requires a `location` parameter. Given that San Francisco is a city and assuming California (CA) is the state, we can directly fit the query into the provided function format.\\n\\nHere\\'s the response in the required JSON format:\\n\\n```json\\n{\\n \"name\": \"get_weather\",\\n \"parameters\": {\\n \"location\": \"San Francisco, CA\"\\n }\\n}\\n```', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0002946250606328249, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.009200166910886765, + "outcome": "passed" + }, + "call": { + "duration": 0.5177558751311153, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"get_weather\", \"parameters\": {\"location\": \"San Francisco, CA\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"get_weather\", \"parameters\": {\"location\": \"San Francisco, CA\"}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00025020912289619446, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.007124624913558364, + "outcome": "passed" + }, + "call": { + "duration": 0.6132153749931604, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"addProduct\", \"parameters\": {\"name\": \"Widget\", \"price\": 19.99, \"inStock\": true, \"tags\": [\"new\", \"sale\"]}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"addProduct\", \"parameters\": {\"name\": \"Widget\", \"price\": 19.99, \"inStock\": true, \"tags\": [\"new\", \"sale\"]}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0003745418507605791, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.01410404103808105, + "outcome": "passed" + }, + "call": { + "duration": 1.3956649999599904, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"get_event\", \"parameters\": {\"date\": \"2025-03-03\", \"time\": \"10:00\"}}<|uniquepaddingtoken557|>---\"\"\"\"\"\"\"\"\"---\" \" \" \"\"\" \" \" \"Interaction\"\"\\n\\nI am unable to execute this task as it exceeds the limitations of the functions I have at hand.\"', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"type\": \"function\", \"name\": \"get_event\", \"parameters\": {\"date\": \"2025-03-03\", \"time\": \"10:00\"}}<|uniquepaddingtoken557|>---\"\"\"\"\"\"\"\"\"---\" \" \" \"\"\" \" \" \"Interaction\"\"\\n\\nI am unable to execute this task as it exceeds the limitations of the functions I have at hand.\"', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00041033304296433926, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.027331124991178513, + "outcome": "passed" + }, + "call": { + "duration": 2.465563999954611, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len((None or []))\n + where None = ChatCompletionMessage(content='{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": 1, \"year\": 2024}}\"\\n\\nThe provided JSON describes a function `getMonthlyExpenseSummary` that takes two parameters: `month` and `year`. The prompt asks for the monthly expense in January of this year. Assuming the current year is 2024, we can fill in the appropriate values for `month` and `year`.\\n\\nThe value for `month` should be `1` (January is the first month), and the value for `year` should be `2024`.\\n\\nTherefore, the appropriate function call with its arguments is:assistant\\n\\nimport datetime\\n\\n# Get the current year\\ncurrent_year = datetime.datetime.now().year\\n\\n# The function call with its arguments\\nprint({\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": 1, \"year\": current_year}})\"{\\\\\"name\\\\\": \\\\\"getMonthlyExpenseSummary\\\\\", \\\\\"parameters\\\\\": {\\\\\"month\\\\\": 1, \\\\\"year\\\\\": 2024}}\"assistant\\n\\nThe final response is: {\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": 1, \"year\": 2024}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len((None or []))\nE + where None = ChatCompletionMessage(content='{\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": 1, \"year\": 2024}}\"\\n\\nThe provided JSON describes a function `getMonthlyExpenseSummary` that takes two parameters: `month` and `year`. The prompt asks for the monthly expense in January of this year. Assuming the current year is 2024, we can fill in the appropriate values for `month` and `year`.\\n\\nThe value for `month` should be `1` (January is the first month), and the value for `year` should be `2024`.\\n\\nTherefore, the appropriate function call with its arguments is:assistant\\n\\nimport datetime\\n\\n# Get the current year\\ncurrent_year = datetime.datetime.now().year\\n\\n# The function call with its arguments\\nprint({\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": 1, \"year\": current_year}})\"{\\\\\"name\\\\\": \\\\\"getMonthlyExpenseSummary\\\\\", \\\\\"parameters\\\\\": {\\\\\"month\\\\\": 1, \\\\\"year\\\\\": 2024}}\"assistant\\n\\nThe final response is: {\"name\": \"getMonthlyExpenseSummary\", \"parameters\": {\"month\": 1, \"year\": 2024}}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0005783340893685818, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.016343542141839862, + "outcome": "passed" + }, + "call": { + "duration": 0.6930254579056054, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 529, + "message": "AssertionError: Expected one of ['sol'] in content, but got: 'I cannot accomplish this task as it requires capabilities beyond those offered by the provided functions.'\nassert False\n + where False = any(. at 0x10738e0a0>)" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 529, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n # Use the first accumulated tool call for assertion\n tool_call = accumulated_tool_calls[0]\n assert tool_call[\"function\"][\"name\"] == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call['function']['name']}'\"\n )\n # Parse the accumulated arguments string for comparison\n actual_arguments = json.loads(tool_call[\"function\"][\"arguments\"])\n assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\n \n # Prepare and append the tool response for the next turn\n tool_response = tool_responses.pop(0)\n messages.append(\n {\n \"role\": \"tool\",\n \"tool_call_id\": tool_call[\"id\"],\n \"content\": tool_response[\"response\"],\n }\n )\n else:\n assert accumulated_content is not None and accumulated_content != \"\", \"Expected content, but none received.\"\n expected_answers = expected[\"answer\"]\n content_lower = accumulated_content.lower()\n> assert any(ans.lower() in content_lower for ans in expected_answers), (\n f\"Expected one of {expected_answers} in content, but got: '{accumulated_content}'\"\n )\nE AssertionError: Expected one of ['sol'] in content, but got: 'I cannot accomplish this task as it requires capabilities beyond those offered by the provided functions.'\nE assert False\nE + where False = any(. at 0x10738e0a0>)\n\ntests/verifications/openai_api/test_chat_completion.py:529: AssertionError" + }, + "teardown": { + "duration": 0.00024741701781749725, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.007791666081175208, + "outcome": "passed" + }, + "call": { + "duration": 0.4420052089262754, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.000628374982625246, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.013015333097428083, + "outcome": "passed" + }, + "call": { + "duration": 0.6754761249758303, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.000581083819270134, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.0128930420614779, + "outcome": "passed" + }, + "call": { + "duration": 0.367436750093475, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.00024812505580484867, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama-v3p3-70b-instruct-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama-v3p3-70b-instruct", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.006677915807813406, + "outcome": "passed" + }, + "call": { + "duration": 0.5142939588986337, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama-v3p3-70b-instruct'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.0002248329110443592, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.008392333984375, + "outcome": "passed" + }, + "call": { + "duration": 9.519045708002523, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.00019570882432162762, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.009688499849289656, + "outcome": "passed" + }, + "call": { + "duration": 0.9869634578935802, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.0002135841641575098, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.007028624881058931, + "outcome": "passed" + }, + "call": { + "duration": 4.688094082986936, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.00026954198256134987, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.006646708119660616, + "outcome": "passed" + }, + "call": { + "duration": 15.899775499943644, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.0004787910729646683, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-scout-instruct-basic-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-scout-instruct-basic", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.016487207962200046, + "outcome": "passed" + }, + "call": { + "duration": 3.922360667027533, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-scout-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.00043979217298328876, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.013401374919340014, + "outcome": "passed" + }, + "call": { + "duration": 2.2223200001753867, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 529, + "message": "AssertionError: Expected one of ['sol'] in content, but got: '{\"name\": \"get_weather\", \"parameters\": {\"location\": \"Rome, Italy\"}} is not the best response here.\n \n Since we don't have a function that directly answers \"What's the name of the Sun in latin?\", a more appropriate response would be to say that there's no function available to answer this question. However, to follow the given format and assuming there's an implicit expectation to still attempt an answer or provide a closest match:\n \n {\"name\": \"get_weather\", \"parameters\": {\"location\": \"Invalid input, no relation to weather\"}} is still not a valid response.\n \n A correct response according to the given constraints isn't feasible. However, to fit the required format and indicating a function that could be related or a default, if there was a \"get_fact\" function:\n \n {\"name\": \"get_fact\", \"parameters\": {\"query\": \"Latin name of the Sun\"}} \n \n But since \"get_fact\" isn't defined in the prompt, and sticking strictly to the given function:\n \n There isn't a proper function to call.\n \n For the sake of compliance, let's assume an unrelated function was to be used due to lack of information.\n \n The best course of action is to indicate that the provided function definitions don't directly support answering the question about the Latin name of the Sun.'\nassert False\n + where False = any(. at 0x1074b9bd0>)" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 529, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n # Use the first accumulated tool call for assertion\n tool_call = accumulated_tool_calls[0]\n assert tool_call[\"function\"][\"name\"] == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call['function']['name']}'\"\n )\n # Parse the accumulated arguments string for comparison\n actual_arguments = json.loads(tool_call[\"function\"][\"arguments\"])\n assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\n \n # Prepare and append the tool response for the next turn\n tool_response = tool_responses.pop(0)\n messages.append(\n {\n \"role\": \"tool\",\n \"tool_call_id\": tool_call[\"id\"],\n \"content\": tool_response[\"response\"],\n }\n )\n else:\n assert accumulated_content is not None and accumulated_content != \"\", \"Expected content, but none received.\"\n expected_answers = expected[\"answer\"]\n content_lower = accumulated_content.lower()\n> assert any(ans.lower() in content_lower for ans in expected_answers), (\n f\"Expected one of {expected_answers} in content, but got: '{accumulated_content}'\"\n )\nE AssertionError: Expected one of ['sol'] in content, but got: '{\"name\": \"get_weather\", \"parameters\": {\"location\": \"Rome, Italy\"}} is not the best response here.\nE \nE Since we don't have a function that directly answers \"What's the name of the Sun in latin?\", a more appropriate response would be to say that there's no function available to answer this question. However, to follow the given format and assuming there's an implicit expectation to still attempt an answer or provide a closest match:\nE \nE {\"name\": \"get_weather\", \"parameters\": {\"location\": \"Invalid input, no relation to weather\"}} is still not a valid response.\nE \nE A correct response according to the given constraints isn't feasible. However, to fit the required format and indicating a function that could be related or a default, if there was a \"get_fact\" function:\nE \nE {\"name\": \"get_fact\", \"parameters\": {\"query\": \"Latin name of the Sun\"}} \nE \nE But since \"get_fact\" isn't defined in the prompt, and sticking strictly to the given function:\nE \nE There isn't a proper function to call.\nE \nE For the sake of compliance, let's assume an unrelated function was to be used due to lack of information.\nE \nE The best course of action is to indicate that the provided function definitions don't directly support answering the question about the Latin name of the Sun.'\nE assert False\nE + where False = any(. at 0x1074b9bd0>)\n\ntests/verifications/openai_api/test_chat_completion.py:529: AssertionError" + }, + "teardown": { + "duration": 0.00047154095955193043, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.01485933386720717, + "outcome": "passed" + }, + "call": { + "duration": 0.6193458330817521, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.000300833024084568, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.012684250017628074, + "outcome": "passed" + }, + "call": { + "duration": 0.5173197500407696, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.00047266692854464054, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.01282945810817182, + "outcome": "passed" + }, + "call": { + "duration": 2.990155333885923, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.00027558300644159317, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "accounts/fireworks/models/llama4-maverick-instruct-basic-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "accounts/fireworks/models/llama4-maverick-instruct-basic", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.008087666006758809, + "outcome": "passed" + }, + "call": { + "duration": 3.6024099169299006, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 1 tool calls, but got 0\nassert 0 == 1\n + where 0 = len(([] or []))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'accounts/fireworks/models/llama4-maverick-instruct-basic'\nprovider = 'fireworks'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 1 tool calls, but got 0\nE assert 0 == 1\nE + where 0 = len(([] or []))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.0010035419836640358, "outcome": "passed" } } ], - "run_timestamp": 1744328684 + "run_timestamp": 1744679046 } diff --git a/tests/verifications/test_results/openai.json b/tests/verifications/test_results/openai.json index 0c1892f7e..32a2a2b82 100644 --- a/tests/verifications/test_results/openai.json +++ b/tests/verifications/test_results/openai.json @@ -1,13 +1,13 @@ { - "created": 1744328898.0248861, - "duration": 47.561042070388794, + "created": 1744679497.440863, + "duration": 102.70424389839172, "exitcode": 0, "root": "/Users/erichuang/projects/llama-stack", "environment": {}, "summary": { - "passed": 24, - "total": 24, - "collected": 24 + "passed": 52, + "total": 52, + "collected": 52 }, "collectors": [ { @@ -27,122 +27,262 @@ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-mini-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[gpt-4o-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[gpt-4o-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[gpt-4o-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[gpt-4o-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[gpt-4o-mini-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 } ] } @@ -150,7 +290,7 @@ "tests": [ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-earth]", @@ -169,21 +309,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.0694252080284059, + "duration": 0.09044458298012614, "outcome": "passed" }, "call": { - "duration": 0.5709165419684723, + "duration": 1.3071064590476453, "outcome": "passed" }, "teardown": { - "duration": 0.0007626248989254236, + "duration": 0.0003990421537309885, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-saturn]", @@ -202,21 +342,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.010281750001013279, + "duration": 0.015266708098351955, "outcome": "passed" }, "call": { - "duration": 0.6309260830748826, + "duration": 1.3942135840188712, "outcome": "passed" }, "teardown": { - "duration": 0.0001824579667299986, + "duration": 0.0006840829737484455, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-mini-earth]", @@ -235,21 +375,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.007922374992631376, + "duration": 0.028802334098145366, "outcome": "passed" }, "call": { - "duration": 0.31756504194345325, + "duration": 0.40633770800195634, "outcome": "passed" }, "teardown": { - "duration": 0.0005268750246614218, + "duration": 0.0006945421919226646, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[gpt-4o-mini-saturn]", @@ -268,21 +408,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.01643404201604426, + "duration": 0.01865937514230609, "outcome": "passed" }, "call": { - "duration": 0.7479908330133185, + "duration": 0.7515070410445333, "outcome": "passed" }, "teardown": { - "duration": 0.0004037501057609916, + "duration": 0.0002985831815749407, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-earth]", @@ -301,21 +441,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.021671707974746823, + "duration": 0.011108374921604991, "outcome": "passed" }, "call": { - "duration": 0.6701172919711098, + "duration": 0.3914629169739783, "outcome": "passed" }, "teardown": { - "duration": 0.0005569590721279383, + "duration": 0.0006979589816182852, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-saturn]", @@ -334,21 +474,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.015847125090658665, + "duration": 0.02875337516888976, "outcome": "passed" }, "call": { - "duration": 0.636536999954842, + "duration": 0.5632798750884831, "outcome": "passed" }, "teardown": { - "duration": 0.00029395800083875656, + "duration": 0.004012458026409149, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-mini-earth]", @@ -367,21 +507,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.011792832985520363, + "duration": 0.0143584581092, "outcome": "passed" }, "call": { - "duration": 0.5610962919890881, + "duration": 0.36101250001229346, "outcome": "passed" }, "teardown": { - "duration": 0.0003578749019652605, + "duration": 0.0005384159740060568, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[gpt-4o-mini-saturn]", @@ -400,21 +540,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.016500207944773138, + "duration": 0.017127499915659428, "outcome": "passed" }, "call": { - "duration": 0.8060244580265135, + "duration": 0.8120857500471175, "outcome": "passed" }, "teardown": { - "duration": 0.0005296670133247972, + "duration": 0.0005928750615566969, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]", - "lineno": 116, + "lineno": 117, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[gpt-4o-case0]", @@ -433,21 +573,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.008338792016729712, + "duration": 0.023183667100965977, "outcome": "passed" }, "call": { - "duration": 7.009252917021513, + "duration": 2.8612758750095963, "outcome": "passed" }, "teardown": { - "duration": 0.0003042910248041153, + "duration": 0.0005042918492108583, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]", - "lineno": 116, + "lineno": 117, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[gpt-4o-mini-case0]", @@ -466,21 +606,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.007238540914840996, + "duration": 0.007410250138491392, "outcome": "passed" }, "call": { - "duration": 3.134693874977529, + "duration": 2.3748936660122126, "outcome": "passed" }, "teardown": { - "duration": 0.0003104590578004718, + "duration": 0.00045658298768103123, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]", - "lineno": 135, + "lineno": 136, "outcome": "passed", "keywords": [ "test_chat_streaming_image[gpt-4o-case0]", @@ -499,21 +639,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.0161851670127362, + "duration": 0.023792708991095424, "outcome": "passed" }, "call": { - "duration": 3.0745719589758664, + "duration": 3.1502402499318123, "outcome": "passed" }, "teardown": { - "duration": 0.00022620800882577896, + "duration": 0.0010152498725801706, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]", - "lineno": 135, + "lineno": 136, "outcome": "passed", "keywords": [ "test_chat_streaming_image[gpt-4o-mini-case0]", @@ -532,21 +672,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.013220708002336323, + "duration": 0.01887162495404482, "outcome": "passed" }, "call": { - "duration": 3.624867417034693, + "duration": 2.070013999938965, "outcome": "passed" }, "teardown": { - "duration": 0.00020633300300687551, + "duration": 0.0005797501653432846, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-calendar]", @@ -565,21 +705,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.017596833989955485, + "duration": 0.017477875109761953, "outcome": "passed" }, "call": { - "duration": 1.248568250099197, + "duration": 0.7350135410670191, "outcome": "passed" }, "teardown": { - "duration": 0.0004248750628903508, + "duration": 0.00046616699546575546, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-math]", @@ -598,21 +738,21 @@ "case_id": "math" }, "setup": { - "duration": 0.01512012502644211, + "duration": 0.033007249934598804, "outcome": "passed" }, "call": { - "duration": 8.170285542029887, + "duration": 5.031138291116804, "outcome": "passed" }, "teardown": { - "duration": 0.00043537491001188755, + "duration": 0.00032295798882842064, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]", @@ -631,21 +771,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.010376665974035859, + "duration": 0.014672457939013839, "outcome": "passed" }, "call": { - "duration": 0.756480542011559, + "duration": 0.7515842081047595, "outcome": "passed" }, "teardown": { - "duration": 0.00025695806834846735, + "duration": 0.00034395791590213776, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[gpt-4o-mini-math]", @@ -664,21 +804,21 @@ "case_id": "math" }, "setup": { - "duration": 0.006846625008620322, + "duration": 0.02985133300535381, "outcome": "passed" }, "call": { - "duration": 2.6833953330060467, + "duration": 2.388004041975364, "outcome": "passed" }, "teardown": { - "duration": 0.00022558309137821198, + "duration": 0.00038116704672574997, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-calendar]", @@ -697,21 +837,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.009646040969528258, + "duration": 0.017887332942336798, "outcome": "passed" }, "call": { - "duration": 0.6117532079806551, + "duration": 1.0018641669303179, "outcome": "passed" }, "teardown": { - "duration": 0.00015258300118148327, + "duration": 0.0005486670415848494, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-math]", @@ -730,21 +870,21 @@ "case_id": "math" }, "setup": { - "duration": 0.012024458032101393, + "duration": 0.0158015841152519, "outcome": "passed" }, "call": { - "duration": 4.522625041077845, + "duration": 7.285852208966389, "outcome": "passed" }, "teardown": { - "duration": 0.0004230838967487216, + "duration": 0.0003417080733925104, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-mini-calendar]", @@ -763,21 +903,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.009566582972183824, + "duration": 0.014434333890676498, "outcome": "passed" }, "call": { - "duration": 2.5591942919418216, + "duration": 0.9268912919797003, "outcome": "passed" }, "teardown": { - "duration": 0.0007555419579148293, + "duration": 0.00046200002543628216, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[gpt-4o-mini-math]", @@ -796,21 +936,21 @@ "case_id": "math" }, "setup": { - "duration": 0.010828875005245209, + "duration": 0.01635808404535055, "outcome": "passed" }, "call": { - "duration": 2.495122667052783, + "duration": 3.7341703751590103, "outcome": "passed" }, "teardown": { - "duration": 0.0002802090020850301, + "duration": 0.0004277920816093683, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]", - "lineno": 204, + "lineno": 205, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[gpt-4o-case0]", @@ -829,21 +969,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.012762792059220374, + "duration": 0.021756208036094904, "outcome": "passed" }, "call": { - "duration": 0.5655921660363674, + "duration": 0.6105514578521252, "outcome": "passed" }, "teardown": { - "duration": 0.00022304197773337364, + "duration": 0.0004747910425066948, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", - "lineno": 204, + "lineno": 205, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]", @@ -862,21 +1002,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.03188708401285112, + "duration": 0.015522167086601257, "outcome": "passed" }, "call": { - "duration": 0.6159415419679135, + "duration": 0.9731334580574185, "outcome": "passed" }, "teardown": { - "duration": 0.0005549580091610551, + "duration": 0.0003415420651435852, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-case0]", - "lineno": 228, + "lineno": 229, "outcome": "passed", "keywords": [ "test_chat_streaming_tool_calling[gpt-4o-case0]", @@ -895,21 +1035,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.014768208027817309, + "duration": 0.014343583025038242, "outcome": "passed" }, "call": { - "duration": 0.47373537498060614, + "duration": 0.5453979168087244, "outcome": "passed" }, "teardown": { - "duration": 0.0005811670562252402, + "duration": 0.0011145840398967266, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[gpt-4o-mini-case0]", - "lineno": 228, + "lineno": 229, "outcome": "passed", "keywords": [ "test_chat_streaming_tool_calling[gpt-4o-mini-case0]", @@ -928,18 +1068,942 @@ "case_id": "case0" }, "setup": { - "duration": 0.010271625011228025, + "duration": 0.017669249791651964, "outcome": "passed" }, "call": { - "duration": 0.5656027499353513, + "duration": 0.6310562079306692, "outcome": "passed" }, "teardown": { - "duration": 0.0025699170073494315, + "duration": 0.0006836249958723783, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[gpt-4o-case0]", + "lineno": 257, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.016614832915365696, + "outcome": "passed" + }, + "call": { + "duration": 0.6914504591841251, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004829999525099993, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[gpt-4o-mini-case0]", + "lineno": 257, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.03217837493866682, + "outcome": "passed" + }, + "call": { + "duration": 0.4917086660861969, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005399580113589764, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[gpt-4o-case0]", + "lineno": 281, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_required[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.01154208299703896, + "outcome": "passed" + }, + "call": { + "duration": 0.5663661658763885, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0008221250027418137, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[gpt-4o-mini-case0]", + "lineno": 281, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_required[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.013238833984360099, + "outcome": "passed" + }, + "call": { + "duration": 0.6098562499973923, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00045654200948774815, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[gpt-4o-case0]", + "lineno": 308, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.014951375080272555, + "outcome": "passed" + }, + "call": { + "duration": 0.5425659997854382, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002112078946083784, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[gpt-4o-mini-case0]", + "lineno": 308, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.010041083907708526, + "outcome": "passed" + }, + "call": { + "duration": 0.7337456250097603, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00042791711166501045, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[gpt-4o-case0]", + "lineno": 331, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_none[gpt-4o-case0]", + "parametrize", + "pytestmark", + "gpt-4o-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "case0" + }, + "setup": { + "duration": 0.007236667210236192, + "outcome": "passed" + }, + "call": { + "duration": 0.4192167909350246, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0010569579899311066, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[gpt-4o-mini-case0]", + "lineno": 331, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_none[gpt-4o-mini-case0]", + "parametrize", + "pytestmark", + "gpt-4o-mini-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "case0" + }, + "setup": { + "duration": 0.01997062494046986, + "outcome": "passed" + }, + "call": { + "duration": 0.6866283339913934, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0010521251242607832, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-text_then_weather_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-text_then_weather_tool]", + "parametrize", + "pytestmark", + "gpt-4o-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.017386124935001135, + "outcome": "passed" + }, + "call": { + "duration": 4.425433791941032, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00043645803816616535, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-weather_tool_then_text]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-weather_tool_then_text]", + "parametrize", + "pytestmark", + "gpt-4o-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.014067957876250148, + "outcome": "passed" + }, + "call": { + "duration": 1.205255625071004, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004651669878512621, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-add_product_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-add_product_tool]", + "parametrize", + "pytestmark", + "gpt-4o-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.016634040977805853, + "outcome": "passed" + }, + "call": { + "duration": 1.4360020828898996, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004704580642282963, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-get_then_create_event_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "gpt-4o-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.015702415956184268, + "outcome": "passed" + }, + "call": { + "duration": 5.882555708056316, + "outcome": "passed" + }, + "teardown": { + "duration": 0.003662874922156334, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "gpt-4o-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.020038041984662414, + "outcome": "passed" + }, + "call": { + "duration": 2.2738899998366833, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004929169081151485, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-text_then_weather_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-text_then_weather_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.007982166949659586, + "outcome": "passed" + }, + "call": { + "duration": 1.7494398748967797, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005488330498337746, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-weather_tool_then_text]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-weather_tool_then_text]", + "parametrize", + "pytestmark", + "gpt-4o-mini-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.007455583196133375, + "outcome": "passed" + }, + "call": { + "duration": 5.338647875003517, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005507499445229769, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-add_product_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-add_product_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.01675066608004272, + "outcome": "passed" + }, + "call": { + "duration": 4.016703582834452, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005397920031100512, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-get_then_create_event_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.009890957968309522, + "outcome": "passed" + }, + "call": { + "duration": 3.9003724998328835, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005802921950817108, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[gpt-4o-mini-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.021778207970783114, + "outcome": "passed" + }, + "call": { + "duration": 2.3824402918107808, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0008852919563651085, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-text_then_weather_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-text_then_weather_tool]", + "parametrize", + "pytestmark", + "gpt-4o-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.021121500059962273, + "outcome": "passed" + }, + "call": { + "duration": 2.362067250069231, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0007184590213000774, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-weather_tool_then_text]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-weather_tool_then_text]", + "parametrize", + "pytestmark", + "gpt-4o-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.01677604205906391, + "outcome": "passed" + }, + "call": { + "duration": 1.4576394581235945, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005367500707507133, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-add_product_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-add_product_tool]", + "parametrize", + "pytestmark", + "gpt-4o-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.010623916983604431, + "outcome": "passed" + }, + "call": { + "duration": 3.295967958169058, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0005429999437183142, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-get_then_create_event_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "gpt-4o-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.014912083046510816, + "outcome": "passed" + }, + "call": { + "duration": 2.7422334579750896, + "outcome": "passed" + }, + "teardown": { + "duration": 0.001017916016280651, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "gpt-4o-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.014568000100553036, + "outcome": "passed" + }, + "call": { + "duration": 2.4006296249572188, + "outcome": "passed" + }, + "teardown": { + "duration": 0.000492083141580224, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-text_then_weather_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-text_then_weather_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.01243741693906486, + "outcome": "passed" + }, + "call": { + "duration": 1.858031083131209, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0012166248634457588, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-weather_tool_then_text]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-weather_tool_then_text]", + "parametrize", + "pytestmark", + "gpt-4o-mini-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.017216125037521124, + "outcome": "passed" + }, + "call": { + "duration": 1.4033057920169085, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00047016702592372894, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-add_product_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-add_product_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.019779917085543275, + "outcome": "passed" + }, + "call": { + "duration": 1.5427470421418548, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0007832080591470003, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-get_then_create_event_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.019053417025133967, + "outcome": "passed" + }, + "call": { + "duration": 4.038398916134611, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00048545910976827145, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[gpt-4o-mini-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "gpt-4o-mini-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "gpt-4o-mini", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.01692862482741475, + "outcome": "passed" + }, + "call": { + "duration": 1.849576957989484, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0032055408228188753, "outcome": "passed" } } ], - "run_timestamp": 1744328848 + "run_timestamp": 1744679391 } diff --git a/tests/verifications/test_results/together.json b/tests/verifications/test_results/together.json index 2b23089e8..44e831936 100644 --- a/tests/verifications/test_results/together.json +++ b/tests/verifications/test_results/together.json @@ -1,15 +1,15 @@ { - "created": 1744328847.853437, - "duration": 49.9419469833374, + "created": 1744679387.346831, + "duration": 90.31976795196533, "exitcode": 1, "root": "/Users/erichuang/projects/llama-stack", "environment": {}, "summary": { - "passed": 22, - "failed": 12, + "passed": 37, + "failed": 39, "skipped": 2, - "total": 36, - "collected": 36 + "total": 78, + "collected": 78 }, "collectors": [ { @@ -29,182 +29,392 @@ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", "type": "Function", - "lineno": 73 + "lineno": 74 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", "type": "Function", - "lineno": 92 + "lineno": 93 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 116 + "lineno": 117 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 135 + "lineno": 136 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", "type": "Function", - "lineno": 159 + "lineno": 160 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", "type": "Function", - "lineno": 182 + "lineno": 183 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 204 + "lineno": 205 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", "type": "Function", - "lineno": 228 + "lineno": 229 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 257 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 281 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 308 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "type": "Function", + "lineno": 331 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 359 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool]", + "type": "Function", + "lineno": 450 + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool]", + "type": "Function", + "lineno": 450 } ] } @@ -212,7 +422,7 @@ "tests": [ { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", @@ -231,21 +441,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.15774220903404057, + "duration": 0.1559112500399351, "outcome": "passed" }, "call": { - "duration": 0.5396400419995189, + "duration": 0.3692209171131253, "outcome": "passed" }, "teardown": { - "duration": 0.0002977499971166253, + "duration": 0.00021362490952014923, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", @@ -264,21 +474,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.015632833004929125, + "duration": 0.007326166843995452, "outcome": "passed" }, "call": { - "duration": 0.4675290420418605, + "duration": 0.49173945817165077, "outcome": "passed" }, "teardown": { - "duration": 0.00029129208996891975, + "duration": 0.00034487503580749035, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", @@ -297,21 +507,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.01530187507160008, + "duration": 0.021014458034187555, "outcome": "passed" }, "call": { - "duration": 0.501894542016089, + "duration": 0.36956487502902746, "outcome": "passed" }, "teardown": { - "duration": 0.0002060839906334877, + "duration": 0.0007119579240679741, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", @@ -330,21 +540,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.014841833035461605, + "duration": 0.011922625126317143, "outcome": "passed" }, "call": { - "duration": 0.4202229160582647, + "duration": 2.7763332079630345, "outcome": "passed" }, "teardown": { - "duration": 0.0005559159908443689, + "duration": 0.0004842919297516346, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", @@ -363,21 +573,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.008204624988138676, + "duration": 0.023896750062704086, "outcome": "passed" }, "call": { - "duration": 1.991508833016269, + "duration": 0.9817597079090774, "outcome": "passed" }, "teardown": { - "duration": 0.000539042055606842, + "duration": 0.0004768748767673969, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", - "lineno": 73, + "lineno": 74, "outcome": "passed", "keywords": [ "test_chat_non_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", @@ -396,21 +606,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.022528667002916336, + "duration": 0.07423937506973743, "outcome": "passed" }, "call": { - "duration": 0.37111237505450845, + "duration": 0.3721332079730928, "outcome": "passed" }, "teardown": { - "duration": 0.0005334159359335899, + "duration": 0.00020033284090459347, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-earth]", @@ -429,21 +639,21 @@ "case_id": "earth" }, "setup": { - "duration": 0.00922920904122293, + "duration": 0.010166750056669116, "outcome": "passed" }, "call": { - "duration": 1.1684916669037193, + "duration": 0.41266337502747774, "outcome": "passed" }, "teardown": { - "duration": 0.0002740409690886736, + "duration": 0.00034358282573521137, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "passed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-3.3-70B-Instruct-Turbo-saturn]", @@ -462,21 +672,21 @@ "case_id": "saturn" }, "setup": { - "duration": 0.010883333045057952, + "duration": 0.016687541967257857, "outcome": "passed" }, "call": { - "duration": 0.4275277080014348, + "duration": 0.7235856249462813, "outcome": "passed" }, "teardown": { - "duration": 0.00043112505227327347, + "duration": 0.00027179205790162086, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", - "lineno": 92, + "lineno": 93, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-earth]", @@ -495,34 +705,34 @@ "case_id": "earth" }, "setup": { - "duration": 0.012945958063937724, + "duration": 0.012556416913866997, "outcome": "passed" }, "call": { - "duration": 0.5551295839250088, + "duration": 0.27039612480439246, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:111: IndexError" }, "teardown": { - "duration": 0.0002744169905781746, + "duration": 0.0002312080468982458, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Scout-17B-16E-Instruct-saturn]", @@ -541,34 +751,34 @@ "case_id": "saturn" }, "setup": { - "duration": 0.017372542060911655, + "duration": 0.006413874914869666, "outcome": "passed" }, "call": { - "duration": 0.3579877089941874, + "duration": 0.36463545891456306, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:111: IndexError" }, "teardown": { - "duration": 0.0005445419810712337, + "duration": 0.00023154192604124546, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", - "lineno": 92, + "lineno": 93, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-earth]", @@ -587,34 +797,34 @@ "case_id": "earth" }, "setup": { - "duration": 0.014297832967713475, + "duration": 0.015633082948625088, "outcome": "passed" }, "call": { - "duration": 0.8067362919682637, + "duration": 0.8896284159272909, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'earth', 'input': {'messages': [{'content': 'Which planet do humans live on?', 'role': 'user'}]}, 'output': 'Earth'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:111: IndexError" }, "teardown": { - "duration": 0.0003220830112695694, + "duration": 0.0006587498355656862, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", - "lineno": 92, + "lineno": 93, "outcome": "failed", "keywords": [ "test_chat_streaming_basic[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-saturn]", @@ -633,34 +843,34 @@ "case_id": "saturn" }, "setup": { - "duration": 0.008816750021651387, + "duration": 0.012669583084061742, "outcome": "passed" }, "call": { - "duration": 0.5383605000097305, + "duration": 0.3499396659899503, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 110, + "lineno": 111, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:110: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'saturn', 'input': {'messages': [{'content': 'Which planet has rings around it with a name starting with letter S?', 'role': 'user'}]}, 'output': 'Saturn'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_basic\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:111: IndexError" }, "teardown": { - "duration": 0.00018316600471735, + "duration": 0.00024912506341934204, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 116, + "lineno": 117, "outcome": "skipped", "keywords": [ "test_chat_non_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -679,22 +889,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.0074389580404385924, + "duration": 0.0153201250359416, "outcome": "passed" }, "call": { - "duration": 0.00014933396596461535, + "duration": 0.0001901669893413782, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 125, 'Skipped: Skipping test_chat_non_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 126, 'Skipped: Skipping test_chat_non_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" }, "teardown": { - "duration": 0.00012462493032217026, + "duration": 0.00012779212556779385, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 116, + "lineno": 117, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -713,21 +923,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.013580625061877072, + "duration": 0.008855124935507774, "outcome": "passed" }, "call": { - "duration": 2.89831429196056, + "duration": 1.37906050006859, "outcome": "passed" }, "teardown": { - "duration": 0.000491458922624588, + "duration": 0.0004904591478407383, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 116, + "lineno": 117, "outcome": "passed", "keywords": [ "test_chat_non_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -746,21 +956,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.008266666904091835, + "duration": 0.017166708130389452, "outcome": "passed" }, "call": { - "duration": 3.8873212080216035, + "duration": 4.003400916932151, "outcome": "passed" }, "teardown": { - "duration": 0.00016850000247359276, + "duration": 0.00042724981904029846, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 135, + "lineno": 136, "outcome": "skipped", "keywords": [ "test_chat_streaming_image[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -779,22 +989,22 @@ "case_id": "case0" }, "setup": { - "duration": 0.0080461660400033, + "duration": 0.007232750067487359, "outcome": "passed" }, "call": { - "duration": 0.00014758307952433825, + "duration": 0.0001449580304324627, "outcome": "skipped", - "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 144, 'Skipped: Skipping test_chat_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" + "longrepr": "('/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py', 145, 'Skipped: Skipping test_chat_streaming_image for model meta-llama/Llama-3.3-70B-Instruct-Turbo on provider together based on config.')" }, "teardown": { - "duration": 0.00012695800978690386, + "duration": 0.0001349160447716713, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 135, + "lineno": 136, "outcome": "failed", "keywords": [ "test_chat_streaming_image[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -813,34 +1023,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.00845700001809746, + "duration": 0.007052165921777487, "outcome": "passed" }, "call": { - "duration": 1.6604419159702957, + "duration": 1.4663615000899881, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 153, + "lineno": 154, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 153, + "lineno": 154, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:153: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:154: IndexError" }, "teardown": { - "duration": 0.00033458403777331114, + "duration": 0.0005696250591427088, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 135, + "lineno": 136, "outcome": "failed", "keywords": [ "test_chat_streaming_image[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -859,34 +1069,34 @@ "case_id": "case0" }, "setup": { - "duration": 0.012580333976075053, + "duration": 0.01214433298446238, "outcome": "passed" }, "call": { - "duration": 4.728511792025529, + "duration": 3.902559082955122, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 153, + "lineno": 154, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 153, + "lineno": 154, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:153: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': [{'text': 'What is in this image?', 'type': 'text'}, {'image_url': {...}, 'type': 'image_url'}], 'role': 'user'}]}, 'output': 'llama'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_image\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n stream=True,\n )\n content = \"\"\n for chunk in response:\n> content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:154: IndexError" }, "teardown": { - "duration": 0.00023266696371138096, + "duration": 0.000591374933719635, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", @@ -905,21 +1115,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.011554082971997559, + "duration": 0.01478054211474955, "outcome": "passed" }, "call": { - "duration": 1.3857994999270886, + "duration": 0.569845792138949, "outcome": "passed" }, "teardown": { - "duration": 0.0003951250109821558, + "duration": 0.00038724998012185097, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", @@ -938,21 +1148,21 @@ "case_id": "math" }, "setup": { - "duration": 0.007673708954825997, + "duration": 0.014717916958034039, "outcome": "passed" }, "call": { - "duration": 3.082161583006382, + "duration": 1.1819656670559198, "outcome": "passed" }, "teardown": { - "duration": 0.0002532500075176358, + "duration": 0.0002410421147942543, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", @@ -971,21 +1181,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.014791041961871088, + "duration": 0.006486707832664251, "outcome": "passed" }, "call": { - "duration": 0.6918012499809265, + "duration": 0.5623017910402268, "outcome": "passed" }, "teardown": { - "duration": 0.00027070799842476845, + "duration": 0.00032504182308912277, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", @@ -1004,21 +1214,21 @@ "case_id": "math" }, "setup": { - "duration": 0.014746625092811882, + "duration": 0.009171125013381243, "outcome": "passed" }, "call": { - "duration": 3.5890139170223847, + "duration": 2.6005691669415683, "outcome": "passed" }, "teardown": { - "duration": 0.00030137505382299423, + "duration": 0.00023995805531740189, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", @@ -1037,21 +1247,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.036798374960199, + "duration": 0.009700333932414651, "outcome": "passed" }, "call": { - "duration": 0.6914895409718156, + "duration": 0.4192442081402987, "outcome": "passed" }, "teardown": { - "duration": 0.00023716699797660112, + "duration": 0.00040241610258817673, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", - "lineno": 159, + "lineno": 160, "outcome": "passed", "keywords": [ "test_chat_non_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", @@ -1070,21 +1280,21 @@ "case_id": "math" }, "setup": { - "duration": 0.05965254199691117, + "duration": 0.006938542006537318, "outcome": "passed" }, "call": { - "duration": 2.609581291093491, + "duration": 2.1736337919719517, "outcome": "passed" }, "teardown": { - "duration": 0.0002674580318853259, + "duration": 0.00019279099069535732, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-calendar]", @@ -1103,21 +1313,21 @@ "case_id": "calendar" }, "setup": { - "duration": 0.014533916022628546, + "duration": 0.008775749942287803, "outcome": "passed" }, "call": { - "duration": 0.6227063750848174, + "duration": 0.5588400410488248, "outcome": "passed" }, "teardown": { - "duration": 0.00019699998665601015, + "duration": 0.00040091690607368946, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", - "lineno": 182, + "lineno": 183, "outcome": "passed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-3.3-70B-Instruct-Turbo-math]", @@ -1136,21 +1346,21 @@ "case_id": "math" }, "setup": { - "duration": 0.009818125050514936, + "duration": 0.01844154205173254, "outcome": "passed" }, "call": { - "duration": 5.144610875053331, + "duration": 2.205772665794939, "outcome": "passed" }, "teardown": { - "duration": 0.00045220903120934963, + "duration": 0.00021091708913445473, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-calendar]", @@ -1169,34 +1379,34 @@ "case_id": "calendar" }, "setup": { - "duration": 0.012392290984280407, + "duration": 0.015595750184729695, "outcome": "passed" }, "call": { - "duration": 0.777625665999949, + "duration": 0.6904467919375747, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:202: IndexError" }, "teardown": { - "duration": 0.000559916952624917, + "duration": 0.0002907498273998499, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", - "lineno": 182, + "lineno": 183, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Scout-17B-16E-Instruct-math]", @@ -1215,34 +1425,34 @@ "case_id": "math" }, "setup": { - "duration": 0.010390624986030161, + "duration": 0.008272957988083363, "outcome": "passed" }, "call": { - "duration": 2.680094916955568, + "duration": 3.499622541014105, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:202: IndexError" }, "teardown": { - "duration": 0.00041987502481788397, + "duration": 0.0005947079043835402, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", - "lineno": 182, + "lineno": 183, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-calendar]", @@ -1261,34 +1471,34 @@ "case_id": "calendar" }, "setup": { - "duration": 0.01190529193263501, + "duration": 0.013340875040739775, "outcome": "passed" }, "call": { - "duration": 0.6690819580107927, + "duration": 0.42789591709151864, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'calendar', 'input': {'messages': [{'content': 'Extract the event information.', 'role': 'system'}, {'cont...articipants'], 'title': 'CalendarEvent', 'type': 'object'}}, 'type': 'json_schema'}}, 'output': 'valid_calendar_event'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:202: IndexError" }, "teardown": { - "duration": 0.000247166957706213, + "duration": 0.0003039578441530466, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", - "lineno": 182, + "lineno": 183, "outcome": "failed", "keywords": [ "test_chat_streaming_structured_output[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-math]", @@ -1307,34 +1517,34 @@ "case_id": "math" }, "setup": { - "duration": 0.009588208980858326, + "duration": 0.01058275019749999, "outcome": "passed" }, "call": { - "duration": 2.4867218340514228, + "duration": 5.795635707909241, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 201, + "lineno": 202, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:201: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'math', 'input': {'messages': [{'content': 'You are a helpful math tutor. Guide the user through the solut... ['steps', 'final_answer'], 'title': 'MathReasoning', ...}}, 'type': 'json_schema'}}, 'output': 'valid_math_reasoning'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_chat_structured_output\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n response_format=case[\"input\"][\"response_format\"],\n stream=True,\n )\n maybe_json_content = \"\"\n for chunk in response:\n> maybe_json_content += chunk.choices[0].delta.content or \"\"\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:202: IndexError" }, "teardown": { - "duration": 0.00022487505339086056, + "duration": 0.0005178749561309814, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 204, + "lineno": 205, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -1353,21 +1563,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.008509417064487934, + "duration": 0.014336749911308289, "outcome": "passed" }, "call": { - "duration": 0.45511841599363834, + "duration": 0.451304541900754, "outcome": "passed" }, "teardown": { - "duration": 0.00031033402774482965, + "duration": 0.0004718329291790724, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 204, + "lineno": 205, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -1386,21 +1596,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.01352791697718203, + "duration": 0.01625004201196134, "outcome": "passed" }, "call": { - "duration": 0.7166531670372933, + "duration": 0.5111537908669561, "outcome": "passed" }, "teardown": { - "duration": 0.00031470798421651125, + "duration": 0.00046774977818131447, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 204, + "lineno": 205, "outcome": "passed", "keywords": [ "test_chat_non_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -1419,21 +1629,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.01369225000962615, + "duration": 0.015832332894206047, "outcome": "passed" }, "call": { - "duration": 0.34134254103992134, + "duration": 0.8238586660008878, "outcome": "passed" }, "teardown": { - "duration": 0.0002922919811680913, + "duration": 0.0006185418460518122, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", - "lineno": 228, + "lineno": 229, "outcome": "passed", "keywords": [ "test_chat_streaming_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", @@ -1452,21 +1662,21 @@ "case_id": "case0" }, "setup": { - "duration": 0.025748749962076545, + "duration": 0.007832166040316224, "outcome": "passed" }, "call": { - "duration": 0.7462511250050738, + "duration": 0.685583250131458, "outcome": "passed" }, "teardown": { - "duration": 0.00030449999030679464, + "duration": 0.0004414590075612068, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", - "lineno": 228, + "lineno": 229, "outcome": "failed", "keywords": [ "test_chat_streaming_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", @@ -1485,34 +1695,39 @@ "case_id": "case0" }, "setup": { - "duration": 0.015131957945413888, + "duration": 0.021764083998277783, "outcome": "passed" }, "call": { - "duration": 0.4556894999695942, + "duration": 0.35617320891469717, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 251, + "lineno": 587, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 251, + "lineno": 247, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:251: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n> _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:247: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" }, "teardown": { - "duration": 0.000539042055606842, + "duration": 0.0005425831768661737, "outcome": "passed" } }, { "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", - "lineno": 228, + "lineno": 229, "outcome": "failed", "keywords": [ "test_chat_streaming_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", @@ -1531,31 +1746,1833 @@ "case_id": "case0" }, "setup": { - "duration": 0.016429082956165075, + "duration": 0.016708041075617075, "outcome": "passed" }, "call": { - "duration": 0.3677835420239717, + "duration": 0.49443637509830296, "outcome": "failed", "crash": { "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", - "lineno": 251, + "lineno": 587, "message": "IndexError: list index out of range" }, "traceback": [ { "path": "tests/verifications/openai_api/test_chat_completion.py", - "lineno": 251, + "lineno": 247, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, "message": "IndexError" } ], - "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...el_display_names': {'gpt-4o': 'gpt-4o', 'gpt-4o-mini': 'gpt-4o-mini'}, 'models': ['gpt-4o', 'gpt-4o-mini'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n # Accumulate partial tool_calls here\n tool_calls_buffer = {}\n current_id = None\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:251: IndexError" + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"],\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n stream=True,\n )\n \n> _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:247: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" }, "teardown": { - "duration": 0.001610000035725534, + "duration": 0.0002642078325152397, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 257, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.009570583933964372, + "outcome": "passed" + }, + "call": { + "duration": 0.5232214999850839, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0006591668352484703, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 257, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.01567283389158547, + "outcome": "passed" + }, + "call": { + "duration": 0.4465816249139607, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0003922500181943178, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_required[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 257, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_tool_choice_required[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.021711332956328988, + "outcome": "passed" + }, + "call": { + "duration": 0.5361095829866827, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0003099590539932251, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 281, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_tool_choice_required[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.009334125090390444, + "outcome": "passed" + }, + "call": { + "duration": 0.5789772500284016, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00037712487392127514, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 281, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_required[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.019614499993622303, + "outcome": "passed" + }, + "call": { + "duration": 0.444399792002514, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 300, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"required\", # Force tool call\n stream=True,\n )\n \n> _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:300: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.0004192921333014965, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_required[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 281, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_required[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.012822834076359868, + "outcome": "passed" + }, + "call": { + "duration": 0.6777042911853641, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 300, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_required(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"required\", # Force tool call\n stream=True,\n )\n \n> _, tool_calls_buffer = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:300: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.0004483328666538, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 308, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.011924332939088345, + "outcome": "passed" + }, + "call": { + "duration": 0.4756374170538038, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 328, + "message": "AssertionError: Expected no tool calls when tool_choice='none'\nassert [ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)] is None\n + where [ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)]).tool_calls\n + where ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)]) = Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)]), seed=13421903014786785000).message" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 328, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"none\",\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert response.choices[0].message.tool_calls is None, \"Expected no tool calls when tool_choice='none'\"\nE AssertionError: Expected no tool calls when tool_choice='none'\nE assert [ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)] is None\nE + where [ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)]).tool_calls\nE + where ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)]) = Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nfd8oz9wmhlwf4mqglcaokrs', function=Function(arguments='{\"location\":\"San Francisco, USA\"}', name='get_weather'), type='function', index=0)]), seed=13421903014786785000).message\n\ntests/verifications/openai_api/test_chat_completion.py:328: AssertionError" + }, + "teardown": { + "duration": 0.0004585420247167349, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 308, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.013246082933619618, + "outcome": "passed" + }, + "call": { + "duration": 0.5618870409671217, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 328, + "message": "AssertionError: Expected no tool calls when tool_choice='none'\nassert [ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)] is None\n + where [ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)]).tool_calls\n + where ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)]) = Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)]), seed=None).message" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 328, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"none\",\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert response.choices[0].message.tool_calls is None, \"Expected no tool calls when tool_choice='none'\"\nE AssertionError: Expected no tool calls when tool_choice='none'\nE assert [ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)] is None\nE + where [ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)]).tool_calls\nE + where ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)]) = Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_h5u55eksczab7xtg8oot43k5', function=Function(arguments='{\"location\":\"San Francisco, United States\"}', name='get_weather'), type='function', index=0)]), seed=None).message\n\ntests/verifications/openai_api/test_chat_completion.py:328: AssertionError" + }, + "teardown": { + "duration": 0.00025883293710649014, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_choice_none[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 308, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_tool_choice_none[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.008055417099967599, + "outcome": "passed" + }, + "call": { + "duration": 0.32869591703638434, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 328, + "message": "AssertionError: Expected no tool calls when tool_choice='none'\nassert [ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)] is None\n + where [ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)]).tool_calls\n + where ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)]) = Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)]), seed=None).message" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 328, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_non_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n response = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"none\",\n stream=False,\n )\n \n assert response.choices[0].message.role == \"assistant\"\n> assert response.choices[0].message.tool_calls is None, \"Expected no tool calls when tool_choice='none'\"\nE AssertionError: Expected no tool calls when tool_choice='none'\nE assert [ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)] is None\nE + where [ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)]).tool_calls\nE + where ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)]) = Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_s1bz9v57b8uizqy2i81869pb', function=Function(arguments='{\"location\":\"San Francisco\"}', name='get_weather'), type='function', index=0)]), seed=None).message\n\ntests/verifications/openai_api/test_chat_completion.py:328: AssertionError" + }, + "teardown": { + "duration": 0.0003937501460313797, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "lineno": 331, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_none[meta-llama/Llama-3.3-70B-Instruct-Turbo-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "case0" + }, + "setup": { + "duration": 0.013460749993100762, + "outcome": "passed" + }, + "call": { + "duration": 0.35879983310587704, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 355, + "message": "AssertionError: Expected no tool call chunks when tool_choice='none'\nassert not [ChoiceDeltaToolCall(index=0, id='call_q472clmnii99ps1fxqtv8qvr', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]\n + where [ChoiceDeltaToolCall(index=0, id='call_q472clmnii99ps1fxqtv8qvr', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')] = ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_q472clmnii99ps1fxqtv8qvr', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 355, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"none\",\n stream=True,\n )\n \n content = \"\"\n for chunk in stream:\n delta = chunk.choices[0].delta\n if delta.content:\n content += delta.content\n> assert not delta.tool_calls, \"Expected no tool call chunks when tool_choice='none'\"\nE AssertionError: Expected no tool call chunks when tool_choice='none'\nE assert not [ChoiceDeltaToolCall(index=0, id='call_q472clmnii99ps1fxqtv8qvr', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]\nE + where [ChoiceDeltaToolCall(index=0, id='call_q472clmnii99ps1fxqtv8qvr', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')] = ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_q472clmnii99ps1fxqtv8qvr', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:355: AssertionError" + }, + "teardown": { + "duration": 0.0002649170346558094, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "lineno": 331, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_none[meta-llama/Llama-4-Scout-17B-16E-Instruct-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "case0" + }, + "setup": { + "duration": 0.0068365419283509254, + "outcome": "passed" + }, + "call": { + "duration": 0.5351063329726458, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 355, + "message": "AssertionError: Expected no tool call chunks when tool_choice='none'\nassert not [ChoiceDeltaToolCall(index=0, id='call_l3roc57o2pn9b70f0dcgil53', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]\n + where [ChoiceDeltaToolCall(index=0, id='call_l3roc57o2pn9b70f0dcgil53', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')] = ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_l3roc57o2pn9b70f0dcgil53', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 355, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"none\",\n stream=True,\n )\n \n content = \"\"\n for chunk in stream:\n delta = chunk.choices[0].delta\n if delta.content:\n content += delta.content\n> assert not delta.tool_calls, \"Expected no tool call chunks when tool_choice='none'\"\nE AssertionError: Expected no tool call chunks when tool_choice='none'\nE assert not [ChoiceDeltaToolCall(index=0, id='call_l3roc57o2pn9b70f0dcgil53', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]\nE + where [ChoiceDeltaToolCall(index=0, id='call_l3roc57o2pn9b70f0dcgil53', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')] = ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_l3roc57o2pn9b70f0dcgil53', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:355: AssertionError" + }, + "teardown": { + "duration": 0.0004712918307632208, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_tool_choice_none[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "lineno": 331, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_tool_choice_none[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-case0", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "case0" + }, + "setup": { + "duration": 0.014073874801397324, + "outcome": "passed" + }, + "call": { + "duration": 0.6729549579322338, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 355, + "message": "AssertionError: Expected no tool call chunks when tool_choice='none'\nassert not [ChoiceDeltaToolCall(index=0, id='call_ktw831i0p838mzvnnaylf6fp', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]\n + where [ChoiceDeltaToolCall(index=0, id='call_ktw831i0p838mzvnnaylf6fp', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')] = ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_ktw831i0p838mzvnnaylf6fp', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 355, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'input': {'messages': [{'content': 'You are a helpful assistant that can use tools to get information.', 'role': 'sys..., 'properties': {...}, 'required': [...], 'type': 'object'}}, 'type': 'function'}]}, 'output': 'get_weather_tool_call'}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases[\"test_tool_calling\"][\"test_params\"][\"case\"], # Reusing existing case for now\n ids=case_id_generator,\n )\n def test_chat_streaming_tool_choice_none(request, openai_client, model, provider, verification_config, case):\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n stream = openai_client.chat.completions.create(\n model=model,\n messages=case[\"input\"][\"messages\"],\n tools=case[\"input\"][\"tools\"],\n tool_choice=\"none\",\n stream=True,\n )\n \n content = \"\"\n for chunk in stream:\n delta = chunk.choices[0].delta\n if delta.content:\n content += delta.content\n> assert not delta.tool_calls, \"Expected no tool call chunks when tool_choice='none'\"\nE AssertionError: Expected no tool call chunks when tool_choice='none'\nE assert not [ChoiceDeltaToolCall(index=0, id='call_ktw831i0p838mzvnnaylf6fp', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]\nE + where [ChoiceDeltaToolCall(index=0, id='call_ktw831i0p838mzvnnaylf6fp', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')] = ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=[ChoiceDeltaToolCall(index=0, id='call_ktw831i0p838mzvnnaylf6fp', function=ChoiceDeltaToolCallFunction(arguments='', name='get_weather'), type='function')]).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:355: AssertionError" + }, + "teardown": { + "duration": 0.000251916004344821, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.009340125136077404, + "outcome": "passed" + }, + "call": { + "duration": 0.3328715830575675, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 0 tool calls, but got 1\nassert 1 == 0\n + where 1 = len(([ChatCompletionMessageToolCall(id='call_3rr948zuvun0533y4oyyep0z', function=Function(arguments='{\"location\":\"San Francisco, CA\"}', name='get_weather'), type='function', index=0)]))\n + where [ChatCompletionMessageToolCall(id='call_3rr948zuvun0533y4oyyep0z', function=Function(arguments='{\"location\":\"San Francisco, CA\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_3rr948zuvun0533y4oyyep0z', function=Function(arguments='{\"location\":\"San Francisco, CA\"}', name='get_weather'), type='function', index=0)]).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 0 tool calls, but got 1\nE assert 1 == 0\nE + where 1 = len(([ChatCompletionMessageToolCall(id='call_3rr948zuvun0533y4oyyep0z', function=Function(arguments='{\"location\":\"San Francisco, CA\"}', name='get_weather'), type='function', index=0)]))\nE + where [ChatCompletionMessageToolCall(id='call_3rr948zuvun0533y4oyyep0z', function=Function(arguments='{\"location\":\"San Francisco, CA\"}', name='get_weather'), type='function', index=0)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_3rr948zuvun0533y4oyyep0z', function=Function(arguments='{\"location\":\"San Francisco, CA\"}', name='get_weather'), type='function', index=0)]).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.00042020808905363083, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.01490145898424089, + "outcome": "passed" + }, + "call": { + "duration": 0.8346118750050664, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00034404080361127853, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.014493625145405531, + "outcome": "passed" + }, + "call": { + "duration": 0.8973606249783188, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00021345820277929306, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.009358166949823499, + "outcome": "passed" + }, + "call": { + "duration": 4.5295154170598835, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0002461671829223633, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.009552374947816133, + "outcome": "passed" + }, + "call": { + "duration": 0.34176899981684983, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 429, + "message": "AssertionError: Expected arguments '{'month': 1, 'year': 2025}', got '{'month': '1', 'year': '2025'}'\nassert {'month': '1', 'year': '2025'} == {'month': 1, 'year': 2025}\n \n Differing items:\n {'month': '1'} != {'month': 1}\n {'year': '2025'} != {'year': 2025}\n \n Full diff:\n {...\n \n ...Full output truncated (7 lines hidden), use '-vv' to show" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 429, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n tool_call = assistant_message.tool_calls[0]\n assert tool_call.function.name == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call.function.name}'\"\n )\n # Parse the JSON string arguments before comparing\n actual_arguments = json.loads(tool_call.function.arguments)\n> assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\nE AssertionError: Expected arguments '{'month': 1, 'year': 2025}', got '{'month': '1', 'year': '2025'}'\nE assert {'month': '1', 'year': '2025'} == {'month': 1, 'year': 2025}\nE \nE Differing items:\nE {'month': '1'} != {'month': 1}\nE {'year': '2025'} != {'year': 2025}\nE \nE Full diff:\nE {...\nE \nE ...Full output truncated (7 lines hidden), use '-vv' to show\n\ntests/verifications/openai_api/test_chat_completion.py:429: AssertionError" + }, + "teardown": { + "duration": 0.000527665950357914, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.012501416960731149, + "outcome": "passed" + }, + "call": { + "duration": 1.585734374821186, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError: Expected 0 tool calls, but got 2\nassert 2 == 0\n + where 2 = len(([ChatCompletionMessageToolCall(id='call_4fm3kj059swz9no94n6fg54d', function=Function(arguments='{\"location\":\"Sun, NA\"}', name='get_weather'), type='function', index=0), ChatCompletionMessageToolCall(id='call_lzc5lo7y2p7wjyquvmvvzt64', function=Function(arguments='{\"name\":\"Sun\"}', name='get_latin_name'), type='function', index=1)]))\n + where [ChatCompletionMessageToolCall(id='call_4fm3kj059swz9no94n6fg54d', function=Function(arguments='{\"location\":\"Sun, NA\"}', name='get_weather'), type='function', index=0), ChatCompletionMessageToolCall(id='call_lzc5lo7y2p7wjyquvmvvzt64', function=Function(arguments='{\"name\":\"Sun\"}', name='get_latin_name'), type='function', index=1)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_4fm3kj059swz9no94n6fg54d', function=Function(arguments='{\"location\":\"Sun, NA\"}', name='get_weather'), type='function', index=0), ChatCompletionMessageToolCall(id='call_lzc5lo7y2p7wjyquvmvvzt64', function=Function(arguments='{\"name\":\"Sun\"}', name='get_latin_name'), type='function', index=1)]).tool_calls" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 418, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n> assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\nE AssertionError: Expected 0 tool calls, but got 2\nE assert 2 == 0\nE + where 2 = len(([ChatCompletionMessageToolCall(id='call_4fm3kj059swz9no94n6fg54d', function=Function(arguments='{\"location\":\"Sun, NA\"}', name='get_weather'), type='function', index=0), ChatCompletionMessageToolCall(id='call_lzc5lo7y2p7wjyquvmvvzt64', function=Function(arguments='{\"name\":\"Sun\"}', name='get_latin_name'), type='function', index=1)]))\nE + where [ChatCompletionMessageToolCall(id='call_4fm3kj059swz9no94n6fg54d', function=Function(arguments='{\"location\":\"Sun, NA\"}', name='get_weather'), type='function', index=0), ChatCompletionMessageToolCall(id='call_lzc5lo7y2p7wjyquvmvvzt64', function=Function(arguments='{\"name\":\"Sun\"}', name='get_latin_name'), type='function', index=1)] = ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_4fm3kj059swz9no94n6fg54d', function=Function(arguments='{\"location\":\"Sun, NA\"}', name='get_weather'), type='function', index=0), ChatCompletionMessageToolCall(id='call_lzc5lo7y2p7wjyquvmvvzt64', function=Function(arguments='{\"name\":\"Sun\"}', name='get_latin_name'), type='function', index=1)]).tool_calls\n\ntests/verifications/openai_api/test_chat_completion.py:418: AssertionError" + }, + "teardown": { + "duration": 0.0003941669128835201, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.014057958032935858, + "outcome": "passed" + }, + "call": { + "duration": 0.7121559998486191, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00048266700468957424, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.02072141715325415, + "outcome": "passed" + }, + "call": { + "duration": 1.0424797078594565, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004878339823335409, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.018570583080872893, + "outcome": "passed" + }, + "call": { + "duration": 3.4340267919469625, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00023016706109046936, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.009570334106683731, + "outcome": "passed" + }, + "call": { + "duration": 2.2068665840197355, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00051837507635355, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.01873366697691381, + "outcome": "passed" + }, + "call": { + "duration": 0.5193468749057502, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 446, + "message": "AssertionError: Expected one of ['sol'] in content, but got: '{\"name\": null, \"parameters\": null}'\nassert False\n + where False = any(. at 0x10e4c0f90>)" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 446, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n tool_call = assistant_message.tool_calls[0]\n assert tool_call.function.name == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call.function.name}'\"\n )\n # Parse the JSON string arguments before comparing\n actual_arguments = json.loads(tool_call.function.arguments)\n assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\n \n # Prepare and append the tool response for the next turn\n tool_response = tool_responses.pop(0)\n messages.append(\n {\n \"role\": \"tool\",\n \"tool_call_id\": tool_call.id,\n \"content\": tool_response[\"response\"],\n }\n )\n else:\n assert assistant_message.content is not None, \"Expected content, but none received.\"\n expected_answers = expected[\"answer\"] # This is now a list\n content_lower = assistant_message.content.lower()\n> assert any(ans.lower() in content_lower for ans in expected_answers), (\n f\"Expected one of {expected_answers} in content, but got: '{assistant_message.content}'\"\n )\nE AssertionError: Expected one of ['sol'] in content, but got: '{\"name\": null, \"parameters\": null}'\nE assert False\nE + where False = any(. at 0x10e4c0f90>)\n\ntests/verifications/openai_api/test_chat_completion.py:446: AssertionError" + }, + "teardown": { + "duration": 0.0004933748859912157, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.014272749889642, + "outcome": "passed" + }, + "call": { + "duration": 1.911199334077537, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00043049990199506283, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.031040542060509324, + "outcome": "passed" + }, + "call": { + "duration": 3.0026419160421938, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00045104208402335644, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool]", + "lineno": 359, + "outcome": "failed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.016529500018805265, + "outcome": "passed" + }, + "call": { + "duration": 2.7563346249517053, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 429, + "message": "AssertionError: Expected arguments '{'name': 'Team Building', 'date': '2025-03-03', 'time': '10:00', 'location': 'Main Conference Room', 'participants': ['Alice', 'Bob', 'Charlie']}', got '{'participants': '[\"Alice\", \"Bob\", \"Charlie\"]', 'location': 'Main Conference Room', 'name': 'Team Building', 'date': '2025-03-03', 'time': '10:00'}'\nassert {'date': '202...arlie\"]', ...} == {'date': '202...harlie'], ...}\n \n Omitting 4 identical items, use -vv to show\n Differing items:\n {'participants': '[\"Alice\", \"Bob\", \"Charlie\"]'} != {'participants': ['Alice', 'Bob', 'Charlie']}\n \n Full diff:\n {...\n \n ...Full output truncated (11 lines hidden), use '-vv' to show" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 429, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_non_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\"\n Test cases for multi-turn tool calling.\n Tool calls are asserted.\n Tool responses are provided in the test case.\n Final response is asserted.\n \"\"\"\n \n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n # Create a copy of the messages list to avoid modifying the original\n messages = []\n tools = case[\"input\"][\"tools\"]\n # Use deepcopy to prevent modification across runs/parametrization\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n # keep going until either\n # 1. we have messages to test in multi-turn\n # 2. no messages but last message is tool response\n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n # do not take new messages if last message is tool response\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n # Ensure new_messages is a list of message objects\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n # If it's a single message object, add it directly\n messages.append(new_messages)\n \n # --- API Call ---\n response = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=False,\n )\n \n # --- Process Response ---\n assistant_message = response.choices[0].message\n messages.append(assistant_message.model_dump(exclude_unset=True))\n \n assert assistant_message.role == \"assistant\"\n \n # Get the expected result data\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n # --- Assertions based on expected result ---\n assert len(assistant_message.tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(assistant_message.tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n tool_call = assistant_message.tool_calls[0]\n assert tool_call.function.name == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call.function.name}'\"\n )\n # Parse the JSON string arguments before comparing\n actual_arguments = json.loads(tool_call.function.arguments)\n> assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\nE AssertionError: Expected arguments '{'name': 'Team Building', 'date': '2025-03-03', 'time': '10:00', 'location': 'Main Conference Room', 'participants': ['Alice', 'Bob', 'Charlie']}', got '{'participants': '[\"Alice\", \"Bob\", \"Charlie\"]', 'location': 'Main Conference Room', 'name': 'Team Building', 'date': '2025-03-03', 'time': '10:00'}'\nE assert {'date': '202...arlie\"]', ...} == {'date': '202...harlie'], ...}\nE \nE Omitting 4 identical items, use -vv to show\nE Differing items:\nE {'participants': '[\"Alice\", \"Bob\", \"Charlie\"]'} != {'participants': ['Alice', 'Bob', 'Charlie']}\nE \nE Full diff:\nE {...\nE \nE ...Full output truncated (11 lines hidden), use '-vv' to show\n\ntests/verifications/openai_api/test_chat_completion.py:429: AssertionError" + }, + "teardown": { + "duration": 0.0005542081780731678, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool]", + "lineno": 359, + "outcome": "passed", + "keywords": [ + "test_chat_non_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.013607957866042852, + "outcome": "passed" + }, + "call": { + "duration": 3.0105869588442147, + "outcome": "passed" + }, + "teardown": { + "duration": 0.0004793750122189522, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.01806124998256564, + "outcome": "passed" + }, + "call": { + "duration": 0.3295827910769731, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError: Expected 0 tool calls, but got 1\nassert 1 == 0\n + where 1 = len(([{'function': {'arguments': '{\"location\":\"San Francisco, CA\"}', 'name': 'get_weather'}, 'id': 'call_l066e8oey2i8exeodczlv1mh', 'type': 'function'}]))" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 500, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n> assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\nE AssertionError: Expected 0 tool calls, but got 1\nE assert 1 == 0\nE + where 1 = len(([{'function': {'arguments': '{\"location\":\"San Francisco, CA\"}', 'name': 'get_weather'}, 'id': 'call_l066e8oey2i8exeodczlv1mh', 'type': 'function'}]))\n\ntests/verifications/openai_api/test_chat_completion.py:500: AssertionError" + }, + "teardown": { + "duration": 0.0002942080609500408, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.007637625094503164, + "outcome": "passed" + }, + "call": { + "duration": 2.021851292112842, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 526, + "message": "AssertionError: Expected content, but none received.\nassert ('' is not None and '' != '')" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 526, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n # Use the first accumulated tool call for assertion\n tool_call = accumulated_tool_calls[0]\n assert tool_call[\"function\"][\"name\"] == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call['function']['name']}'\"\n )\n # Parse the accumulated arguments string for comparison\n actual_arguments = json.loads(tool_call[\"function\"][\"arguments\"])\n assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\n \n # Prepare and append the tool response for the next turn\n tool_response = tool_responses.pop(0)\n messages.append(\n {\n \"role\": \"tool\",\n \"tool_call_id\": tool_call[\"id\"],\n \"content\": tool_response[\"response\"],\n }\n )\n else:\n> assert accumulated_content is not None and accumulated_content != \"\", \"Expected content, but none received.\"\nE AssertionError: Expected content, but none received.\nE assert ('' is not None and '' != '')\n\ntests/verifications/openai_api/test_chat_completion.py:526: AssertionError" + }, + "teardown": { + "duration": 0.00036791712045669556, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool]", + "lineno": 450, + "outcome": "passed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.013031583046540618, + "outcome": "passed" + }, + "call": { + "duration": 0.8596610419917852, + "outcome": "passed" + }, + "teardown": { + "duration": 0.00042829103767871857, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.015244666952639818, + "outcome": "passed" + }, + "call": { + "duration": 1.0227877080906183, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 526, + "message": "AssertionError: Expected content, but none received.\nassert ('' is not None and '' != '')" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 526, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n # Use the first accumulated tool call for assertion\n tool_call = accumulated_tool_calls[0]\n assert tool_call[\"function\"][\"name\"] == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call['function']['name']}'\"\n )\n # Parse the accumulated arguments string for comparison\n actual_arguments = json.loads(tool_call[\"function\"][\"arguments\"])\n assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\n \n # Prepare and append the tool response for the next turn\n tool_response = tool_responses.pop(0)\n messages.append(\n {\n \"role\": \"tool\",\n \"tool_call_id\": tool_call[\"id\"],\n \"content\": tool_response[\"response\"],\n }\n )\n else:\n> assert accumulated_content is not None and accumulated_content != \"\", \"Expected content, but none received.\"\nE AssertionError: Expected content, but none received.\nE assert ('' is not None and '' != '')\n\ntests/verifications/openai_api/test_chat_completion.py:526: AssertionError" + }, + "teardown": { + "duration": 0.00024933391250669956, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-3.3-70B-Instruct-Turbo-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.008626125054433942, + "outcome": "passed" + }, + "call": { + "duration": 0.3212552920449525, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 512, + "message": "AssertionError: Expected arguments '{'month': 1, 'year': 2025}', got '{'month': '1', 'year': '2025'}'\nassert {'month': '1', 'year': '2025'} == {'month': 1, 'year': 2025}\n \n Differing items:\n {'month': '1'} != {'month': 1}\n {'year': '2025'} != {'year': 2025}\n \n Full diff:\n {...\n \n ...Full output truncated (7 lines hidden), use '-vv' to show" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 512, + "message": "AssertionError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-3.3-70B-Instruct-Turbo', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n \n # --- Construct Assistant Message for History ---\n assistant_message_dict = {\"role\": \"assistant\"}\n if accumulated_content:\n assistant_message_dict[\"content\"] = accumulated_content\n if accumulated_tool_calls:\n assistant_message_dict[\"tool_calls\"] = accumulated_tool_calls\n \n messages.append(assistant_message_dict)\n \n # --- Assertions ---\n expected = expected_results.pop(0)\n num_tool_calls = expected[\"num_tool_calls\"]\n \n assert len(accumulated_tool_calls or []) == num_tool_calls, (\n f\"Expected {num_tool_calls} tool calls, but got {len(accumulated_tool_calls or [])}\"\n )\n \n if num_tool_calls > 0:\n # Use the first accumulated tool call for assertion\n tool_call = accumulated_tool_calls[0]\n assert tool_call[\"function\"][\"name\"] == expected[\"tool_name\"], (\n f\"Expected tool '{expected['tool_name']}', got '{tool_call['function']['name']}'\"\n )\n # Parse the accumulated arguments string for comparison\n actual_arguments = json.loads(tool_call[\"function\"][\"arguments\"])\n> assert actual_arguments == expected[\"tool_arguments\"], (\n f\"Expected arguments '{expected['tool_arguments']}', got '{actual_arguments}'\"\n )\nE AssertionError: Expected arguments '{'month': 1, 'year': 2025}', got '{'month': '1', 'year': '2025'}'\nE assert {'month': '1', 'year': '2025'} == {'month': 1, 'year': 2025}\nE \nE Differing items:\nE {'month': '1'} != {'month': 1}\nE {'year': '2025'} != {'year': 2025}\nE \nE Full diff:\nE {...\nE \nE ...Full output truncated (7 lines hidden), use '-vv' to show\n\ntests/verifications/openai_api/test_chat_completion.py:512: AssertionError" + }, + "teardown": { + "duration": 0.00020562508143484592, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.007338125025853515, + "outcome": "passed" + }, + "call": { + "duration": 0.4175920831039548, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.00023462506942451, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.007788832997903228, + "outcome": "passed" + }, + "call": { + "duration": 0.45610866602510214, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.00021450011990964413, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.006751166889443994, + "outcome": "passed" + }, + "call": { + "duration": 0.7053082089405507, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.00021783309057354927, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.008729791967198253, + "outcome": "passed" + }, + "call": { + "duration": 0.5665898330044001, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.0002288338728249073, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Scout-17B-16E-Instruct-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.009526000125333667, + "outcome": "passed" + }, + "call": { + "duration": 1.1714977910742164, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Scout-17B-16E-Instruct', provider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.00032483390532433987, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-text_then_weather_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "text_then_weather_tool" + }, + "setup": { + "duration": 0.010107750073075294, + "outcome": "passed" + }, + "call": { + "duration": 0.26202141703106463, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'text_then_weather_tool', 'expected': [{'answer': ['sol'], 'num_tool_calls': 0}, {'num_tool_calls': 1, 'to...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.00022558285854756832, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-weather_tool_then_text", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "weather_tool_then_text" + }, + "setup": { + "duration": 0.008256082888692617, + "outcome": "passed" + }, + "call": { + "duration": 0.3466235001105815, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'weather_tool_then_text', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'location': 'San Francisco...], 'type': 'object'}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': '70 degrees and foggy'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.000535458093509078, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-add_product_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "add_product_tool" + }, + "setup": { + "duration": 0.0180504999589175, + "outcome": "passed" + }, + "call": { + "duration": 1.8803812500555068, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'add_product_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'inStock': True, 'name': 'Widget...}}, 'type': 'function'}]}, 'tool_responses': [{'response': \"{'response': 'Successfully added product with id: 123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.00025062495842576027, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-get_then_create_event_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "get_then_create_event_tool" + }, + "setup": { + "duration": 0.00993091706186533, + "outcome": "passed" + }, + "call": { + "duration": 0.5258524999953806, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'get_then_create_event_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'date': '2025-03-03', ...ents found for 2025-03-03 at 10:00'}\"}, {'response': \"{'response': 'Successfully created new event with id: e_123'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.0002823749091476202, + "outcome": "passed" + } + }, + { + "nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool]", + "lineno": 450, + "outcome": "failed", + "keywords": [ + "test_chat_streaming_multi_turn_tool_calling[meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool]", + "parametrize", + "pytestmark", + "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8-compare_monthly_expense_tool", + "test_chat_completion.py", + "openai_api", + "verifications", + "tests", + "llama-stack", + "" + ], + "metadata": { + "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", + "case_id": "compare_monthly_expense_tool" + }, + "setup": { + "duration": 0.047535917023196816, + "outcome": "passed" + }, + "call": { + "duration": 0.4426498331595212, + "outcome": "failed", + "crash": { + "path": "/Users/erichuang/projects/llama-stack/tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError: list index out of range" + }, + "traceback": [ + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 485, + "message": "" + }, + { + "path": "tests/verifications/openai_api/test_chat_completion.py", + "lineno": 587, + "message": "IndexError" + } + ], + "longrepr": "request = >\nopenai_client = \nmodel = 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8'\nprovider = 'together'\nverification_config = {'providers': {'cerebras': {'api_key_var': 'CEREBRAS_API_KEY', 'base_url': 'https://api.cerebras.ai/v1', 'model_displa...-versatile', 'meta-llama/llama-4-scout-17b-16e-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct'], ...}, ...}}\ncase = {'case_id': 'compare_monthly_expense_tool', 'expected': [{'num_tool_calls': 1, 'tool_arguments': {'month': 1, 'year': ... 'Total expenses for January 2025: $1000'}\"}, {'response': \"{'response': 'Total expenses for February 2024: $2000'}\"}]}\n\n @pytest.mark.parametrize(\n \"case\",\n chat_completion_test_cases.get(\"test_chat_multi_turn_tool_calling\", {}).get(\"test_params\", {}).get(\"case\", []),\n ids=case_id_generator,\n )\n def test_chat_streaming_multi_turn_tool_calling(request, openai_client, model, provider, verification_config, case):\n \"\"\" \"\"\"\n test_name_base = get_base_test_name(request)\n if should_skip_test(verification_config, provider, model, test_name_base):\n pytest.skip(f\"Skipping {test_name_base} for model {model} on provider {provider} based on config.\")\n \n messages = []\n tools = case[\"input\"][\"tools\"]\n expected_results = copy.deepcopy(case[\"expected\"])\n tool_responses = copy.deepcopy(case.get(\"tool_responses\", []))\n input_messages_turns = copy.deepcopy(case[\"input\"][\"messages\"])\n \n while len(input_messages_turns) > 0 or (len(messages) > 0 and messages[-1][\"role\"] == \"tool\"):\n if len(messages) == 0 or messages[-1][\"role\"] != \"tool\":\n new_messages = input_messages_turns.pop(0)\n if isinstance(new_messages, list):\n messages.extend(new_messages)\n else:\n messages.append(new_messages)\n \n # --- API Call (Streaming) ---\n stream = openai_client.chat.completions.create(\n model=model,\n messages=messages,\n tools=tools,\n stream=True,\n )\n \n # --- Process Stream ---\n> accumulated_content, accumulated_tool_calls = _accumulate_streaming_tool_calls(stream)\n\ntests/verifications/openai_api/test_chat_completion.py:485: \n_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ \n\nstream = \n\n def _accumulate_streaming_tool_calls(stream):\n \"\"\"Accumulates tool calls and content from a streaming ChatCompletion response.\"\"\"\n tool_calls_buffer = {}\n current_id = None\n full_content = \"\" # Initialize content accumulator\n # Process streaming chunks\n for chunk in stream:\n> choice = chunk.choices[0]\nE IndexError: list index out of range\n\ntests/verifications/openai_api/test_chat_completion.py:587: IndexError" + }, + "teardown": { + "duration": 0.0010368749499320984, "outcome": "passed" } } ], - "run_timestamp": 1744328795 + "run_timestamp": 1744679294 }