forked from phoenix-oss/llama-stack-mirror
# What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests
132 lines
4.6 KiB
Markdown
132 lines
4.6 KiB
Markdown
# Test Results Report
|
|
|
|
*Generated on: 2025-04-10 16:48:18*
|
|
|
|
*This report was generated by running `python tests/verifications/generate_report.py`*
|
|
|
|
## Legend
|
|
|
|
- ✅ - Test passed
|
|
- ❌ - Test failed
|
|
- ⚪ - Test not applicable or not run for this model
|
|
|
|
|
|
## Summary
|
|
|
|
| Provider | Pass Rate | Tests Passed | Total Tests |
|
|
| --- | --- | --- | --- |
|
|
| Together | 64.7% | 22 | 34 |
|
|
| Fireworks | 82.4% | 28 | 34 |
|
|
| Openai | 100.0% | 24 | 24 |
|
|
|
|
|
|
|
|
## Together
|
|
|
|
*Tests run on: 2025-04-10 16:46:35*
|
|
|
|
```bash
|
|
# Run all tests for this provider:
|
|
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -v
|
|
|
|
# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
|
|
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -k "test_chat_non_streaming_basic and earth"
|
|
```
|
|
|
|
|
|
**Model Key (Together)**
|
|
|
|
| Display Name | Full Model ID |
|
|
| --- | --- |
|
|
| Llama-3.3-70B-Instruct | `meta-llama/Llama-3.3-70B-Instruct-Turbo` |
|
|
| Llama-4-Maverick-Instruct | `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` |
|
|
| Llama-4-Scout-Instruct | `meta-llama/Llama-4-Scout-17B-16E-Instruct` |
|
|
|
|
|
|
| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-Instruct | Llama-4-Scout-Instruct |
|
|
| --- | --- | --- | --- |
|
|
| test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_image | ⚪ | ✅ | ✅ |
|
|
| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_tool_calling | ✅ | ✅ | ✅ |
|
|
| test_chat_streaming_basic (earth) | ✅ | ❌ | ❌ |
|
|
| test_chat_streaming_basic (saturn) | ✅ | ❌ | ❌ |
|
|
| test_chat_streaming_image | ⚪ | ❌ | ❌ |
|
|
| test_chat_streaming_structured_output (calendar) | ✅ | ❌ | ❌ |
|
|
| test_chat_streaming_structured_output (math) | ✅ | ❌ | ❌ |
|
|
| test_chat_streaming_tool_calling | ✅ | ❌ | ❌ |
|
|
|
|
## Fireworks
|
|
|
|
*Tests run on: 2025-04-10 16:44:44*
|
|
|
|
```bash
|
|
# Run all tests for this provider:
|
|
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -v
|
|
|
|
# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
|
|
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -k "test_chat_non_streaming_basic and earth"
|
|
```
|
|
|
|
|
|
**Model Key (Fireworks)**
|
|
|
|
| Display Name | Full Model ID |
|
|
| --- | --- |
|
|
| Llama-3.3-70B-Instruct | `accounts/fireworks/models/llama-v3p3-70b-instruct` |
|
|
| Llama-4-Maverick-Instruct | `accounts/fireworks/models/llama4-maverick-instruct-basic` |
|
|
| Llama-4-Scout-Instruct | `accounts/fireworks/models/llama4-scout-instruct-basic` |
|
|
|
|
|
|
| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-Instruct | Llama-4-Scout-Instruct |
|
|
| --- | --- | --- | --- |
|
|
| test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_image | ⚪ | ✅ | ✅ |
|
|
| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ |
|
|
| test_chat_non_streaming_tool_calling | ❌ | ❌ | ❌ |
|
|
| test_chat_streaming_basic (earth) | ✅ | ✅ | ✅ |
|
|
| test_chat_streaming_basic (saturn) | ✅ | ✅ | ✅ |
|
|
| test_chat_streaming_image | ⚪ | ✅ | ✅ |
|
|
| test_chat_streaming_structured_output (calendar) | ✅ | ✅ | ✅ |
|
|
| test_chat_streaming_structured_output (math) | ✅ | ✅ | ✅ |
|
|
| test_chat_streaming_tool_calling | ❌ | ❌ | ❌ |
|
|
|
|
## Openai
|
|
|
|
*Tests run on: 2025-04-10 16:47:28*
|
|
|
|
```bash
|
|
# Run all tests for this provider:
|
|
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -v
|
|
|
|
# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
|
|
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -k "test_chat_non_streaming_basic and earth"
|
|
```
|
|
|
|
|
|
**Model Key (Openai)**
|
|
|
|
| Display Name | Full Model ID |
|
|
| --- | --- |
|
|
| gpt-4o | `gpt-4o` |
|
|
| gpt-4o-mini | `gpt-4o-mini` |
|
|
|
|
|
|
| Test | gpt-4o | gpt-4o-mini |
|
|
| --- | --- | --- |
|
|
| test_chat_non_streaming_basic (earth) | ✅ | ✅ |
|
|
| test_chat_non_streaming_basic (saturn) | ✅ | ✅ |
|
|
| test_chat_non_streaming_image | ✅ | ✅ |
|
|
| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ |
|
|
| test_chat_non_streaming_structured_output (math) | ✅ | ✅ |
|
|
| test_chat_non_streaming_tool_calling | ✅ | ✅ |
|
|
| test_chat_streaming_basic (earth) | ✅ | ✅ |
|
|
| test_chat_streaming_basic (saturn) | ✅ | ✅ |
|
|
| test_chat_streaming_image | ✅ | ✅ |
|
|
| test_chat_streaming_structured_output (calendar) | ✅ | ✅ |
|
|
| test_chat_streaming_structured_output (math) | ✅ | ✅ |
|
|
| test_chat_streaming_tool_calling | ✅ | ✅ |
|